Skip to content

Commit

Permalink
Merge branch 'main' into improve-experimental-resize-performance
Browse files Browse the repository at this point in the history
  • Loading branch information
banasraf authored Oct 29, 2024
2 parents 44b9ca2 + bf7a0a5 commit 5e38b78
Show file tree
Hide file tree
Showing 241 changed files with 2,355 additions and 1,410 deletions.
34 changes: 34 additions & 0 deletions Acknowledgements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4410,3 +4410,37 @@ products or services of Licensee, or any third party.
8. By copying, installing or otherwise using Python, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.

==============================================================================
str2bool


BSD 3-Clause License

Copyright (c) 2017, SymonSoft
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2 changes: 1 addition & 1 deletion DALI_DEPS_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
6d93550b1340c2010fc356b1e16ab6e4dfdc27c0
a72649c13fc6282960976760e4b88b1d315d3528
9 changes: 6 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,18 +114,19 @@ DALI success stories:
---------------------

- `During Kaggle computer vision competitions <https://www.kaggle.com/code/theoviel/rsna-breast-baseline-faster-inference-with-dali>`__:
`"*DALI is one of the best things I have learned in this competition*" <https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/391059>`__
`"DALI is one of the best things I have learned in this competition" <https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/391059>`__
- `Lightning Pose - state of the art pose estimation research model <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168383/>`__
- `To improve the resource utilization in Advanced Computing Infrastructure <https://arcwiki.rs.gsu.edu/en/dali/using_nvidia_dali_loader>`__
- `MLPerf - the industry standard for benchmarking compute and deep learning hardware and software <https://developer.nvidia.com/blog/mlperf-hpc-v1-0-deep-dive-into-optimizations-leading-to-record-setting-nvidia-performance/>`__
- `"we optimized major models inside eBay with the DALI framework" <https://www.nvidia.com/en-us/on-demand/session/gtc24-s62578/>`__

----

DALI Roadmap
------------

`The following issue represents <https://github.com/NVIDIA/DALI/issues/4578>`__ a high-level overview of our 2023 plan. You should be aware that this
roadmap may change at any time and the order below does not reflect any type of priority.
`The following issue represents <https://github.com/NVIDIA/DALI/issues/5320>`__ a high-level overview of our 2024 plan. You should be aware that this
roadmap may change at any time and the order of its items does not reflect any type of priority.

We strongly encourage you to comment on our roadmap and provide us feedback on the mentioned
GitHub issue.
Expand Down Expand Up @@ -177,6 +178,8 @@ depending on your version.
Additional Resources
--------------------

- GPU Technology Conference 2024; **Optimizing Inference Model Serving for Highest Performance at eBay**; Yiheng Wang:
`event <https://www.nvidia.com/en-us/on-demand/session/gtc24-s62578/>`__
- GPU Technology Conference 2023; **Developer Breakout: Accelerating Enterprise Workflows With Triton Server and DALI**; Brandon Tuttle:
`event <https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52140/>`__.
- GPU Technology Conference 2023; **GPU-Accelerating End-to-End Geospatial Workflows**; Kevin Green:
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.43.0dev
1.44.0dev
3 changes: 1 addition & 2 deletions cmake/lint.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,9 @@ add_custom_target(lint-python-flake
COMMENT
"Performing Python linter check"
)
add_dependencies(lint-python-flake lint-python-black)

add_custom_target(lint-python)
add_dependencies(lint-python lint-python-flake lint-python-bandit)
add_dependencies(lint-python lint-python-black lint-python-flake lint-python-bandit)

add_custom_target(lint)
add_dependencies(lint lint-cpp lint-python)
5 changes: 5 additions & 0 deletions conda/dali_native_libs/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,11 @@ requirements:
# Since we link statically, we need to add those dependencies explicitly
- libwebp-base
- openjpeg
# libprotobuf-static we link statically depends on libabseil so add libprotobuf here as a runtime
# dependency to install the right version on the libabseil (as protobuf depends on
# libprotobuf-static and a newer version of libprotobuf-static may be available than
# the protobuf was build with)
- libprotobuf =5.27.4
- cfitsio
- nvidia-nvimagecodec-cuda{{ environ.get('CUDA_VERSION', '') | replace(".","") }}

Expand Down
7 changes: 7 additions & 0 deletions conda/dali_python_bindings/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,21 @@ requirements:
- astunparse >=1.6.0
- gast >=0.3.3
- dm-tree >=0.1.8
- packaging
- nvidia-dali-core{% if environ.get('NVIDIA_DALI_BUILD_FLAVOR', '')|length %}{{"-" + environ.get('NVIDIA_DALI_BUILD_FLAVOR', '')}}{% endif %}-cuda{{ environ.get('CUDA_VERSION', '') | replace(".","") }} ={{ environ.get('DALI_CONDA_BUILD_VERSION', '') }}
- nvidia-nvimagecodec-cuda{{ environ.get('CUDA_VERSION', '') | replace(".","") }}
run:
- python
# libprotobuf-static we link statically depends on libabseil so add libprotobuf here as a runtime
# dependency to install the right version on the libabseil (as protobuf depends on
# libprotobuf-static and a newer version of libprotobuf-static may be available than
# the protobuf was build with)
- libprotobuf =5.27.4
- future
- astunparse >=1.6.0
- gast >=0.3.3
- dm-tree >=0.1.8
- packaging
- nvidia-dali-core{% if environ.get('NVIDIA_DALI_BUILD_FLAVOR', '')|length %}{{"-" + environ.get('NVIDIA_DALI_BUILD_FLAVOR', '')}}{% endif %}-cuda{{ environ.get('CUDA_VERSION', '') | replace(".","") }} ={{ environ.get('DALI_CONDA_BUILD_VERSION', '') }}
- nvidia-nvimagecodec-cuda{{ environ.get('CUDA_VERSION', '') | replace(".","") }}
about:
Expand Down
8 changes: 4 additions & 4 deletions conda/third_party/dali_ffmpeg/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.

{% set build_version = "7.0.2" %}
{% set build_version = "7.1" %}

package:
name: dali-ffmpeg
version: {{ build_version }}

source:
fn: FFmpeg-n7.0.2.tar.gz
url: https://developer.download.nvidia.com/compute/redist/nvidia-dali/FFmpeg-n7.0.2.tar.gz
sha256: 5eb46d18d664a0ccadf7b0adee03bd3b7fa72893d667f36c69e202a807e6d533
fn: FFmpeg-n7.1.tar.gz
url: https://developer.download.nvidia.com/compute/redist/nvidia-dali/FFmpeg-n7.1.tar.gz
sha256: 7ddad2d992bd250a6c56053c26029f7e728bebf0f37f80cf3f8a0e6ec706431a

build:
number: 0
Expand Down
2 changes: 1 addition & 1 deletion conda/third_party/jpeg_turbo/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

{% set build_version = "3.0.3" %}
{% set build_version = "3.0.90" %}

package:
name: jpeg-turbo
Expand Down
3 changes: 1 addition & 2 deletions dali/benchmark/operator_bench.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ class OperatorBench : public DALIBenchmark {
template <typename OutputContainer, typename OperatorPtr, typename Workspace>
void Setup(OperatorPtr &op_ptr, const OpSpec &spec, Workspace &ws, int batch_size) {
std::vector<OutputDesc> outputs;
bool can_infer_outs = op_ptr->CanInferOutputs();
if (op_ptr->Setup(outputs, ws) && can_infer_outs) {
if (op_ptr->Setup(outputs, ws)) {
int num_out = outputs.size();
for (int i = 0; i < num_out; i++) {
auto data_out = std::make_shared<OutputContainer>(batch_size);
Expand Down
51 changes: 50 additions & 1 deletion dali/core/cuda_event_pool_test.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand All @@ -20,6 +20,7 @@
#include "dali/core/cuda_error.h"
#include "dali/core/cuda_event_pool.h"
#include "dali/core/cuda_stream.h"
#include "dali/core/cuda_shared_event.h"

namespace dali {
namespace test {
Expand Down Expand Up @@ -58,5 +59,53 @@ TEST(EventPoolTest, PutGet) {
t.join();
}

TEST(CUDASharedEventTest, RefCounting) {
int devices = 0;
(void)cudaGetDeviceCount(&devices);
if (devices == 0) {
(void)cudaGetLastError(); // No CUDA devices - we don't care about the error
GTEST_SKIP();
}

CUDASharedEvent ev1 = CUDASharedEvent::GetFromPool();
CUDASharedEvent ev2 = CUDASharedEvent::GetFromPool();
ASSERT_EQ(ev1, ev1.get()) << "Sanity check failed - object not equal to itself.";
ASSERT_NE(ev1.get(), nullptr) << "Sanity check failed - returned null instead of throwing.";
ASSERT_NE(ev2.get(), nullptr) << "Sanity check failed - returned null instead of throwing.";
ASSERT_NE(ev1, nullptr) << "Sanity check failed - comparison to null broken.";
ASSERT_NE(ev1, ev2) << "Sanity check failed - returned the same object twice.";

EXPECT_EQ(ev1.use_count(), 1);
EXPECT_EQ(ev2.use_count(), 1);
CUDASharedEvent ev3 = ev1;
EXPECT_EQ(ev1, ev3);
EXPECT_EQ(ev1.use_count(), 2);
EXPECT_EQ(ev3.use_count(), 2);
ev1.reset();
EXPECT_EQ(ev1.use_count(), 0);
EXPECT_EQ(ev3.use_count(), 1);
}

TEST(CUDASharedEventTest, ReturnToPool) {
int devices = 0;
(void)cudaGetDeviceCount(&devices);
if (devices == 0) {
(void)cudaGetLastError(); // No CUDA devices - we don't care about the error
GTEST_SKIP();
}

CUDAEventPool pool;

CUDASharedEvent ev1 = CUDASharedEvent::GetFromPool(pool);
EXPECT_NE(ev1, nullptr);
cudaEvent_t orig = ev1.get();
ev1.reset();
EXPECT_EQ(ev1, nullptr);
CUDASharedEvent ev2 = CUDASharedEvent::GetFromPool(pool);
EXPECT_EQ(ev2.get(), orig) << "Should have got the sole event from the pool";
ev1 = CUDASharedEvent::GetFromPool(pool);
EXPECT_NE(ev1, ev2);
}

} // namespace test
} // namespace dali
46 changes: 46 additions & 0 deletions dali/core/exec/tasking_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,52 @@ TEST(TaskingTest, MultiOutputTuple) {
EXPECT_EQ(ret, 1 + 3 + 42 + 5 + 10);
}

TEST(TaskingTest, ZeroResults) {
Executor ex(4);
ex.Start();
auto producer1 = Task::Create(0, []() {
return std::tuple<>();
});
auto producer2 = Task::Create(0, []() {
return std::vector<std::any>();
});

auto consumer = Task::Create([]() { });
consumer->Succeed(producer1);
consumer->Succeed(producer2);

ex.AddSilentTask(producer1);
ex.AddSilentTask(producer2);
auto fut = ex.AddTask(consumer);
EXPECT_NO_THROW(fut.Value<void>());
}

TEST(TaskingTest, ZeroResultsThrow) {
Executor ex(4);
ex.Start();
auto producer1 = Task::Create(0, []() {
return std::tuple<>();
});
class MyError {};
auto producer2 = Task::Create(0, []() {
throw MyError();
return std::tuple<>();
});

auto consumer = Task::Create([](Task *t) {
t->GetInputValue<void>(0);
t->GetInputValue<void>(1);
});
consumer->Subscribe(producer1);
consumer->Subscribe(producer2);

ex.AddSilentTask(producer1);
ex.AddSilentTask(producer2);
auto fut = ex.AddTask(consumer);
EXPECT_THROW(fut.Value<void>(), MyError);
}


namespace {

template <typename T>
Expand Down
1 change: 1 addition & 0 deletions dali/kernels/erase/erase_gpu_test.cu
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <complex>
#include <tuple>
#include <vector>
#include <iomanip>

#include "dali/kernels/common/utils.h"
#include "dali/kernels/erase/erase_gpu.h"
Expand Down
1 change: 0 additions & 1 deletion dali/kernels/slice/slice_flip_normalize_permute_pad_cpu.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
#include "dali/kernels/kernel.h"
#include "dali/kernels/slice/slice_flip_normalize_permute_pad_common.h"
#include "dali/kernels/slice/slice_kernel_utils.h"
#include "dali/util/half.hpp"

namespace dali {
namespace kernels {
Expand Down
1 change: 0 additions & 1 deletion dali/operators/audio/mel_scale/mel_filter_bank.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ class MelFilterBank : public StatelessOperator<Backend> {
}

protected:
bool CanInferOutputs() const override { return true; }
bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) override;
void RunImpl(Workspace &ws) override;

Expand Down
1 change: 0 additions & 1 deletion dali/operators/audio/mfcc/mfcc.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ class MFCC : public StatelessOperator<Backend> {
: StatelessOperator<Backend>(spec) {}

protected:
bool CanInferOutputs() const override { return true; }
bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) override;
void RunImpl(Workspace &ws) override;

Expand Down
4 changes: 0 additions & 4 deletions dali/operators/audio/nonsilence_op.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,6 @@ class NonsilenceOperator : public StatelessOperator<Backend> {
StatelessOperator<Backend>(spec) {}


bool CanInferOutputs() const override {
return true;
}

bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) override {
AcquireArgs(spec_, ws);
TensorShape<> scalar_shape = {};
Expand Down
4 changes: 0 additions & 4 deletions dali/operators/audio/preemphasis_filter_op.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,6 @@ class PreemphasisFilter : public StatelessOperator<Backend> {
~PreemphasisFilter() override = default;
DISABLE_COPY_MOVE_ASSIGN(PreemphasisFilter);

bool CanInferOutputs() const override {
return true;
}

bool SetupImpl(std::vector<::dali::OutputDesc> &output_desc,
const Workspace &ws) override {
const auto &input = ws.Input<Backend>(0);
Expand Down
4 changes: 0 additions & 4 deletions dali/operators/audio/resample.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,6 @@ class ResampleBase : public StatelessOperator<Backend> {
}
}

bool CanInferOutputs() const override {
return true;
}

bool SetupImpl(std::vector<OutputDesc> &outputs, const Workspace &ws) override {
outputs.resize(1);
if (dtype_ == DALI_NO_TYPE)
Expand Down
4 changes: 0 additions & 4 deletions dali/operators/bbox/bb_flip.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ class BbFlip : public StatelessOperator<Backend> {
DISABLE_COPY_MOVE_ASSIGN(BbFlip);

protected:
bool CanInferOutputs() const override {
return true;
}

bool SetupImpl(std::vector<OutputDesc> &output_descs, const Workspace &ws) override {
const auto &input = ws.Input<Backend>(0);
DALI_ENFORCE(input.type() == DALI_FLOAT, "Bounding box in wrong format");
Expand Down
4 changes: 4 additions & 0 deletions dali/operators/bbox/bbox_paste.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ class BBoxPaste : public StatelessOperator<Backend> {
protected:
bool use_ltrb_ = false;

bool HasContiguousOutputs() const override {
return false;
}

bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) override {
return false;
}
Expand Down
4 changes: 4 additions & 0 deletions dali/operators/debug/dump_image.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ class DumpImage : public StatelessOperator<Backend> {
inline ~DumpImage() override = default;

protected:
bool HasContiguousOutputs() const override {
return false;
}

bool SetupImpl(std::vector<OutputDesc> &output_desc, const Workspace &ws) override {
return false;
}
Expand Down
4 changes: 0 additions & 4 deletions dali/operators/decoder/audio/audio_decoder_op.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,6 @@ class AudioDecoderCpu : public StatelessOperator<CPUBackend> {
void RunImpl(Workspace &ws) override;


bool CanInferOutputs() const override {
return true;
}


private:
template<typename OutputType>
Expand Down
Loading

0 comments on commit 5e38b78

Please sign in to comment.