Skip to content

Commit

Permalink
Merge pull request #2704 from ROCm/develop-upstream-sync-241001
Browse files Browse the repository at this point in the history
Develop upstream sync 241001
  • Loading branch information
mmakevic-amd authored Oct 13, 2024
2 parents 1537a0c + cece9eb commit 7ae1692
Show file tree
Hide file tree
Showing 2,047 changed files with 53,657 additions and 29,083 deletions.
6 changes: 4 additions & 2 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,8 @@ build:android_x86_64 --fat_apk_cpu=x86_64
# Build everything statically for Android since all static libs are later
# bundled together into a single .so for deployment.
build:android --dynamic_mode=off
# TODO(belitskiy): Remove once on Clang 20.
build:android --define=xnn_enable_avxvnniint8=false

# Sets the default Apple platform to macOS.
build:macos --apple_platform_type=macos
Expand Down Expand Up @@ -245,6 +247,8 @@ build:cuda_clang --copt=-Qunused-arguments
# major release. Example: sm_80 kernels can run on sm_89 GPUs but
# not on sm_90 GPUs. compute_80 kernels though can also run on sm_90 GPUs.
build:cuda_clang --repo_env=HERMETIC_CUDA_COMPUTE_CAPABILITIES="sm_60,sm_70,sm_80,sm_89,compute_90"
# Permit newer CUDA versions than Clang is aware of
build:cuda_clang --copt="-Wno-unknown-cuda-version"
# Set lld as the linker.
build:cuda_clang --host_linkopt="-fuse-ld=lld"
build:cuda_clang --host_linkopt="-lm"
Expand Down Expand Up @@ -354,8 +358,6 @@ build:linux --copt="-Werror=unused-result"
# Add switch as an error on Linux.
build:linux --copt="-Wswitch"
build:linux --copt="-Werror=switch"
# Required for building with clang
build:linux --copt="-Wno-error=unused-but-set-variable"
# We have some invalid linker scripts in the build,
# so we need to disable this check
build:linux --linkopt=-Wl,--undefined-version
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ tensorflow/contrib/cmake/_build/
/api_init_files_list.txt
/estimator_api_init_files_list.txt
*.whl
dist

# Android
.gradle
Expand Down
113 changes: 60 additions & 53 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Release 2.18.0
# Release 2.19.0

## TensorFlow

Expand All @@ -9,26 +9,6 @@
* <DOCUMENT BREAKING CHANGES HERE>
* <THIS SECTION SHOULD CONTAIN API, ABI AND BEHAVIORAL BREAKING CHANGES>

* `tf.lite`
* C API:
* An optional, fourth parameter was added `TfLiteOperatorCreate` as a step
forward towards a cleaner API for `TfLiteOperator`. Function
`TfLiteOperatorCreate` was added recently, in TensorFlow Lite version 2.17.0,
released on 7/11/2024, and we do not expect there will be much code using this
function yet. Any code breakages can be easily resolved by passing nullptr as
the new, 4th parameter.
* SignatureRunner is now supported for models with no signatures.

* TensorRT support is disabled in CUDA builds for code health improvement.

* Hermetic CUDA support is added.

Hermetic CUDA uses a specific downloadable version of CUDA instead of the
user’s locally installed CUDA. Bazel will download CUDA, CUDNN and NCCL
distributions, and then use CUDA libraries and tools as dependencies in
various Bazel targets. This enables more reproducible builds for Google ML
projects and supported CUDA versions.

### Known Caveats

* <CAVEATS REGARDING THE RELEASE (BUT NOT BREAKING CHANGES).>
Expand All @@ -40,44 +20,12 @@
* <INSERT MAJOR FEATURE HERE, USING MARKDOWN SYNTAX>
* <IF RELEASE CONTAINS MULTIPLE FEATURES FROM SAME AREA, GROUP THEM TOGETHER>

* `tf.lite`:
* The LiteRT [repo](https://github.com/google-ai-edge/LiteRT) is
live (see [announcement](https://developers.googleblog.com/en/tensorflow-lite-is-now-litert/)), which means that in the coming months there will be changes to the development experience
for TFLite. The TF Lite Runtime source will be moved later this year,
and sometime after that we will start accepting contributions through that repo.

### Bug Fixes and Other Changes

* <SIMILAR TO ABOVE SECTION, BUT FOR OTHER IMPORTANT CHANGES / BUG FIXES>
* <IF A CHANGE CLOSES A GITHUB ISSUE, IT SHOULD BE DOCUMENTED HERE>
* <NOTES SHOULD BE GROUPED PER AREA>

* `tf.data`
* Add optional `synchronous` argument to `map`, to specify that the `map`
should run synchronously, as opposed to be parallelizable when
`options.experimental_optimization.map_parallelization=True`. This saves
memory compared to setting `num_parallel_calls=1`.
* Add optional `use_unbounded_threadpool` argument to `map`, to specify that
the `map` should use an unbounded threadpool instead of the default pool
that is based on the number of cores on the machine. This can improve
throughput for map functions which perform IO or otherwise release the
CPU.
* Add [`tf.data.experimental.get_model_proto`](https://www.tensorflow.org/api_docs/python/tf/data/experimental/get_model_proto)
to allow users to peek into the analytical model inside of a dataset
iterator.

* `tf.lite`
* `Dequantize` op supports `TensorType_INT4`.
* This change includes per-channel dequantization.
* Add support for `stablehlo.composite`.
* `EmbeddingLookup` op supports per-channel
quantization and `TensorType_INT4` values.
* `FullyConnected` op supports `TensorType_INT16` activation and
`TensorType_Int4` weight per-channel quantization.

* `tf.tensor_scatter_update`, `tf.tensor_scatter_add` and of other reduce types.
* Support `bad_indices_policy`.

## Keras

<INSERT SMALL BLURB ABOUT RELEASE FOCUS AREA AND POTENTIAL TOOLCHAIN CHANGES>
Expand Down Expand Up @@ -110,6 +58,65 @@ This release contains contributions from many people at Google, as well as:

<INSERT>, <NAME>, <HERE>, <USING>, <GITHUB>, <HANDLE>

# Release 2.18.0

## TensorFlow

### Breaking Changes

* `tf.lite`
* C API:
* An optional, fourth parameter was added `TfLiteOperatorCreate` as a step forward towards a cleaner API for `TfLiteOperator`. Function `TfLiteOperatorCreate` was added recently, in TensorFlow Lite version 2.17.0, released on 7/11/2024, and we do not expect there will be much code using this function yet. Any code breakages can be easily resolved by passing nullptr as the new, 4th parameter.
* SignatureRunner is now supported for models with no signatures.

* TensorRT support is disabled in CUDA builds for code health improvement.

* TensorFlow now supports and is compiled with NumPy 2.0 by default. Please see the [NumPy 2 release notes](https://numpy.org/doc/stable/release/2.0.0-notes.html) and the [NumPy 2 migration guide](https://numpy.org/devdocs/numpy_2_0_migration_guide.html#numpy-2-migration-guide).
* Note that NumPy's type promotion rules have been changed(See [NEP 50](https://numpy.org/neps/nep-0050-scalar-promotion.html#nep50)for details). This may change the precision at which computations happen, leading either to type errors or to numerical changes to results.
* Tensorflow will continue to support NumPy 1.26 until 2025, aligning with community standard deprecation timeline [here](https://scientific-python.org/specs/spec-0000/).

* Hermetic CUDA support is added.

Hermetic CUDA uses a specific downloadable version of CUDA instead of the user’s locally installed CUDA. Bazel will download CUDA, CUDNN and NCCL distributions, and then use CUDA libraries and tools as dependencies in various Bazel targets. This enables more reproducible builds for Google ML projects and supported CUDA versions.

### Known Caveats

### Major Features and Improvements

* `tf.lite`:
* The LiteRT [repo](https://github.com/google-ai-edge/LiteRT) is live (see [announcement](https://developers.googleblog.com/en/tensorflow-lite-is-now-litert/)), which means that in the coming months there will be changes to the development experience for TFLite. The TF Lite Runtime source will be moved later this year, and sometime after that we will start accepting contributions through that repo.

### Bug Fixes and Other Changes

* `tf.data`
* Add optional `synchronous` argument to `map`, to specify that the `map` should run synchronously, as opposed to be parallelizable when `options.experimental_optimization.map_parallelization=True`. This saves memory compared to setting `num_parallel_calls=1`.
* Add optional `use_unbounded_threadpool` argument to `map`, to specify that the `map` should use an unbounded threadpool instead of the default pool that is based on the number of cores on the machine. This can improve throughput for map functions which perform IO or otherwise release the CPU.
* Add [`tf.data.experimental.get_model_proto`](https://www.tensorflow.org/api_docs/python/tf/data/experimental/get_model_proto) to allow users to peek into the analytical model inside of a dataset iterator.

* `tf.lite`
* `Dequantize` op supports `TensorType_INT4`.
* This change includes per-channel dequantization.
* Add support for `stablehlo.composite`.
* `EmbeddingLookup` op supports per-channel quantization and `TensorType_INT4` values.
* `FullyConnected` op supports `TensorType_INT16` activation and `TensorType_Int4` weight per-channel quantization.

* `tf.tensor_scatter_update`, `tf.tensor_scatter_add` and of other reduce types.
* Support `bad_indices_policy`.

## Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Akhil Goel, akhilgoe, Alexander Pivovarov, Amir Samani, Andrew Goodbody, Andrey Portnoy, Anthony Platanios, bernardoArcari, Brett Taylor, buptzyb, Chao, Christian Clauss, Cocoa, Daniil Kutz, Darya Parygina, dependabot[bot], Dimitris Vardoulakis, Dragan Mladjenovic, Elfie Guo, eukub, Faijul Amin, flyingcat, Frédéric Bastien, ganyu.08, Georg Stefan Schmid, Grigory Reznikov, Harsha H S, Harshit Monish, Heiner, Ilia Sergachev, Jan, Jane Liu, Jaroslav Sevcik, Kaixi Hou, Kanvi Khanna, Kristof Maar, Kristóf Maár, LakshmiKalaKadali, Lbertho-Gpsw, lingzhi98, MarcoFalke, Masahiro Hiramori, Mmakevic-Amd, mraunak, Nobuo Tsukamoto, Notheisz57, Olli Lupton, Pearu Peterson, pemeliya, Peyara Nando, Philipp Hack, Phuong Nguyen, Pol Dellaiera, Rahul Batra, Ruturaj Vaidya, sachinmuradi, Sergey Kozub, Shanbin Ke, Sheng Yang, shengyu, Shraiysh, Shu Wang, Surya, sushreebarsa, Swatheesh-Mcw, syzygial, Tai Ly, terryysun, tilakrayal, Tj Xu, Trevor Morris, Tzung-Han Juang, wenchenvincent, wondertx, Xuefei Jiang, Ye Huang, Yimei Sun, Yunlong Liu, Zahid Iqbal, Zhan Lu, Zoranjovanovic-Ns, Zuri Obozuwa

# Release 2.17.1

### Bug Fixes and Other Changes

* Add necessary header files in the aar library. These are needed if developers build apps with header files unpacked from tflite aar files from maven.
* Implement Name() for GCSWritableFile to fix the profiler trace viewer cache file generation.
* Fix `cstring.h` missing file issue with the Libtensorflow archive.

# Release 2.17.0

## TensorFlow
Expand Down
6 changes: 6 additions & 0 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ load("@local_xla//third_party/py:python_init_repositories.bzl", "python_init_rep

python_init_repositories(
default_python_version = "system",
local_wheel_dist_folder = "dist",
local_wheel_inclusion_list = [
"tensorflow*",
"tf_nightly*",
],
local_wheel_workspaces = ["//:WORKSPACE"],
requirements = {
"3.9": "//:requirements_lock_3_9.txt",
"3.10": "//:requirements_lock_3_10.txt",
Expand Down
44 changes: 44 additions & 0 deletions ci/official/containers/ml_build/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
################################################################################
FROM ubuntu:22.04@sha256:58b87898e82351c6cf9cf5b9f3c20257bb9e2dcf33af051e12ce532d7f94e3fe AS devel
################################################################################

# Install devtoolset build dependencies
COPY setup.sources.sh /setup.sources.sh
COPY setup.packages.sh /setup.packages.sh
COPY builder.packages.txt /builder.packages.txt

RUN /setup.sources.sh && /setup.packages.sh /builder.packages.txt

# Install devtoolset-9 in /dt9 with glibc 2.17 and libstdc++ 4.8, for building
# manylinux2014-compatible packages.
COPY builder.devtoolset/fixlinks.sh /fixlinks.sh
COPY builder.devtoolset/rpm-patch.sh /rpm-patch.sh
COPY builder.devtoolset/build_devtoolset.sh /build_devtoolset.sh
COPY builder.devtoolset/glibc2.17-inline.patch /glibc2.17-inline.patch
RUN /build_devtoolset.sh devtoolset-9 /dt9

# Make sure clang is on the path
RUN ln -s /usr/lib/llvm-18/bin/clang /usr/bin/clang

# Install various tools.
# - bats: bash unit testing framework
# - bazelisk: always use the correct bazel version
# - buildifier: clean bazel build deps
# - buildozer: clean bazel build deps
# - gcloud SDK: communicate with Google Cloud Platform (GCP) for RBE, CI
# - patchelf: Utility tool to modify existing ELF executables and libraries
RUN git clone --branch v1.11.0 https://github.com/bats-core/bats-core.git && bats-core/install.sh /usr/local && rm -rf bats-core
RUN wget https://github.com/bazelbuild/bazelisk/releases/download/v1.21.0/bazelisk-linux-amd64 -O /usr/local/bin/bazel && chmod +x /usr/local/bin/bazel
RUN wget https://github.com/bazelbuild/buildtools/releases/download/v7.3.1/buildifier-linux-amd64 -O /usr/local/bin/buildifier && chmod +x /usr/local/bin/buildifier
RUN wget https://github.com/bazelbuild/buildtools/releases/download/v7.3.1/buildozer-linux-amd64 -O /usr/local/bin/buildozer && chmod +x /usr/local/bin/buildozer
RUN curl -sSL https://sdk.cloud.google.com > /tmp/gcloud && bash /tmp/gcloud --install-dir=~/usr/local/bin --disable-prompts
# Download and install patchelf v0.18.0 from GitHub. The default Ubuntu focal
# packages only provide the "0.10-2build1" version. We use patchelf to manipulate
# certain shared libraries during the wheel building process (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/pip_package/build_pip_package.sh#L255-L262).
# When we use Patchelf versions <0.12, those shared libraries end up with a
# corrupted PT_NOTE program header. This was fixed in v0.12, see https://github.com/NixOS/patchelf/commit/43a33482b501b0f5ee9da312aabfca3806570cc9.
RUN wget https://github.com/NixOS/patchelf/releases/download/0.18.0/patchelf-0.18.0-x86_64.tar.gz && tar -zxvf patchelf-0.18.0-x86_64.tar.gz -C /usr && rm -rf patchelf-0.18.0-x86_64.tar.gz

# Don't use the bazel cache when a new docker image is created.
RUN echo build --action_env=DOCKER_CACHEBUSTER=$(date +%s%N)$RANDOM >> /etc/bazel.bazelrc
RUN echo build --host_action_env=DOCKER_HOST_CACHEBUSTER=$(date +%s%N)$RANDOM >> /etc/bazel.bazelrc
8 changes: 8 additions & 0 deletions ci/official/containers/ml_build/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
WIP ML Build Docker container for ML repositories (Tensorflow, JAX and XLA).

This container branches off from
/tensorflow/tools/tf_sig_build_dockerfiles/. However, since
hermetic CUDA and hermetic Python is now available for Tensorflow, a lot of the
requirements installed on the original container can be removed to reduce the
footprint of the container and make it more reusable across different ML
repositories.
Loading

0 comments on commit 7ae1692

Please sign in to comment.