Sync 231206 #2321

jayfurmanek · 2023-12-06T17:13:15Z

No description provided.

PiperOrigin-RevId: 585989541

This picks up, among other things, the fix for the invalid memcpy call in google/XNNPACK@07e1a4a The new XNNPACK requires a new cpuinfo, so update that too. PiperOrigin-RevId: 585992920

@xla-rotation

…tor private targets Imported from GitHub PR openxla/xla#7323 fixed rocm build due to openxla/xla@33fc605 @xla-rotation Copybara import of the project: -- ad859aa6fa0d44e2a7609eaee6bedbcd4d3968da by Chao Chen <cchen104@amd.com>: remove command_buffer and kernel links in rocm build Merging this change closes tensorflow#7323 PiperOrigin-RevId: 585997544

…nterface. Reimplement local collectives to utilize thread-parallelism, rather than having one thread do all the work. They are simpler this way! PiperOrigin-RevId: 585998597

triton_support - basic Triton support checks triton_tiling_propagation - The code for propagating the tilings in a functional paradigm triton_fusion_analysis - FusionContext and TritonFusionAnalysis gemm_rewriter_triton - GemmRewriterTriton PiperOrigin-RevId: 586006558

Adds parameter and return type annotations to the majority of public functions and methods in the `test_util` module. This includes annotations for methods on `TensorFlowTestCase` which return values, but omits the assertion methods. If adding types is currently infeasible (due to complexity of the signature, limitations of the supported versions of python, type checker limitations, etc.), then this change simply does not add those annotations. PiperOrigin-RevId: 586008562

…d we can remove the compatibility support here. PiperOrigin-RevId: 586020786

https://github.com/openxla/xla/pull/7277/files PiperOrigin-RevId: 586026438

Factored out a common pattern of mutating `NodeDef`s by iterating all node defs in a `GraphDef` into a templated function and applied it for both `enable_dump_tensor` and `change_dump_tensor_file_name`. PiperOrigin-RevId: 586028409

…ate* Also remove some unused #includes from the .cc files. Also use "= default" syntax for destructor. PiperOrigin-RevId: 586069299

… data size. This is an implementation detail. PiperOrigin-RevId: 586097271

There is apparently no feasible way of resolving the TODO comment. PiperOrigin-RevId: 586106981

PiperOrigin-RevId: 586130880

This adds the configs needed for us to be able to run a Linux Arm64 GitHub presubmit on incoming PRs. It runs tests by cross-compiling test binaries on remote Linux x86 VMs using RBE and then executing the built test binaries on the host Arm64 VM. On average, this presubmit should take about ~30 mins and is ~83% faster than the current GitHub Linux Arm64 presubmit (https://github.com/tensorflow/tensorflow/actions/workflows/arm-ci.yml). I have changed the name of the cross-compile env file to add the Python version it runs and to be consistent with other env names. PiperOrigin-RevId: 586144808

Imported from GitHub PR openxla/xla#7136 This PR add the `Allocate` command to command buffer. The `Allocate` command is constructed with the pointer to `BufferAllocation`. The allocation will be performed when the command is recorded, the allocated address will be tracked by command buffer runtime through allocation index. For the consumer commands who want to access the allocated buffer, the record parameter buffer address should be provided as se::DeviceMemoryBase with special address (LAZY_ALLOCATE_ADDRESS_MARKER) and non-zero size, and it can be created with API se::DeviceMemory<>::MakeLazyAllocAddressFromByteSize(byte_length); Below is an example how to construct command sequences that access buffers allocated inside command buffer: ``` BufferAllocation alloc_a(/*index=*/0, byte_length, /*color=*/0); BufferAllocation alloc_b(/*index=*/1, byte_length, /*color=*/0); BufferAllocation alloc_c(/*index=*/2, byte_length, /*color=*/0); BufferAllocation::Slice slice_a(&alloc_a, 0, byte_length); BufferAllocation::Slice slice_b(&alloc_b, 0, byte_length); BufferAllocation::Slice slice_c(&alloc_c, 0, byte_length); // Prepare commands sequence for constructing command buffer. CommandBufferCmdSequence commands; commands.Emplace<AllocateCmd>(&alloc_b); commands.Emplace<MemcpyDeviceToDeviceCmd>(slice_b, slice_a, byte_length); commands.Emplace<MemcpyDeviceToDeviceCmd>(slice_c, slice_b, byte_length); // Construct a thunk with command sequence. CommandBufferThunk thunk(std::move(commands), Thunk::ThunkInfo(nullptr)); // Prepare arguments: a=42, b=0 se::DeviceMemory<int32_t> a = executor->AllocateArray<int32_t>(length, 0); stream.ThenMemset32(&a, 42, byte_length); se::DeviceMemory<int32_t> b = se::DeviceMemory<int32_t>::MakeLazyAllocAddressFromByteSize(byte_length); se::DeviceMemory<int32_t> c = executor->AllocateArray<int32_t>(length, 0); BufferAllocations allocations({a, b, c}, 0, executor->GetAllocator()); ServiceExecutableRunOptions run_options; Thunk::ExecuteParams params(run_options, allocations, &stream, {}); // Execute command buffer thunk and verify that it copied the memory. TF_ASSERT_OK(thunk.ExecuteOnStream(params)); ``` For CUDA implementation, the command has no update parameters, which means that when the command is added to command buffer, the address range allocated for this command is fixed across command buffer launches. The `Allocation` command is only implemented for CUDA platform Copybara import of the project: -- d2cdd0423fe5947e06d8d7b8d5192a8845b2beae by Shawn Wang <shawnw@nvidia.com>: Add Allocate command to command buffer Merging this change closes tensorflow#7136 PiperOrigin-RevId: 586150993

These are not ready yet. PiperOrigin-RevId: 586160312

PiperOrigin-RevId: 586163360

…to generate sharding strategies. For those that cannot be, we rely on pre-existing convolution handling code. PiperOrigin-RevId: 586163568

…n in a While command PiperOrigin-RevId: 586164354

…ation test is skipped for now because the full support for convolution is not implemented. Refactored the target op quantization pattern matching to compatible with dot-like ops. PiperOrigin-RevId: 586166124

PiperOrigin-RevId: 586166961

PiperOrigin-RevId: 586169003

…vides built-in utilities for saving & loading). PiperOrigin-RevId: 586169032

PiperOrigin-RevId: 586173556

Our Mac builds require some specific build environment setup such as installing Bazelisk, upgrading Pyenv, installing Python, etc. Since these scripts are meant to be run by both internal CI builds and external users, we re-work some conditional logic that were previously only meant to run for internal CI builds. These will now instead use the `TFCI_*_ENABLE` variables. This makes the conditionals from being possibly confusing system checks in scripts to explicit settings in "envs" files and allows both internal CI builds and external users to decide if they want to enable or disable a particular macOS build environment setup task. PiperOrigin-RevId: 586173730

PiperOrigin-RevId: 586180704

PiperOrigin-RevId: 587849841

PiperOrigin-RevId: 587851943

…or_pass. PiperOrigin-RevId: 587856220

…tandalone utility PiperOrigin-RevId: 587857877

PiperOrigin-RevId: 587862987

Block and thread dimensions already available in device kernels, so there should no reason to add extra kernel parameters for them. For CUTLASS gemm args packing we know thread dimensions statically from an operation template. PiperOrigin-RevId: 587863262

PiperOrigin-RevId: 587865806

1. Deduplicate the postprocessing code for dots and convs. 2. Combine the InferInputShardingForTopK function with GetInputSharding function, and get rid of an unused parameter in the later. PiperOrigin-RevId: 587868586

…a standalone utility PiperOrigin-RevId: 587869254

…ding HloValue in the producer instruction, if the producer instruction is a tuple. PiperOrigin-RevId: 587874472

PiperOrigin-RevId: 587875856

We got unlucky and hit a seed which happens to fail the KS test. PiperOrigin-RevId: 587885112

PiperOrigin-RevId: 587889341

PiperOrigin-RevId: 587893584

PiperOrigin-RevId: 587903563

…rd-swish-fusion-fp32-bf16 PiperOrigin-RevId: 587906135

…ingerprint; this contains information (like solver wall time) that can vary between runs. PiperOrigin-RevId: 587906345

…_sinking by doing a prepass to detect whether to construct a fusion. PiperOrigin-RevId: 587914972

…ion library. This change fixes a rare issue where two component functions are registered on a remote eager context, and their function libraries contain a function with the same name but a different body. When this happens the second registration fails due to a duplicate function upon adding it to the context-wide `FunctionLibraryDefinition`. To avoid this problem, when registering a component function, we use the `FunctionDefLibrary` shipped to create a private `FunctionLibraryDefinition` for running that function. We can do this relatively easily because the eager `ClusterFunctionLibraryRuntime` ships all reachable functions along with the root component function; and we have long-standing support for instantiating a function with an "overlay" `FunctionLibraryDefinition`. The behavior matches the TF1 `ClusterFunctionLibraryRuntime`, which ships an entire private library as part of the subgraph it registers with a remote worker, and creates a new `FunctionLibraryDefinition` and `ProcessFunctionLibraryRuntime` for that subgraph. Note that support for removing a component function via the `ClusterFunctionLibraryRuntime` was previously unsupported. We rely on this to simplify the ownership of the private `FunctionLibraryDefinition` objects, which are owned by the `EagerContext` and never deleted. Future support for remove would likely require using refcounted or otherwise-shared `FunctionLibraryDefinition` objects in the FLR stack. (In our experience, the issue is the result of an MLIR rewrite that canonicalizes the same source function in two different ways, so e.g. the choice of retained node for common subexpression elimination is different, but the two versions are functionally equivalent. In principle, making that rewrite deterministic, or making it choose a new name for the rewritten function would also solve the problem. However, I prefer this approach because it is robust to less-than-perfect rewrite passes, and we have a lot of rewrite passes.) PiperOrigin-RevId: 587920362

We need to install Bazelisk and Pyenv manually as these are not present on the x86 Mac VMs. Note that the uploads from these new jobs are disabled as they are not yet ready. However, the old Mac x86 nightly builds will still be running and upload to tf-nightly so there won't be any missing nightly packages while we are doing this migration. PiperOrigin-RevId: 587930871

Conflicts: third_party/xla/xla/service/gpu/BUILD third_party/xla/xla/service/gpu/buffer_comparator_test.cc third_party/xla/xla/stream_executor/device_description.h third_party/xla/xla/stream_executor/rocm/hip_blas_lt.cc third_party/xla/xla/stream_executor/rocm/hip_blas_lt.h third_party/xla/xla/tests/BUILD

…nDef()`. Some compilers do not like using the name of a class as a method, which is fair enough. PiperOrigin-RevId: 588312567

draganmladjenovic · 2023-12-12T22:37:44Z

Retest Ubuntu-GPU-single please.
Retest Ubuntu-CPU please.

draganmladjenovic · 2023-12-13T14:14:09Z

Retest gpu-pycpp please.

mihaimaruseac and others added 30 commits November 28, 2023 07:53

Wrap line to 80 characters

ac14d91

Merge pull request tensorflow#62402 from tensorflow:sushreebarsa-patch-2

2acd3c1

PiperOrigin-RevId: 585989541

Fix duplicate checkpoint removal in Saver class

af0eaf0

Merge branch 'master' of https://github.com/tensorflow/tensorflow

5f3f90d

Update XNNPACK and cpuinfo version

6880b6a

This picks up, among other things, the fix for the invalid memcpy call in google/XNNPACK@07e1a4a The new XNNPACK requires a new cpuinfo, so update that too. PiperOrigin-RevId: 585992920

[XLA:CPU] Refactor local collectives into a separate file behind an i…

051bde1

…nterface. Reimplement local collectives to utilize thread-parallelism, rather than having one thread do all the work. They are simpler this way! PiperOrigin-RevId: 585998597

Failed compatibility tests were disabled for older version plugins an…

8407c14

…d we can remove the compatibility support here. PiperOrigin-RevId: 586020786

third_party/gpus changes from PR tensorflow#7277 that were missed

8d84d04

https://github.com/openxla/xla/pull/7277/files PiperOrigin-RevId: 586026438

Rewrite debugger functionalities in c++.

1436d4d

Factored out a common pattern of mutating `NodeDef`s by iterating all node defs in a `GraphDef` into a templated function and applied it for both `enable_dump_tensor` and `change_dump_tensor_file_name`. PiperOrigin-RevId: 586028409

Add missing #includes that define symbols referenced by simple_*deleg…

44ac2ac

…ate* Also remove some unused #includes from the .cc files. Also use "= default" syntax for destructor. PiperOrigin-RevId: 586069299

Make the allocated memory array size private, only allow querying the…

3c9304a

… data size. This is an implementation detail. PiperOrigin-RevId: 586097271

Remove a dangling TODO comment.

ddc4140

There is apparently no feasible way of resolving the TODO comment. PiperOrigin-RevId: 586106981

[xla:runtime] NFC: Remove runner library

37f9708

PiperOrigin-RevId: 586130880

Don't upload macOS Arm64 build artifacts.

14f8066

These are not ready yet. PiperOrigin-RevId: 586160312

Update XLA GPU config with NVCC compiler.

4f1d4bc

PiperOrigin-RevId: 586163360

For convolutions that can be interpreted as dots, rely on DotHandler …

8509ee8

…to generate sharding strategies. For those that cannot be, we rely on pre-existing convolution handling code. PiperOrigin-RevId: 586163568

[stream_executor] Record cond_builder before evaluating loop conditio…

192718f

…n in a While command PiperOrigin-RevId: 586164354

Implemented QuantizeConvolutionOpPattern and integration test, integr…

b67b97e

…ation test is skipped for now because the full support for convolution is not implemented. Refactored the target op quantization pattern matching to compatible with dot-like ops. PiperOrigin-RevId: 586166124

[stream_executor] Add Memset command to CommandBuffer

fc8a13b

PiperOrigin-RevId: 586166961

[stream_executor] Add support for updating Memcpy command parameters

16f2028

PiperOrigin-RevId: 586169003

Converts the AutoShardingSolverRequest object into a proto (which pro…

739d18a

…vides built-in utilities for saving & loading). PiperOrigin-RevId: 586169032

Merge branch 'tensorflow:master' into master

ec89cbd

Integrate StableHLO at openxla/stablehlo@83f095e7

0d829d7

PiperOrigin-RevId: 586173556

[stream_executor] NFC: Guard new features with CUDA_VERSION check

e1dbfeb

PiperOrigin-RevId: 586180704

tyb0807 and others added 20 commits December 4, 2023 15:15

[stream_executor] Remove GetSubBuffer

f2a4683

PiperOrigin-RevId: 587849841

[xla:ffi] Add XLA_FFI_Error_GetMessage API

57c33a2

PiperOrigin-RevId: 587851943

Add dialect verification to dialect to verify_input_dialect_to_execut…

cc8ae1a

…or_pass. PiperOrigin-RevId: 587856220

Refactor the splitting functionality out of XlaSplitNDBaseOp into a s…

2e289d3

…tandalone utility PiperOrigin-RevId: 587857877

[stream_executor] Use GetSlice to create sub-buffer

6acc044

PiperOrigin-RevId: 587862987

[xla:gpu] Add CUTLASS gemm benchmarks

50c4460

PiperOrigin-RevId: 587865806

Some more cleanup:

363c14b

1. Deduplicate the postprocessing code for dots and convs. 2. Combine the InferInputShardingForTopK function with GetInputSharding function, and get rid of an unused parameter in the later. PiperOrigin-RevId: 587868586

Refactor the concatenate functionality out of XlaConcatNDBaseOp into …

c909884

…a standalone utility PiperOrigin-RevId: 587869254

When printing a sharding group, also print the index of the correspon…

dbf1465

…ding HloValue in the producer instruction, if the producer instruction is a tuple. PiperOrigin-RevId: 587874472

Integrate StableHLO at openxla/stablehlo@57e5a4a5

1c2e66a

PiperOrigin-RevId: 587875856

Change seed for a test

e97f5e4

We got unlucky and hit a seed which happens to fail the KS test. PiperOrigin-RevId: 587885112

Import ragged_tensor.py in the ragged __init__.py file.

227a72e

PiperOrigin-RevId: 587889341

Add BufferDonor support to cpu_compiler.cc.

7e85121

PiperOrigin-RevId: 587893584

Let multi process runner re-raise SkipTest from sub-process.

1308427

PiperOrigin-RevId: 587903563

Merge pull request tensorflow#58903 from Intel-tensorflow:mabuzain/ha…

83abfdb

…rd-swish-fusion-fp32-bf16 PiperOrigin-RevId: 587906135

Clears the contents of 'solve_info' before computing the solution's f…

05c3107

…ingerprint; this contains information (like solver wall time) that can vary between runs. PiperOrigin-RevId: 587906345

[XLA] Improve the compile time and memory usage of while_loop_fusible…

1366242

…_sinking by doing a prepass to detect whether to construct a fusion. PiperOrigin-RevId: 587914972

jayfurmanek force-pushed the sync-231206 branch from 6f8c614 to d011573 Compare December 6, 2023 18:29

jayfurmanek and others added 3 commits December 12, 2023 19:17

Initial commit to resolve merge conflicts

4d21b57

Rename EagerOperation::FunctionDef() to `EagerOperation::GetFunctio…

d24bced

…nDef()`. Some compilers do not like using the name of a class as a method, which is fair enough. PiperOrigin-RevId: 588312567

draganmladjenovic force-pushed the sync-231206 branch from 2620692 to d24bced Compare December 12, 2023 19:19

draganmladjenovic self-requested a review December 15, 2023 11:51

draganmladjenovic approved these changes Dec 15, 2023

View reviewed changes

draganmladjenovic merged commit e4b2051 into develop-upstream Dec 15, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync 231206 #2321

Sync 231206 #2321

jayfurmanek commented Dec 6, 2023

draganmladjenovic commented Dec 12, 2023

draganmladjenovic commented Dec 13, 2023

Sync 231206 #2321

Sync 231206 #2321

Conversation

jayfurmanek commented Dec 6, 2023

draganmladjenovic commented Dec 12, 2023

draganmladjenovic commented Dec 13, 2023