Develop upstream sync 230717 #2160

weihanmines · 2023-07-17T19:41:56Z

Removed the API GetWindowedOutputSizeVerboseV2.

PiperOrigin-RevId: 547264730

PiperOrigin-RevId: 547272211

… executable Setting `xla_gpu_cuda_graph_num_runs_to_instantiate` to a negative value will instantiate all CUDA graphs before executing the main function PiperOrigin-RevId: 547275419

…exports.py. PiperOrigin-RevId: 547277744

Updates LLVM usage to match [be29fe2f987b](llvm/llvm-project@be29fe2f987b) PiperOrigin-RevId: 547284303

…_portable. PiperOrigin-RevId: 547287511

…s cusparseLt. It performs C=C+A*B and assumes the input A,B, C and the output are dense arrays on the host. Pruning and compression will be done after the data are transfer ed to the device. PiperOrigin-RevId: 547288029

http://github.com/tensorflow/runtime/commit/2311b85fed9d2a38619e0188a0eabcb3f1ef1b95. PiperOrigin-RevId: 547295729

PiperOrigin-RevId: 547300343

PiperOrigin-RevId: 547303330

PiperOrigin-RevId: 547305655

PiperOrigin-RevId: 547306800

PiperOrigin-RevId: 547306878

http://github.com/tensorflow/runtime/commit/a9afd3d20d538d81145a841ffdf20faf48dc69f8. PiperOrigin-RevId: 547310248

PiperOrigin-RevId: 547310976

…work/tensor.py. PiperOrigin-RevId: 547326089

…etween buffer on_device_shape and expected execution shape. Buffer on_device_shape may be dynamic. ShapeUtil::Equal will fail if it is dynamic. An alternative fix is to compare logical_on_device_shape. But getting logical_on_device_shape is blocking and may have performance impact. PiperOrigin-RevId: 547327689

…ing dimensions PiperOrigin-RevId: 547334655

PiperOrigin-RevId: 547336766

It breaks FP8 gemms on Hopper PiperOrigin-RevId: 547339624

- Fix the function to check if the multiply is effectively scalar as opposed to true scalar - This fixes a regression in RS pattern matching caused by `ReshapeMover` pass, which pushes down the reshape in the offset computation generated by SPMD partitioner, causing the RS pattern matching to fail PiperOrigin-RevId: 547340786

PiperOrigin-RevId: 547342340

Encoded args/rets create a lot of store instructions that LLVM tries to optimize very hard, but we don't really expect any optimizations to improve performance. By marking store instructions volatile we suppress most of the expensive LLVM optimizations. PiperOrigin-RevId: 547359822

The input of C++ API ToLiteral can have a specific host layout (tile dimensions not supported right now). Add it to corresponding PJRT C API. PiperOrigin-RevId: 547374250

PiperOrigin-RevId: 547375586

…lugin. PiperOrigin-RevId: 547380253

PiperOrigin-RevId: 547383616

PiperOrigin-RevId: 547383747

This is a leftover from when we removed the handling for the special Softmax custom call. PiperOrigin-RevId: 547402693

Previously, we had duplicated functionality for the refinement of polymorphic shapes in refine_polymorphic_shapes (used from JAX) and xla_call_module_loader (used by tf.XlaCallModule). We now consolidate and share this functionality in refine_polymorphic_shapes. We move incorporate ValidateStaticShapes into RefinePolymorphicShapes. This is in preparation for augmenting the refine_polymorphic_shapes with shape assertion handling. PiperOrigin-RevId: 547409633

PiperOrigin-RevId: 548620777

PiperOrigin-RevId: 548629503

…m and to an opaque tensor. PiperOrigin-RevId: 548633760

PiperOrigin-RevId: 548636217

…writer if the CUDA compute capability is older than Ampere, since they result in unsupported PTX instructions. PiperOrigin-RevId: 548638356

…s an effective scalar. This short-circuit avoids crashing within last_dimension when attempting to match and either the operand or the result of the bitcast has a shape with rank 0. PiperOrigin-RevId: 548645429

…factor-threadpool PiperOrigin-RevId: 548651797

PiperOrigin-RevId: 548658687

… ptxas we are using. This fixes a failure in the case that the user installs a new ptxas from the CUDA pip packages, but has an older nvlink installed system-wide that cannot understand the output of ptxas. Fixes jax-ml/jax#16586 PiperOrigin-RevId: 548664235

This avoids that we run out of stack space if the number of stacked computations is huge. PiperOrigin-RevId: 548678213

weihanmines · 2023-07-18T00:24:19Z

jenkins : retest Ubuntu-CPU please

weihanmines · 2023-07-18T01:39:11Z

CPU FAILURE: �[31m�[1mFAILED:�[0m Build did NOT complete successfully
//tensorflow/compiler/xla/python/pjrt_ifrt:xla_sharding_serdes_test

weihanmines · 2023-07-18T01:39:36Z

Test output for //tensorflow/python/ops:nn_test_gpu:
Running test /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/nn_test_gpu on GPU 0
2023-07-17 22:41:18.972500: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Running tests under Python 3.10.9: /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/python_x86_64-unknown-linux-gnu/bin/python3
faulthandler.register(SIGTERM) failed AttributeError("module 'signal' has no attribute 'SIGTERM'"); ignoring.

weihanmines · 2023-07-18T01:45:03Z

Test output for //tensorflow/python/ops:nn_test_gpu:
Running test /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/nn_test_gpu on GPU 0
2023-07-17 22:41:18.972500: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Running tests under Python 3.10.9: /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/python_x86_64-unknown-linux-gnu/bin/python3
faulthandler.register(SIGTERM) failed AttributeError("module 'signal' has no attribute 'SIGTERM'"); ignoring.
FAIL: testDropout_generator_no_0.1 (main.DropoutTest)
DropoutTest.testDropout_generator_no_0.1
testDropout_generator_no_0.1('generator', functools.partial(<function general_dropout at 0x7f842dc80f70>, uniform_sampler=<function at 0x7f8526d58f70>), 'no', 0.1)

Traceback (most recent call last):
File "/home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 316, in bound_param_test
return test_method(self, *testcase_params)
File "/home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/nn_test.py", line 372, in testDropout
self.assertLess(rel_error, 0.15)
AssertionError: 0.16666666666666666 not less than 0.15

Ran 284 tests in 97.165s

FAILED (failures=1, skipped=38)

weihanmines · 2023-07-18T01:45:59Z

//tensorflow/compiler/xla/service/gpu/tests:fused_slice_amdgcn.hlo.test:

weihanmines · 2023-07-18T01:46:20Z

//tensorflow/compiler/xla/service/gpu/tests:dynamic_update_slice_inplace_amdgcn.hlo.test

weihanmines · 2023-07-18T02:45:32Z

last commit fixes the fail unit test on CPU.

i-chaochen · 2023-07-18T11:14:12Z

2 failed unit tests will be fixed in next weekly-sync due to this PR https://github.com/openxla/xla/pull/4311/files

//tensorflow/compiler/xla/service/gpu/tests:dynamic_update_slice_inplace_amdgcn.hlo.test //tensorflow/compiler/xla/service/gpu/tests:fused_slice_amdgcn.hlo.test

weihanmines · 2023-07-18T14:28:02Z

Thanks!

weihanmines · 2023-07-18T14:32:54Z

//tensorflow/compiler/tests:unary_ops_test_gpu FAILED
disabled for now

i-chaochen · 2023-07-19T11:16:02Z

Also, we will enable the graph flag at next weekly-sync due to this PR is going to merge openxla/xla#3960

rickeylev and others added 30 commits July 11, 2023 12:35

Internal Code Change

ad14d68

PiperOrigin-RevId: 547264730

Ensure scatter_dims_to_operand_dims is not an out of range index.

138866e

PiperOrigin-RevId: 547272211

[xla:gpu] Add an option to instantiate all CUDA graphs before running…

2b1cf8a

… executable Setting `xla_gpu_cuda_graph_num_runs_to_instantiate` to a negative value will instantiate all CUDA graphs before executing the main function PiperOrigin-RevId: 547275419

Move compiler imports from python/__init__.py to python/modules_with_…

9af07b5

…exports.py. PiperOrigin-RevId: 547277744

Integrate LLVM at llvm/llvm-project@be29fe2f987b

bfe68b5

Updates LLVM usage to match [be29fe2f987b](llvm/llvm-project@be29fe2f987b) PiperOrigin-RevId: 547284303

[NFC] Change uses of get_compatible_with_cloud to get_compatible_with…

89cae4a

…_portable. PiperOrigin-RevId: 547287511

Update TFRT dependency to use revision

57dd47d

http://github.com/tensorflow/runtime/commit/2311b85fed9d2a38619e0188a0eabcb3f1ef1b95. PiperOrigin-RevId: 547295729

[TF:PJRT] Returns an error if the compilation result is TensorList.

e449b10

PiperOrigin-RevId: 547300343

Enable to set fdo_profile through XLA python client.

f687e26

PiperOrigin-RevId: 547303330

Fix typo in comment

d1e3c9c

PiperOrigin-RevId: 547305655

Disable test that breaks tensorflow.gpu.pascal

abae3ee

PiperOrigin-RevId: 547306800

Update rules_python version to 0.23.1

930a384

PiperOrigin-RevId: 547306878

Update TFRT dependency to use revision

6229f74

http://github.com/tensorflow/runtime/commit/a9afd3d20d538d81145a841ffdf20faf48dc69f8. PiperOrigin-RevId: 547310248

[XLA:GPU] Rollback cl/547196631.

eff4a80

PiperOrigin-RevId: 547310976

Update ops.Tensor references to //third_party/tensorflow/python/frame…

6df4c01

…work/tensor.py. PiperOrigin-RevId: 547326089

[XLA] Extended associative reordering to work with arbitrary contract…

e713567

…ing dimensions PiperOrigin-RevId: 547334655

Internal Code Change

e87d1c9

PiperOrigin-RevId: 547336766

Rollback 8b116e2.

41f4706

It breaks FP8 gemms on Hopper PiperOrigin-RevId: 547339624

Benchmarks from MLIR for tfl tensorlists

6c7ebc9

PiperOrigin-RevId: 547342340

[PJRT C API] Add host_layout to ToHostBufferArg.

221af6a

The input of C++ API ToLiteral can have a specific host layout (tile dimensions not supported right now). Add it to corresponding PJRT C API. PiperOrigin-RevId: 547374250

Add an e2e test for a SAX model with streaming using TFRT

ee4da7b

PiperOrigin-RevId: 547375586

[PJRT C API] Support passing allow_devices as an option in PJRT GPU p…

2821ab8

…lugin. PiperOrigin-RevId: 547380253

Internal Code Change

e07f254

PiperOrigin-RevId: 547383616

Internal Code Change

61df417

PiperOrigin-RevId: 547383747

Remove unused include (NFC)

bc3e83d

This is a leftover from when we removed the handling for the special Softmax custom call. PiperOrigin-RevId: 547402693

tensorflower-gardener and others added 13 commits July 17, 2023 02:12

Update GraphDef version to 1560.

5848201

PiperOrigin-RevId: 548620777

Support all fusion kinds except Triton in GetLaunchDimensions.

396f8ab

PiperOrigin-RevId: 548629503

Extend c_api_opaque to support reading and writing strings values fro…

1a66ca3

…m and to an opaque tensor. PiperOrigin-RevId: 548633760

[XLA:GPU] Switch Triton GEMM to block pointers.

fa791fc

PiperOrigin-RevId: 548636217

[XLA:GPU] Prevent matching converts from/to bf16 in Triton Softmax re…

76d0af9

…writer if the CUDA compute capability is older than Ampere, since they result in unsupported PTX instructions. PiperOrigin-RevId: 548638356

[XLA:GPU] Handle edge case in Triton Softmax rewriter where bitcast i…

e1ad3b7

…s an effective scalar. This short-circuit avoids crashing within last_dimension when attempting to match and either the operand or the result of the bitcast has a shape with rank 0. PiperOrigin-RevId: 548645429

Merge pull request tensorflow#61236 from Intel-tensorflow:agramesh/re…

65b5bc5

…factor-threadpool PiperOrigin-RevId: 548651797

[XLA:GPU] Flip default for --xla_gpu_enable_triton_softmax_fusion flag.

8c1827c

PiperOrigin-RevId: 548658687

Compute MakeEmbeddedComputationsList() iteratively.

26ad97d

This avoids that we run out of stack space if the number of stacked computations is huge. PiperOrigin-RevId: 548678213

weekly sync 230717 before solving conflicts

e790a98

weekly sync 230717 after solving conflicts

d803377

fixed API changes in a few places

44ec8a3

weihanmines requested review from i-chaochen and jayfurmanek July 17, 2023 19:41

attemp to fix xla_sharding_serdes_test failure

4d7ba9d

disable hlo-llvm ir tests and unaray op gpu test

36719ed

i-chaochen approved these changes Jul 18, 2023

View reviewed changes

weihanmines merged commit e721d6f into develop-upstream Jul 19, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop upstream sync 230717 #2160

Develop upstream sync 230717 #2160

weihanmines commented Jul 17, 2023

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023 •

edited

Loading

weihanmines commented Jul 18, 2023 •

edited

Loading

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023

i-chaochen commented Jul 18, 2023 •

edited

Loading

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023 •

edited

Loading

i-chaochen commented Jul 19, 2023

Develop upstream sync 230717 #2160

Develop upstream sync 230717 #2160

Conversation

weihanmines commented Jul 17, 2023

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023 • edited Loading

weihanmines commented Jul 18, 2023 • edited Loading

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023

i-chaochen commented Jul 18, 2023 • edited Loading

weihanmines commented Jul 18, 2023

weihanmines commented Jul 18, 2023 • edited Loading

i-chaochen commented Jul 19, 2023

weihanmines commented Jul 18, 2023 •

edited

Loading

weihanmines commented Jul 18, 2023 •

edited

Loading

i-chaochen commented Jul 18, 2023 •

edited

Loading

weihanmines commented Jul 18, 2023 •

edited

Loading