Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop upstream sync 230717 #2160

Merged
merged 400 commits into from
Jul 19, 2023
Merged

Conversation

weihanmines
Copy link

Removed the API GetWindowedOutputSizeVerboseV2.

rickeylev and others added 30 commits July 11, 2023 12:35
PiperOrigin-RevId: 547264730
… executable

Setting `xla_gpu_cuda_graph_num_runs_to_instantiate` to a negative value will instantiate all CUDA graphs before executing the main function

PiperOrigin-RevId: 547275419
Updates LLVM usage to match
[be29fe2f987b](llvm/llvm-project@be29fe2f987b)

PiperOrigin-RevId: 547284303
…s cusparseLt. It performs C=C+A*B and assumes the input A,B, C and the output are dense arrays on the host. Pruning and compression will be done after the data are transfer ed to the device.

PiperOrigin-RevId: 547288029
PiperOrigin-RevId: 547305655
PiperOrigin-RevId: 547306878
PiperOrigin-RevId: 547310976
…etween buffer on_device_shape and expected execution shape.

Buffer on_device_shape may be dynamic. ShapeUtil::Equal will fail if it is dynamic. An alternative fix is to compare logical_on_device_shape. But getting logical_on_device_shape is blocking and may have performance impact.

PiperOrigin-RevId: 547327689
PiperOrigin-RevId: 547336766
It breaks FP8 gemms on Hopper

PiperOrigin-RevId: 547339624
- Fix the function to check if the multiply is effectively scalar as opposed to
  true scalar
- This fixes a regression in RS pattern matching caused by `ReshapeMover` pass,
  which pushes down the reshape in the offset computation generated by SPMD
  partitioner, causing the RS pattern matching to fail

PiperOrigin-RevId: 547340786
PiperOrigin-RevId: 547342340
Encoded args/rets create a lot of store instructions that LLVM tries to optimize very hard, but we don't really expect any optimizations to improve performance. By marking store instructions volatile we suppress most of the expensive LLVM optimizations.

PiperOrigin-RevId: 547359822
The input of C++ API ToLiteral can have a specific host layout (tile dimensions not supported right now). Add it to corresponding PJRT C API.

PiperOrigin-RevId: 547374250
PiperOrigin-RevId: 547383616
PiperOrigin-RevId: 547383747
This is a leftover from when we removed the handling for the special Softmax
custom call.

PiperOrigin-RevId: 547402693
Previously, we had duplicated functionality for the refinement of polymorphic shapes in refine_polymorphic_shapes (used from JAX) and xla_call_module_loader (used by tf.XlaCallModule).

We now consolidate and share this functionality in refine_polymorphic_shapes.
We move incorporate ValidateStaticShapes into RefinePolymorphicShapes.

This is in preparation for augmenting the refine_polymorphic_shapes with
shape assertion handling.

PiperOrigin-RevId: 547409633
tensorflower-gardener and others added 13 commits July 17, 2023 02:12
PiperOrigin-RevId: 548620777
…m and to an opaque tensor.

PiperOrigin-RevId: 548633760
…writer if the

CUDA compute capability is older than Ampere, since they result in unsupported
PTX instructions.

PiperOrigin-RevId: 548638356
…s an

effective scalar. This short-circuit avoids crashing within last_dimension when
attempting to match and either the operand or the result of the bitcast has a
shape with rank 0.

PiperOrigin-RevId: 548645429
…factor-threadpool

PiperOrigin-RevId: 548651797
… ptxas we are using.

This fixes a failure in the case that the user installs a new ptxas from the CUDA pip packages, but has an older nvlink installed system-wide that cannot understand the output of ptxas.

Fixes jax-ml/jax#16586

PiperOrigin-RevId: 548664235
This avoids that we run out of stack space if the number of stacked
computations is huge.

PiperOrigin-RevId: 548678213
@weihanmines
Copy link
Author

jenkins : retest Ubuntu-CPU please

@weihanmines
Copy link
Author

CPU FAILURE: �[31m�[1mFAILED:�[0m Build did NOT complete successfully
//tensorflow/compiler/xla/python/pjrt_ifrt:xla_sharding_serdes_test

@weihanmines
Copy link
Author

weihanmines commented Jul 18, 2023

Test output for //tensorflow/python/ops:nn_test_gpu:
Running test /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/nn_test_gpu on GPU 0
2023-07-17 22:41:18.972500: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Running tests under Python 3.10.9: /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/python_x86_64-unknown-linux-gnu/bin/python3
faulthandler.register(SIGTERM) failed AttributeError("module 'signal' has no attribute 'SIGTERM'"); ignoring.

@weihanmines
Copy link
Author

weihanmines commented Jul 18, 2023

Test output for //tensorflow/python/ops:nn_test_gpu:
Running test /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/nn_test_gpu on GPU 0
2023-07-17 22:41:18.972500: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Running tests under Python 3.10.9: /home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/python_x86_64-unknown-linux-gnu/bin/python3
faulthandler.register(SIGTERM) failed AttributeError("module 'signal' has no attribute 'SIGTERM'"); ignoring.
FAIL: testDropout_generator_no_0.1 (main.DropoutTest)
DropoutTest.testDropout_generator_no_0.1
testDropout_generator_no_0.1('generator', functools.partial(<function general_dropout at 0x7f842dc80f70>, uniform_sampler=<function at 0x7f8526d58f70>), 'no', 0.1)

Traceback (most recent call last):
File "/home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/absl_py/absl/testing/parameterized.py", line 316, in bound_param_test
return test_method(self, *testcase_params)
File "/home/jenkins/sharedspace/bazel-ci_build-cache/.cache/bazel/_bazel_jenkins/eab0d61a99b6696edb3d2aff87b585e8/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/ops/nn_test_gpu.runfiles/org_tensorflow/tensorflow/python/ops/nn_test.py", line 372, in testDropout
self.assertLess(rel_error, 0.15)
AssertionError: 0.16666666666666666 not less than 0.15


Ran 284 tests in 97.165s

FAILED (failures=1, skipped=38)

@weihanmines
Copy link
Author

//tensorflow/compiler/xla/service/gpu/tests:fused_slice_amdgcn.hlo.test:

@weihanmines
Copy link
Author

//tensorflow/compiler/xla/service/gpu/tests:dynamic_update_slice_inplace_amdgcn.hlo.test

@weihanmines
Copy link
Author

last commit fixes the fail unit test on CPU.

@i-chaochen
Copy link

i-chaochen commented Jul 18, 2023

2 failed unit tests will be fixed in next weekly-sync due to this PR https://github.com/openxla/xla/pull/4311/files

//tensorflow/compiler/xla/service/gpu/tests:dynamic_update_slice_inplace_amdgcn.hlo.test //tensorflow/compiler/xla/service/gpu/tests:fused_slice_amdgcn.hlo.test

@weihanmines
Copy link
Author

Thanks!

@weihanmines
Copy link
Author

weihanmines commented Jul 18, 2023

//tensorflow/compiler/tests:unary_ops_test_gpu FAILED
disabled for now

@i-chaochen
Copy link

Also, we will enable the graph flag at next weekly-sync due to this PR is going to merge openxla/xla#3960

@weihanmines weihanmines merged commit e721d6f into develop-upstream Jul 19, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.