-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop upstream sync 230717 #2160
Conversation
PiperOrigin-RevId: 547264730
PiperOrigin-RevId: 547272211
… executable Setting `xla_gpu_cuda_graph_num_runs_to_instantiate` to a negative value will instantiate all CUDA graphs before executing the main function PiperOrigin-RevId: 547275419
…exports.py. PiperOrigin-RevId: 547277744
Updates LLVM usage to match [be29fe2f987b](llvm/llvm-project@be29fe2f987b) PiperOrigin-RevId: 547284303
…_portable. PiperOrigin-RevId: 547287511
…s cusparseLt. It performs C=C+A*B and assumes the input A,B, C and the output are dense arrays on the host. Pruning and compression will be done after the data are transfer ed to the device. PiperOrigin-RevId: 547288029
PiperOrigin-RevId: 547300343
PiperOrigin-RevId: 547303330
PiperOrigin-RevId: 547305655
PiperOrigin-RevId: 547306800
PiperOrigin-RevId: 547306878
PiperOrigin-RevId: 547310976
…work/tensor.py. PiperOrigin-RevId: 547326089
…etween buffer on_device_shape and expected execution shape. Buffer on_device_shape may be dynamic. ShapeUtil::Equal will fail if it is dynamic. An alternative fix is to compare logical_on_device_shape. But getting logical_on_device_shape is blocking and may have performance impact. PiperOrigin-RevId: 547327689
…ing dimensions PiperOrigin-RevId: 547334655
PiperOrigin-RevId: 547336766
- Fix the function to check if the multiply is effectively scalar as opposed to true scalar - This fixes a regression in RS pattern matching caused by `ReshapeMover` pass, which pushes down the reshape in the offset computation generated by SPMD partitioner, causing the RS pattern matching to fail PiperOrigin-RevId: 547340786
PiperOrigin-RevId: 547342340
Encoded args/rets create a lot of store instructions that LLVM tries to optimize very hard, but we don't really expect any optimizations to improve performance. By marking store instructions volatile we suppress most of the expensive LLVM optimizations. PiperOrigin-RevId: 547359822
The input of C++ API ToLiteral can have a specific host layout (tile dimensions not supported right now). Add it to corresponding PJRT C API. PiperOrigin-RevId: 547374250
PiperOrigin-RevId: 547375586
…lugin. PiperOrigin-RevId: 547380253
PiperOrigin-RevId: 547383616
PiperOrigin-RevId: 547383747
This is a leftover from when we removed the handling for the special Softmax custom call. PiperOrigin-RevId: 547402693
Previously, we had duplicated functionality for the refinement of polymorphic shapes in refine_polymorphic_shapes (used from JAX) and xla_call_module_loader (used by tf.XlaCallModule). We now consolidate and share this functionality in refine_polymorphic_shapes. We move incorporate ValidateStaticShapes into RefinePolymorphicShapes. This is in preparation for augmenting the refine_polymorphic_shapes with shape assertion handling. PiperOrigin-RevId: 547409633
PiperOrigin-RevId: 548620777
PiperOrigin-RevId: 548629503
…m and to an opaque tensor. PiperOrigin-RevId: 548633760
PiperOrigin-RevId: 548636217
…writer if the CUDA compute capability is older than Ampere, since they result in unsupported PTX instructions. PiperOrigin-RevId: 548638356
…s an effective scalar. This short-circuit avoids crashing within last_dimension when attempting to match and either the operand or the result of the bitcast has a shape with rank 0. PiperOrigin-RevId: 548645429
…factor-threadpool PiperOrigin-RevId: 548651797
PiperOrigin-RevId: 548658687
… ptxas we are using. This fixes a failure in the case that the user installs a new ptxas from the CUDA pip packages, but has an older nvlink installed system-wide that cannot understand the output of ptxas. Fixes jax-ml/jax#16586 PiperOrigin-RevId: 548664235
This avoids that we run out of stack space if the number of stacked computations is huge. PiperOrigin-RevId: 548678213
jenkins : retest Ubuntu-CPU please |
CPU FAILURE: �[31m�[1mFAILED:�[0m Build did NOT complete successfully |
Test output for //tensorflow/python/ops:nn_test_gpu: |
Test output for //tensorflow/python/ops:nn_test_gpu: Traceback (most recent call last): Ran 284 tests in 97.165s FAILED (failures=1, skipped=38) |
//tensorflow/compiler/xla/service/gpu/tests:fused_slice_amdgcn.hlo.test: |
//tensorflow/compiler/xla/service/gpu/tests:dynamic_update_slice_inplace_amdgcn.hlo.test |
last commit fixes the fail unit test on CPU. |
2 failed unit tests will be fixed in next weekly-sync due to this PR https://github.com/openxla/xla/pull/4311/files //tensorflow/compiler/xla/service/gpu/tests:dynamic_update_slice_inplace_amdgcn.hlo.test //tensorflow/compiler/xla/service/gpu/tests:fused_slice_amdgcn.hlo.test |
Thanks! |
//tensorflow/compiler/tests:unary_ops_test_gpu FAILED |
Also, we will enable the graph flag at next weekly-sync due to this PR is going to merge openxla/xla#3960 |
Removed the API GetWindowedOutputSizeVerboseV2.