Develop upstream sync 230731 #2170

weihanmines · 2023-08-01T01:39:02Z

No description provided.

…in global or local view. If the attribute is set on a CallOp, then verification logic converts the programs arguments and results from local view to global view to verify that local view shape + sharding is equivalent to the expected global view shape. PiperOrigin-RevId: 551222813

… wheel. PiperOrigin-RevId: 551224665

Updates LLVM usage to match [365d6eb1f7d8](llvm/llvm-project@365d6eb1f7d8) PiperOrigin-RevId: 551229328

…ion/configuration out of experimental. PiperOrigin-RevId: 551235514

PiperOrigin-RevId: 551238120

Also fix typo in SetAllowBufferHandleOutput comment: false->true. Also fix #include order to match style guide. PiperOrigin-RevId: 551247708

PiperOrigin-RevId: 551261650

PiperOrigin-RevId: 551275563

PiperOrigin-RevId: 551276374

`TF_STATUS_ASSIGN_OR_RETURN` and `TF_STATUS_RETURN_IF_ERROR` PiperOrigin-RevId: 551278625

PiperOrigin-RevId: 551290442

PiperOrigin-RevId: 551295403

…_heuristics PiperOrigin-RevId: 551297374

This CL will add patterns to fold Transpose and FC to covert into a BMM, like below- FC(lhs, Transpose(rhs)) -> BMM(lha, rhs, false, false) The right thing to do in this pattern will be to apply the pattern only if keep_num_dims==True. Because, if the output rank is less-than the input rank, it means `keep_num_dims` has reduced the output. But checking for rank will improve the coverage. This pattern will now work PiperOrigin-RevId: 551297769

PiperOrigin-RevId: 551313932

To improve debuggability, we want the shape refinement to make as few changes as possible to the module. In this change we remove one use of inlining. PiperOrigin-RevId: 551325242

PiperOrigin-RevId: 551347216

PiperOrigin-RevId: 551353292

…ac compiler error Apparently ssize_t is only a long sometimes (at least 32-bit), instead of long long (at least 64-bit). I don't have a mac so I can't repro the failing build, but hopefully this fixes it based on the error message. PiperOrigin-RevId: 551376003

PiperOrigin-RevId: 551401683

PiperOrigin-RevId: 551408554

PiperOrigin-RevId: 551410772

PiperOrigin-RevId: 551413305

The BFS algorithm didn't have a visited set, therefore had a complexity of O(N*E). PiperOrigin-RevId: 551414282

…sion pattern. This change implements a conversion pattern that converts stablehlo.convolution to tfl.conv_2d. This is a minimal version that converts quantized `stablehlo.convolution` with certain assumptions like that the filter has the format of `[0, 1, i, o]`. PiperOrigin-RevId: 551419638

…ll on SplitShardingDimension. PiperOrigin-RevId: 551438695

This is in preparation for another change improving the state of copies in while loops. PiperOrigin-RevId: 551451818

PiperOrigin-RevId: 551452964

…end-Recv sequence. This is to prevent the latency hiding scheduler to interleave two Send-Recv sequences. PiperOrigin-RevId: 552621536

PiperOrigin-RevId: 552621643

InitializeCreateGcsFileSystemFnPtr is a temporary fix and it is no longer needed. PiperOrigin-RevId: 552624923

…nalysis PiperOrigin-RevId: 552625083

PiperOrigin-RevId: 552626000

PiperOrigin-RevId: 552631765

This removes some unnecessary `cuDeviceGetCount()` calls when custom ops are used. PiperOrigin-RevId: 552634342

…c in tf.constant according to auto dtype conversion semantics. WeakTensor is created if it satisfies both of the following conditions: 1. tf.constant is called with no dtype arg specified. 2. Input is a nested Python type. PiperOrigin-RevId: 552634845

…mul. PiperOrigin-RevId: 552636662

…nd 0 of a gather, assume that the sharding of that operand does not matter. PiperOrigin-RevId: 552637713

i-chaochen · 2023-08-01T11:22:29Z

Since I did the cover here https://github.com/openxla/xla/pull/4603/files#diff-fc02eb6aea06ad0d72011e9a64da28e8c6fae000a9e0cab9b15b40b4e914af4aR956-R958

could you try to enable triton-softmax flag to have a go, please?
da2cefb#diff-04cb485c1774fda54f8346ece0ade9efbcd813235768905b234d596c22660cf6R170

we need to see are any other unit tests affected on that flag.

i-chaochen

Since gpu.graph PR is merged, you should enable xla_graph_level = 1

weihanmines · 2023-08-01T16:54:18Z

Since gpu.graph PR is merged, you should enable xla_graph_level = 1

turn it on in the latest commit.

weihanmines · 2023-08-01T16:54:43Z

Since I did the cover here https://github.com/openxla/xla/pull/4603/files#diff-fc02eb6aea06ad0d72011e9a64da28e8c6fae000a9e0cab9b15b40b4e914af4aR956-R958

could you try to enable triton-softmax flag to have a go, please? da2cefb#diff-04cb485c1774fda54f8346ece0ade9efbcd813235768905b234d596c22660cf6R170

we need to see are any other unit tests affected on that flag.

turn it on in the latest commit.

weihanmines · 2023-08-03T02:06:45Z

Jenkins: retest Ubuntu-GPU-single please

i-chaochen · 2023-08-03T09:28:05Z

tensorflow/compiler/xla/service/gpu/BUILD

This is added in tensorflow@56f261b we need a ticket to track this test

tensorflow/compiler/xla/service/gpu/BUILD

tensorflower-gardener and others added 30 commits July 26, 2023 09:08

Add python and numpy headers to the local_config_python folder in the…

a034b3d

… wheel. PiperOrigin-RevId: 551224665

Integrate LLVM at llvm/llvm-project@365d6eb1f7d8

daa9a34

Updates LLVM usage to match [365d6eb1f7d8](llvm/llvm-project@365d6eb1f7d8) PiperOrigin-RevId: 551229328

Update sample_stable_delegate for promotion of experimental/accelerat…

f178576

…ion/configuration out of experimental. PiperOrigin-RevId: 551235514

Increase the memory limit for the dtensor GPU test.

0ef963c

PiperOrigin-RevId: 551238120

Remove trigraph

0321ee1

Remove trigraph

cc5aa34

Remove unnecessary 'const' from pass-by-value function parameters.

7ba36c1

Also fix typo in SetAllowBufferHandleOutput comment: false->true. Also fix #include order to match style guide. PiperOrigin-RevId: 551247708

Internal visibility change only.

d9ec8c5

PiperOrigin-RevId: 551261650

special allocations' aggregated metrics need to consider memory color.

cf4afb6

PiperOrigin-RevId: 551275563

deprecate instruction name, it is changed over 1 years ago.

7f8be6e

PiperOrigin-RevId: 551276374

Add macros for working with TF_Status in C++ code

0cc2c30

`TF_STATUS_ASSIGN_OR_RETURN` and `TF_STATUS_RETURN_IF_ERROR` PiperOrigin-RevId: 551278625

Correct the device assignment for tf._XlaCompile

3874ea2

PiperOrigin-RevId: 551290442

#tf-data-service Graduate "data_transfer" experiment.

69541bb

PiperOrigin-RevId: 551295403

Merge pull request tensorflow#60026 from milpuz01:node_rewrite_to_mkl…

2d04d5c

…_heuristics PiperOrigin-RevId: 551297374

Merge pull request tensorflow#61176 from tensorflow:pjpratik-patch-6

f7e9b91

PiperOrigin-RevId: 551313932

Remove one use of inlining in XlaCallModule shape refinement.

f9e2045

To improve debuggability, we want the shape refinement to make as few changes as possible to the module. In this change we remove one use of inlining. PiperOrigin-RevId: 551325242

Merge consecutive Pad operators

fe33928

PiperOrigin-RevId: 551347216

update internal files for release

1c26b1c

PiperOrigin-RevId: 551353292

Added a workaround for broadcast.

6021733

PiperOrigin-RevId: 551401683

[PJRT] Add PjRtDevice::PoisonExecution.

54eeb2a

PiperOrigin-RevId: 551408554

[IFRT] Update ShardingParam to also support scalars.

333bb69

PiperOrigin-RevId: 551410772

Merge pull request tensorflow#61384 from elfringham:fewer_jobs

60574d4

PiperOrigin-RevId: 551413305

[xla:gpu] Fix the BFS algorithm in dataflow analysis

b03ea08

The BFS algorithm didn't have a visited set, therefore had a complexity of O(N*E). PiperOrigin-RevId: 551414282

[XLA] Update PatternMatchUnmergeSharding to avoid illegal function ca…

7d50115

…ll on SplitShardingDimension. PiperOrigin-RevId: 551438695

[XLA:GPU] Enable using region analysis in the CopyInsertion pass.

9b2016c

This is in preparation for another change improving the state of copies in while loops. PiperOrigin-RevId: 551451818

Add a tflite model containing tensors that store string data.

e5e0228

PiperOrigin-RevId: 551452964

bixia1 and others added 12 commits July 31, 2023 16:21

[xla] Make Send a control predecessor of Recv-done in the generated S…

bc68c98

…end-Recv sequence. This is to prevent the latency hiding scheduler to interleave two Send-Recv sequences. PiperOrigin-RevId: 552621536

Fixed conflict in graph_execution_options wrapper.

cd1cdc8

PiperOrigin-RevId: 552621643

Remove setting up GCS in FindAndLoadTpuLibrary.

3c4627f

InitializeCreateGcsFileSystemFnPtr is a temporary fix and it is no longer needed. PiperOrigin-RevId: 552624923

[XLA:GPU] Update intercept check of DUS and Copy in LiveRangeRegion A…

73a54a8

…nalysis PiperOrigin-RevId: 552625083

Merge pull request tensorflow#61428 from elfringham:fix_xla_lit

0eed700

PiperOrigin-RevId: 552626000

Update Rendezvous API to not depend on :tf_status

b769fc4

PiperOrigin-RevId: 552631765

Add a device count cache to CudaPlatform.

67dbc78

This removes some unnecessary `cuDeviceGetCount()` calls when custom ops are used. PiperOrigin-RevId: 552634342

Creates an optimization to fuse transpose and reshape into batch_mat…

9d0fea2

…mul. PiperOrigin-RevId: 552636662

When sharding propagation does not return an input sharding for opera…

d1280be

…nd 0 of a gather, assume that the sharding of that operand does not matter. PiperOrigin-RevId: 552637713

weekly sync 230731 before solving conflicts

e1de22b

weekly sync 230731 after solving conflicts

da2cefb

weihanmines requested review from i-chaochen and jayfurmanek August 1, 2023 01:40

remove intel's rules

24ddcca

i-chaochen reviewed Aug 1, 2023

View reviewed changes

turn on gpu graph and triton softmax fusion

110b0d8

disable float_support_test which targets for sm80

0b22eac

weihanmines force-pushed the develop-upstream-sync-230731 branch from e1a2099 to 5ec1e3e Compare August 2, 2023 00:13

fix the tag issue

3131c3a

weihanmines force-pushed the develop-upstream-sync-230731 branch from 5ec1e3e to 3131c3a Compare August 2, 2023 04:00

turn off hlo_op_profiler_test for now

6dd08fb

i-chaochen reviewed Aug 3, 2023

View reviewed changes

tensorflow/compiler/xla/service/gpu/BUILD Show resolved Hide resolved

jayfurmanek approved these changes Aug 7, 2023

View reviewed changes

weihanmines merged commit 7a70b16 into develop-upstream Aug 7, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop upstream sync 230731 #2170

Develop upstream sync 230731 #2170

weihanmines commented Aug 1, 2023

i-chaochen commented Aug 1, 2023

i-chaochen left a comment

weihanmines commented Aug 1, 2023

weihanmines commented Aug 1, 2023

weihanmines commented Aug 3, 2023

i-chaochen Aug 3, 2023

weihanmines Aug 3, 2023

Develop upstream sync 230731 #2170

Develop upstream sync 230731 #2170

Conversation

weihanmines commented Aug 1, 2023

i-chaochen commented Aug 1, 2023

i-chaochen left a comment

Choose a reason for hiding this comment

weihanmines commented Aug 1, 2023

weihanmines commented Aug 1, 2023

weihanmines commented Aug 3, 2023

i-chaochen Aug 3, 2023

Choose a reason for hiding this comment

weihanmines Aug 3, 2023

Choose a reason for hiding this comment