Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop upstream sync 230731 #2169

Closed
wants to merge 517 commits into from

Conversation

weihanmines
Copy link

No description provided.

daniel-lang and others added 30 commits July 26, 2023 13:16
Similiar to tensorflow#58677, the capitalization of FlatBuffers needs to match.
Otherwise using TFLite via find_package() will fail to find FlatBuffers.
…ednn-3.0-final

PiperOrigin-RevId: 551186750
Create reduction_utils for utils related to reduction codegen.
Move some functions to gpu_fusible.

PiperOrigin-RevId: 551187475
As of CUDA 12.2 additional input validation allows NULL for the row offsets
only when rows=0.
…in global or local view.

If the attribute is set on a CallOp, then verification logic converts the programs arguments and results from local view to global view to verify that local view shape + sharding is equivalent to the expected global view shape.

PiperOrigin-RevId: 551222813
Updates LLVM usage to match
[365d6eb1f7d8](llvm/llvm-project@365d6eb1f7d8)

PiperOrigin-RevId: 551229328
…ion/configuration out of experimental.

PiperOrigin-RevId: 551235514
Also fix typo in SetAllowBufferHandleOutput comment: false->true.
Also fix #include order to match style guide.

PiperOrigin-RevId: 551247708
PiperOrigin-RevId: 551261650
`TF_STATUS_ASSIGN_OR_RETURN` and `TF_STATUS_RETURN_IF_ERROR`

PiperOrigin-RevId: 551278625
…_heuristics

PiperOrigin-RevId: 551297374
This CL will add patterns to fold Transpose and FC to covert into a BMM, like below-

FC(lhs, Transpose(rhs)) -> BMM(lha, rhs, false, false)

The right thing to do in this pattern will be to apply the pattern only if keep_num_dims==True. Because, if the output rank is less-than the input rank, it means `keep_num_dims` has reduced the output. But checking for rank will improve the coverage. This pattern will now work

PiperOrigin-RevId: 551297769
To improve debuggability, we want the shape refinement to make as few changes as possible to the module. In this change we remove one use of inlining.

PiperOrigin-RevId: 551325242
PiperOrigin-RevId: 551347216
PiperOrigin-RevId: 551353292
…ac compiler error

Apparently ssize_t is only a long sometimes (at least 32-bit), instead
of long long (at least 64-bit). I don't have a mac so I can't repro
the failing build, but hopefully this fixes it based on the error
message.

PiperOrigin-RevId: 551376003
PiperOrigin-RevId: 551401683
PiperOrigin-RevId: 551408554
ezhulenev and others added 27 commits July 31, 2023 11:26
…memcpy API call

hlo.sort operation compiled to a memcpy + a sequence of device kernel launches

PiperOrigin-RevId: 552539521
PiperOrigin-RevId: 552543030
Use memref descriptor to get offset if we do not know it at compile time.

PiperOrigin-RevId: 552554429
A total of three new ops are added: Mul, Equal, and While. The control flow op works for one float32 input only.

PiperOrigin-RevId: 552571044
…fixes two things:

1. when compile & execute the program, set the option properly for multi-partition.
2. Use a constant launch_id for TF to align with previous non-PJRT implementation.

PiperOrigin-RevId: 552577435
This is needed by the DUCC FFT library in order to use `tsl::condition_variable`
as a direct replacement for `std::condition_variable`.

PiperOrigin-RevId: 552595622
- Add an option to provide XLA the device memory limit to use
- Plumb that to HloModuleConfig through different objects

PiperOrigin-RevId: 552596103
…end-Recv

sequence.

This is to prevent the latency hiding scheduler to interleave two Send-Recv
sequences.

PiperOrigin-RevId: 552621536
InitializeCreateGcsFileSystemFnPtr is a temporary fix and it is no longer needed.

PiperOrigin-RevId: 552624923
This removes some unnecessary `cuDeviceGetCount()` calls when custom ops are used.

PiperOrigin-RevId: 552634342
…c in tf.constant according to auto dtype conversion semantics.

WeakTensor is created if it satisfies both of the following conditions:
1. tf.constant is called with no dtype arg specified.
2. Input is a nested Python type.

PiperOrigin-RevId: 552634845
…nd 0 of a gather, assume that the sharding of that operand does not matter.

PiperOrigin-RevId: 552637713
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.