Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

Merged
merged 172 commits into from
Sep 2, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

RKSimon and others added 30 commits May 11, 2024 12:50
…tCFInstrCost implementations.

We were using the default implementations instead of the CRTP versions.
…r each i1 mask element

These can nearly always be folded into the existing cost of the branch, and brings the throughput costs of the scalarised gather/scatter code much closer to the llvm-mca/uica estimates
Being able to add custom dialects is one of the big missing pieces of
the C API. This change should make it achievable via IRDL. Hopefully
this should open custom dialect definition to non-C++ users of MLIR.
Previously, isRoot() would return true for pointers with a base
of sizeof(InlineDescriptor), even if the actual metadata size of the
pointee was 0.
…lvm#91844)

I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  24 under clang/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
…trieve

source location in `ConvertConstructorToDeductionGuideTransform`.

The commit fec4716 was reverted by accident in 7415524.

Reland it with a testcase.
…llvm#91738)

There is a follow-up commit for llvm#90319. The Windows test was disabled in
that commit, but it should pass on this operating system. Therefore, it
would be beneficial to have it enabled for MS Windows.
Avoid using bitfield in dxbc::ProgramHeader.
It could potentially be read incorrectly on any host depending on the
compiler.

From [C++17's
[class.bit]](https://timsong-cpp.github.io/cppwp/n4659/class.bit#1)
> Bit-fields are packed into some addressable allocation unit. [ Note:
Bit-fields straddle allocation units on some machines and not on others.
Bit-fields are assigned right-to-left on some machines, left-to-right on
others.  — end note ]

For llvm#91793
Fix race condition in internal NFC test.
PR llvm#87090 amended `accumulateBitfields` to do the correct clipping. The scissor is no longer necessary and  `checkBitfieldClipping` can compute its location directly when needed.
…OpCost directly

The generic getCommonMaskedMemoryOpCost now gives the same cost estimates for scalarized gather/scatter.
getGSVectorCost has supported other TargetCostKind since a551272
…#91807)

The pass runs a `DataFlowSolver` and collects state information on the
input IR. Then, the rewrite driver and folding is applied. During
pattern application and folding it can happen that an Op from the input
IR is deleted and a new Op is created at the same address. When the
newly created Ops is looked up in the `DataFlowSolver` state memory, the
state of the original Op is returned.

This patch adds a method to `DataFlowSolver` which removes all state
related to a `ProgramPoint`. It also adds a listener to the Pass which
clears the state information of deleted Ops from the `DataFlowSolver`.

Fix llvm#81228
…ggregate initialization using a default member initializer (llvm#87933)

This PR complete [DR1815](https://wg21.link/CWG1815) under the guidance
of `FIXME` comments. And reuse `CXXDefaultInitExpr` rewrite machinery to
clone the initializer expression on each use that would lifetime extend
its temporaries.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
)

This is probably the most involved addition, as it tries to make use of
isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a
number of different intrinsics that are all lane-wise. Additional tests have
been added for some of the different intrinsics from
isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
… YAML

Fix an issue where the profile for all branches that have a BRANCHENTRY
is dropped. If the branch has an entry in BAT, it will be translated to
its input offset. We used to only permit the basic block offset as a
branch source. Perform a lookup of containing basic block instead.

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: maksfb, dcci, rafaelauler, ayermolo

Reviewed By: maksfb

Pull Request: llvm#91273
…st (llvm#89170)

This patch made following changes:
1. Support ISD FDIV/UDIV/SDIV/UREM/SREM
2. Classify instructions which cost the same
…m#90995)

GVNSink used to order instructions based on their pointer values and was
prone to non-determinism because of that.
This patch ensures all the values stored are using a deterministic
order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to
assert when ordering isn't preserved.

Additionally, I have added a test case (mirror graph image of an
existing test) that would have failed before this patch.

Fixes: llvm#77852
…icInstrCost (llvm#89170)

Insert a break to fix the implicit-fallthrough caught by sanitizer.

Original commit message:

This patch made following changes:
1. Support ISD FDIV/UDIV/SDIV/UREM/SREM
2. Classify instructions which cost the same
fhahn and others added 24 commits May 13, 2024 20:44
This avoids 'Permission denied' when PWD is read-only.
While here, change the triple from a Linux one to a generic ELF one.
…s in RISCVRegisterInfo::needsFrameBaseReg.

Instead of using getReservedRegs, just check the subtarget reserved
list. getReservedRegs considers the frame pointer to be reserved when
it is being used, but we do need to save/restore it so it should be
counted as a callee saved register. AArch64 hardcodes their callee
saved size, but the comment mentions the Frame Pointer being counted.
…:needsFrameBaseReg

The vector callee saved registers shouldn't affect the frame pointer
offset so we don't want to consider them.

I've listed the GPR, FPR32, and FPR64 register classes explicitly
because getMinimalPhysRegClass is slow and this function is called
frequently. So explicitly listing the interesting classs should be
a compile time improvement.
The testing we have for vector ptradd was a bit lacking. In adding tests
this patch found a couple of issues mostly with the way v3 vectors of
ptrs were sometimes legalized via i64, and with non-i64 additions. It
does not attempt to fix the issue with mergevalues from returning vector
ptrs.
…2016)

This is a proof of concept recognition of the most basic forms of ReLu
operations, used to show-case sparsification of end-to-end PyTorch
models. In the long run, we must avoid lowering such constructs too
early (with this need for raising them back).

See discussion at

https://discourse.llvm.org/t/min-max-abs-relu-recognition-starter-project/78918
Switch from FuncBranchData intermediate maps (Intra/InterIndex)
to aggregated Data, same as one used by DataReader:
https://github.com/llvm/llvm-project/blob/e62ce1f8842cca36eb14126d79dcca0a85bf6d36/bolt/lib/Profile/DataReader.cpp#L385-L389
This aligns the order of the output between YAMLProfileWriter and
writeBATYAML.

Test Plan: updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: ayermolo, maksfb

Pull Request: llvm#91289
There is nothing specific here and it is not different from i16 or f16.
## Why
Currently, the system header `errno.h` is included in `libc_errno.h`,
which is supposed to be consumed by internal implementations only. As
unit and hermetic tests should never use `#include <errno.h>` but
instead use `#include "src/errno/libc_errno.h"`, we do not want to
implicitly include `errno.h`. In order to have a clear seperation
between those two, we want to pull out the definitions of errno numbers
from `errno.h`.

## What
* Extract the definitions of errno numbers from
[include/errno.h.def](https://github.com/llvm/llvm-project/pull/91150/files#diff-ed38ed463ed50571b498a5b69039cab58dc9d145da7f751a24da9d77f07781cd)
and place it under
[include/llvm-libc-macros/linux/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-d6192866629690ebb7cefa1f0a90b6675073e9642f3279df08a04dcdb05fd892)
* Provide mips-specific errno numbers in
[include/llvm-libc-macros/linux/mips/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-3fd35a4c94e0cc359933e497b10311d857857b2e173e8afebc421b04b7527743)
* Find definition of mips errno numbers in glibc
[here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/mips/bits/errno.h#L32-L50)
(equally defined in the Linux kernel)
* Provide sparc-specific errno numbers in
[include/llvm-libc-macros/linux/sparc/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-5f16ffb2a51a6f72ebd4403aca7e1edea48289c99dd5978a1c84385bec4f226b)
* Find definition of sparc errno numbers in glibc
[here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/sparc/bits/errno.h#L33-L51)
(equally defined in the Linux kernel)
* Include proxy header `errno_macros.h` instead of the system header
`errno.h` in `libc_errno.h`/`libc_errno.cpp`

Closes llvm#80172
…ack ops. (llvm#90641)

Windows build of `mlir` with Visual Studio (19.36.32538 for x64) using
with the following command:

`cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release
-DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1
-DLLVM_TARGETS_TO_BUILD=host ../llvm`

is leading to a crash when calling canonicalization on
`tensor.pack`/`tensor.unpack` ops `mlir-opt --canonicalize input.mlir`
where the `input.mlir` is as follows (this is taken from one of the
filecheck tests for `tensor.pack`):

```
func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> {
          %pack_dest = tensor.empty() : tensor<8x16x8x32xf32>
          %unpack_dest = tensor.empty() : tensor<128x256xf32>
          %tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32>
          %tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
          return %tup : tensor<128x256xf32>
        }
```

The crash is seemingly coming from invalid memory access during
iterating over `innerDimsPos` within `getPackOpResultTypeShape`.

This crash is also causing the following tests to fail:


```
MLIR :: Dialect/Linalg/canonicalize.mlir                                                                                                                    
MLIR :: Dialect/Linalg/data-layout-propagation.mlir                                                                                                         
MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir                                                                                                     
MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir                                                                                                          
MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir                                                                                                   
MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir                                                                                                        
MLIR :: Dialect/Linalg/transform-lower-pack.mlir                                                                                                            
MLIR :: Dialect/Linalg/transform-op-fuse.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-op-pack.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-pack-greedily.mlir                                                                                                         
MLIR :: Dialect/Tensor/canonicalize.mlir                                                                                                                   
MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir                                                                                                       
MLIR :: Dialect/Tensor/invalid.mlir                                                                                                                         
MLIR :: Dialect/Tensor/ops.mlir                                                                                                                             
MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir                                                                                                           
MLIR :: Dialect/Tensor/tiling.mlir
```
This fixes the new test linkerscript/enable-non-contiguous-regions.test
from llvm#90007 in -stdlib=libc++ -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG builds.

adjustOutputSections does not discard the output section .potential_a
because it contained .a (which would be spilled to .actual_a).
.potential_a and .bc have the same address and will cause an assertion
failure.
This reapplies
llvm@195d8ac
[DirectX] Fix DXIL part header version encoding. The endian issue was
fixed by
llvm@f42117c.

Move MinorVersion be the lower 8 bit.
Set DXIL version in DXContainerObjectWriter::writeObject.

Fixes llvm#89952
…on Windows

This marks delayed-definition-die-searching.test as unsupported on
Windows. Clang uses link.exe as default linker if not marked explicitly
to use lld. When used with link.exe clang produces PDB format debug info
even when -gdwarf is specified.
This test will be unsupported until we make lldb-aarch64-windows buildbot
to use lld.
Instead of hardcoding all of the register name strings.
This PR:

- Make `clock_gettime` a header-only library
- Add `clock_conversion` header library to allow conversion between
clocks relative to the time of call
- Add `timeout` header library to manage the absolute timeout used in
POSIX's timed locking/waiting APIs
)

This change improves the matching algorithm by using the diff algorithm,
the current matching algorithm only processes the callsites grouped by
the same name functions, it doesn't consider the order relationships
between different name functions, this sometimes fails to handle this
ambiguous anchor case. For example. (`Foo:1` means a
calliste[callee_name: callsite_location])
```
IR :      foo:1  bar:2  foo:4  bar:5 
Profile :        bar:3  foo:5  bar:6
```
The `foo:1` is matched to the 2nd `foo:5` and using the diff
algorithm(finding longest common subsequence ) can help on this issue.
One well-known diff algorithm is the Myers diff algorithm(paper "An
O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its
variations have been implemented and used in many famous tools, like the
GNU diff or git diff. It provides an efficient way to find the longest
common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in
our case, the num of function callsites is usually very small, so we
implemented the basic greedy version in this change which should be good
enough.
We observed better matchings and positive perf improvement on our
internal services.
… V/Zve is not enabled.

We can't save vector registers without V/Zve.
Patch llvm#91150 added a proxy header for errno macros. This patch fixes the
bazel build since it needs to be added as a dependency.
… PRs (llvm#91826)

We have been collecting release notes from the PRs for most of the
18.1.x releases and this just helps automate the process.
…ult in LLVM (llvm#89799)""

This reverts commit 91446e2 and
a unittest followup 1530f31 (llvm#90476).

In a stage-2 -flto=thin -gsplit-dwarf -g -fdebug-info-for-profiling
-fprofile-sample-use= build of clang, a ThinLTO backend compile has
assertion failures:

    Global is external, but doesn't have external or weak linkage!
    ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE
    function declaration may only have a unique !dbg attachment
    ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE

The failures somehow go away if -fprofile-sample-use= is removed.
Base automatically changed from bump_to_9d66dcaf172c to bump_to_82383d5f3fa8 August 27, 2024 07:19
Base automatically changed from bump_to_82383d5f3fa8 to feature/fused-ops September 2, 2024 08:49
@cferry-AMD cferry-AMD merged commit 5520c5c into feature/fused-ops Sep 2, 2024
11 checks passed
@cferry-AMD cferry-AMD deleted the bump_to_23f8fac7 branch September 2, 2024 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.