[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

mgehre-amd · 2024-08-23T12:26:35Z

No description provided.

…tCFInstrCost implementations. We were using the default implementations instead of the CRTP versions.

…r each i1 mask element These can nearly always be folded into the existing cost of the branch, and brings the throughput costs of the scalarised gather/scatter code much closer to the llvm-mca/uica estimates

…1581)

Being able to add custom dialects is one of the big missing pieces of the C API. This change should make it achievable via IRDL. Hopefully this should open custom dialect definition to non-C++ users of MLIR.

Previously, isRoot() would return true for pointers with a base of sizeof(InlineDescriptor), even if the actual metadata size of the pointee was 0.

…lvm#91844) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 24 under clang/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".

…trieve source location in `ConvertConstructorToDeductionGuideTransform`. The commit fec4716 was reverted by accident in 7415524. Reland it with a testcase.

…ypes (llvm#87716)

…llvm#91738) There is a follow-up commit for llvm#90319. The Windows test was disabled in that commit, but it should pass on this operating system. Therefore, it would be beneficial to have it enabled for MS Windows.

This effectively reverts 5cd2804 and changes to QualifierFixerTest.cpp from e62ce1f. Failed buidbots: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968

Avoid using bitfield in dxbc::ProgramHeader. It could potentially be read incorrectly on any host depending on the compiler. From [C++17's [class.bit]](https://timsong-cpp.github.io/cppwp/n4659/class.bit#1) > Bit-fields are packed into some addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ] For llvm#91793

…nters"" This reverts commit fb1c2db.

Fix race condition in internal NFC test.

PR llvm#87090 amended `accumulateBitfields` to do the correct clipping. The scissor is no longer necessary and `checkBitfieldClipping` can compute its location directly when needed.

…OpCost directly The generic getCommonMaskedMemoryOpCost now gives the same cost estimates for scalarized gather/scatter.

getGSVectorCost has supported other TargetCostKind since a551272

…#91807) The pass runs a `DataFlowSolver` and collects state information on the input IR. Then, the rewrite driver and folding is applied. During pattern application and folding it can happen that an Op from the input IR is deleted and a new Op is created at the same address. When the newly created Ops is looked up in the `DataFlowSolver` state memory, the state of the original Op is returned. This patch adds a method to `DataFlowSolver` which removes all state related to a `ProgramPoint`. It also adds a listener to the Pass which clears the state information of deleted Ops from the `DataFlowSolver`. Fix llvm#81228

…ggregate initialization using a default member initializer (llvm#87933) This PR complete [DR1815](https://wg21.link/CWG1815) under the guidance of `FIXME` comments. And reuse `CXXDefaultInitExpr` rewrite machinery to clone the initializer expression on each use that would lifetime extend its temporaries. --------- Signed-off-by: yronglin <yronglin777@gmail.com>

) This is probably the most involved addition, as it tries to make use of isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a number of different intrinsics that are all lane-wise. Additional tests have been added for some of the different intrinsics from isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.

… YAML Fix an issue where the profile for all branches that have a BRANCHENTRY is dropped. If the branch has an entry in BAT, it will be translated to its input offset. We used to only permit the basic block offset as a branch source. Perform a lookup of containing basic block instead. Test Plan: Updated bolt-address-translation-yaml.test Reviewers: maksfb, dcci, rafaelauler, ayermolo Reviewed By: maksfb Pull Request: llvm#91273

…91846) This is how MSVC handles it. https://godbolt.org/z/fG386bjnf

This reverts commit 0869204, which caused a buildbot failure: https://lab.llvm.org/buildbot/#/builders/5/builds/43322

…st (llvm#89170) This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same

Fix the following buildbot failures by making LangOpts in the unit test static: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968

…m#90995) GVNSink used to order instructions based on their pointer values and was prone to non-determinism because of that. This patch ensures all the values stored are using a deterministic order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to assert when ordering isn't preserved. Additionally, I have added a test case (mirror graph image of an existing test) that would have failed before this patch. Fixes: llvm#77852

…cInstrCost (llvm#89170)" This reverts commit ed16e7a.

…icInstrCost (llvm#89170) Insert a break to fix the implicit-fallthrough caught by sanitizer. Original commit message: This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same

This avoids 'Permission denied' when PWD is read-only. While here, change the triple from a Linux one to a generic ELF one.

…s in RISCVRegisterInfo::needsFrameBaseReg. Instead of using getReservedRegs, just check the subtarget reserved list. getReservedRegs considers the frame pointer to be reserved when it is being used, but we do need to save/restore it so it should be counted as a callee saved register. AArch64 hardcodes their callee saved size, but the comment mentions the Frame Pointer being counted.

…:needsFrameBaseReg The vector callee saved registers shouldn't affect the frame pointer offset so we don't want to consider them. I've listed the GPR, FPR32, and FPR64 register classes explicitly because getMinimalPhysRegClass is slow and this function is called frequently. So explicitly listing the interesting classs should be a compile time improvement.

The testing we have for vector ptradd was a bit lacking. In adding tests this patch found a couple of issues mostly with the way v3 vectors of ptrs were sometimes legalized via i64, and with non-i64 additions. It does not attempt to fix the issue with mergevalues from returning vector ptrs.

…2016) This is a proof of concept recognition of the most basic forms of ReLu operations, used to show-case sparsification of end-to-end PyTorch models. In the long run, we must avoid lowering such constructs too early (with this need for raising them back). See discussion at https://discourse.llvm.org/t/min-max-abs-relu-recognition-starter-project/78918

Switch from FuncBranchData intermediate maps (Intra/InterIndex) to aggregated Data, same as one used by DataReader: https://github.com/llvm/llvm-project/blob/e62ce1f8842cca36eb14126d79dcca0a85bf6d36/bolt/lib/Profile/DataReader.cpp#L385-L389 This aligns the order of the output between YAMLProfileWriter and writeBATYAML. Test Plan: updated bolt-address-translation-yaml.test Reviewers: rafaelauler, dcci, ayermolo, maksfb Reviewed By: ayermolo, maksfb Pull Request: llvm#91289

There is nothing specific here and it is not different from i16 or f16.

## Why Currently, the system header `errno.h` is included in `libc_errno.h`, which is supposed to be consumed by internal implementations only. As unit and hermetic tests should never use `#include <errno.h>` but instead use `#include "src/errno/libc_errno.h"`, we do not want to implicitly include `errno.h`. In order to have a clear seperation between those two, we want to pull out the definitions of errno numbers from `errno.h`. ## What * Extract the definitions of errno numbers from [include/errno.h.def](https://github.com/llvm/llvm-project/pull/91150/files#diff-ed38ed463ed50571b498a5b69039cab58dc9d145da7f751a24da9d77f07781cd) and place it under [include/llvm-libc-macros/linux/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-d6192866629690ebb7cefa1f0a90b6675073e9642f3279df08a04dcdb05fd892) * Provide mips-specific errno numbers in [include/llvm-libc-macros/linux/mips/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-3fd35a4c94e0cc359933e497b10311d857857b2e173e8afebc421b04b7527743) * Find definition of mips errno numbers in glibc [here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/mips/bits/errno.h#L32-L50) (equally defined in the Linux kernel) * Provide sparc-specific errno numbers in [include/llvm-libc-macros/linux/sparc/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-5f16ffb2a51a6f72ebd4403aca7e1edea48289c99dd5978a1c84385bec4f226b) * Find definition of sparc errno numbers in glibc [here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/sparc/bits/errno.h#L33-L51) (equally defined in the Linux kernel) * Include proxy header `errno_macros.h` instead of the system header `errno.h` in `libc_errno.h`/`libc_errno.cpp` Closes llvm#80172

…ack ops. (llvm#90641) Windows build of `mlir` with Visual Studio (19.36.32538 for x64) using with the following command: `cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1 -DLLVM_TARGETS_TO_BUILD=host ../llvm` is leading to a crash when calling canonicalization on `tensor.pack`/`tensor.unpack` ops `mlir-opt --canonicalize input.mlir` where the `input.mlir` is as follows (this is taken from one of the filecheck tests for `tensor.pack`): ``` func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> { %pack_dest = tensor.empty() : tensor<8x16x8x32xf32> %unpack_dest = tensor.empty() : tensor<128x256xf32> %tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32> %tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32> return %tup : tensor<128x256xf32> } ``` The crash is seemingly coming from invalid memory access during iterating over `innerDimsPos` within `getPackOpResultTypeShape`. This crash is also causing the following tests to fail: ``` MLIR :: Dialect/Linalg/canonicalize.mlir MLIR :: Dialect/Linalg/data-layout-propagation.mlir MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir MLIR :: Dialect/Linalg/transform-lower-pack.mlir MLIR :: Dialect/Linalg/transform-op-fuse.mlir MLIR :: Dialect/Linalg/transform-op-pack.mlir MLIR :: Dialect/Linalg/transform-pack-greedily.mlir MLIR :: Dialect/Tensor/canonicalize.mlir MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir MLIR :: Dialect/Tensor/invalid.mlir MLIR :: Dialect/Tensor/ops.mlir MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir MLIR :: Dialect/Tensor/tiling.mlir ```

…lvm#92041)

This fixes the new test linkerscript/enable-non-contiguous-regions.test from llvm#90007 in -stdlib=libc++ -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG builds. adjustOutputSections does not discard the output section .potential_a because it contained .a (which would be spilled to .actual_a). .potential_a and .bc have the same address and will cause an assertion failure.

This reapplies llvm@195d8ac [DirectX] Fix DXIL part header version encoding. The endian issue was fixed by llvm@f42117c. Move MinorVersion be the lower 8 bit. Set DXIL version in DXContainerObjectWriter::writeObject. Fixes llvm#89952

…on Windows This marks delayed-definition-die-searching.test as unsupported on Windows. Clang uses link.exe as default linker if not marked explicitly to use lld. When used with link.exe clang produces PDB format debug info even when -gdwarf is specified. This test will be unsupported until we make lldb-aarch64-windows buildbot to use lld.

Instead of hardcoding all of the register name strings.

This PR: - Make `clock_gettime` a header-only library - Add `clock_conversion` header library to allow conversion between clocks relative to the time of call - Add `timeout` header library to manage the absolute timeout used in POSIX's timed locking/waiting APIs

) This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (`Foo:1` means a calliste[callee_name: callsite_location]) ``` IR : foo:1 bar:2 foo:4 bar:5 Profile : bar:3 foo:5 bar:6 ``` The `foo:1` is matched to the 2nd `foo:5` and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching. There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough. We observed better matchings and positive perf improvement on our internal services.

… V/Zve is not enabled. We can't save vector registers without V/Zve.

Patch llvm#91150 added a proxy header for errno macros. This patch fixes the bazel build since it needs to be added as a dependency.

… PRs (llvm#91826) We have been collecting release notes from the PRs for most of the 18.1.x releases and this just helps automate the process.

…ult in LLVM (llvm#89799)"" This reverts commit 91446e2 and a unittest followup 1530f31 (llvm#90476). In a stage-2 -flto=thin -gsplit-dwarf -g -fdebug-info-for-profiling -fprofile-sample-use= build of clang, a ThinLTO backend compile has assertion failures: Global is external, but doesn't have external or weak linkage! ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE function declaration may only have a unique !dbg attachment ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE The failures somehow go away if -fprofile-sample-use= is removed.

RKSimon and others added 30 commits May 11, 2024 12:50

[TTI] getCommonMaskedMemoryOpCost - use the target getMemoryOpCost/ge…

079fdef

…tCFInstrCost implementations. We were using the default implementations instead of the CRTP versions.

[TTI][X86] getGSScalarCost - don't bother with adding cost of ICMP fo…

23fe1fc

…r each i1 mask element These can nearly always be folded into the existing cost of the branch, and brings the throughput costs of the scalarised gather/scatter code much closer to the llvm-mca/uica estimates

[DebugInfo][JumpThreading] Fix missing debug location updates (llvm#9…

3773191

…1581)

[DebugInfo][LICM] Fix missing debug location updates (llvm#91729)

cdd7821

[MLIR] Add IRDL dialect loading to C API (llvm#91852)

1337622

Being able to add custom dialects is one of the big missing pieces of the C API. This change should make it achievable via IRDL. Hopefully this should open custom dialect definition to non-C++ users of MLIR.

[clang][Interp] Use pointee metadata size in isRoot()

379b777

Previously, isRoot() would return true for pointers with a base of sizeof(InlineDescriptor), even if the actual metadata size of the pointee was 0.

[lldb] Add required skipIfLLVMTargetMissing for X86 (NFC)

baffaf0

Reland: [clang] Use getDefaultArgRange instead of getDefaultArg to re…

2b38688

…trieve source location in `ConvertConstructorToDeductionGuideTransform`. The commit fec4716 was reverted by accident in 7415524. Reland it with a testcase.

[libc++] Vectorize std::mismatch with trivially equality comparable t…

05cc2d5

…ypes (llvm#87716)

[clang-format] Fix buildbot failures

0869204

This effectively reverts 5cd2804 and changes to QualifierFixerTest.cpp from e62ce1f. Failed buidbots: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968

[X86][CodeGen] Add NDD entries for transform TEST+AND -> TEST

e586556

Reapply "Reapply "[clang][Interp] Create full type info for dummy poi…

63224d7

…nters"" This reverts commit fb1c2db.

[BOLT] Fix race condition in a test (llvm#91866)

c8864bc

Fix race condition in internal NFC test.

[clang][NFC] Remove class layout scissor (llvm#89055)

1d6bf0c

PR llvm#87090 amended `accumulateBitfields` to do the correct clipping. The scissor is no longer necessary and `checkBitfieldClipping` can compute its location directly when needed.

[CostModel][X86] Remove getGSScalarCost and use getCommonMaskedMemory…

a477004

…OpCost directly The generic getCommonMaskedMemoryOpCost now gives the same cost estimates for scalarized gather/scatter.

[CostModel][X86] getGSVectorCost - remove FIXME

502e77d

getGSVectorCost has supported other TargetCostKind since a551272

[X86][vectorcall] Pass built types byval when xmm0~6 exhausted (llvm#…

5bde801

…91846) This is how MSVC handles it. https://godbolt.org/z/fG386bjnf

Revert "[clang-format] Fix buildbot failures"

626025a

This reverts commit 0869204, which caused a buildbot failure: https://lab.llvm.org/buildbot/#/builders/5/builds/43322

[RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeticInstrCo…

ed16e7a

…st (llvm#89170) This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same

[clang-format] Fix buildbot failures

de641e2

Fix the following buildbot failures by making LangOpts in the unit test static: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968

Revert "[RISCV][TTI] Support fdiv/udiv/sdiv/srem/urem in getArithmeti…

d67c3a4

…cInstrCost (llvm#89170)" This reverts commit ed16e7a.

fhahn and others added 24 commits May 13, 2024 20:44

[LV] Use VPBuilder to create Select (NFCI).

e122380

[gn build] Port 31a203f

a4b3422

[test] Use conventional -emit-llvm-only

a6d7828

This avoids 'Permission denied' when PWD is read-only. While here, change the triple from a Linux one to a generic ELF one.

[test] Fix check prefixes

ef9090f

[AMDGPU] Make v2bf16 BUILD_VECTOR legal (llvm#92022)

efc7bbb

There is nothing specific here and it is not different from i16 or f16.

[libc][errno] Include <linux/errno.h> for Linux in full build mode. (l…

8960078

…lvm#92041)

[RISCV] Use printRegName in RISCVInstPrinter::printRlist. NFC

ec3bc2f

Instead of hardcoding all of the register name strings.

[RISCV] Inogre CallingConv::RISCV_VectorCall in getCalleeSavedRegs if…

4357712

… V/Zve is not enabled. We can't save vector registers without V/Zve.

[libc] add errno_macro header to bazel build (llvm#92044)

4c79d38

Patch llvm#91150 added a proxy header for errno macros. This patch fixes the bazel build since it needs to be added as a dependency.

[workflows] Add a job for requesting a release note on release branch…

c99d115

… PRs (llvm#91826) We have been collecting release notes from the PRs for most of the 18.1.x releases and this just helps automate the process.

[AutoBump] Merge with 23f8fac (May 14)

c155219

mgehre-amd requested a review from cferry-AMD August 26, 2024 09:09

cferry-AMD approved these changes Aug 26, 2024

View reviewed changes

Base automatically changed from bump_to_9d66dcaf172c to bump_to_82383d5f3fa8 August 27, 2024 07:19

Base automatically changed from bump_to_82383d5f3fa8 to feature/fused-ops September 2, 2024 08:49

cferry-AMD merged commit 5520c5c into feature/fused-ops Sep 2, 2024
11 checks passed

cferry-AMD deleted the bump_to_23f8fac7 branch September 2, 2024 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

mgehre-amd commented Aug 23, 2024

[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

Conversation

mgehre-amd commented Aug 23, 2024