forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 23f8fac7 (May 14) (44) #303
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tCFInstrCost implementations. We were using the default implementations instead of the CRTP versions.
…r each i1 mask element These can nearly always be folded into the existing cost of the branch, and brings the throughput costs of the scalarised gather/scatter code much closer to the llvm-mca/uica estimates
Being able to add custom dialects is one of the big missing pieces of the C API. This change should make it achievable via IRDL. Hopefully this should open custom dialect definition to non-C++ users of MLIR.
Previously, isRoot() would return true for pointers with a base of sizeof(InlineDescriptor), even if the actual metadata size of the pointee was 0.
…lvm#91844) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 24 under clang/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
…llvm#91738) There is a follow-up commit for llvm#90319. The Windows test was disabled in that commit, but it should pass on this operating system. Therefore, it would be beneficial to have it enabled for MS Windows.
This effectively reverts 5cd2804 and changes to QualifierFixerTest.cpp from e62ce1f. Failed buidbots: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968
Avoid using bitfield in dxbc::ProgramHeader. It could potentially be read incorrectly on any host depending on the compiler. From [C++17's [class.bit]](https://timsong-cpp.github.io/cppwp/n4659/class.bit#1) > Bit-fields are packed into some addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ] For llvm#91793
…nters"" This reverts commit fb1c2db.
Fix race condition in internal NFC test.
PR llvm#87090 amended `accumulateBitfields` to do the correct clipping. The scissor is no longer necessary and `checkBitfieldClipping` can compute its location directly when needed.
…OpCost directly The generic getCommonMaskedMemoryOpCost now gives the same cost estimates for scalarized gather/scatter.
getGSVectorCost has supported other TargetCostKind since a551272
…#91807) The pass runs a `DataFlowSolver` and collects state information on the input IR. Then, the rewrite driver and folding is applied. During pattern application and folding it can happen that an Op from the input IR is deleted and a new Op is created at the same address. When the newly created Ops is looked up in the `DataFlowSolver` state memory, the state of the original Op is returned. This patch adds a method to `DataFlowSolver` which removes all state related to a `ProgramPoint`. It also adds a listener to the Pass which clears the state information of deleted Ops from the `DataFlowSolver`. Fix llvm#81228
…ggregate initialization using a default member initializer (llvm#87933) This PR complete [DR1815](https://wg21.link/CWG1815) under the guidance of `FIXME` comments. And reuse `CXXDefaultInitExpr` rewrite machinery to clone the initializer expression on each use that would lifetime extend its temporaries. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
) This is probably the most involved addition, as it tries to make use of isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a number of different intrinsics that are all lane-wise. Additional tests have been added for some of the different intrinsics from isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
… YAML Fix an issue where the profile for all branches that have a BRANCHENTRY is dropped. If the branch has an entry in BAT, it will be translated to its input offset. We used to only permit the basic block offset as a branch source. Perform a lookup of containing basic block instead. Test Plan: Updated bolt-address-translation-yaml.test Reviewers: maksfb, dcci, rafaelauler, ayermolo Reviewed By: maksfb Pull Request: llvm#91273
…91846) This is how MSVC handles it. https://godbolt.org/z/fG386bjnf
This reverts commit 0869204, which caused a buildbot failure: https://lab.llvm.org/buildbot/#/builders/5/builds/43322
…st (llvm#89170) This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same
Fix the following buildbot failures by making LangOpts in the unit test static: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968
…m#90995) GVNSink used to order instructions based on their pointer values and was prone to non-determinism because of that. This patch ensures all the values stored are using a deterministic order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to assert when ordering isn't preserved. Additionally, I have added a test case (mirror graph image of an existing test) that would have failed before this patch. Fixes: llvm#77852
…cInstrCost (llvm#89170)" This reverts commit ed16e7a.
…icInstrCost (llvm#89170) Insert a break to fix the implicit-fallthrough caught by sanitizer. Original commit message: This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same
This avoids 'Permission denied' when PWD is read-only. While here, change the triple from a Linux one to a generic ELF one.
…s in RISCVRegisterInfo::needsFrameBaseReg. Instead of using getReservedRegs, just check the subtarget reserved list. getReservedRegs considers the frame pointer to be reserved when it is being used, but we do need to save/restore it so it should be counted as a callee saved register. AArch64 hardcodes their callee saved size, but the comment mentions the Frame Pointer being counted.
…:needsFrameBaseReg The vector callee saved registers shouldn't affect the frame pointer offset so we don't want to consider them. I've listed the GPR, FPR32, and FPR64 register classes explicitly because getMinimalPhysRegClass is slow and this function is called frequently. So explicitly listing the interesting classs should be a compile time improvement.
The testing we have for vector ptradd was a bit lacking. In adding tests this patch found a couple of issues mostly with the way v3 vectors of ptrs were sometimes legalized via i64, and with non-i64 additions. It does not attempt to fix the issue with mergevalues from returning vector ptrs.
…2016) This is a proof of concept recognition of the most basic forms of ReLu operations, used to show-case sparsification of end-to-end PyTorch models. In the long run, we must avoid lowering such constructs too early (with this need for raising them back). See discussion at https://discourse.llvm.org/t/min-max-abs-relu-recognition-starter-project/78918
Switch from FuncBranchData intermediate maps (Intra/InterIndex) to aggregated Data, same as one used by DataReader: https://github.com/llvm/llvm-project/blob/e62ce1f8842cca36eb14126d79dcca0a85bf6d36/bolt/lib/Profile/DataReader.cpp#L385-L389 This aligns the order of the output between YAMLProfileWriter and writeBATYAML. Test Plan: updated bolt-address-translation-yaml.test Reviewers: rafaelauler, dcci, ayermolo, maksfb Reviewed By: ayermolo, maksfb Pull Request: llvm#91289
There is nothing specific here and it is not different from i16 or f16.
## Why Currently, the system header `errno.h` is included in `libc_errno.h`, which is supposed to be consumed by internal implementations only. As unit and hermetic tests should never use `#include <errno.h>` but instead use `#include "src/errno/libc_errno.h"`, we do not want to implicitly include `errno.h`. In order to have a clear seperation between those two, we want to pull out the definitions of errno numbers from `errno.h`. ## What * Extract the definitions of errno numbers from [include/errno.h.def](https://github.com/llvm/llvm-project/pull/91150/files#diff-ed38ed463ed50571b498a5b69039cab58dc9d145da7f751a24da9d77f07781cd) and place it under [include/llvm-libc-macros/linux/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-d6192866629690ebb7cefa1f0a90b6675073e9642f3279df08a04dcdb05fd892) * Provide mips-specific errno numbers in [include/llvm-libc-macros/linux/mips/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-3fd35a4c94e0cc359933e497b10311d857857b2e173e8afebc421b04b7527743) * Find definition of mips errno numbers in glibc [here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/mips/bits/errno.h#L32-L50) (equally defined in the Linux kernel) * Provide sparc-specific errno numbers in [include/llvm-libc-macros/linux/sparc/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-5f16ffb2a51a6f72ebd4403aca7e1edea48289c99dd5978a1c84385bec4f226b) * Find definition of sparc errno numbers in glibc [here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/sparc/bits/errno.h#L33-L51) (equally defined in the Linux kernel) * Include proxy header `errno_macros.h` instead of the system header `errno.h` in `libc_errno.h`/`libc_errno.cpp` Closes llvm#80172
…ack ops. (llvm#90641) Windows build of `mlir` with Visual Studio (19.36.32538 for x64) using with the following command: `cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1 -DLLVM_TARGETS_TO_BUILD=host ../llvm` is leading to a crash when calling canonicalization on `tensor.pack`/`tensor.unpack` ops `mlir-opt --canonicalize input.mlir` where the `input.mlir` is as follows (this is taken from one of the filecheck tests for `tensor.pack`): ``` func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> { %pack_dest = tensor.empty() : tensor<8x16x8x32xf32> %unpack_dest = tensor.empty() : tensor<128x256xf32> %tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32> %tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32> return %tup : tensor<128x256xf32> } ``` The crash is seemingly coming from invalid memory access during iterating over `innerDimsPos` within `getPackOpResultTypeShape`. This crash is also causing the following tests to fail: ``` MLIR :: Dialect/Linalg/canonicalize.mlir MLIR :: Dialect/Linalg/data-layout-propagation.mlir MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir MLIR :: Dialect/Linalg/transform-lower-pack.mlir MLIR :: Dialect/Linalg/transform-op-fuse.mlir MLIR :: Dialect/Linalg/transform-op-pack.mlir MLIR :: Dialect/Linalg/transform-pack-greedily.mlir MLIR :: Dialect/Tensor/canonicalize.mlir MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir MLIR :: Dialect/Tensor/invalid.mlir MLIR :: Dialect/Tensor/ops.mlir MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir MLIR :: Dialect/Tensor/tiling.mlir ```
This fixes the new test linkerscript/enable-non-contiguous-regions.test from llvm#90007 in -stdlib=libc++ -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG builds. adjustOutputSections does not discard the output section .potential_a because it contained .a (which would be spilled to .actual_a). .potential_a and .bc have the same address and will cause an assertion failure.
This reapplies llvm@195d8ac [DirectX] Fix DXIL part header version encoding. The endian issue was fixed by llvm@f42117c. Move MinorVersion be the lower 8 bit. Set DXIL version in DXContainerObjectWriter::writeObject. Fixes llvm#89952
…on Windows This marks delayed-definition-die-searching.test as unsupported on Windows. Clang uses link.exe as default linker if not marked explicitly to use lld. When used with link.exe clang produces PDB format debug info even when -gdwarf is specified. This test will be unsupported until we make lldb-aarch64-windows buildbot to use lld.
Instead of hardcoding all of the register name strings.
This PR: - Make `clock_gettime` a header-only library - Add `clock_conversion` header library to allow conversion between clocks relative to the time of call - Add `timeout` header library to manage the absolute timeout used in POSIX's timed locking/waiting APIs
) This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (`Foo:1` means a calliste[callee_name: callsite_location]) ``` IR : foo:1 bar:2 foo:4 bar:5 Profile : bar:3 foo:5 bar:6 ``` The `foo:1` is matched to the 2nd `foo:5` and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching. There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough. We observed better matchings and positive perf improvement on our internal services.
… V/Zve is not enabled. We can't save vector registers without V/Zve.
Patch llvm#91150 added a proxy header for errno macros. This patch fixes the bazel build since it needs to be added as a dependency.
… PRs (llvm#91826) We have been collecting release notes from the PRs for most of the 18.1.x releases and this just helps automate the process.
…ult in LLVM (llvm#89799)"" This reverts commit 91446e2 and a unittest followup 1530f31 (llvm#90476). In a stage-2 -flto=thin -gsplit-dwarf -g -fdebug-info-for-profiling -fprofile-sample-use= build of clang, a ThinLTO backend compile has assertion failures: Global is external, but doesn't have external or weak linkage! ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE function declaration may only have a unique !dbg attachment ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE The failures somehow go away if -fprofile-sample-use= is removed.
cferry-AMD
approved these changes
Aug 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.