forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 76782e28 (needs torch-mlir bump, has downstream PR) (31) #291
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Like with the 'default' clause, this is being applied to only Compute Constructs for now. The 'if' clause takes a condition expression which is used as a runtime value. This is not a particularly complex semantic implementation, as there isn't much to this clause, other than its interactions with 'self', which will be managed in the patch to implement that.
Fix for llvm#88451 Do not perform semantic check about data transfer on assignment statement in device context.
Closes llvm#88066. Compared to before, the function names in the stdbit table are sorted by function name, not order-of-appearance in the standard. Since macros aren't printed by docgen.py and are still a TODO in the code, they are also not printed in the new stdbit.h docs. Adds some checks to docgen.py for conditions that tripped me up. Add code to docgen.py to add the include of the `|check|` rewriter, since all other generated files need it.
Leftover from a previous commit, this ends up not being used, so remove it.
While working on a followup patch, it became clear that this extra bit of 'OpenACC' before each clause name was redundant with the visitors, so remove it to make this a little less verbose.
…esystem` to have it use cached `status` (llvm#88152) As-is, calls to `exists()` fallback on the implementation in `ProxyFileSystem::exists` which explicitly calls out to the underlying `FS`, which for the `DependencyScanningFilesystem` (overlay) is the real underlying filesystem. Instead, directly overloading `exists` allows us to have it rely on the cached `status` behavior used elsewhere by the `DependencyScanningFilesystem`.
Looks like I forgot to do build CIndex.cpp when validating myself!
Running the test-release.sh script with PGO enabled causes build errors like: ld.lld: error: Function Import: link error: linking module flags 'ProfileSummary': IDs have conflicting values I believe this a build system bug due to the PGO profile data being generated unconditionally. If you run `ninja check-all` and then `ninja install` like we do in test-release.sh, then the profile data is regenerated during `ninja install` and some of the clang tools which are not test dependencies get build during the ninja install step with different profile data. When these tools link against the LLVM libraries, like libSupport, we end up with these errors.
…llvm#87533) InstCombine transforms add of 0 to or of 0. For system atomics, this is problematic because while PCIe supports add, it does not support the other operations. Undo this for system scope atomics.
The malloc->calloc fold creates a new MemoryAccess, which may end of at the same address as a previously deleted access inside SkipStores. To the most part, this is not a problem, because SkipStores is normally only used together with MemDefs. Neither the old malloc access nor the new calloc access will be part of MemDefs, so there is no problem here. However, SkipStores is also used in one more place: In the main DSE loop, ToCheck entries are checked against it. Fix this by not using SkipStores here, and instead using a separate set to track deletions inside this loop. This way it is not affected by the calloc optimization that happens outside it. This is all pretty ugly, but I haven't found another good way to fix it. Suggestions welcome. No test case as I don't have a reliable DSE-only test-case for this. Fixes llvm#84458.
…vm#88211) StringMap uses xxHash instead of DJB.
Otherwise the test would fail on Darwin and other platforms that use Itanium ABI but do not support alias/ifunc.
… NFC After inlining, `scanSection` is significantly longer (more than 100+ instructions on x86-64 built with Clang) when `i` does not always increment by one (MIPS).
Enable frame pointer optimization by default to match it with gcc. Fixes: llvm#75013
This patch moves CUDA-related `Sema` function into new `SemaCUDA` class, following the recent example of SYCL, OpenACC, and HLSL. This is a part of the effort to split Sema. Additional context can be found in llvm#82217, llvm#84184, llvm#87634.
This patch implements intrinsic that supports `std::is_pointer_interconvertible_base_of` type trait from [P0466R5](https://wg21.link/p0466r5) "Layout-compatibility and Pointer-interconvertibility Traits". Normative wording: > Comment: If `Base` and Derived are non-union class types and are not (possibly _cv_-qualified) versions of the same type, `Derived` is a complete type. > Condition: `Derived` is unambiguously derived from `Base` without regard to _cv_-qualifiers, and each object of type `Derived` is pointer-interconvertible (6.7.2 [basic.compound]) with its `Base` subobject, or `Base` and `Derived` are not unions and name the same class type without regard to _cv_-qualifiers. The paper also express the following intent: > Note that `is_pointer_interconvertible_base_of_v<T,T>` is always true under this wording, even though `T` is not derived from itself. I find the treatment of unions in the wording contradictory to this intent, and I'm not able to find anything relevant in minutes or on the reflector. That said, this patch implements what the wording says, since it's very explicit about unions.
This patch is breaking `dr` to `cwg` equivalence in our terminology, making room for tests for LWG issues that concern compiler intrinsics.
…tem modules (llvm#88432) Co-authored-by: Alex Lorenz <arphaman@gmail.com>
Some tests in `dr6xx.cpp` were using C11 `_Static_assert`, and were expecting extension warnings in C++98 mode because of that. This is noise, and we can do better than that.
Coding my way home: 4.42841S, 102.96190W
1. Move expansion to combineFP_EXTEND to help with small vectors; 2. Combine FP_ROUND to reduce fptrunc then fpextend after promotion;
This patch uses `getConstantRangeAtUse` to infer nsw/nuw flags with at-use info. It will enables more optimizations in InstCombine. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=a5ed14bc8e122fa5ac0aa81f8d8390931bd6b4e4&to=a83d3402b663439b91cb37a046fb7ac0220ba5c7&stat=instructions%3Au Related issue: llvm#87854
This patch enables vectorization for non-power-of-2 VFs. Initially only VFs where adding 1 makes the VF a power-of-2, i.e. we can still make relatively effective use of the vectors. It relies on the existing target cost-models to return accurate costs for non-power-of-2 vectors. I checked mostly AArch64 and X86 and there the costs seem reasonable for the costs I checked, although I expect there will be a need to refine both the cost-models and lowering to make most effective use of non-power-of-2 SLP vectorization. Note that re-ordering and shuffling is not implemented for nodes requiring padding yet to keep the initial implementation simpler. The feature is guarded by a new flag, off by defaul for now. PR: llvm#77790
The overhead of taking a std::mutex is much lower than taking a reader lock on a shared mutex, even under heavy contention. The benefit of shared_mutex only occurs as the amount of time spent in the critical sections grows large enough. In our case all we do is read a pointer and return the lock. As a result, using a shared lock can be ~50%-100% slower Here are the results for the provided benchmark on my machine: ``` 2024-04-07T12:48:51-04:00 Running ./libcxx/benchmarks/shared_mutex_vs_mutex.libcxx.out Run on (12 X 400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x6) L1 Instruction 32 KiB (x6) L2 Unified 1024 KiB (x6) L3 Unified 32768 KiB (x1) Load Average: 2.70, 2.70, 1.63 --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_shared_mutex/threads:1 13.9 ns 13.9 ns 50533700 BM_shared_mutex/threads:2 34.5 ns 68.9 ns 9957784 BM_shared_mutex/threads:4 38.4 ns 137 ns 4987772 BM_shared_mutex/threads:8 51.1 ns 358 ns 1974160 BM_shared_mutex/threads:32 57.1 ns 682 ns 1043648 BM_mutex/threads:1 5.54 ns 5.53 ns 125867422 BM_mutex/threads:2 15.5 ns 30.9 ns 21830116 BM_mutex/threads:4 15.4 ns 57.2 ns 12136920 BM_mutex/threads:8 19.3 ns 140 ns 4997080 BM_mutex/threads:32 20.8 ns 252 ns 2859808 ```
…plemented (llvm#87871) Implemented: https://wg21.link/P0905R1 in 0913ca1 - https://eel.is/c++draft/cmp.weakeq (Removed in https://wg21.link/P1959R0) - https://eel.is/c++draft/cmp.strongeq (Removed in https://wg21.link/P1959R0) - https://eel.is/c++draft/cmp.partialord - https://github.com/llvm/llvm-project/blob/4abb722ffa7fcf809faa4a479fdf2f78c685b351/libcxx/include/__compare/ordering.h#L101 - https://eel.is/c++draft/cmp.weakord - https://github.com/llvm/llvm-project/blob/4abb722ffa7fcf809faa4a479fdf2f78c685b351/libcxx/include/__compare/ordering.h#L173 - https://eel.is/c++draft/cmp.strongord - https://github.com/llvm/llvm-project/blob/4abb722ffa7fcf809faa4a479fdf2f78c685b351/libcxx/include/__compare/ordering.h#L250 References: - https://wg21.link/P1959R0 - Remove `std::weak_equality` and `std::strong_equality`
…issing (llvm#88573) When then linker creates runtime pseudo relocations, it places them in a list with the assumption that the runtime will fix these relocations later, when the image gets loaded. If the relevant runtime function doesn't seem to be present in the linked image, error out. Normally when linking the mingw-w64 runtime libraries, this function always is available. However, if linking without including the mingw-w64 CRT startup files, and the image needs runtime pseudo relocations, make it clear that this won't work as expected at runtime. With ld.bfd, this situation is a hard error too; ld.bfd adds an undefined reference to this symbol if runtime pseudo relocations are needed. A later alternative would be to actually try to pull in the symbol (if seen in a static library, but not included yet). This would allow decoupling the function from the main mingw-w64 CRT startup code (making it optional, only running if the linker actually produced runtime pseudo relocations). Doing that would require restructuring the lld code (gathering pseudo relocations earlier, then loading the relocator function, then pulling in more object files to satisfy the dependencies of the relocator) though. Also, ld.bfd doesn't currently successfully pull in more object files to satisfy the dependency on _pei386_runtime_relocator, so with that in mind, there's not much extra value in making LLD do it currently either; we can't make such a change in mingw-w64's CRT until both linkers handle it. This fixes one issue brought up in llvm#84424.
We rename `TuneNoStripWSuffix` to `TunePreferWInst`. If all the users of an instruction just use the low 32 bits, we can convert it to its W variant. A quick test on Coremark (`-O3 -march=rv64gc`): | | W instructions | code size(.text) | |--------|----------------|------------------| | before | 302 | 12257 | | after | 343 | 12265 | | | +13.58% | +0.065% | Reviewers: asb, dtcxzyw, preames, lukel97, michaelmaitland, topperc Reviewed By: topperc, dtcxzyw Pull Request: llvm#87237
Close llvm#87609 We tried to profile the body of the lambda expressions in https://reviews.llvm.org/D153957. But as the original comments show, it is indeed dangerous. After we tried to skip calculating the ODR hash values recently, we have fall into this trap twice. So in this patch, I choose to not profile the body of the lambda expression. The signature of the lambda is still profiled.
This patch changes the preprocessor directives surrounding getCurrentTID, particularly moving it out of the block that is only defined when not building for Android. The getCurrentTID function is called in places that only require Linux definitions, so this function should have the same preprocessor scoping around it to prevent link time failures.
This PR includes: * vadd.vv/vand.vv/vor.vv/vxor.vv * vmseq.vv/vmsne.vv * vmin.vv/vminu.vv/vmax.vv/vmaxu.vv * vmul.vv/vmulh.vv/vmulhu.vv * vwadd.vv/vwaddu.vv * vwmul.vv/vwmulu * vwmacc.vv/vwmaccu.vv * vadc.vvm There is no test change, I may add it later. Fixes part of llvm#64422 Reviewers: michaelmaitland, preames, lukel97, topperc, asb Reviewed By: topperc, lukel97 Pull Request: llvm#88379
Functions inside __cpu_traits were needlessly prefixed with __parallel, which doesn't serve a real purpose anymore now that they are inside a traits class.
https://bugs.llvm.org/show_bug.cgi?id=51559 llvm#50901 IPSCCP pass removes the global variable and does not create a constant expression for the initializer value. Extend test coverage to include: - half, bfloat types. - checks for undef (int32 and ptr). There is no support for: - fp128, x86_fp80, ppc_fp128 types. llvm#88102
This commit ensures that the `CallDescription`s in `MallocChecker` are matched with the mode `CDM::CLibrary`, so: - they don't match methods or functions within user-defined namespaces; - they also match builtin variants of these functions (if any), so the checker can model `__builtin_alloca()` like `alloca()`. This change fixes llvm#81597. New tests were added to verify that `std::malloc` and `std::free` (from `<cstdlib>`) are modeled, but a method that's named e.g. `free` isn't confused with the memory release function. The responsibility for modeling `__builtin_alloca` and `__builtin_alloca_with_align` was moved from `BuiltinFunctionChecker` to `MallocChecker`, to avoid buggy interactions between the checkers and ensure that the builtin and non-builtin variants are handled by exactly the same logic. This change might be a step backwards for the users who don't have `unix.Malloc` enabled; but I suspect that `__builtin_alloca()` is so rare that it would be a waste of time to implement backwards compatibility for them. There were several test files that relied on `__builtin_alloca()` calls to get an `AllocaRegion`, these were modified to enable `unix.Malloc`. One of these files (cxx-uninitialized-object-ptr-ref.cpp) had some tests that relied on the fact that `malloc()` was treated as a "black box" in them, these were updated to use `calloc()` (to get initialized memory) and `free()` (to avoid memory leak reports). While I was developing this change, I found a very suspicious assert in `MallocChecker`. As it isn't blocking the goals of this commit, I just marked it with a FIXME, but I'll try to investigate and fix it in a follow-up change.
llvm#87980) This commit generalizes and cleans up the `ValueBoundsConstraintSet` API. The API used to provide function overloads for comparing/computing bounds of: - index-typed SSA value - dimension of shaped value - affine map + operands This commit removes all overloads. There is now a single entry point for each `compare` variant and each `computeBound` variant. These functions now take a `Variable`, which is internally represented as an affine map and map operands. This commit also adds support for computing bounds for an affine map + operands. There was previously no public API for that.
…m#88751) Fixes llvm#88478 Promoting the `EHCleanup` to `NormalAndEHCleanup` in `EmitCallArgs` surfaced another bug with deactivation of normal cleanups. Here we missed emitting CPP scope ends for deactivated normal cleanups. This patch also fixes that bug. We missed emitting CPP scope ends because we remove the `fallthrough` (clears the insertion point) before deactivating normal cleanups. This is to make the emitted "normal" cleanup code unreachable. But we still need to emit CPP scope ends in the original basic block even for a deactivated normal cleanup. (This worked correctly before we did not remove `fallthrough` for `EHCleanup`s).
We were calling classfiyPrim() instead of classify().
This gives us one extra SGPR to play with. The comment suggested that it could cause bugs, but I have tested it with Vulkan CTS with the default wave size for compute shaders set to 32 and did not find any problems.
…Y) & NegPow2C` (llvm#88859) Alive2: https://alive2.llvm.org/ce/z/NFtkSX This optimization will be beneficial to jemalloc users.
llvm-project/mlir/test/lib/Dialect/Test/TestDialect.cpp:597:31: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (getVarOperands().size() != expectedNumOperands) ~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~ 1 error generated.
This patch updates the definition of `omp.distribute` to enforce the restrictions of a wrapper operation.
Re-land llvm#88395 Two build-bots were broken by the old version: - https://lab.llvm.org/buildbot/#/builders/285/builds/245 - https://lab.llvm.org/buildbot/#/builders/21/builds/96988 The problem in both cases was that the compiler did not support `std::filesystem` (which I use in the unit test). I have removed the dependency upon std::filesystem because there isn't an easy way to add the right linker options so that this is supported correctly in all build environments [1] [1] https://gitlab.kitware.com/cmake/cmake/-/issues/17834 --- This is a GNU extension: https://gcc.gnu.org/onlinedocs/gfortran/ACCESS.html Used in SALMON: https://salmon-tddft.jp/download.html Unfortunately the intrinsic takes a file path to operate on so there isn't an easy way to make the test robust. The unit test expects to be able to create, set read write and execute permissions, and delete files called std::filesystem::temp_directory_path() / <test_name>.<pid> The test will fail if a file already exists with that name. I have not implemented the intrinsic on Windows because this is wrapping a POSIX system call and Windows doesn't support all of the permission bits tested by the intrinsic. I don't have a Windows machine easily available to check if Gfortran implements this intrinsic on Windows.
The old code was replaced by llvm#80019.
mgehre-amd
approved these changes
Aug 20, 2024
mgehre-amd
changed the title
[AutoBump] Merge with 76782e28 (31)
[AutoBump] Merge with 76782e28 (needs torch-mlir bump) (31)
Aug 20, 2024
mgehre-amd
changed the title
[AutoBump] Merge with 76782e28 (needs torch-mlir bump) (31)
[AutoBump] Merge with 76782e28 (needs torch-mlir bump, has downstream PR) (31)
Aug 20, 2024
An error occurred while trying to automatically change base from
bump_to_be006372
to
feature/fused-ops
August 21, 2024 20:25
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This goes up to April 16.
Needs Xilinx/torch-mlir#240