[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

mgehre-amd · 2024-09-20T15:34:53Z

No description provided.

) The original implementation provided a simple method to check whether the forest of nested cycles is well-formed. This is now augmented with other methods to check well-formedness of all cycles, either invdividually, or as the entire forest. These will be used by future transforms that modify CycleInfo.

Use the nuw attribute of GEPs to prove that pointers do not alias, in cases matching the following: + + + | BaseOffset | +<nuw> Indices | ---------------->|-------------------->| |-->V2Size | |-------> V1Size LHS RHS If the difference between pointers is Offset +<nuw> Indices then we know that the addition does not wrap the pointer index type (add nuw) and the constant Offset is a lower bound on the distance between the pointers. We can then prove NoAlias via Offset u>= V2Size.

…102613) After decomposition of OpenMP compound constructs and assignment of applicable clauses to each leaf construct, composite constructs are then combined again into a single element in the construct queue. This helped later lowering stages easily identify composite constructs. However, as a result of the re-composition stage, the same list of clauses is used to produce all MLIR operations corresponding to each leaf of the original composite construct. This undoes existing logic introducing implicit clauses and deciding to which leaf construct(s) each clause applies. This patch removes construct re-composition logic and updates Flang lowering to be able to identify composite constructs from a list of leaf constructs. As a result, the right set of clauses is produced for each operation representing a leaf of a composite construct. PR stack: - llvm#102612 - llvm#102613

…vm#104595) This new interface is supposed to capture the core functionality of DLTI: querying for values at keys. As such this new interface unifies the ability to query DLTI attributes in a single method: query(). All existing DLTI interfaces exposing their own query methods now 1) now extend this new interface and 2) provide a default implementation for `query()`. As DLTIQueryInterface::query() returns an attribute, it naturally enables recursive queries on nested DLTI attrs. A utility function, `dlti::query()`, implements the logic for nested lookups. A new `#dlti.map` attribute is introduced to capture the most generic form of a finite DLTI-mapping. One of the benefits is that it allows for more easily encoding hierachical information that is suitably queryable, i.e. by means of nested attributes. In line with the above, `transform.dlti.query` is modified so as to take an arbitrary number of keys and to perform a nested lookup using the above utility function.

…atalyst (llvm#104872) Mac Catalyst is the iOS platform, but it builds against the macOS SDK and so it needs to be checking the macOS SDK version instead of the iOS one. Add tests against a greater-than SDK version just to make sure this works beyond the initially supporting SDKs.

…lvm#102300)" This reverts commit b432afc. Reverted due to linker failures in expensive-checks.

…#104805) This extends SimplifyCFG hoisting to also hoist instructions with commuted operands, for example a+b on one side and b+a on the other side. This should address the issue mentioned in: llvm#91185 (comment)

Avoids implicit sint_to_fp which wasn't occurring on strict fp codegen Fixes llvm#104848

…cy zero (llvm#102915) A long time ago (back in 2009) there was a commit 52d4d82 that changed the scheduler to not dirty height/depth when adding or removing SUnit predecessors when the latency on the edge was zero. That commit message is claiming that the depth or height isn't affected when the latency is zero. As a matter of fact, the depth/height can change even with a zero latency on the edge. If for example adding a new SUnit A, with zero latency, but as a predecessor to a SUnit B, then both height of A and depth of B should be marked as dirty. If for example B has a greater height than A, then the height of A needs to be adjusted even if the latency is zero. I think this has been wrong for many years. Downstream we have had commit 52d4d82 reverted since back in 2016. There is no motivating lit test for 52d4d82 (only an incomplete C level reproducer in llvm#3613). After commit 13d04fa there finally appeared an upstream lit test that shows that we get better code if marking height/depth as dirty (llvm/test/CodeGen/AArch64/abds.ll).

…lvm#104781) E.g.: https://godbolt.org/z/G8zK5svjK Based on Evgenii's work.

This change does two kinds of splits: - Splits each target into a different file. Some targets are left in the same files, such as riscv32/64 and x86/_64 as these tests and lists are very similar. - Splits up the very long 'note:' lines which contain a list of CPUs, using `CHECK-SAME`. There was a note about this not being possible before, but with `{{^}}`, this is now possible -- I have verified that this does the right thing if a single CPU anywhere in the list is left out. These tests had become quite annoying to change when adding a CPU, and I believe this change makes these easier to maintain, and should cut down on conflicts in these files (or at least makes conflicts easier to resolve). I apologise in advance for downstream conflicts, but hopefully that's a small amount of short term pain, in return for fewer conflicts in future.

Small PR to add additional getters for LLVMContextRef in the C API.

…lvm#104775) Another upstreaming of C API extensions we have in Julia/LLVM.jl. Although [we went](maleadt/LLVM.jl#431) with a string-based API there, here I'm proposing something that's similar to existing metadata/attribute APIs: - explicit functions to map syncscope names to IDs, and back - `LLVM*SyncScope` versions of builder APIs that already take a `SingleThread` argument: atomic rmw, atomic xchg, fence - `LLVMGetAtomicSyncScopeID` and `LLVMSetAtomicSyncScopeID` for other atomic instructions - testing through `llvm-c-test`'s `--echo` functionality

Add a hint to use the no-verify-fixpoint option.

These are annoying to update, and are redundant since the tests in clang/test/Driver/print-enabled-extensions/ were added.

@davemgreen

) Patterns were previously added to allow the following reductions - fminimum(abs(a), abs(b)) -> famin(a, b) - fmaximum(abs(a), abs(b)) -> famax(a, b) - llvm#103027 It was suggested by @davemgreen that the following reductions are also possible - fminnum[nnan](abs(a), abs(b)) -> famin(a, b) - fmaxnum[nnan](abs(a), abs(b)) -> famax(a, b) ('nnan' documenatation: https://llvm.org/docs/LangRef.html#fast-math-flags) The 'no NaNs' flag allows optimisations to assume that neither argument is a NaN, and so the differing NaN propagation semantics of llvm.maxnum/llvm.minnum and FAMAX/FAMIN can be ignored in this reduction. (llvm.maxnum/llvm.minnum: https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic) - Changes to LLVM - lib/target/AArch64/AArch64InstrInfo.td - add 'fminnm_nnan' and 'fmaxnm_nnan'; patfrags on fminnm/fmaxnm that are predicated on the instrinsic call having the 'nnan' flag. - add AArch64famin and AArch64famax patfrags, containing the new and existing reductions. - test/CodeGen/AArch64/aarch64-neon-faminmax.ll - add positive and negative tests for the new reduction, based on the presence of 'nnan' in the IR intrinsic call.

This patch moves utilities from `offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h` to `llvm/Frontend/Offloading/Utility.h` to be reused by other projects. Concretely the following changes were made: - Rename `KernelMetaDataTy` to `AMDGPUKernelMetaData`. - Remove unused fields `KernelObject`, `KernelSegmentSize`, `ExplicitArgumentCount` and `ImplicitArgumentCount` from `AMDGPUKernelMetaData`. - Return the produced error if `ELFObj.sections()` failed instead of using `cantFail`. - Added `AGPRCount` field to `AMDGPUKernelMetaData`. - Added a default invalid value to all the fields in `AMDGPUKernelMetaData`.

…vm#104692) Inline asm operands could contain any kind of relocation, so remove the checks. Fixes llvm#103493

…e_map (llvm#104918) This test is already disabled for Windows because of symlinks. Disable it for cross build on Windows host too.

This change looks for instructions of storing symmetric constants instruction 32-bit units. usually consisting of several 'MOV' and one or less 'ORR'. If found, load only the lower 32-bit constant and change it to copy and save to the upper 32-bit using the 'STP' instruction. For example: renamable $x8 = MOVZXi 49370, 0 renamable $x8 = MOVKXi $x8, 320, 16 renamable $x8 = ORRXrs $x8, $x8, 32 STRXui killed renamable $x8, killed renamable $x0, 0 becomes $w8 = MOVZWi 49370, 0 $w8 = MOVKWi $w8, 320, 16 STPWi killed renamable $w8, killed renamable $w8, killed renamable $x0, 0 related issue : llvm#51483

…lvm#102300)" This reverts commit 4aacc60. The original implementation provided a simple method to check whether the forest of nested cycles is well-formed. This is now augmented with other methods to check well-formedness of every cycle, either individually, or as the entire forest. These will be used by future transforms that modify CycleInfo.

This extends the existing sxtw peephole optimization (llvm#96293) to uxtw, which in llvm is a ORRWrr which clears the top bits. Fixes llvm#98481

…odel=aggressive (llvm#100453) This change modifies -ffp-model=fast to select options that more closely match -funsafe-math-optimizations, and introduces a new model, -ffp-model=aggressive which matches the existing behavior (except for a minor change in the fp-contract behavior). The primary motivation for this change is to make -ffp-model=fast more user friendly, particularly in light of LLVM's aggressive optimizations when -fno-honor-nans and -fno-honor-infinites are used. This was previously proposed here: https://discourse.llvm.org/t/making-ffp-model-fast-more-user-friendly/78402

Baed off worst case llvm-mca numbers for CTLZ/CTTZ(+ZERO_UNDEF) codegen Prep work for llvm#102885

Reland [CGData] llvm-cgdata llvm#89884 using `Opt` instead of `cl` - Action options are required, `--convert`, `--show`, `--merge`. This was similar to sub-commands previously implemented, but having a prefix `--`. - `--format` option is added, which specifies `text` or `binary`. --------- Co-authored-by: Kyungwoo Lee <kyulee@fb.com>

This breaks using -passes=atomic-expand (but only sometimes?). Somehow an AtomicExpand pass ends up running without a TargetMachine, despite always being constructed with one.

A new section of a test is failing on aarch64 and ppc64le; disable it while I sort things out.

CFI programs may have more saves than restores and this is completely benign from BOLT's perspective. Reduce the verbosity and print the warning only under `-v=1` and above.

…ich can trap (llvm#105214) This allows the use a single wider operation with a restricted EVL instead of having to split and cover via decreasing powers-of-two sizes. On RISCV, this avoids the need for a bunch of vslidedown and vslideup instructions to extract subvectors, and VL toggles to switch between the various widths. Note there is a potential downside of using vp nodes; we loose any generic DAG combines which might have applied to the split form.

Previously the libc startup code was marked `EXCLUDE_FROM_ALL` due to build issues. This patch removes that as no longer necessary.

Nowadays, an ASM_MASM tool is required for building the BLAKE3 assembly in llvm/lib/Support - the llvm-ml tool can do this.

In llvm#100024 we moved from safe_load to load for reading the yaml in newheadergen due to dependency issues. Those should be resolved by now so this should be a simple safety improvement.

Rework `IntrinsicEmitter::EmitIntrinsicToBuiltinMap` for improved peformance as well as refactor the code. Performance: - Current generated code does a linear search on the TargetPrefix, followed by a binary search on the builtin names for that target's builtins. - Improve the performance of this code in 2 ways: (a) Use binary search on the target prefix to lookup the builtin table for the target. (b) Improve the (common) case of when all builtins for a target share a common prefix. Check this common prefix first, and then do the binary search in the builtin table using the builtin name with the common prefix removed. This should help both data size (by creating a smaller static string table) and runtime (by reducing the cost of binary search on smaller strings). Refactor: - Use range based for loops for iterating over maps. - Use formatv() and C++ raw string literals to simplify the emission code. - Change the generated `getIntrinsicForClangBuiltin` and `getIntrinsicForMSBuiltin` to take a `StringRef` instead of `const char *` for the prefix.

Don't try to fold x87 extended precision operations in a test unless it's targeting x86-64.

The mismatch between the comment on this test and the test itself was pointed out in llvm#100699 (comment), but apparently I failed to actually commit the fix.

…a `cold` function Closes llvm#101298

This recently added test is failing on Windows with: ``` c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe --no-lldbinit -S C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\Expr\Output\TestAnonNamespaceParamFunc.cpp.tmp -o run -o "expression func(a)" -o exit | c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\Expr\TestAnonNamespaceParamFunc.cpp executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe' --no-lldbinit -S 'C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\Expr\Output\TestAnonNamespaceParamFunc.cpp.tmp' -o run -o 'expression func(a)' -o exit .---command stderr------------ | TestAnonNamespaceParamFunc.cpp.tmp :: Class 'tagARRAYDESC' has a member 'tdescElem' of type 'tagTYPEDESC' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'tagARRAYDESC' has a member 'tdescElem' of type 'tagTYPEDESC' which does not have a complete definition. | (lldb) TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::partial_ordering' has a member 'less' of type 'std::partial_ordering' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::partial_ordering' has a member 'less' of type 'std::partial_ordering' which does not have a complete definition. | (lldb) TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::strong_ordering' has a member 'less' of type 'std::strong_ordering' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::strong_ordering' has a member 'less' of type 'std::strong_ordering' which does not have a complete definition. | (lldb) TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::weak_ordering' has a member 'less' of type 'std::weak_ordering' which does not have a complete definition.error: TestAnonNamespaceParamFunc.cpp.tmp :: Class 'std::weak_ordering' has a member 'less' of type 'std::weak_ordering' which does not have a complete definition. | (lldb) error: Couldn't look up symbols: | int func(struct `anonymous namespace'::InAnon) | Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler. `----------------------------- executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\Expr\TestAnonNamespaceParamFunc.cpp' .---command stderr------------ | C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\Expr\TestAnonNamespaceParamFunc.cpp:10:11: error: CHECK: expected string not found in input | // CHECK: (int) $0 = 15 | ^ | <stdin>:16:26: note: scanning from here | (lldb) expression func(a) | ^ ``` So the function is still not callable. But AFAICT, this is not a regression, since this function wasn't callable prior to the patch anyway. I currently do not have a Windows setup to test this on, so XFAIL for now.

Remove widenToNextPow2 from StoreActions. Reorder clampScalar and lowerIfMemSizeNotByteSizePow2 for StoreActions. These match AArch64 and got me further on a test case I was playing with that contained a i129 store.

…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```

…entwiseOpFusion (llvm#104409) This commit changes the getPreservedProducerResults function so that it takes the consumer into account along with the producer, in order to predict which of the producer’s outputs can be dropped during the fusion process. It provides a more accurate prediction, considering that the fusion process also depends on the consumer.

This wires up dxil-op-lower, dxil-intrinsic-expansion, dxil-translate-metadata, and dxil-pretty-printer to the new pass manager, both as a matter of future proofing the backend and so that they can be used more flexibly in tests. A few arbitrary tests are updated in order to test the new PM path, and we drop the "print-dxil-resource-md" pass since it's redundant with the pretty printer. Pull Request: llvm#104250

It didn't crash so I thought this worked now, but upon further review it miscalculates the stack address for the return.

Specifically, to illustrate our general lowering strategy for non-power of two vectors.

…vm#104826) With -fsanitize-cfi-icall-experimental-normalize-integers, Clang appends ".normalized" to KCFI types in CodeGenModule::CreateKCFITypeId, which changes type hashes also for functions that don't have integer types in their signatures. However, llvm::setKCFIType does not take integer normalization into account, which means LLVM generated functions with KCFI types, e.g. sanitizer constructors, will fail KCFI checks when integer normalization is enabled in Clang. Add a cfi-normalize-integers module flag to indicate integer normalization is used, and append ".normalized" to KCFI types also in llvm::setKCFIType to fix the type mismatch.

For the tests I just added +sve instead of what actual hardware has, which is only SME, since otherwise all the test functions need to be marked as streaming mode. rdar://121864771

…Lowering::lowerReturn. NFC This is similar to X86 and AArch64 structure.

/llvm-project/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp:124:7: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result] opOperandsToIgnore.pop_back_val(); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated.

Since 2.2, `fmin.s/fmax.s` instructions follow the IEEE754-2019, if F extension is avaiable; and `fmin.d/fmax.d` also follow the IEEE754-2019 if D extension is avaiable. So, let's mark them as Legal.

This reverts commit 6476a1d. d3fb41d relanded in 9bb5556. ...amended to incorporate changes from the reland.

ssahasra and others added 30 commits August 20, 2024 15:23

[LLVM-Reduce] - Distinct Metadata Reduction (llvm#104624)

42067f2

Revert "[CycleAnalysis] Methods to verify cycles and their nesting. (l…

4aacc60

…lvm#102300)" This reverts commit b432afc. Reverted due to linker failures in expensive-checks.

[clang][bytecode] Fix discarding CompoundLiteralExprs (llvm#104909)

c99347a

[X86] Add clang codegen test coverage for llvm#104848

3b49d27

[X86] Use correct fp immediate types in _mm_set_ss/sd

6dcce42

Avoids implicit sint_to_fp which wasn't occurring on strict fp codegen Fixes llvm#104848

[gn build] Port 42067f2

21de049

[X86][AVX10] Fix unexpected error and warning when using intrinsic (l…

3f25f23

…lvm#104781) E.g.: https://godbolt.org/z/G8zK5svjK Based on Evgenii's work.

[llvm-c] Add getters for LLVMContextRef for various types (llvm#99087)

7cfc9a3

Small PR to add additional getters for LLVMContextRef in the C API.

[InstCombine] Adjust fixpoint error message (NFC)

2511cdb

Add a hint to use the no-verify-fixpoint option.

[AArch64] Remove TargetParser CPU/Arch feature tests (llvm#104587)

34e15ad

These are annoying to update, and are redundant since the tests in clang/test/Driver/print-enabled-extensions/ were added.

[SPARC] Remove assertions in printOperand for inline asm operands (ll…

576b7a7

…vm#104692) Inline asm operands could contain any kind of relocation, so remove the checks. Fixes llvm#103493

[lldb][Windows] Fixed the API test breakpoint_with_realpath_and_sourc…

fc04490

…e_map (llvm#104918) This test is already disabled for Windows because of symlinks. Disable it for cross build on Windows host too.

[AArch64] Extend sxtw peephole to uxtw. (llvm#104516)

fe946bf

This extends the existing sxtw peephole optimization (llvm#96293) to uxtw, which in llvm is a ORRWrr which clears the top bits. Fixes llvm#98481

[InstCombine] Thwart complexity-based canonicalization in test (NFC)

fd83b86

[CostModel][X86] Add missing costkinds for scalar CTLZ/CTTZ instructions

254da5a

Baed off worst case llvm-mca numbers for CTLZ/CTTZ(+ZERO_UNDEF) codegen Prep work for llvm#102885

arsenm and others added 29 commits August 21, 2024 00:19

AMDGPU: Temporarily stop adding AtomicExpand to new PM passes

dd90c72

This breaks using -passes=atomic-expand (but only sometimes?). Somehow an AtomicExpand pass ends up running without a TargetMachine, despite always being constructed with one.

[flang] Disable part of failing test (temporary) (llvm#105350)

0c48986

A new section of a test is failing on aarch64 and ppc64le; disable it while I sort things out.

[BOLT] Reduce CFI warning verbosity (llvm#105336)

8f30506

CFI programs may have more saves than restores and this is completely benign from BOLT's perspective. Reduce the verbosity and print the warning only under `-v=1` and above.

[libc] Include startup code when installing all (llvm#105203)

2353f48

Previously the libc startup code was marked `EXCLUDE_FROM_ALL` due to build issues. This patch removes that as no longer necessary.

[cmake] Set up llvm-ml as ASM_MASM tool in WinMsvc.cmake (llvm#104903)

aeeb74f

Nowadays, an ASM_MASM tool is required for building the BLAKE3 assembly in llvm/lib/Support - the llvm-ml tool can do this.

[libc] move newheadergen back to safe_load (llvm#105374)

a3c66c8

In llvm#100024 we moved from safe_load to load for reading the yaml in newheadergen due to dependency issues. Those should be resolved by now so this should be a simple safety improvement.

[RISCV][GISel] Remove s32 support for G_ABS on RV64.

5e6a198

[bazel] Add missing dependencies for c8a678b

019e1a3

[flang] Fix test on ppc64le & aarch64 (llvm#105439)

c9a4c51

Don't try to fold x87 extended precision operations in a test unless it's targeting x86-64.

[DXIL][Analysis] Update test to match comment. NFC (llvm#105409)

1a2a18f

The mismatch between the comment on this test and the test itself was pointed out in llvm#100699 (comment), but apparently I failed to actually commit the fix.

[FunctionAttrs] Add tests for deducing attr cold on functions; NFC

26b79f8

[FunctionAttrs] deduce attr cold on functions if all CG paths call …

b7eac8c

…a `cold` function Closes llvm#101298

[RISCV][GISel] Split LoadStoreActions in LoadActions and StoreActions.

1e9d002

Remove widenToNextPow2 from StoreActions. Reorder clampScalar and lowerIfMemSizeNotByteSizePow2 for StoreActions. These match AArch64 and got me further on a test case I was playing with that contained a i129 store.

[RISCV][GISel] Allow >2*XLen integers in isSupportedReturnType.

a16f0dc

Revert "[RISCV][GISel] Allow >2*XLen integers in isSupportedReturnType."

a8ef679

It didn't crash so I thought this worked now, but upon further review it miscalculates the stack address for the return.

[RISCV] Add coverage for int reductions of <3 x i8> vectors

3145cff

Specifically, to illustrate our general lowering strategy for non-power of two vectors.

[AArch64] Basic SVE PCS support for handling scalable vectors on Darwin.

39ec1f7

For the tests I just added +sve instead of what actual hardware has, which is only SME, since otherwise all the test functions need to be marked as streaming mode. rdar://121864771

[RISCV][GISel] Merge RISCVCallLowering::lowerReturnVal into RISCVCall…

381a803

…Lowering::lowerReturn. NFC This is similar to X86 and AArch64 structure.

RISC-V: Add fminimumnum and fmaximumnum support (llvm#104411)

2b84fe6

Since 2.2, `fmin.s/fmax.s` instructions follow the IEEE754-2019, if F extension is avaiable; and `fmin.d/fmax.d` also follow the IEEE754-2019 if D extension is avaiable. So, let's mark them as Legal.

Reland "[gn build] Port d3fb41d (llvm-cgdata)"

5ec73b7

This reverts commit 6476a1d. d3fb41d relanded in 9bb5556. ...amended to incorporate changes from the reland.

[AutoBump] Merge with 5ec73b7 (Aug 21)

64e887e

cferry-AMD approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

mgehre-amd commented Sep 20, 2024

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

Are you sure you want to change the base?

[AutoBump] Merge with 5ec73b7d (Aug 21) (8) #361

Conversation

mgehre-amd commented Sep 20, 2024