[AutoBump] Merge with 02654f73 (Aug 30) (18) #371

mgehre-amd · 2024-09-25T13:09:31Z

No description provided.

…m#105617) Follow-up on 8ac140f. The test `SemaTemplate/default-parm-init.cpp` was introduced since the fix llvm#80288 and mainly did the following things: - Ensure the default arguments are properly substituted inside either the primary template & their explicit / out-of-line specializations. - Ensure the strategy doesn't mess up the substitution of a lambda expression as a default argument. The 1st is for the bug of llvm#68490, yet it does some redundant work: each of the member functions is duplicated twice for the `sizeof` and `alignof` operators, respectively, and the principle under the hood are essentially the same. So this patch removes the duplication and reduces the 8 functions to 4 functions that reveal the same thing. The 2nd is presumably testing that the fix in llvm#80288 doesn't impact a complicated substitution. However, that seems unnecessary & unrelated to the original issue. And more importantly, we don't have any problems with that ever. Hence, I'll remove that test from this patch. The test for default arguments is merged into `SemaTemplate/default-arguments.cpp` with a new namespace, and hopefully this could reduce the entropy of our testing cases.

… BSTRINS_D Reviewed By: SixWeining Pull Request: llvm#106331

…S_D instruction Reviewed By: heiher, SixWeining Pull Request: llvm#106332

precommit f16 test for llvm#87506 fp-int conversion

The PR llvm#105996 broke taking the address of a vector: **compound-literal.c** ```C typedef int v4i32 __attribute((vector_size(16))); v4i32 *y = &(v4i32){1,2,3,4}; ``` That because the current interpreter handle vector unary operator as a fallback when the generic code path fail. but the new interpreter was not. we need to handle `UO_AddrOf` in `Compiler<Emitter>::VisitVectorUnaryOperator`. Signed-off-by: yronglin <yronglin777@gmail.com>

New value added in e00e9a3

CompilerInstance can re-use same SourceManager across multiple frontendactions. During this process it calls `SourceManager::clearIDTables` to reset any caches based on FileIDs. It didn't reset IncludeLocMap, resulting in wrong include locations for workflows that triggered multiple frontend-actions through same CompilerInstance.

…nh/tanh intrinsics to support llvm#106584

This patch adds check for mutiples of `tosa.tile`. The `multiples` in `tosa.tile` indicates how many times the tensor should be replicated along each dimension. Zero and negative values are invalid, except for -1, which represents a dynamic value. Therefore, each element of `mutiples` should be positive integer or -1. Fix llvm#106167.

A optimizable cast can also be removed by VPlan simplifications. Remove the restriction from planContainsAdditionalSimplifications, as this causes it to miss relevant simplifications, triggering false positives for the cost decision verification. Also adds debug output for printing additional cost-precomputations. Fixes llvm#106641.

…ectorsCombine. (llvm#104774) UZP2 requires both operands to match the result type but the combine tries to replace a truncate by passing the pre-truncated operands directly to an UZP2 with the truncated result type. This patch nop-casts the operands to keep the DAG consistent. There should be no changes to the generated code, which is fine as it. This patch also enables more target specific getNode() validation for fixed length vector types.

…ecks (llvm#104478) The CMake docs state that `check_c_source_compiles()` checks whether the supplied code "can be compiled as a C source file and linked as an executable (so it must contain at least a `main()` function)." https://cmake.org/cmake/help/v3.30/module/CheckCSourceCompiles.html In practice, this command is a wrapper around `try_compile()`: - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/CheckCSourceCompiles.cmake#L54 - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/Internal/CheckSourceCompiles.cmake#L101 When `CMAKE_SOURCE_DIR` is compiler-rt/lib/builtins/, `CMAKE_TRY_COMPILE_TARGET_TYPE` is set to `STATIC_LIBRARY`, so the checks for `float16` and `bfloat16` support work as intended in a Clang + compiler-rt runtime build for example, as it runs CMake recursively from that directory. However, when using llvm/ or compiler-rt/ as CMake source directory, as `CMAKE_TRY_COMPILE_TARGET_TYPE` defaults to `EXECUTABLE`, these checks will indeed fail if the code doesn't have a `main()` function. This results in LLVM using x86 SIMD registers when generating calls to builtins that, with Arch Linux's compiler-rt package for example, actually use a GPR for their argument or return value as they use `uint16_t` instead of `_Float16`. This had been caught in post-commit review: https://reviews.llvm.org/D145237#4521152. Use of the internal `CMAKE_C_COMPILER_WORKS` variable is not what hides the issue, however. PR llvm#69842 tried to fix this by unconditionally setting `CMAKE_TRY_COMPILE_TARGET_TYPE` to `STATIC_LIBRARY`, but it apparently caused other issues, so it was reverted. This PR just adds a `main()` function in the checks, as per the CMake docs.

…lvm#106707) * Revert "Fix MSVC "not all control paths return a value" warning. NFC." Dep to revert c9b6e01 * Revert "[AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763)" Breaks tests.

Make dsymutil return a non-zero exit code when crashing during linking.

Need to use FinalShuffle function for all vectorized results to correctly produce vectorized value. Fixes llvm#106655

By choosing an initial value whose mask is only used by the blend we can remove the need for the mask entirely.

Recursion here causes stack overflow on large inputs. Fixing by unrolling via a stack.

…intrinsics to support llvm#106584

Code lowering always generates fir.if else blocks for source level if statements, whether needed or not. Change this to only generate else blocks that are needed.

Trivially extend dd0cf23 ([LICM] Reassociate & hoist sub expressions) to handle unsigned predicates as well. Alive2 proofs: https://alive2.llvm.org/ce/z/GdDBtT.

Ever since 6859685 (or, precisely, 84428da) relative jumps emitted by the AVR codegen are off by two bytes - this pull request fixes it. ## Abstract As compared to absolute jumps, relative jumps - such as rjmp, rcall or brsh - have an implied `pc+2` behavior; that is, `jmp 100` is `pc = 100`, but `rjmp 100` gets understood as `pc = pc + 100 + 2`. This is not reflected in the AVR codegen: https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L89 ... which always emits relative jumps that are two bytes too far - or rather it _would_ emit such jumps if not for this check: https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L517 ... which causes most of the relative jumps to be actually resolved late, by the linker, which applies the offsetting logic on its own, hiding the issue within LLVM. [Some time ago](llvm@697a162) we've had a similar "jumps are off" problem that got solved by touching `shouldForceRelocation()`, but I think that has worked only by accident. It's exploited the fact that absolute vs relative jumps in the parsed assembly can be distinguished through a "side channel" check relying on the existence of labels (i.e. absolute jumps happen to named labels, but relative jumps are anonymous, so to say). This was an alright idea back then, but it got broken by 6859685. I propose a different approach: - when emitting relative jumps, offset them by `-2` (well, `-1`, strictly speaking, because those instructions rely on right-shifted offset), - when parsing relative jumps, treat `.` as `+2` and read `rjmp .+1234` as `rjmp (1234 + 2)`. This approach seems to be sound and now we generate the same assembly as avr-gcc, which can be confirmed with: ```cpp // avr-gcc test.c -O3 && avr-objdump -d a.out int main() { asm( " foo:\n\t" " rjmp .+2\n\t" " rjmp .-2\n\t" " rjmp foo\n\t" " rjmp .+8\n\t" " rjmp end\n\t" " rjmp .+0\n\t" " end:\n\t" " rjmp .-4\n\t" " rjmp .-6\n\t" " x:\n\t" " rjmp x\n\t" " .short 0xc00f\n\t" ); } ``` avr-gcc is also how I got the opcodes for all new tests like `inst-brbc.s`, so we should be good.

…san_disable (llvm#106727) This better matches lsan_enable and disable, which we are trying to emulate.

Fixes issue found here llvm#106691 (comment) The issue wasn't in the code change itself, just the unittest; the trailing marker wasn't properly cleaned up.

llvm#100692 changes clang template deduction, and an error was now emitted when building flang with top of the tree clang when mapping std::pow in intrinsics-library.cpp for constant folding `error: address of overloaded function 'pow' is ambiguous` See https://lab.llvm.org/buildbot/#/builders/4/builds/1670 I I am not expert enough to understand if the new error is justified or not here, but it is easy to help the compiler here with explicit wrappers to fix the builds.

… of for loop. NFC.

…lob expansion in lit's internal shell" (llvm#106763) Reverts llvm#106325 Broke some Buildbots.

…rnings" (llvm#106765) Reverts llvm#106609

If the operand node has the same scalars as one of the vectorized nodes, the compiler could miss this and incorrectly request minbitwidth data for the wrong node. It may lead to a compiler crash, because the vectorized node might have different minbw result. Fixes llvm#106667

Noticed in clang-formatting of llvm#106750

The worst possible case for a double literal goes like: ``` mov ... movk ..., lsl #16 movk ..., lsl #32 movk ..., lsl #48 fmov ... ``` The limit of 5 in the code gives the impression that `Insn` includes all instructions including the `fmov`, but that's not true. It only counts the integer moves. This led me astray on some other work in this area.

…106753) Similar to what we do in foldVMV_V_V with the passthru, if we end up changing the Src's VL in tryReduceVL we need to make sure it dominates. Fixes llvm#106735

To support detecting MD5 checksum mismatches, deal with SupportFiles rather than a plain FileSpecs in the SourceManager.

This patch implements sandboxir::ConstantFP mirroring llvm::ConstantFP.

…lvm#106712)

There is no need to support Python 2.7 anymore, Python 3.3+ has `subprocess.DEVNULL`. This is good practice and also prevents file handles from staying open unnecessarily. Also remove a couple unused or unneeded `__future__` imports.

After landing support for actual vectorization of the "clustered" loads, need better estimate the cost between the masked gather and clustered loads. This includes estimation of the address calculation and better estimation of the gathered loads. Also, this estimation now relies on SLPCostThreshold option, allowing modify the behavior of the compiler. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#105858

Argument is another possible starting point for the pointer traversal, and PtrUseVisitor should be able to handle it.

…mber of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#106449

Consolidating code so that we have one copy instead of multiple reasoning about identity element. Note that we're (deliberately) not passing the FMF flags to common utility to preserve behavior in this change.

…optional" (llvm#106778) Reverts llvm#104668 This commit triggers an edge case that can cause circular `unrealized_conversion_cast` ops. llvm#106760 may fix it, but it is has other issues. Reverting this PR for now, until I find a solution for that problem.

Primary goal is having one way of doing this, to ensure that we don't end up with accidental divergence.

…nfc]

https://github.com/WebAssembly/half-precision/blob/f267a3d54432e5723dcc13ad4530c3581a0cc4b3/proposals/half-precision/Overview.md#binary-format

This patch updates the source cache dump command to print both the actual (on-disk) checksum and the expected (line table) checksum. To achieve that we now read and store the on-disk checksum in the cached object. The same information will be used in a future path to print a warning when the checksums differ.

Follow up fix to llvm#106332 `LoongArchMatInt.cpp:96:33: runtime error: shift exponent 64 is too large for 64-bit type` https://lab.llvm.org/buildbot/#/builders/169/builds/2681

…#106792) Reverts llvm#103371 There is `heap-use-after-free`, commented on 206b5af Maybe `if (Next == E || BB != Next->getParent()) {` is enough, but not sure, what was the intent there,

…m#98214) We split up all the headers into top-level modules when we broke up cycles with the C compatibility headers. However, this resulted in a large number of small modules, which is awkward and clearly against the philosophy of Clang modules. This was necessary to make things work. This patch regroups a few headers from two leaf modules: stop_token and pstl. It should be pretty uncontroversial that grouping these headers into a single module doesn't introduce any cyclic dependency, yet it's a first step towards reducing the number of top-level modules we have in our modulemap.

…lvm#106494) This patch contains two pars: - first to revert the patch llvm#101428. - second to remove `atomic_fetch_and_*()` to `atomic_<op>()` conversion (when return value is not used), but preserve `__sync_fetch_and_add()` to locked insn with cpu v1/v2.

This updates the expected diffferences document to capture the difference in multi-argument overload resolution between Clang and DXC. Fixes llvm#99530

dklimkin and others added 30 commits August 30, 2024 10:21

Fix bazel build past 89e6a28 (llvm#106685)

d6dc7cf

[LoongArch] Pre-commit test for immediate value materialization using…

5b77e25

… BSTRINS_D Reviewed By: SixWeining Pull Request: llvm#106331

[LoongArch] Optimize for immediate value materialization using BSTRIN…

eaf87d3

…S_D instruction Reviewed By: heiher, SixWeining Pull Request: llvm#106332

[RISCV][NFC] Splits f16 cast tests into a separate file (llvm#106692)

8f4aafb

precommit f16 test for llvm#87506 fp-int conversion

Add no-op handing for HLSLAttributedResource switch cases (llvm#106698)

1b32c3e

New value added in e00e9a3

[AArch64] Add accelerate test coverage for acos/asin/atan and cosh/si…

c4b5cb0

…nh/tanh intrinsics to support llvm#106584

[mlir][ArmSME] Fix test after llvm#98043 (NFC)

833ce5d

[Inline][X86] Regenerate inline-target-cpu-* tests

b065ec0

[InstCombine][X86] Split off vperm shuffle tests from other avx512 tests

fda7649

Revert: [AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763) (l…

6345604

…lvm#106707) * Revert "Fix MSVC "not all control paths return a value" warning. NFC." Dep to revert c9b6e01 * Revert "[AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763)" Breaks tests.

[dsymutil] return EXIT_FAILURE when Crashed (llvm#106619)

2d5613a

Make dsymutil return a non-zero exit code when crashing during linking.

[SLP]Fix PR106655: Use FinalShuffle for alternate cast nodes.

87a988e

Need to use FinalShuffle function for all vectorized results to correctly produce vectorized value. Fixes llvm#106655

[LLVM][VPlan] Pick more optimal initial value for VPBlend. (llvm#104019)

ce5620b

By choosing an initial value whose mask is only used by the blend we can remove the need for the mask entirely.

Fix stack overflow in allPathsGoThroughCold past 6b11573 (llvm#106384)

64f1995

Recursion here causes stack overflow on large inputs. Fixing by unrolling via a stack.

[RISCV] Add full test coverage for acos/asin/atan and cosh/sinh/tanh …

ceb613a

…intrinsics to support llvm#106584

[flang] Don't generate empty else blocks (llvm#106618)

8586d03

Code lowering always generates fir.if else blocks for source level if statements, whether needed or not. Change this to only generate else blocks that are needed.

LICM: extend hoistAddSub to unsigned case (llvm#106373)

2a8fda4

Trivially extend dd0cf23 ([LICM] Reassociate & hoist sub expressions) to handle unsigned predicates as well. Alive2 proofs: https://alive2.llvm.org/ce/z/GdDBtT.

[compiler-rt][rtsan] NFC: Rename rtsan_on->rtsan_enable rtsan_off->rt…

a919588

…san_disable (llvm#106727) This better matches lsan_enable and disable, which we are trying to emulate.

[RemoveDIs] Fix asan-identified leak in unittest (llvm#106723)

7ffe67c

Fixes issue found here llvm#106691 (comment) The issue wasn't in the code change itself, just the unittest; the trailing marker wasn't properly cleaned up.

[SLP] vectorizeChainsInBlock - remove superfluous continue at the end…

96ad495

… of for loop. NFC.

[SLP] findBestRootPair - fix incorrect argument name comment. NFC.

b719c92

Harini0924 and others added 29 commits August 30, 2024 10:15

Revert "[llvm-lit] Add precommit test to verify current behavior of g…

5af4ba2

…lob expansion in lit's internal shell" (llvm#106763) Reverts llvm#106325 Broke some Buildbots.

Revert "[LLDB][DWARF] Add an option to silence unsupported DW_FORM wa…

5500e21

…rnings" (llvm#106765) Reverts llvm#106609

[SLP][NFC]Add a function description, NFC.

6023d17

[X86] Rename trailing whitespace. NFC.

ef7b18a

Noticed in clang-formatting of llvm#106750

[RISCV] Check VL dominates and potentially move in tryReduceVL (llvm#…

0efa386

…106753) Similar to what we do in foldVMV_V_V with the passthru, if we end up changing the Src's VL in tryReduceVL we need to make sure it dominates. Fixes llvm#106735

[lldb] Deal with SupportFiles in SourceManager (NFC) (llvm#106740)

130eddf

To support detecting MD5 checksum mismatches, deal with SupportFiles rather than a plain FileSpecs in the SourceManager.

[SandboxIR] Implement ConstantFP (llvm#106648)

2c7e1b8

This patch implements sandboxir::ConstantFP mirroring llvm::ConstantFP.

Fix cl::desc typos in aarch64-enable-dead-defs and arm-implicit-it. (l…

0717898

…lvm#106712)

[SLP][NFC]Remove unused variable

8a267b7

[PtrUseVisitor] Allow using Argument as a starting point (llvm#106308)

688a274

Argument is another possible starting point for the pointer traversal, and PtrUseVisitor should be able to handle it.

Reuse getBinOpIdentity in createAnyOfTargetReduction [nfc]

897b00f

Consolidating code so that we have one copy instead of multiple reasoning about identity element. Note that we're (deliberately) not passing the FMF flags to common utility to preserve behavior in this change.

[VP] Reduce duplicate code in vp.reduce expansions

c315d78

Primary goal is having one way of doing this, to ensure that we don't end up with accidental divergence.

[libc++][NFC] Minor reformatting in <cstddef>

a3f8790

[VPlan] Manually jumpthread a bit of reduction code for readability […

c53008d

…nfc]

[WebAssembly] Update FP16 opcodes to match current spec. (llvm#106759)

923a1c1

https://github.com/WebAssembly/half-precision/blob/f267a3d54432e5723dcc13ad4530c3581a0cc4b3/proposals/half-precision/Overview.md#binary-format

[llvm][LoongArch] Avoid shift overflow (llvm#106785)

432e9f4

Follow up fix to llvm#106332 `LoongArchMatInt.cpp:96:33: runtime error: shift exponent 64 is too large for 64-bit type` https://lab.llvm.org/buildbot/#/builders/169/builds/2681

Revert "AtomicExpand: Allow incrementally legalizing atomicrmw" (llvm…

982d244

…#106792) Reverts llvm#103371 There is `heap-use-after-free`, commented on 206b5af Maybe `if (Next == E || BB != Next->getParent()) {` is enough, but not sure, what was the intent there,

[gn build] Port 06c531e

d66765d

[HLSL][Doc] Document multi-argument resolution (llvm#104474)

02654f7

This updates the expected diffferences document to capture the difference in multi-argument overload resolution between Clang and DXC. Fixes llvm#99530

[AutoBump] Merge with 02654f7 (Aug 30)

bf6bee8

cferry-AMD approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 02654f73 (Aug 30) (18) #371

[AutoBump] Merge with 02654f73 (Aug 30) (18) #371

mgehre-amd commented Sep 25, 2024

[AutoBump] Merge with 02654f73 (Aug 30) (18) #371

Are you sure you want to change the base?

[AutoBump] Merge with 02654f73 (Aug 30) (18) #371

Conversation

mgehre-amd commented Sep 25, 2024