[AutoBump] Merge with b851c7f1 (Apr 17) (2) #295

mgehre-amd · 2024-08-22T08:24:24Z

No description provided.

…ze (llvm#83124) When in-place new-ing a local variable of an array of trivial type, the generated code calls 'memset' with the correct size of the array, earlier it was generating size (squared of the typedef array + size). The cause: `typedef TYPE TArray[8]; TArray x;` The type of declarator is Tarray[8] and in `SemaExprCXX.cpp::BuildCXXNew` we check if it's of typedef and of constant size then we get the original type and it works fine for non-dependent cases. But in case of template we do `TreeTransform.h:TransformCXXNEWExpr` and there we again check the allocated type which is TArray[8] and it stays that way, so ArraySize=(Tarray[8] type, alloc Tarray[8*type]) so the squared size allocation. ArraySize gets calculated earlier in `TreeTransform.h` so that `if(!ArraySize)` condition was failing. fix: I changed that condition to `if(ArraySize)`. Fixes llvm#41441

Some very basic tests for a case where we could fold BLEND(PERMUTE(X),PERMUTE(Y)) -> PERMUTE(BLEND(X,Y)) These assume the permute masks are the same, and "complete" (no undefs/duplicate elements) but we could relax that depending on the blend mask

This should ensure we explore the same VFs as before 6d66db3. Fixes llvm#88640.

On macOS, file paths start with /Users/..., which clang-cl interptrets as the /U switch followed by a preprocessor macro name to undefine. Put the filename after `--` to prevent this. For consistency, move %s to the end of the regular `clang` lines (where this isn't needed) as well.

This patch moves OpenMP-related entities out of `Sema` to a newly created `SemaOpenMP` class. This is a part of the effort to split `Sema` up, and follows the recent example of CUDA, OpenACC, SYCL, HLSL. Additional context can be found in llvm#82217, llvm#84184, llvm#87634.

Summary: AIX headers define this, so we need to work around it. In the future this will be removed but for now we should just rename it to avoid these issues.

…sions" and related commits (llvm#88884) The original change caused widespread breakages in msan/ubsan tests and causes `use-after-free`. Most likely we are adding more cleanups than necessary.

…llvm#87632) We had some instances when LLVM would not inline fixed-count memcpy and ended up attempting to lower it a a libcall, which would not work on AMDGPU as the address space doesn't meet the requirement, causing compiler crash. The patch relaxes the threshold used for -Os/-Oz compilation so we're always allowed to inline memory copy functions. This patch basically does the same thing as https://reviews.llvm.org/D158226 for AMDGPU. Fix llvm#88497.

…signed arg.

… smax/smin intrinsics. Need to check that unsigned argument can be safely used in smax/smin intrinsics by checking if at least single sign bit is cleared, otherwise its value may be treated as negative instead of positive.

Use v4 of UTC to improve regex matching of argument names to fix a filecheck matching in a future patch

…ble, fix possible UB. Using fp type in the compiler is not the best idea, here it used with the comparison for equal to 0 and may cause undefined behavior in some cases. Reviewers: fhahn Reviewed By: fhahn Pull Request: llvm#87241

…nops tests from shuffle.ll

`self` clauses on compute constructs take an optional condition expression. We again limit the implementation to ONLY compute constructs to ensure we get all the rules correct for others. However, this one will be particularly complicated, as it takes a `var-list` for `update`, so when we get to that construct/clause combination, we need to do that as well. This patch also furthers uses of the `OpenACCClauses.def` as it became useful while implementing this (as well as some other minor refactors as I went through). Finally, `self` and `if` clauses have an interaction with each other, if an `if` clause evaluates to `true`, the `self` clause has no effect. While this is intended and can be used 'meaningfully', we are warning on this with a very granular warning, so that this edge case will be noticed by newer users, but can be disabled trivially.

We need file-level - not target-level - dependencies for these custom commands to re-trigger when their dependencies change.

Prior to llvm#85863, the required parameters of llvm::isKnownNonZero were Value and DataLayout. After, they are Value, Depth, and SimplifyQuery, where SimplifyQuery is implicitly constructible from DataLayout. The change to move Depth before SimplifyQuery needed callers to be updated unnecessarily, and as commented in llvm#85863, we actually want Depth to be after SimplifyQuery anyway so that it can be defaulted and the caller does not need to specify it.

Not sure how best to test this, but I think it fixes the error https://github.com/llvm/mlir-www/actions/runs/8699908058/job/23859264085#step:7:1111 Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com> Co-authored-by: Jacques Pienaar <jpienaar@google.com>

…#88008) (llvm#88014) Use refactored `CheckForConstantInitializer()` to skip checking expr with error. --------- Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

…nch" (llvm#88907) Reverts llvm#86312

…ccess and bad_function_call (llvm#87390) This patch uses our availability machinery to allow defining a key function for bad_function_call and bad_expected_access at all times but only rely on it when we can. This prevents compilers from complaining about weak vtables and reduces code bloat and the amount of work done by the dynamic linker. rdar://111917845

…vm#86410) When we initially implemented the C++20 synchronization library, we reluctantly accepted for the implementation to be backported to C++03 upon request from the person who provided the patch. This was when we were only starting to have experience with the issues this can create, so we flinched. Nowadays, we have a much stricter stance about not backporting features to previous standards. We have recently started fixing several bugs (and near bugs) in our implementation of the synchronization library. A recurring theme during these reviews has been how difficult to understand the current code is, and upon inspection it becomes clear that being able to use a few recent C++ features (in particular lambdas) would help a great deal. The code would still be pretty intricate, but it would be a lot easier to reason about the flow of callbacks through things like __thread_poll_with_backoff. As a result, this patch deprecates support for the synchronization library before C++20. In the next release, we can remove that support entirely.

This tests requires the OpenMP runtime to be present, but the way that the lit config detects it fails when "openmp" is added to RUNTIMES instead of PROJECTS. This caused the tests to be skipped as unsupported in local and upstream tests. The actual bug was a missing word in the message, and putting the check at the wrong line.

…llvm#88743

This reverts commit 82f479b due to bot breakage.

CASE_VFMA_OPCODE_VV and CASE_VFMA_CHANGE_OPCODE_VV need to match up if we are are to avoid "Unexpected opcode" errors, but in CASE_VFMA_CHANGE_OPCODE_VV, CASE_VFMA_CHANGE_OPCODE_LMULS_MF2 had mistakenly been used instead of CASE_VFMA_CHANGE_OPCODE_LMULS_MF4.

This change updates a few of the transformations in foldFMulReassoc to respect absent fast-math flags in cases where fmul and fdiv, fadd, or fsub instructions were being folded but the code was only checking for fast-math flags on the fmul instruction and was transferring flags to the folded instruction that were not present on the other original instructions. This fixes llvm#82857

llvm#88249) …se of tensor pack When the vector sizes are not passed as inputs to the vector transform operation, the vector sizes are queried from the static result shape in the case of tensor.pack op.

Since 97fe519, in ARM64EC mode, we don't define `__aarch64__`. Fix various preprocessor guards to account for this.

See https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981

This reverts commit 7d4e8c1. Contrary to the commit description, this does cause large compile-time regressions (up to 10% on individual files).

- Those special register stores are STORE and their memory operands are input operands instead of output ones. Reviewers: JDevlieghere, arsenm, yinying-lisa-li, koachan, PeimingLiu, jyknight, aartbik, matthias-springer Reviewed By: arsenm Pull Request: llvm#88971

- If a def operand includes multiple sub-operands, count them when generating instr info. - Found issues in x86 and sparc backends, where memory operands of store or store-like instructions are wrongly placed in the output list. Reviewers: jayfoad, arsenm, Pierre-vh Reviewed By: arsenm Pull Request: llvm#88972

…h" (llvm#89006) Reverts llvm#88546 Leak and performance regression. Details in llvm#88546

RFC: https://discourse.llvm.org/t/rfc-introduce-new-clang-builtin-builtin-allow-runtime-check/78281 --------- Co-authored-by: Noah Goldstein <goldstein.w.n@gmail.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

getFileOffsetFor() was replaced with getFileOffsetForAddress().

RewriteInstance::isKSymtabSection() is deprecated.

…ference (llvm#88843) Reapply for llvm#88765. Partially fixes: llvm#60895.

The VTYPE operands of a vsetvli pseudo are always immediates

…6811) Currently the CFI offset for RVV registers are not handled entirely, this patch add those information for either stack unwinding or debugger to work correctly on RVV callee-saved stack object. Depends On D154576 Differential Revision: https://reviews.llvm.org/D156846

See https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755, llvm#75894 and llvm#85050 for the background.

The `clang-scan-deps` tool can be used for fast scanning of batches of compilation commands passed in via the `-compilation-database` option. This gets awkward in our tests where we have to resort to using `.in`/`.template` JSON files and running them through `sed` in order to embed LIT's `%t` variable into them. However, most of our tests only need to pass single compilation command, so this dance is entirely unnecessary. This patch makes sure the existing "per-file" mode (where the compilation command is passed in-line after the `--` argument) works for all output formats, not only `P1689`.

This makes it possible to pass "-o /dev/null" to `clang-scan-deps` and skip some potentially expensive work, making timings less noisy. Also removes the need for stream redirection.

llvm::SmallVector::operator== exactly meets our needs.

…89001) Instead of searching all encodings, we can convert the encoding back to a register and use getMatchingSuperReg.

Per [tab:time.format.spec] %z The offset from UTC as specified in ISO 8601-1:2019, subclause 5.3.4.1. For example -0430 refers to 4 hours 30 minutes behind UTC. If the offset is zero, +0000 is used. The modified commands %Ez and %Oz insert a : between the hours and minutes: -04:30. If the offset information is not available, an exception of type format_error is thrown. Typically the modified versions Oz or Ez would have wording like The modified command %OS produces the locale's alternative representation. In this case the modified version does not depend on the locale. This change is a preparation for formatting sys_info which has time zone information. The function time_put<_CharT>::put() does not have proper time zone support, therefore it's a manual implementation. Fixes llvm#78184

Fixes llvm#88181

Relates to llvm#88181

…lvm#88872) This patch adds a test that assert-fails without the fix.

nico and others added 30 commits April 16, 2024 08:14

[gn] port fe48bf6

d4602a9

[SLP] Make sure MinVF is a power-of-2 by using PowerOf2Ceil.

b73476c

This should ensure we explore the same VFs as before 6d66db3. Fixes llvm#88640.

[CUDA] Rename SM_32 to SM_32_ to work around AIX headers (llvm#88779)

9e7aab9

Summary: AIX headers define this, so we need to work around it. In the future this will be removed but for now we should just rename it to avoid these issues.

Switch release notes links to using markup for github issues; NFC

e7fb49c

Revert "[codegen] Emit missing cleanups for stmt-expr and coro suspen…

9d8be24

…sions" and related commits (llvm#88884) The original change caused widespread breakages in msan/ubsan tests and causes `use-after-free`. Most likely we are adding more cleanups than necessary.

[SLP][NFC]Add a test with the incorrect vectorization of smax with un…

6ab5927

…signed arg.

[VectorCombine][X86] Regenerate shuffle.ll + shuffle-of-casts.ll

e185978

Use v4 of UTC to improve regex matching of argument names to fix a filecheck matching in a future patch

[VectorCombine][X86] shuffle-of-binops.ll - split off foldShuffleOfBi…

254df2e

…nops tests from shuffle.ll

[bazel] Add missing dependency for 1c076b4

ac79188

[libclc] Give built bytecode objects a .bc extension. NFC

a0f8191

[libclc] Fix dependencies between targets

3d118f9

We need file-level - not target-level - dependencies for these custom commands to re-trigger when their dependencies change.

[AST][RecoveryExpr] Fix a crash on c89/c90 invalid InitListExpr (llvm…

b632476

…#88008) (llvm#88014) Use refactored `CheckForConstantInitializer()` to skip checking expr with error. --------- Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

Revert "[JumpThreading] Thread over BB with only an unconditional bra…

d2d4a1b

…nch" (llvm#88907) Reverts llvm#86312

[RISCV] Add coverage for strength reduction of mul 2^N +/- 3/5/9

bd28889

[VectorCombine][X86] Add initial shuffle-of-shuffles.ll test cover for …

bf1ad1d

…llvm#88743

Revert "Add asan tests for libsanitizers. (llvm#88349)"

f8e2ec1

This reverts commit 82f479b due to bot breakage.

keith and others added 25 commits April 16, 2024 19:17

[bazel] Add support for lldb-server (llvm#88989)

1bc0921

[mlir][vector] Determine vector sizes from the result shape in the ca… (

ce5381e

llvm#88249) …se of tensor pack When the vector sizes are not passed as inputs to the vector transform operation, the vector sizes are queried from the static result shape in the case of tensor.pack op.

[ARM64EC] Fix arm_neon.h on ARM64EC. (llvm#88572)

8c9f45e

Since 97fe519, in ARM64EC mode, we don't define `__aarch64__`. Fix various preprocessor guards to account for this.

[mlir][complex] Fastmath flag for complex angle (llvm#88658)

8c9d814

See https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981

Revert "[SLP]Attempt to vectorize long stores, if short one failed."

efd6055

This reverts commit 7d4e8c1. Contrary to the commit description, this does cause large compile-time regressions (up to 10% on individual files).

Revert "Improve stack usage to increase recursive initialization dept…

d0f718e

…h" (llvm#89006) Reverts llvm#88546 Leak and performance regression. Details in llvm#88546

[BOLT][NFC] Remove unused function (llvm#89009)

52a4d81

getFileOffsetFor() was replaced with getFileOffsetForAddress().

[BOLT][NFC] Remove another unused function (llvm#89011)

0af8cae

RewriteInstance::isKSymtabSection() is deprecated.

[clang analysis] ExprMutationAnalyzer support recursive forwarding re…

f40f4fc

…ference (llvm#88843) Reapply for llvm#88765. Partially fixes: llvm#60895.

[RISCV] Convert VTYPE operand check to assert in RISCVInsertVSETVLI. NFC

3204f3e

The VTYPE operands of a vsetvli pseudo are always immediates

[C++20] [Modules] Add Release Notes and Documents for Reduced BMI

e6ecff8

See https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755, llvm#75894 and llvm#85050 for the background.

[clang][deps] Add -o flag to specify output path (llvm#88767)

6a4eaf9

This makes it possible to pass "-o /dev/null" to `clang-scan-deps` and skip some potentially expensive work, making timings less noisy. Also removes the need for stream redirection.

[memprof] Simplify IndexedMemProfRecord::operator== (NFC) (llvm#88986)

f71e25b

llvm::SmallVector::operator== exactly meets our needs.

[RISCV] Simplify FindRegWithEncoding in copyPhysRegVector. NFC (llvm#…

fca2a49

…89001) Instead of searching all encodings, we can convert the encoding back to a register and use getMatchingSuperReg.

[analyzer] Fix a security.cert.env.InvalidPtr crash

e096c14

Fixes llvm#88181

[analyzer] Harden security.cert.env.InvalidPtr checker fn matching

024281d

Relates to llvm#88181

[clang][dataflow] Support StmtExpr in PropagateResultObject(). (l…

b851c7f

…lvm#88872) This patch adds a test that assert-fails without the fix.

[AutoBump] Merge with b851c7f (Apr 17)

91788e3

cferry-AMD approved these changes Aug 22, 2024

View reviewed changes

Base automatically changed from bump_to_1c076b43 to feature/fused-ops August 22, 2024 15:06

An error occurred while trying to automatically change base from bump_to_1c076b43 to feature/fused-ops August 22, 2024 15:06

mgehre-amd merged commit ac378c2 into feature/fused-ops Aug 22, 2024
11 checks passed

mgehre-amd deleted the bump_to_b851c7f1 branch August 22, 2024 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with b851c7f1 (Apr 17) (2) #295

[AutoBump] Merge with b851c7f1 (Apr 17) (2) #295

mgehre-amd commented Aug 22, 2024

[AutoBump] Merge with b851c7f1 (Apr 17) (2) #295

[AutoBump] Merge with b851c7f1 (Apr 17) (2) #295

Conversation

mgehre-amd commented Aug 22, 2024