[AutoBump] Merge with fe0dee4d (Jun 10) (62) #325

jorickert · 2024-09-05T12:08:33Z

No description provided.

gcc patch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1f2ca510065a2033bac408eb5a960ef0126f25cc

…#94455) The the function is doing two fairly different things, depending on how it is called. While this allows for some code reuse, it also makes it hard to override it correctly. Possibly for this reason ValueObjectSynthetic overerides GetChildAtIndex instead, which forces it to reimplement some of its functionality, most notably caching of generated children. Splitting this up makes it easier to move the caching to a common place (and hopefully makes the code easier to follow in general).

…m#94557)" (llvm#94730) This reverts commit d843c02.

Since the constructor of ContextEdge takes ContextIds by value, we should move it to the corresponding member variable as suggested by clang-tidy's performance-unnecessary-value-param. While we are at it, this patch updates a couple of callers. To avoid the ambiguity in the evaluation order among the constructor arguments, I'm calling computeAllocType before calling the constructor.

This allows the ReportError functor to hold move-only types.

…RI instructions (llvm#94552)

…rs whose return values are unused (llvm#94590) This patch adds a peephole pass `LoongArchDeadRegisterDefinitions`. It rewrites `rd` to `r0` when `rd` is marked as dead. It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO.

And change the previous GetPtrField to only peek() the base pointer. We can get rid of a whole bunch of DupPtr ops this way.

In preparation for adding essentially the same visitor to StreamChecker, this patch factors this visitor out to a common header. I'll be the first to admit that the interface of these classes are not terrific, but it rather tightly held back by its main technical debt, which is NoStoreFuncVisitor, the main descendant of NoStateChangeVisitor. Change-Id: I99d73ccd93a18dd145bbbc83afadbb432dd42b90

…ave Zvfbfmin" (llvm#94565)"

This PR fixes an incorrect line for setting scaling_governer in benchmarking tips.

It's not strictly needed and did cause some test failures.

This PR handle translation of DIStringType. Mostly mechanical changes to translate DIStringType to/from DIStringTypeAttr. The 'stringLength' field is 'DIVariable' in DIStringType. As there was no `DIVariableAttr` previously, it has been added to ease the translation. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>

Fixes llvm#94599

…lvm#94598) Use tablegen to generate the pass constructor. This pass is supposed to add function attributes so it does not need to operate on other top level operations.

As noted on llvm#94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns. This patch flags basic ABDS/ABDU for neon types as legal and updates all tablegen patterns to use abds/abdu instead. Fixes llvm#94466

This operation extracts a number of bits at a given offset and sign or zero extends them, which is done by emitting it as a left shift followed by a right shift. This is being added for use in clang for C++ structured bindings of bitfields that have offset or size that aren't a byte multiple. A new operation is being added, instead of shifts being used directly, as it makes correctly handling it in optimisations (which will be done in a later patch) much easier.

Currently, during a loop pipelining transformation, operations may be hoisted out without any checks on the loop bounds, which leads to incorrect transformations and unexpected behaviour. The following [issue ](llvm#90870) describes the problem more extensively, including an example. The proposed fix adds some check in the loop bounds before and applies the maximum hoisting.

They do not count into lambda captures, so visit them lazily.

The check lines in this test were clearly not generated by UTC.

Regenerate these with --check-globals. The manual global CHECKS get dropped during regeneration otherwise. Annoyingly UTC insists on putting the globals directly before the first function, so the first comment is a bit out of place now.

This patch implements the lowering for vector deinterleave for vector of n-dimensions. Process involves unrolling the n-d vector to a series of one-dimensional vectors. The deinterleave operation is then used on these vectors. From: ``` %0, %1 = vector.deinterleave %a : vector<2x8xi8> -> vector<2x4xi8> ``` To: ``` %cst = arith.constant dense<0> : vector<2x4xi32> %0 = vector.extract %arg0[0] : vector<8xi32> from vector<2x8xi32> %res1, %res2 = vector.deinterleave %0 : vector<8xi32> -> vector<4xi32> %1 = vector.insert %res1, %cst [0] : vector<4xi32> into vector<2x4xi32> %2 = vector.insert %res2, %cst [0] : vector<4xi32> into vector<2x4xi32> %3 = vector.extract %arg0[1] : vector<8xi32> from vector<2x8xi32> %res1_0, %res2_1 = vector.deinterleave %3 : vector<8xi32> -> vector<4xi32> %4 = vector.insert %res1_0, %1 [1] : vector<4xi32> into vector<2x4xi32> %5 = vector.insert %res2_1, %2 [1] : vector<4xi32> into vector<2x4xi32> ...etc. ```

When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options, we cannot use r11 as an allocatable register, even if -fomit-frame-pointer is also used. This is so that r11 will always point to a valid frame record, even if we don't create one in every function.

This makes codegen for array initialization simpler in two ways: 1. Drop the zero-index GEP at the start, which is no longer needed with opaque pointers. 2. Emit GEPs directly to the correct element, instead of having a long chain of +1 GEPs. This is more canonical, and also avoids regressions in unoptimized builds from llvm#93823.

Refactor the pass to only support `IntrinsicInst` calls. `ReplaceWithVecLib` used to support instructions, as AArch64 was using this pass to replace a vectorized frem instruction to the fmod vector library call (through TLI). As this replacement is now done by the codegen (llvm#83859), there is no need for this pass to support instructions. Additionally, removed 'frem' tests from: - AArch64/replace-with-veclib-armpl.ll - AArch64/replace-with-veclib-sleef-scalable.ll - AArch64/replace-with-veclib-sleef.ll Such testing is done at codegen level: - llvm#83859

…lvm#94754) Prior to this patch VisualStudio._get_step_info incorrectly identifies the reason the debugger has stopped. e.g., stepping through a program would be reported as a StopReason.Breakpoint rather than StopReason.Step. Fix. No test added as there are no VisualStudio tests (tested locally).

Reapply after llvm#93956, which changed clang array initialization codegen to avoid size regressions for unoptimized builds. ----- This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

The checker is made more exact (only pointer into array is allowed, check array index) and more tests are added.

…on funcs (llvm#92417) Add `LLVMPositionBuilderBeforeDbgRecords` and `LLVMPositionBuilderBeforeInstrAndDbgRecords` to `llvm/include/llvm-c/Core.h` which behave the same as `LLVMPositionBuilder` and `LVMPositionBuilderBefore` except that the position is set before debug records attached to the target instruction (the existing functions set the insertion point to after any attached debug records). More info on debug records and the migration towards using them can be found here: https://llvm.org/docs/RemoveDIsDebugInfo.html The distinction is important in some situations. An important example is when inserting a phi before another instruction which has debug records attached to it (these come "before" the instruction). Inserting before the instruction but after the debug records would result in having debug records before a phi, which is illegal. That results in an assertion failure: `llvm/lib/IR/Instruction.cpp:166: Assertion '!isa<PHINode>(this) && "Inserting PHI after debug-records!"' failed.` In llvm (C++) we've added bit to instruction iterators that carries around the extra information. Adding dedicated functions seemed like the least invasive and least suprising way to update the C API. Update llvm/tools/llvm-c-test/debuginfo.c to test this functionality. Update the OCaml bindings, the migration docs and release notes.

Allocate result statically on the stack (using max rank) and use the runtime to fill it in correctly.

…94841) The `else if` condition for checking `m_compression_type` is redundant as it matches with a previous `if` condition, making the expression always false. Reported by cppcheck as a possible cut-and-paste error. Caught by cppcheck - lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp:543:35: style: Expression is always false because 'else if' condition matches previous condition at line 535. [multiCondition] Fix llvm#91222

This issue is reported by cppcheck as a pointless test in the watch mask check. The `else if` condition is opposite to the previous `if` condition, making the expression always true. Caught by cppcheck - lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:509:25: style: Expression is always true because 'else if' condition is opposite to previous condition at line 505. [multiCondition] Fix llvm#91223

…ace (llvm#93212)

…e' feature is missing (llvm#94903) Do not let the compiler gets failed in case the target platform does not support the 'coroutine' C++ features. Just compile without it and let lldb know about missed/unsupported feature.

Use fast unsigned arithmetic before constructing an APInt. This gives me a ~2x speed up when running this in my Release+Asserts build: $ unittests/Support/SupportTests --gtest_filter=KnownBitsTest.*Exhaustive

llvm#94362) …and operators that have non-const overloads. This allows `unnecessary-copy-initialization` to warn on more cases. The common case is a class with a a set of const/non-sconst overloads (e.g. std::vector::operator[]). ``` void F() { std::vector<Expensive> v; // ... const Expensive e = v[i]; } ```

…xtendedValue (llvm#94822) The hlfir::Entity to fir::ExtendedValue conversion usually uses the "fir base" output of hlfir.declare (which is the same as the input) to avoid introducing temporary descriptors for the sole purpose of introducing updating lower bound information. This is possible because local lower bounds, if any, are tracked in a vector inside the fir::ExtendedValue. With assumed-ranks, the lower bounds cannot be tracked inside the fir::ExtendedValue vector (their numbers is unknown at compile time). Hence, the fir.box/fir.class used in fir::ExtendedValue in lowering must always contain accurate local lower bound information.

…4739) Use tablegen to automatically create the pass constructor. The purpose of this pass is to add attributes to functions, so it doesn't need to work on other top level operations.

…ents per P2448 (llvm#94123) Fixes llvm#92583

…st. (llvm#94943) Exit early if known bits have a conflict. This gives me a ~15% speed up when running this in my Release+Asserts build: $ unittests/Support/SupportTests --gtest_filter=KnownBitsTest.*Exhaustive

…m#94330) Co-authored-by: Andrew Gozillon <Andrew.Gozillon@amd.com>

Implements parts of: - P0355 Extending chrono to Calendars and Time Zones

FreddyLeaf and others added 30 commits June 7, 2024 14:17

[X86] Assign AVX10_1 feature priority to align with gcc. (llvm#94557)

d843c02

gcc patch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1f2ca510065a2033bac408eb5a960ef0126f25cc

Revert "[X86] Assign AVX10_1 feature priority to align with gcc. (llv…

c007883

…m#94557)" (llvm#94730) This reverts commit d843c02.

[ORC] Switch ExecutionSession::ErrorReporter to use unique_function.

4a7b800

This allows the ReportError functor to hold move-only types.

[LoongArch] Set isReMaterializable on LU{12,32,52}I.D/ADDI.D and {X}O…

f21c2fa

…RI instructions (llvm#94552)

[SCEV] Use insert_or_assign() (NFC)

d224a03

[clang][Interp][NFC] Add GetPtrFieldPop opcode

c15b867

And change the previous GetPtrField to only peek() the base pointer. We can get rid of a whole bunch of DupPtr ops this way.

Reland "[RISCV] Support select/merge like ops for bf16 vectors when h…

be18daa

…ave Zvfbfmin" (llvm#94565)"

[docs] Fix benchmarking tips (llvm#94724)

8ef5c98

This PR fixes an incorrect line for setting scaling_governer in benchmarking tips.

[test] Don't generate extra file for regalloc-amdgpu.mir test.

36bc741

[clang][Interp] Remove StoragKind limitation in Pointer assign operators

1c0063b

It's not strictly needed and did cause some test failures.

[NFC][LoongArch] Update test for llvm#94590

ac40463

[clangd] Fix crash with null check for Token at Loc (llvm#94528)

5f1adf0

Fixes llvm#94599

[flang][Transforms][NFC] Remove boilerplate from vscale range pass (l…

8f11649

…lvm#94598) Use tablegen to generate the pass constructor. This pass is supposed to add function attributes so it does not need to operate on other top level operations.

[PowerPC] modify the frameaddress case, NFC

0749b01

[PowerPC] return correct frame address for frameaddress intrinsic

3453ded

[CMake][Release] Use the TXZ cpack generator for binaries (llvm#90138)

0d1b367

[clang][Interp][NFC] Properly assign block pointer Pointee

5d6acf8

[clang][Interp] Fix refers_to_enclosing_variable_or_capture DREs

3a31eae

They do not count into lambda captures, so visit them lazily.

[SimplifyCFG] Remove bogus UTC line from test (NFC)

1934c1a

The check lines in this test were clearly not generated by UTC.

nikic and others added 27 commits June 10, 2024 09:19

[flang] add source to SHAPE API (llvm#94781)

e58f830

[clang][Interp] Diagnose casts from void pointers

c0b65a2

[clang][analyzer] Improved PointerSubChecker (llvm#93676)

26224ca

The checker is made more exact (only pointer into array is allowed, check array index) and more tests are added.

[flang] lower SHAPE with assumed-rank arguments (llvm#94812)

0257f9c

Allocate result statically on the stack (using max rank) and use the runtime to fill it in correctly.

NFC fix typos from llvm#92417

38c01c3

[scudo] Apply filling when realloc shrinks and re-grows a block in-pl…

760d880

…ace (llvm#93212)

[flang][runtime] add LBOUND API for assumed-rank arrays (llvm#94808)

a0faf79

[KnownBits] Speed up ForeachKnownBits in unit test. NFC. (llvm#94939)

f97bcdb

Use fast unsigned arithmetic before constructing an APInt. This gives me a ~2x speed up when running this in my Release+Asserts build: $ unittests/Support/SupportTests --gtest_filter=KnownBitsTest.*Exhaustive

[AMDGPU] Remove unused checks left over from X86

c9fd7b1

[AMDGPU] Fix typos in GCN-PROMOTE check prefix

317ed77

[flang][Transforms][NFC] reduce boilerplate in func attr pass (llvm#9…

a6129a5

…4739) Use tablegen to automatically create the pass constructor. The purpose of this pass is to add attributes to functions, so it doesn't need to work on other top level operations.

[Clang][C++23] update constexpr diagnostics for missing return statem…

ae9d89d

…ents per P2448 (llvm#94123) Fixes llvm#92583

[KnownBits] Speed up conflict handling in ForeachKnownBits in unit te…

ecb9d94

…st. (llvm#94943) Exit early if known bits have a conflict. This gives me a ~15% speed up when running this in my Release+Asserts build: $ unittests/Support/SupportTests --gtest_filter=KnownBitsTest.*Exhaustive

[flang][OpenMP] Fix unused prefixes in function-filtering-2 test (llv…

8dc8b9f

…m#94330) Co-authored-by: Andrew Gozillon <Andrew.Gozillon@amd.com>

[libc++][TZDB] Implements time_zone::to_local. (llvm#91003)

da03175

Implements parts of: - P0355 Extending chrono to Calendars and Time Zones

[AArch64] Add tests for extending mul. NFC

fe0dee4

[AutoBump] Merge with fe0dee4 (Jun 10)

a77d993

Base automatically changed from bump_to_46672c1d to feature/fused-ops September 11, 2024 12:08

mgehre-amd merged commit a77d993 into feature/fused-ops Sep 11, 2024
6 checks passed

mgehre-amd deleted the bump_to_fe0dee4d branch September 11, 2024 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with fe0dee4d (Jun 10) (62) #325

[AutoBump] Merge with fe0dee4d (Jun 10) (62) #325

jorickert commented Sep 5, 2024

[AutoBump] Merge with fe0dee4d (Jun 10) (62) #325

[AutoBump] Merge with fe0dee4d (Jun 10) (62) #325

Conversation

jorickert commented Sep 5, 2024