[AutoBump] Merge with fe0dee4d (Jun 10) (62) #323

…m#94557)" (llvm#94730) This reverts commit d843c02.

Since the constructor of ContextEdge takes ContextIds by value, we should move it to the corresponding member variable as suggested by clang-tidy's performance-unnecessary-value-param. While we are at it, this patch updates a couple of callers. To avoid the ambiguity in the evaluation order among the constructor arguments, I'm calling computeAllocType before calling the constructor.

This allows the ReportError functor to hold move-only types.

…RI instructions (llvm#94552)

…rs whose return values are unused (llvm#94590) This patch adds a peephole pass `LoongArchDeadRegisterDefinitions`. It rewrites `rd` to `r0` when `rd` is marked as dead. It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO.

And change the previous GetPtrField to only peek() the base pointer. We can get rid of a whole bunch of DupPtr ops this way.

In preparation for adding essentially the same visitor to StreamChecker, this patch factors this visitor out to a common header. I'll be the first to admit that the interface of these classes are not terrific, but it rather tightly held back by its main technical debt, which is NoStoreFuncVisitor, the main descendant of NoStateChangeVisitor. Change-Id: I99d73ccd93a18dd145bbbc83afadbb432dd42b90

…ave Zvfbfmin" (llvm#94565)"

This PR fixes an incorrect line for setting scaling_governer in benchmarking tips.

It's not strictly needed and did cause some test failures.

This PR handle translation of DIStringType. Mostly mechanical changes to translate DIStringType to/from DIStringTypeAttr. The 'stringLength' field is 'DIVariable' in DIStringType. As there was no `DIVariableAttr` previously, it has been added to ease the translation. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>

Fixes llvm#94599

…lvm#94598) Use tablegen to generate the pass constructor. This pass is supposed to add function attributes so it does not need to operate on other top level operations.

As noted on llvm#94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns. This patch flags basic ABDS/ABDU for neon types as legal and updates all tablegen patterns to use abds/abdu instead. Fixes llvm#94466

This operation extracts a number of bits at a given offset and sign or zero extends them, which is done by emitting it as a left shift followed by a right shift. This is being added for use in clang for C++ structured bindings of bitfields that have offset or size that aren't a byte multiple. A new operation is being added, instead of shifts being used directly, as it makes correctly handling it in optimisations (which will be done in a later patch) much easier.

Currently, during a loop pipelining transformation, operations may be hoisted out without any checks on the loop bounds, which leads to incorrect transformations and unexpected behaviour. The following [issue ](llvm#90870) describes the problem more extensively, including an example. The proposed fix adds some check in the loop bounds before and applies the maximum hoisting.

They do not count into lambda captures, so visit them lazily.

The check lines in this test were clearly not generated by UTC.

Regenerate these with --check-globals. The manual global CHECKS get dropped during regeneration otherwise. Annoyingly UTC insists on putting the globals directly before the first function, so the first comment is a bit out of place now.

This patch implements the lowering for vector deinterleave for vector of n-dimensions. Process involves unrolling the n-d vector to a series of one-dimensional vectors. The deinterleave operation is then used on these vectors. From: ``` %0, %1 = vector.deinterleave %a : vector<2x8xi8> -> vector<2x4xi8> ``` To: ``` %cst = arith.constant dense<0> : vector<2x4xi32> %0 = vector.extract %arg0[0] : vector<8xi32> from vector<2x8xi32> %res1, %res2 = vector.deinterleave %0 : vector<8xi32> -> vector<4xi32> %1 = vector.insert %res1, %cst [0] : vector<4xi32> into vector<2x4xi32> %2 = vector.insert %res2, %cst [0] : vector<4xi32> into vector<2x4xi32> %3 = vector.extract %arg0[1] : vector<8xi32> from vector<2x8xi32> %res1_0, %res2_1 = vector.deinterleave %3 : vector<8xi32> -> vector<4xi32> %4 = vector.insert %res1_0, %1 [1] : vector<4xi32> into vector<2x4xi32> %5 = vector.insert %res2_1, %2 [1] : vector<4xi32> into vector<2x4xi32> ...etc. ```

When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options, we cannot use r11 as an allocatable register, even if -fomit-frame-pointer is also used. This is so that r11 will always point to a valid frame record, even if we don't create one in every function.

…#94601) Removes residual ARM handling for vXi64 ABS nodes to prevent infinite loops.

from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations): > Comparisons to singletons like None should always be done with is or is not, never the equality operators. Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>

Cortex-R52+ is an Armv8-R AArch32 CPU. Technical Reference Manual for Cortex-R52+: https://developer.arm.com/documentation/102199/latest/

llvm#89811 caused this test to fail, somehow. I think it may not be at fault, but actually be exposing some existing undefined behaviour, see llvm#94741. Skipping this for now to get the bots green again.

This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).

Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception emits a fatal error on failure. Thus, just return nothing.

This "small" set grows quite large and it's more performant to store whether a node has been combined before in the node itself. As this information is only relevant for nodes that are currently not in the worklist, add a second state to the CombinerWorklistIndex (-2) to indicate that a node is currently not in a worklist, but was combined before. This brings a substantial performance improvement.

They need to be fully initialized, similar to global variables.

Check this by looking at the VarDecl.

Following of llvm#86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to llvm#86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As llvm#86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

…93680) Whole quad mode requires inserting a copy of the initial EXEC mask. In a function that also uses llvm.amdgcn.init.exec, insert the COPY after initializing EXEC.

The file OMP.td is becoming tedious to update by hand due to the seemingly random ordering of various items in it. This patch brings order to it by sorting most of the contents. The clause definitions are sorted alphabetically with respect to the spelling of the clause.[1] The directive definitions are split into two leaf directives and compound directives.[2] Within each, definitions are sorted alphabetically with respect to the spelling, with the exception that "end xyz" directives are placed immediately following the definition of "xyz".[3] Within each directive definition, the lists of clauses are also sorted alphabetically. [1] All spellings are made of lowercase letters, _, or space. Ordering that includes non-letters follows the order assumed by the `sort` utility. [2] Compound directives refer to the consituent leaf directives, hence the leaf definitions must come first. [3] Some of the "end xyz" directives have properties derived from the corresponding "xyz" directive. This exception guarantees that "xyz" precedes the "end xyz".

…lvm#94195) Extends delayed privatization support to `taraget .. private(..)`. With this PR, `private` is support for `target` **only** is delayed privatization mode.

Summary: The NVPTX build wasn't getting the `C++20` standard necessary for a few files.

This commit adds support for lowering `tensor.unpack` with a non-identity `outer_dims_perm`. This was previously left as a not-yet-implemented case.

This PR adds fusion by collapsing and fusion by expansion patterns for `tensor.pad` ops in ElementwiseOpFusion. Pad ops can be expanded or collapsed as long as none of the padded dimensions will be expanded or collapsed.

…m#94631) After the `output_shape` field was added to `expand_shape` ops, dynamically sized expand shapes are now possible, but this was not accounted for in the folder. This PR tightens the constraints of the folder to fix this.

Change the target triple to remove some unnecessary instructions.

This change is an implementation of llvm#87367 investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This PR is just for Tan. Now that x86 tan backend landed: llvm#90503 we can add other backends since the shared pieces are in tree now. Changes: - `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for arm64 backends. - `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall table - `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion for f128, f16, and vector\neon operations - `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define `G_FTAN` as a legal arm64 instruction resolves llvm#94755

Summary: The utilities `nvptx-arch` and `amdgpu-arch` are used to support `--offload-arch=native` among other utilities in clang. However, these rely on the GPU drivers to query the features. In certain cases these drivers can become locked up, which will lead to indefinate hangs on any compiler jobs running in the meantime. This patch adds a ten second timeout period for these utilities before it kills the job and errors out.

@CharKeaney

All post-Increment load/store, register-register load/store spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie, @realqhc

This PR depends on llvm#90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size. |run (CTMark/) |baseline (1)|priority (2)|diff (1 -> 2)| |----------------|------------|------------|-------------| |lencod |349624 |349264 |-0.1030% | |SPASS |219672 |219480 |-0.0874% | |kc |271956 |251200 |-7.6321% | |sqlite3 |223920 |223708 |-0.0947% | |7zip-benchmark |405364 |402624 |-0.6759% | |bullet |139820 |139500 |-0.2289% | |consumer-typeset|295684 |290196 |-1.8560% | |pairlocalalign |72236 |72092 |-0.1993% | |tramp3d-v4 |189572 |189292 |-0.1477% | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).

Parameter "Version" is confusing in deserializeV012 and deserializeV3 because we also have member variable "Version". Fortunately, parameter "Version" and member variable "Version" always have the same value because IndexedMemProfReader::deserialize initializes the member variable and passes it to deserializeV012 and deserializeV3. This patch removes the parameter.

This patch integrates CallStackRadixTreeBuilder into the V3 format, reducing the profile size to about 27% of the V2 profile size. - Serialization: writeMemProfCallStackArray just needs to write out the radix tree array prepared by CallStackRadixTreeBuilder. Mappings from CallStackIds to LinearCallStackIds are moved by new function CallStackRadixTreeBuilder::takeCallStackPos. - Deserialization: Deserializing a call stack is the same as deserializing an array encoded in the obvious manner -- the length followed by the payload, except that we need to follow a pointer to the parent to take advantage of common prefixes once in a while. This patch teaches LinearCallStackIdConverter to how to handle those pointers.

The "Emulated" sub-directories under "ArmSVE" and "ArmSME" have been removed. Associated tests have been moved up a directory and now include the "REQUIRES" constraint for the arm-emulator.

Allow KnownBits to represent "always poison" values via conflict. close: llvm#94436

…#94646) These tests pass on 64-bit. They were fixed by 5fdd094 on 32-bit. So XFAIL only for 32-bit before clang 19.

If we are extracting the even lanes and the odd lanes and adding them, we can use an addp instruction.

llvm#94550) For regex patterns that produce zero-length matches, there is one (imaginary) match in-between every character in the sequence being searched (as well as before the first character and after the last character). It's easiest to demonstrate using replacement: `std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where each exclamation mark makes a zero-length match visible. Currently our implementation doesn't correctly set the prefix of each zero-length match, "swallowing" the characters separating the imaginary matches -- e.g. when going through zero-length matches within `abc`, the corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this patch they will all be empty (`{'', '', '', ''}`). This happens in the implementation of `regex_iterator::operator++`. Note that the Standard spells out quite explicitly that the prefix might need to be adjusted when dealing with zero-length matches in [`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr): > In all cases in which the call to `regex_search` returns `true`, `match.prefix().first` shall be equal to the previous value of `match[0].second`... It is unspecified how the implementation makes these adjustments. [Reproduction example](https://godbolt.org/z/8ve6G3dav) ```cpp #include <iostream> #include <regex> #include <string> int main() { std::string str = "abc"; std::regex empty_matching_pattern(""); { // The underlying problem is that `regex_iterator::operator++` doesn't update // the prefix correctly. std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e; std::cout << "\""; for (; i != e; ++i) { const std::ssub_match& prefix = i->prefix(); std::cout << prefix.str(); } std::cout << "\"\n"; // Before the patch: "" // After the patch: "abc" } { // `regex_replace` makes the problem very visible. std::string replaced = std::regex_replace(str, empty_matching_pattern, "!"); std::cout << "\"" << replaced << "\"\n"; // Before the patch: "!!!!" // After the patch: "!a!b!c!" } } ``` Fixes llvm#64451 rdar://119912002

Re-apply llvm#87550 with fixes. Details: Some tests in fuchsia failed because of the newly added assertion. This was because `GetExceptionBreakpoint()` could be called before `g_dap.debugger` was initted. The fix here is to just lazily populate the list in GetExceptionBreakpoint() rather than assuming it's already been initted. (There is some nuisance here because we can't simply just populate it in DAP::DAP(), which is a global ctor and is called before `SBDebugger::Initialize()` is called. )

This patch reverts 9b832b7 (llvm#87111): - [libc++] Deprecated `shared_ptr` Atomic Access APIs as per P0718R2 - [libc++] Implemented P2869R3: Remove Deprecated `shared_ptr` Atomic Access APIs from C++26 As explained in [1], the suggested replacement in P2869R3 is `__cpp_lib_atomic_shared_ptr`, which libc++ does not yet implement. Let's not deprecate the old way of doing things before the new way of doing things exists. [1]: llvm#87111 (comment)

…rep expression (and remove an unused argument)

Add SHAPE runtime API (will be used for assumed-rank, lowering is generating other cases inline). I tried to make it in a way were there is no dynamic allocation in the runtime/deallocation expected to be inserted by inline code for arrays that we know are small (lowering will just always stack allocate a rank 15 array to avoid dynamic stack allocation or heap allocation).

…lag (llvm#94749)

) Summary: AMDGPU supports a `target-id` feature which is used to qualify targets with different incompatible features. These are both rules and target features. Currently, we pass `-target-cpu` twice when offloading to OpenMP, and do not pass the target-id features at all. The effect was that passing something like `--offload-arch=gfx90a:xnack+` would show up as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the xnack completely and passing it twice. This patch fixes that to pass it once and then separate it like how HIP does.

…m#94592) As discussed in llvm#94443, this PR changes the wording to be more correct.

…lvm#94756)

Otherwise, older copies of LLD may not understand the latest bitcode versions (for example, if we increase `ModuleSummaryIndex::BitCodeSummaryVersion`) Related to llvm#90692 (comment)

…lvm#94538) It also moves the test near other similar test cases.

) Prior to this change, `SelectionDAGBuilder` was producing `SDNode`s of the form: `f32 = extract_vector_elt <1 x bfloat|half>, i32 0` when lowering phis of `<1 x bfloat|half>` and running on a target that promotes this type to `f32` (like some x86 or AMDGPU targets.) This construct is invalid since this type of node only allows type extensions for integer types. It went unotice because the `extract_vector_elt` node is later broken down in `bitcast` followed by `bf16_to_fp|fp_extend`. However, when the argument of the phi is a constant we were crashing because the existing code would try to constant fold this `extract_vector_elt` into a any_ext. This patch fixes this by using a proper decomposition for `<1 x bfloat|half>`: ``` bfloat|half = bitcast <1 x blfoat|half> float = fp_extend bfloat|half ``` This change should be NFC for the non-constant-folding cases and fix the SDISel crashes (reported in llvm#94449) for the folding cases. Note: The change on the arm test is a missing fp16 to f32 constant folding exposed by this patch. I'll push a separate improvement for that.

…X, 0))))) to vmax+vnclipu. (llvm#94720) This pattern is an obscured way to express saturating a signed value into a smaller unsigned value. If (setltu, X, 256) is true, then the value is already in the desired range so we can pick X. If it's false, we select (sext (setgt X, 0)) which is 0 for negative values and all ones for positive values. The all ones value when truncated to the final type will still be all ones like we want.

…lvm#94698) Most other instructions accept addresses that start with a '(' without an immediate before it. The .insn cases were missing. This is also supported by binutils.

This reverts commit 35fa2de. It broke the LLDB bots on green dragon

Similar to f7018ba, this adds patterns for floating point faddp from an fadd and shuffles.

Only in expensive checks, to match other LazyCallGraph verification. Is helpful for verifying LazyCallGraph updates. Many issues only surface when we reuse the LazyCallGraph.

…m#94673) Debug info generation won't emit the alignment of types that have a standard alignment. It was not taking into account the that case. rdar://127785973

…lvm#94762) For `using std::literals`, we now output: error: using declaration cannot refer to a namespace 4 | using std::literals; | ~~~~~^ note: did you mean 'using namespace'? 4 | using std::literals; | ^ | namespace Previously, we didn't have the note. This only fires for qualified namespaces. Just `using std;` doesn't trigger this, since using declarations without cxx scope specifier are rejected earlier. Making that work is an exercise for future selves :)

For baremetal targets that don't support FILE, this version of printf just writes directly to a function provided by a vendor. To do this both printf and vprintf were moved to /generic (vprintf since they need the same flags and cmake gets funky about setting variables in one file and reading them in another).

…criminator (llvm#94506) It's useful if the probe-based build can consume a dwarf based profile(e.g. the profile transition), before there is a conflict for the discriminator, this change tries to mitigate the issue by encoding the dwarf base discriminator into the probe discriminator. As the num of probe id(num of basic block and calls) starts from 1, there are some unused space. We try to reuse some bit of the probe id. The new encode rule is: - Use a bit to [28:28] to indicate whether dwarf base discriminator is encoded.(fortunately we can borrow this bit from the `PseudoProbeType`) - If the bit is set, use [15:3] for probe id, [18:16] for dwarf base discriminator. Otherwise, still use [18:3] for probe id. Note that these doesn't affect the original probe id capacity, we still prioritize probe id encoding, i.e. the base discriminator is not encoded when probe id is bigger than [15:3]. Then adjust `getBaseDiscriminatorFromDiscriminator` to use the base discriminator from the probe discriminator.

Fixes: 56c4971 If the default target triple uses visualstudio::Linker::ConstructJob, when a MSVC installation cannot be found, there will be a -Wmsvc-not-found diagnostic, which is turned to an error due to -Werror. We have many driver tests that don't specify --target= and would get a -Wmsvc-not-found warning, but this might be the only that uses -Werror and is not skipped by a `UNSUPPORTED`.

Pass the linker LTO options enabled by the clang '-flto' command line options when targeting bare-metal. --------- Co-authored-by: Keith Walker <keith.walker@arm.com>

Needed for getaddrinfo().

…en SPIR-V entities and required capability/extensions (llvm#94626) This PR continues llvm#94467 and contains fixes in emission of type intrinsics, constant recording and corresponding test cases: * type-deduce-global-dup.ll -- fix of integer constant emission on 32-bit platforms and correct type deduction for globals * type-deduce-simple-for.ll -- fix of GEP translation (there was an issue previously that led to incorrect translation/broken logic of for-range implementation) This PR also: * fixes a cast between identical storage classes and updates the test case to include validation run by spirv-val, * ensures that Bitcast for pointers satisfies the requirement that the address spaces must match and adds the corresponding test case, * improve encode in Tablegen and decode in code of dependencies between SPIR-V entities and required capability/extensions, * prevent emission of identical OpTypePointer instructions.

…rt (llvm#94613) When compiling the following code: ```cpp #include <stdio.h> #include <stdlib.h> #include <stddef.h> #include <stdbool.h> int main() { int a; float f; scanf("%d", &a); scanf("%f", &f); a += (int)f; return a; } ``` for `-march=rv32ima_zbb` we get a libcall: ``` call scanf lw a0, -20(s0) call __fixsfsi mv a1, a0 ``` When we try to use GlobalISel we get this error: ``` error in backend: unable to legalize instruction: %9:_(s32) = G_FPTOSI %8:_(s32) (in function: main) ``` (Here is a link to a reproducer in Godblot: https://godbolt.org/z/f67vEEb41 ) The goal of this PR is to do a libcall for the legalization of `G_FPTOSI` and `G_FPTOUI` instead of doing a fallback to Selection DAG to do the same libcall later.

…93156) Fixes llvm#93104 Prevent a crash by only printing DWARFUnit-unaware information in cases in which `DWARFUnit* U` is `nullptr`.

Two folds unlocked: `(icmp eq/ne (xor x, C0), C1)` -> `(icmp eq/ne x, C2)` `(icmp eq/ne (xor x, y), 0)` -> `(icmp eq/ne x, y)` This fixes regressions assosiated with llvm#87180 Closes llvm#87275

…4801) Without a value set conditional checks like if(NOT ${OPENMP_STANDALONE_BUILD}) will not be able to evaluate to true. Fixes issue introduced from PR llvm#93463, which did not allow the OMPT variable to be propogated up to offload during a runtimes build.

There were some tests in this file with "noerrno" in the name, but all the tests were no errno since all the libcalls were declared with memory(none). Ensure we have adequate coverage for the errno and no-errno cases by duplicating the libcall transform cases into errno and non-errno versions with callsite attributes.

…lvm#94679) This PR extends Dwarf.def to include the number of operands and the arity (the number of entries on the DWARF stack). - The arity is used in LLDB's DWARF expression evaluator. - The number of operands is unused, but is present in the table to avoid confusing the arity with the operands. Keeping the latter up to date should be straightforward as it maps directly to a table present in the DWARF standard.

…iomVectorize (llvm#94081) To facilitate sharing LoopIdiomTransform between AArch64 and RISC-V, this first patch moves AArch64LoopIdiomTransform from lib/Target/AArch64 to lib/Transforms/Vectorize and renames it to LoopIdiomVectorize. The following patch (llvm#94082) will teach LoopIdiomVectorize how to generate VP intrinsics (in addition to the current masked vector style) in favor of RVV.

GNU ld's relocatable linking behaviors: * Sections with the `SHF_GROUP` flag are handled like sections matched by the `--unique=pattern` option. They are processed like orphan sections and ignored by input section descriptions. * Section groups' (usually named `.group`) content is updated as the section indexes are updated. Section groups can be discarded with `/DISCARD/ : { *(.group) }`. `-r --force-group-allocation` discards section groups and allows sections with the `SHF_GROUP` flag to be matched like normal sections. If two section group members are placed into the same output section, their relocation sections (if present) are combined as well. This behavior can be useful when -r output is used as a pseudo shared object (e.g., FreeBSD's amd64 kernel modules, CHERIoT compartments). This patch implements --force-group-allocation: * Input SHT_GROUP sections are discarded. * Input sections do not get the SHF_GROUP flag, so `addInputSec` will combine relocation sections if their relocated section group members are combined. The default behavior is: * Input SHT_GROUP sections are retained. * Input SHF_GROUP sections can be matched (unlike GNU ld) * Input SHF_GROUP sections keep the SHF_GROUP flag, so `addInputSec` will create different OutputDesc copies. GNU ld provides the `FORCE_GROUP_ALLOCATION` command, which is not implemented. Pull Request: llvm#94704

Summary: Use workaround for quadratic behavior inside AttemptToFoldSymbolOffsetDifference called from BinaryEmitter::emitLSDA. llvm@b06e736#commitcomment-142836456

These instructions are very similar to narrowing shift instructions which already have this. Remove TargetConstraintType parameter from VPseudoBinaryV_WV class. Only 2 was ever passed to it. Pass 2 directly to the classes instantiated from VPseudoBinaryV_WV instead.

Part of llvm#93566.

Add new check misc-use-internal-linkage to detect variable and function can be marked as static. --------- Co-authored-by: Danny Mösch <danny.moesch@icloud.com>

See Buildbot failure: https://lab.llvm.org/buildbot/#/builders/138/builds/67337.

This reverts commit b6824c9. This relands commit 0a6c74e. The original commit was reverted due to buildbot failures. These bots should be updated now, so hopefully this will stick.

…Interface` (llvm#94516) This patch adds `getLoopInductionVars`, `getLoopLowerBounds`, `getLoopBounds`, `getLoopSteps` interface methods to `LoopLIkeOpInterface`. The corresponding single value versions have been moved to shared class declaration and have been implemented based on the new interface methods.

To help debug or surface matching issues, add more statistics to the matching. Also add optional emission of each context seen in the function profiles along with its allocation type, size in bytes, and whether it was matched. This information is emitted along with a hash of the full stack context, to allow deduplication across modules for allocations within header files.

…ryCarryIn. NFC They were always passed the same values, 1 for CarryIn and "" for Constraint.

It doesn't always have a CarryIn. One of the parameters is named CarryIn. It always has CarryOut or a CarryIn and in some cases both.

This is useful if you have a transcript of a user session and want to rerun those commands with RunCommandInterpreter. The same functionality is also useful in testing. I'm adding it primarily for the second reason. In a subsequent patch, I'm adding the ability to Python based commands to provide their "auto-repeat" command. Among other things, that will allow potentially state destroying user commands to prevent auto-repeat. Testing this with Shell or pexpect tests is not nearly as accurate or convenient as using RunCommandInterpreter, but to use that I need to allow auto-repeat. I think for consistency's sake, having interactive sessions always do auto-repeats is the right choice, though that's a lightly held opinion...

We call llvm::sort in a couple of places in the V3 encoding: - We sort Frames by FrameIds for stability of the output. - We sort call stacks in the dictionary order to maximize the length of the common prefix between adjacent call stacks. It turns out that we can improve the deserialization performance by modifying the comparison functions -- without changing the format at all. Both places take advantage of the histogram of Frames -- how many times each Frame occurs in the call stacks. - Frames: We serialize popular Frames in the descending order of popularity for improved cache locality. For two equally popular Frames, we break a tie by serializing one that tends to appear earlier in call stacks. Here, "earlier" means a smaller index within llvm::SmallVector<FrameId>. - Call Stacks: We sort the call stacks to reduce the number of times we follow pointers to parents during deserialization. Specifically, instead of comparing two call stacks in the strcmp style -- integer comparisons of FrameIds, we compare two FrameIds F1 and F2 with Histogram[F1] < Histogram[F2] at respective indexes. Since we encode from the end of the sorted list of call stacks, we tend to encode popular call stacks first. Since the two places use the same histogram, we compute it once and share it in the two places. Sorting the call stacks reduces the number of "jumps" by 74% when we deserialize all MemProfRecords. The cycle and instruction counts go down by 10% and 1.5%, respectively. If we sort the Frames in addition to the call stacks, then the cycle and instruction counts go down by 14% and 1.6%, respectively, relative to the same baseline (that is, without this patch).

@nikic

… + (Op01C + Op1C) (llvm#94586) This patch simplifies `sdiv` to `udiv` by preserving the `nsw` flag for `(X | Op01C) + Op1C --> X + (Op01C + Op1C)` if the sum of `Op01C` and `Op1C` will not overflow, and preserves the `nuw` flag unconditionally. Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT, https://alive2.llvm.org/ce/z/YnJHnH

So long as ld -r links using bitcode always result in an ELF object, and not a merged bitcode object, the output form a relocatable link using FatLTO objects should not have a .llvm.lto section. Prior to this, using the object code sections would cause the bitcode section in the output of a relocatable link to be corrupted, by concatenating all the .llvm.lto sections together. This patch discards SHT_LLVM_LTO sections when not using --fat-lto-objects, so that the relocatable ELF output won't contain inalid bitcode.

While we are at it, this patch changes the type of ValueCounts to std:array<double, ...> so that we can use std::array:fill. Identified with modernize-use-default-member-init.

The scheduler class name is hardcoded in the class so its not a general class.

…ritance. NFC VPseudoVROR can inherit from VPseudoVROL. Adjust the names to VPseudoVROT_VV_VX and VPseudoVROT_VV_VX_VI.

These pseudoinstructions have a policy operand so calling them TU is confusing.

This class is most closely related to VPatBinaryM.

…atBinaryW_VV_VX_VI_VWSLL. NFC

…ic analyzer (llvm#94106) This job will run once per day on the main branch, and for every commit on a release branch. It currently only builds llvm, but could add more sub-projects in the future. OpenSSF Best Practices recommends running a static analyzer on software before it is released: https://www.bestpractices.dev/en/criteria/0#0.static_analysis

…(NFC) /llvm-project/mlir/lib/Target/LLVMIR/ModuleImport.cpp:48: tools/mlir/include/mlir/Dialect/LLVMIR/LLVMConversionEnumsFromLLVM.inc:158:11: error: enumeration value 'Reserved' not handled in switch [-Werror,-Wswitch] switch (value) { ^~~~~ 1 error generated.

DFSan's sscanf is incorrect (llvm#94769), which results in erroneous matches when scraping RSS from /proc/maps. This patch works around the issue by using strstr as a secondary check. It also adds a loose validity check for the initial RSS measurement, to guard against regressions in get_rss_kb(). Fixes llvm#91287

…haderType (llvm#93847) `HLSLShaderAttr::ShaderType` enum is a subset of `llvm::Triple::EnvironmentType`. We can use `llvm::Triple::EnvironmentType` directly and avoid converting one enum to another.

…uilds. NFC. (llvm#94835) * generate Clang configuration file with provided target sysroot (TOOLCHAIN_TARGET_SYSROOTFS) * explicitly pass provided target sysroot into the compiler-rt tests configuration. * added ability to configure a type of the build libraries -- shared or static (TOOLCHAIN_SHARED_LIBS, default OFF) In behalf of: llvm#94284

These usually have a single value that is always used. We can just hardcode into the class body.

Also remove some unused Constraint paramters that appeared before the sew parameter.

) Passing the result of c_str() to a stream is slow and redundant. This change removes unnecessary c_str() calls and uses the string object directly. Caught by cppcheck - lldb/tools/debugserver/source/JSON.cpp:398:19: performance: Passing the result of c_str() to a stream is slow and redundant. [stlcstrStream] lldb/tools/debugserver/source/JSON.cpp:408:64: performance: Passing the result of c_str() to a stream is slow and redundant. [stlcstrStream] lldb/tools/debugserver/source/JSON.cpp:420:54: performance: Passing the result of c_str() to a stream is slow and redundant. [stlcstrStream] lldb/tools/debugserver/source/JSON.cpp:46:13: performance: Passing the result of c_str() to a stream is slow and redundant. [stlcstrStream] Fix llvm#91212

…lvm#94843) Reverts llvm#87635 On some corner cases, lld generated an object file with an empty REL section with `sh_info` set to 0. This file triggers an lld error when used as its input. See llvm#87635 (comment) for details.

…ue. NFC

`BlockT *LoopBase<BlockT, LoopT>::getLoopPreheader()` was changed in 7243607 to use `llvm::size` rather than the checking that `child_begin() + 1 == child_end()`. `llvm::size` requires that `std::distance` be O(1) and hence that clients support random access. Use `llvm::hasSingleElement` instead.

Some Ubuntu builds were broken after 20d497c "[Driver] Remove unneeded *-linux-gnu after D158183". This patch by Fangrui Song fixes this with a handling in config.guess.

… instead of inheritance (llvm#92785) This commit simplifies the design of the `GreedyPatternRewriterDriver` class. This class used to inherit from both `PatternRewriter` and `RewriterBase::Listener` and then attached itself as a listener. In the new design, the class has a `PatternRewriter` field instead of inheriting from `PatternRewriter`, which is generally perferred in object-oriented programming. --------- Co-authored-by: Markus Böck <markus.boeck02@gmail.com>

…m#90588) Use handleFloatFloatBinOp to properly diagnose NaN results and divisions by zero. Fixes llvm#84871

In some case, constant can survive early constant folding optimization because they are hidden behind several layers of type changes. E.g., consider the following sequence (extracted from the arm test that this commit changes): ``` t2: v1f16 = BUILD_VECTOR ConstantFP:f16<APFloat(0)> t4: v1f16 = insert_vector_elt t2, ConstantFP:f16<APFloat(0)>, Constant:i32<0> t5: f16 = bitcast t4 t6: f32 = fp_extend t5 ``` Because the constant (APFloat(0)) is hidden behind a <1 x ty> type, all the constant folding that normally happen for scalar nodes when using `SelectionDAG::getNode` are blocked. As a result the constant manages to survive as an actual conversion instruction down to the select phase: ``` t11: f32 = fp16_to_fp Constant:i32<0> ``` With the change in this patch, we try to do constant folding one more time during dag combine, which in the motivating example result in the much better sequence: ``` t7: ch = CopyToReg t0, Register:f32 %0, ConstantFP:f32<0.000000e+00> ``` Note: I'm sure we have this problem in a lot of other places. Generally speaking I believe SDISel is not that good with <1 x ty> compared to pure scalar. However, I only changed what I could easily test.

…ray (llvm#94171) `std::aligned_storage` is deprecated with C++23, see [here](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1413r3.pdf). This replaces the usages of `std::aligned_storage` within compiler-rt with an aligned `std::byte` array. I will provide patches for other subcomponents as well.

2e1788f reverted llvm#94843. It was creating `%t` as a directory and causes an error in incremental builds.

This patch removes LLDB from a list of projects that are excluded from building and testing on pre-merge CI on Linux. Windows environment needs to be prepared in order to test LLDB (llvm#94208 (comment)), but we don't have enough maintenance resources to do that at the moment. Because LLDB has been in the list of projects that need to be tested on Clang changes, this PR make this happen on Linux. This seems to be the consensus in the discussion of this PR.

…hes with an unreachable default case (llvm#94468) When transforming a switch with holes into a lookup table, we currently use a mask to check if the current index is handled by the switch or if it is a hole. If it is a hole, we skip loading from the lookup table. Normally, if the switch's default case is unreachable this has no impact, as the mask test gets optimized away by subsequent passes. However, if the switch is large enough that the number of lookup table entries exceeds the target's register width, we won't be able to fit all the cases into a mask and the switch won't get transformed into a lookup table. If we know that the switch's default case is unreachable, we know that the mask is unnecessary and can skip constructing it entirely, which allows us to transform the switch into a lookup table. [Example](https://godbolt.org/z/7x7qfx8M1) In the future, it might be interesting to consider allowing lookup table masks to be more than one register large (e.g. using a constant array of bit flags, similar to `std::bitset`).

The pr description in llvm#94008 mismatches with the code. > + When VT is smaller than ShiftVT, it is safe to use trunc. > + When VT is larger than ShiftVT, it is safe to use zext iff `is_zero_poison` is true (i.e., `opcode == ISD::CTTZ_ZERO_UNDEF`). See also the counterexample `src_shl_cttz2 -> tgt_shl_cttz2` in the alive2 proofs. Closes llvm#94824.

This patch relands llvm#91469 and uses `uint64_t` for repeat count to avoid a miscompilation caused by overflow llvm#91469 (comment).

Since we can bitcast and then do the same thing sub does in the table section above, I figured it was trivial to add fsub, fmul, and fdiv.

While I am at it, this patch adds const to a couple of places.

'override' makes 'virtual' redundant. Identified with modernize-use-override.

This depends on enabling the use of extensions.

This broadly follows how in almost all places, we accept `(<reg>)` to mean `0(<reg>)`, but I think these are the first like this for Jumps rather than Loads/Stores. These are accepted by binutils but not by LLVM: https://godbolt.org/z/GK7MGE7q7

Identified with modernize-use-default-member-init.

… (NFC) (llvm#94840) Cppcheck recommends using a const reference for range variables in a for-each loop. This avoids unnecessary copying of elements, improving performance. Caught by cppcheck - lldb/source/API/SBBreakpoint.cpp:717:22: performance: Range variable 'name' should be declared as const reference. [iterateByValue] lldb/source/API/SBTarget.cpp:1150:15: performance: Range variable 'name' should be declared as const reference. [iterateByValue] lldb/source/Breakpoint/Breakpoint.cpp:888:26: performance: Range variable 'name' should be declared as const reference. [iterateByValue] lldb/source/Breakpoint/BreakpointIDList.cpp:262:26: performance: Range variable 'name' should be declared as const reference. [iterateByValue] Fix llvm#91213 Fix llvm#91217 Fix llvm#91219 Fix llvm#91220

…1537) This is an implementation of floating point multiplication: It will consist of - `double x double -> float`

Added remquof128 function. Closes llvm#94312

Apply the onlyFirstPartUsed logic generally to all per-part VPInstructions. Note that the test changes remove the second part of an unsued first-order recurrence splice.

This patch adds initial support to the half type on RISC-V.

Now that FOR exit and resume value creation is explicitly modeled in VPlan (05e1b53, 07b3301) it doesn't depend on the first order recurrence splice being preserved and it can now be marked as not having side-effects. This allows removal of first-order-recurrence-splce if the FOR is only used in the exit or as scalar ph resume value.

…C) (llvm#94862) readNext has two variants: - readNext<uint64_t, endian>(ptr) - readNext<uint64_t>(ptr, endian) This patch uses the latter to simplify readBinaryIdsInternal. Both forms default to unaligned.

…ive shifts; NFC

When we fold `(shift (shift C0, x), C1)` we can propagate flags that are common to both shifts. Proofs: https://alive2.llvm.org/ce/z/LkEzXD Closes llvm#94872

Decrease the uses of getFragmentList() to make it easier to change the fragment list representation.

Transform `addc imm, %rs, %rd` into `addc %rs, imm, %rd`. This is used in some GNU and Linux code. Reviewers: s-barannikov, rorth, jrtc27, brad0 Reviewed By: s-barannikov Pull Request: llvm#94245

This adds support for GNU %uhi and %ulo extensions. Those resolve to the same relocations as %hh and %hm. Reviewers: cyndyishida, dcci, brad0, jrtc27, aaupov, Endilll, rorth, maksfb, #reviewers-libcxxabi, s-barannikov, rafaelauler, ayermolo, #reviewers-libunwind, #reviewers-libcxx, daniel-grumberg, tbaederr Reviewed By: s-barannikov Pull Request: llvm#94246

This adds %set_softint and %clear_softint alias for %asr20 and %asr21 as defined in JPS1. Reviewers: jrtc27, brad0, s-barannikov, rorth Reviewed By: s-barannikov Pull Request: llvm#94247

Really perform the conversion always if the flag is set and don't make it dependent on whether we're checking the result for initialization.

…rentheses (llvm#94654) Do not emit warnings for non-math operators. Closes llvm#92516

…m#94736)

Based on feedback on llvm#94863

…onversion (llvm#94512) Ignore implicit declarations and defaulted functions. Helps with issues in generated code like, C++ spaceship operator. Closes llvm#93409

…tions (llvm#94651) Add StrictCStandardCompliance and StrictCppStandardCompliance options that default to true. Closes llvm#83732

…llvm#94524) Ignore implicit pointer conversions that are part of a cast expression Closes llvm#93959

…ring bitcasted constants (llvm#94863) We currently only constant fold binop(bitcast(c1),bitcast(c2)) if c1 and c2 are both bitcasted and from the same type. This patch relaxes this assumption to allow the constant build vector to originate from different types (and allow cases where only one operand was bitcasted). We still ensure we bitcast back to one of the original types if both operand were bitcasted (we assume that if we have a non-bitcasted constant then its legal to keep using that type).

llvm#93882) This moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block, so that it triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms.

Handle binary ops and a few other instructions in onlyFirstPartUsed; they only use the first part if they themselves only have their first part used.

Swap out range metadata to range attribute for calls to be able to deprecate range metadata on calls in the future.

This adds named tag constants (such as `#one_write` and `#one_read`) for the prefetch instruction. Reviewers: jrtc27, rorth, brad0, s-barannikov Reviewed By: s-barannikov Pull Request: llvm#94249

This adds support for `prefetcha` instruction for prefetching from alternate address spaces. Reviewers: jrtc27, brad0, rorth, s-barannikov Reviewed By: s-barannikov Pull Request: llvm#94250

If the Count passed into writeNopData is not a multiple of four, add a little amount of zeros before writing the NOP stream. This makes it match the behavior of GNU binutils. Reviewers: brad0, rorth, s-barannikov, jrtc27 Reviewed By: s-barannikov Pull Request: llvm#94251

This adds the alternate mnemonics for movrz and movrnz. Reviewers: s-barannikov, jrtc27, brad0, rorth Reviewed By: s-barannikov Pull Request: llvm#94252

Implements parts of: - P0355 Extending chrono to Calendars and Time Zones

We can implicitly convert RemainingVDs to an ArrayRef. Note that RemainingVDs is of type SmallVector<InstrProfValueData, 24>.

…lvm#94878)

…td. NFC Remove unneeded parameters or sync into class if they are only ever used with one value.

…pShlConstant` (llvm#94899) Closes llvm#94897.

Vectors are supported for fp operations now, so remove the assert. The supported type/operation combinations are best left for the verifier. Avoids regression in future commit that starts treating some vector cases as legal.

…interfaces (llvm#94908) Fully qualify all namespaces appearing in `GPUTargetAttrInterface` and `OffloadingLLVMTranslationAttrInterface`. If they're not fully qualified then out-of-tree dialects might encounter name resolution errors.

…ScopeDirectiveClass case (llvm#77535) (llvm#84135) Fix llvm#77535, Change unstable assertion into compilation error, and add a test for it.

…C) (llvm#94859) VTableNamePtr and CompressedVTableNamesLen are always used together to create a StringRef in getSymtab. We can create the StringRef ahead of time in readHeader. This way, IndexedInstrProfReader becomes a tiny bit simpler with fewer member variables. Also, StringRef default-constructs itself with its Data and Length set to nullptr and 0, respectively, which is exactly what we need.

This implements the throwing overload and the exception classes throw by this overload. Implements parts of: - P0355 Extending chrono to Calendars and Time Zones

…94918) Make `sym_name` an inherent attr in GPUModuleOp so that it doesn't show in the discardable attributes. The change is safe as the attribute is always expected to be present.

MCInst is primarily used in local variables and MCRelaxableFragment (mostly JMP/JCC for x86). Reducing the inline element count can make MCRelaxableFragment smaller, potentially leading to a lower peak RSS. When compiling sqlite3.c, x86-64 has the largest maximum numOperands. aarch64: 5; ppc64: 6; riscv64: 3; s390x: 6; x86-64: 8 Here is the frequency table for x86-64: max getNumOperands: 8 0: 676 1: 37892 2: 84046 3: 26767 4: 1640 5: 1222 6: 80794 7: 768 8: 22 Pull Request: llvm#94913

This PR fixes attribute registration for `SI8Attr` and `UI8Attr` in `ir.py`.

…4922) BinaryIdsStart and BinaryIdsSize in IndexedInstrProfReader are always used together, so this patch packages them into an ArrayRef<uint8_t>. For now, readBinaryIdsInternal immediately unpacks ArrayRef into its constituents to avoid touching the rest of readBinaryIdsInternal.

relax-recompute-align.s might change when we change the fragment relaxation approach.

Part of llvm#93566.

Lazy relaxation caused hash table lookups (`getFragmentOffset`) and complex use/compute interdependencies. Some expressions involding forward declared symbols (e.g. `subsection-if.s`) cannot be computed. Recursion detection requires complex `IsBeingLaidOut` (https://reviews.llvm.org/D79570). D76114's `invalidateFragmentsFrom` makes lazy relaxation even less useful. Switch to eager relaxation to greatly simplify code and resolve these issues. This change also removes a `getPrevNode` use, which makes it more feasible to replace the fragment representation, which might yield a large peak RSS win. Minor downsides: The number of section relaxations may increase (offset by avoiding the hash table lookup). For relax-recompute-align.s, the computed layout is not optimal.

…4896) This commit fixes a crash in the ownership-based buffer deallocation pass when indirectly calling a function via SSA value. Such functions must be conservatively assumed to be public. Fixes llvm#94780.

This implements the overload with the choose argument and adds this enum. Implements parts of: - P0355 Extending chrono to Calendars and Time Zones

The symbol including this member is being overwritten by memcpy here: https://github.com/llvm/llvm-project/blob/2117677e304d334326f6591f3c75fb2f34dc4bcb/lld/COFF/SymbolTable.cpp#L496-L509

This makes codegen for array initialization simpler in two ways: 1. Drop the zero-index GEP at the start, which is no longer needed with opaque pointers. 2. Emit GEPs directly to the correct element, instead of having a long chain of +1 GEPs. This is more canonical, and also avoids regressions in unoptimized builds from llvm#93823.

Refactor the pass to only support `IntrinsicInst` calls. `ReplaceWithVecLib` used to support instructions, as AArch64 was using this pass to replace a vectorized frem instruction to the fmod vector library call (through TLI). As this replacement is now done by the codegen (llvm#83859), there is no need for this pass to support instructions. Additionally, removed 'frem' tests from: - AArch64/replace-with-veclib-armpl.ll - AArch64/replace-with-veclib-sleef-scalable.ll - AArch64/replace-with-veclib-sleef.ll Such testing is done at codegen level: - llvm#83859

…lvm#94754) Prior to this patch VisualStudio._get_step_info incorrectly identifies the reason the debugger has stopped. e.g., stepping through a program would be reported as a StopReason.Breakpoint rather than StopReason.Step. Fix. No test added as there are no VisualStudio tests (tested locally).

Reapply after llvm#93956, which changed clang array initialization codegen to avoid size regressions for unoptimized builds. ----- This fold is subtly incorrect, because DL-unaware constant folding does not know the correct index type to use, and just performs the addition in the type that happens to already be there. This is incorrect, since sext(X)+sext(Y) is generally not the same as sext(X+Y). See the `@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the current implementation. One could try to restrict the fold to cases where no overflow occurs, but I'm not bothering with that here, because the DL-aware constant folding will take care of this anyway. I've only kept the straightforward zero-index case, where we just concatenate two GEPs.

The checker is made more exact (only pointer into array is allowed, check array index) and more tests are added.

…on funcs (llvm#92417) Add `LLVMPositionBuilderBeforeDbgRecords` and `LLVMPositionBuilderBeforeInstrAndDbgRecords` to `llvm/include/llvm-c/Core.h` which behave the same as `LLVMPositionBuilder` and `LVMPositionBuilderBefore` except that the position is set before debug records attached to the target instruction (the existing functions set the insertion point to after any attached debug records). More info on debug records and the migration towards using them can be found here: https://llvm.org/docs/RemoveDIsDebugInfo.html The distinction is important in some situations. An important example is when inserting a phi before another instruction which has debug records attached to it (these come "before" the instruction). Inserting before the instruction but after the debug records would result in having debug records before a phi, which is illegal. That results in an assertion failure: `llvm/lib/IR/Instruction.cpp:166: Assertion '!isa<PHINode>(this) && "Inserting PHI after debug-records!"' failed.` In llvm (C++) we've added bit to instruction iterators that carries around the extra information. Adding dedicated functions seemed like the least invasive and least suprising way to update the C API. Update llvm/tools/llvm-c-test/debuginfo.c to test this functionality. Update the OCaml bindings, the migration docs and release notes.

Allocate result statically on the stack (using max rank) and use the runtime to fill it in correctly.

…94841) The `else if` condition for checking `m_compression_type` is redundant as it matches with a previous `if` condition, making the expression always false. Reported by cppcheck as a possible cut-and-paste error. Caught by cppcheck - lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp:543:35: style: Expression is always false because 'else if' condition matches previous condition at line 535. [multiCondition] Fix llvm#91222

This issue is reported by cppcheck as a pointless test in the watch mask check. The `else if` condition is opposite to the previous `if` condition, making the expression always true. Caught by cppcheck - lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:509:25: style: Expression is always true because 'else if' condition is opposite to previous condition at line 505. [multiCondition] Fix llvm#91223

…ace (llvm#93212)

…e' feature is missing (llvm#94903) Do not let the compiler gets failed in case the target platform does not support the 'coroutine' C++ features. Just compile without it and let lldb know about missed/unsupported feature.

Use fast unsigned arithmetic before constructing an APInt. This gives me a ~2x speed up when running this in my Release+Asserts build: $ unittests/Support/SupportTests --gtest_filter=KnownBitsTest.*Exhaustive

llvm#94362) …and operators that have non-const overloads. This allows `unnecessary-copy-initialization` to warn on more cases. The common case is a class with a a set of const/non-sconst overloads (e.g. std::vector::operator[]). ``` void F() { std::vector<Expensive> v; // ... const Expensive e = v[i]; } ```

…xtendedValue (llvm#94822) The hlfir::Entity to fir::ExtendedValue conversion usually uses the "fir base" output of hlfir.declare (which is the same as the input) to avoid introducing temporary descriptors for the sole purpose of introducing updating lower bound information. This is possible because local lower bounds, if any, are tracked in a vector inside the fir::ExtendedValue. With assumed-ranks, the lower bounds cannot be tracked inside the fir::ExtendedValue vector (their numbers is unknown at compile time). Hence, the fir.box/fir.class used in fir::ExtendedValue in lowering must always contain accurate local lower bound information.

…4739) Use tablegen to automatically create the pass constructor. The purpose of this pass is to add attributes to functions, so it doesn't need to work on other top level operations.

…ents per P2448 (llvm#94123) Fixes llvm#92583

…st. (llvm#94943) Exit early if known bits have a conflict. This gives me a ~15% speed up when running this in my Release+Asserts build: $ unittests/Support/SupportTests --gtest_filter=KnownBitsTest.*Exhaustive

…m#94330) Co-authored-by: Andrew Gozillon <Andrew.Gozillon@amd.com>

Implements parts of: - P0355 Extending chrono to Calendars and Time Zones

Remove the copy into fresh variables done when lowering scf.for into emitc.for and use the variables carrying the init and iter values as the loop's results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with fe0dee4d (Jun 10) (62) #323

[AutoBump] Merge with fe0dee4d (Jun 10) (62) #323

Commits on Jun 7, 2024

Commits on Jun 8, 2024

Commits on Jun 9, 2024

Commits on Jun 10, 2024

Commits on Sep 5, 2024