[AutoBump] Merge with f4b9839d (Sep 04) (20) #373

…#106835) The target name and the message are wrong -- both should say "cuda" for the filtering to work. Fixes commit 300e5b9 (llvm#93186).

In llvm#92581 the `LibomptargetUitls.cmake` helpers have been removed, but only uses of `libomptarget_say` were migrated. Migrate the remaining few warning and error messages so the `check-offload` target would not fail due to missing `libomptarget_warning_say`. While at it, update the `check-offload` unavailability message to say `check-offload` instead of `check-libomptarget`. Fixes llvm#92581

…lvm#106634) Summary: The `langinfo.h` header is a POSIX extension, so ideally we would be able to build the C++ library without it. Currently the LLVM C library doesn't support / provide it. This allows us to build the C++ library with locales enabled. We can either disable it here, or just provide stubs that do nothing as in llvm#106620.

…6632) Summary: We currently do not provide a more complicated rune table, so we want the default.

This patch adds cost model support for [u|s]cmp.

/llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:21558:14: error: unused variable 'ValLMUL' [-Werror,-Wunused-variable] unsigned ValLMUL = ^ /llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:21561:14: error: unused variable 'PartLMUL' [-Werror,-Wunused-variable] unsigned PartLMUL = ^ 2 errors generated.

…mber of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#106449

First step for support WaveSize attribute in https://microsoft.github.io/DirectX-Specs/d3d/HLSL_SM_6_6_WaveSize.html and https://microsoft.github.io/hlsl-specs/proposals/0013-wave-size-range.html A new attribute HLSLWaveSizeAttr was supported in the AST. Implement both the wave size and the wave size range, rather than separately which might require more work. For llvm#70118

HLSL output parameters are denoted with the `inout` and `out` keywords in the function declaration. When an argument to an output parameter is constructed a temporary value is constructed for the argument. For `inout` pamameters the argument is initialized via copy-initialization from the argument lvalue expression to the parameter type. For `out` parameters the argument is not initialized before the call. In both cases on return of the function the temporary value is written back to the argument lvalue expression through an implicit assignment binary operator with casting as required. This change introduces a new HLSLOutArgExpr ast node which represents the output argument behavior. The OutArgExpr has three defined children: - An OpaqueValueExpr of the argument lvalue expression. - An OpaqueValueExpr of the copy-initialized parameter. - A BinaryOpExpr assigning the first with the value of the second. Fixes llvm#87526 --------- Co-authored-by: Damyan Pepper <damyanp@microsoft.com> Co-authored-by: John McCall <rjmccall@gmail.com>

…lvm#106828) Stop adding liveins for virtual registers. In the livein interface, the register goes through a MCPhysReg which is uint16_t. This causes the virtual register bit to be dropped making it alias to some nonsense physical register. Recompute the liveins for the continue block to handle any live registers that are needed by instructions that were spliced from the original block. This fixing the machine verifier error so we can remove that fixme now.

llvm#105740) …limination ArgumentPromotion and DeadArgumentElimination passes may change function signature. This makes bpf tracing difficult since users either not aware of signature change or need to poke into IR or assembly to understand the function signature change. This patch enabled to emit some remarks so if recompiling with -foptimization-record-file=<file>, users can check remarks to see what kind of signature changes for a particular function. The following are some examples for implemented remarks: ``` Pass: deadargelim Name: ReturnValueRemoved DebugLoc: { File: 'bpf-next/net/mptcp/protocol.c', Line: 572, Column: 0 } Function: mptcp_check_data_fin Args: - String: 'removing return value ' - String: '0' Pass: deadargelim Name: ArgumentRemoved DebugLoc: { File: 'bpf-next/kernel/bpf/syscall.c', Line: 1670, Column: 0 } Function: map_delete_elem Args: - String: 'eliminating argument ' - ArgName: uattr.coerce0 - String: '(' - ArgIndex: '1' - String: ')' Pass: argpromotion Name: ArgumentPromoted DebugLoc: { File: 'bpf-next/net/mptcp/protocol.h', Line: 570, Column: 0 } Function: mptcp_subflow_ctx Args: - String: 'promoting argument ' - ArgName: sk - String: '(' - ArgIndex: '0' - String: ')' - String: ' to pass by value' ``` [1] llvm#104678

This applies to function template non-call partial ordering the same provisional wording change applied in the call context: Don't perform the consistency check on return type and parameters which didn't have any template parameters deduced from. Fixes regression introduced in llvm#100692, which was reported on the PR.

…m#101414) We have been discussing changes to our commit access polices recently and based on some feedback from clattner here: https://discourse.llvm.org/t/rfc-new-criteria-for-commit-access/76290/81 We need to update our Developer Policy so that it matches what we are actually doing in this project. We currently grant commit access to anyone with a valid justification, not just contributors who have submitted high-quality patches in the past. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>

…m#106489)

This matches the MachineBasicBlock liveins used to populate it.

…vm#105728) Basic infrastructure to collect Function properties in Metadata Analysis - Add a `SmallVector` of entry properties to the metadata information. - Add a structure to represent function properties. Currently `numthreads` and shader kind properties of shader entry functions are represented.

fixes llvm#103300

@test

…llvm#105510) This patch replaces all dominated uses of condition with true/false to improve context-sensitive optimizations. It eliminates a bunch of branches in llvm-opt-benchmark. As a side effect, it may introduce new phi nodes in some corner cases. See the following case: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else] ret i1 %res } ``` It will be simplified into: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else] ret i1 %res } ``` I am planning to fix this in late pipeline/CGP since this problem exists before the patch.

Fixes llvm#106761.

…asts. NFC

Fix the DeclID not being set in global temporaries and use the same strategy for deciding if a temporary is readable as the current interpreter.

As far as I can tell, there's no way to call this. There are no calls in the X86 directory. It has the same name as a function in MCRegisterInfo, but that function takes a MCRegister and isn't virtual. The function in MCRegisterInfo uses a DenseMap populated by `X86_MC::initLLVMToSEHAndCVRegMapping`. The DenseMap is populated for every physical register using the encoding value. I think that means the function in MCRegisterInfo would return the same value as the function in X86RegisterInfo.

…106886) The LegalizeDAG expansion will go through memory since i16 isn't a legal type. Avoid this by using FMV nodes.

…lvm#106882)

…n Windows (llvm#106794) Suppresses the copyright banner for `ml64` compiling BLAKE3 assembly sources with MSVC and Ninja on Windows: ``` [157/3758] Building ASM_MASM object lib\Support\BLAKE3\CMa...upportBlake3.dir\blake3_avx512_x86-64_windows_msvc.asm.obj Microsoft (R) Macro Assembler (x64) Version 14.41.34120.0 Copyright (C) Microsoft Corporation. All rights reserved. Assembling: C:\path\to\llvm-project\llvm\lib\Support\BLAKE3\blake3_avx512_x86-64_windows_msvc.asm ``` is now just: ``` Assembling: C:\path\to\llvm-project\llvm\lib\Support\BLAKE3\blake3_avx512_x86-64_windows_msvc.asm ``` We can suppress that last line with `/quiet` in more recent versions of `ml64` (from MSVC 2022 17.6) but it is not supported by all potential MASM compilers.

) This doesn't seem to have any use other than the possibility of merge conflicts and accidentally forgetting to update `NUM_PREDEF_DECL_IDS`.

When the shuffle masks are `PoisonMaskElem`, there is not need to check the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause the compiler to crash. Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) && "SK_ExtractSubvector index out of range"' failed.

…vm#106902)

llvm#106889) We can't assume closed world even in full LTO post-link stage. It is only true if we are building a "GPU executable". However, AMDGPU does support "dyamic library". I'm not aware of any approach to tell if it is relocatable link when we create the pass. For now let's revert the patch as it is currently breaking things. We can re-enable it once we can handle it correctly.

NEON has non-IEEE compliant denormal flushing and the compiler should check if it safe to vectorize instructions for NEON in non-fast math mode. Fixes llvm#106909

disabled.

Address comment llvm#106747 (comment).

Implement cost computation for VPWidenCallRecipe. In some cases, targets use argument info to compute intrinsic costs. If all operands of the call are VPValues with an underlying IR value, use the IR values as arguments. PR: llvm#106731

This patch reduces the memory usage for import lists by employing memory-efficient data structures. With this patch, an import list for a given destination module is basically DenseSet<uint32_t> with each element indexing into the deduplication table containing tuples of: {SourceModule, GUID, Definition/Declaration} In one of our large applications, the peak memory usage goes down by 9.2% from 6.120GB to 5.555GB during the LTO indexing step. This patch addresses several sources of space inefficiency associated with std::unordered_map: - std::unordered_map<GUID, ImportKind> takes up 16 bytes because of padding even though ImportKind only carries one bit of information. - std::unordered_map uses pointers to elements, both in the hash table proper and for collision chains. - We allocate an instance of std::unordered_map for each {Destination Module, Source Module} pair for which we have at least one import. Most import lists have less than 10 imports, so the metadata like the size of std::unordered_map and the pointer to the hash table costs a lot relative to the actual contents.

We have special handling for this in type legalization, but we didn't have a test.

Branches exiting the loop will remain regardless, so don't consider them in collectValuesToIgnore. This fixes another divergence between legacy and VPlan-based cost model. Fixes llvm#106780.

This is for an upcoming change to the threshold on Apple targets for using a constant pool for FP literals versus building them with integer moves. This file is based on literal_pools_float.ll. I tried to bolt on to the existing test, but it got messy as that file is already testing a matrix of combinations, so creating this new file instead.

…finx. We should use RMM instead of DYN.

…stead of using isel patterns.

…=` (llvm#106803) Time trace profiler support was added into LLVMgold in cd3255a. This patch adds its `-plugin-opt` counterpart, which is just an alias to `--time-trace=`, into LLD for compatibility.

Widening/narrowing the source data type to match the destination data type may require multiple steps. To model the costs, the patch generated the interim type by following the logic in RISCVTargetLowering::lowerVPFPIntConvOp.

`HighMask` is the value that sets bits from `Msb+1` to 63 to 1, while the other bits are set to 0.

Currently the option prints a path to a nonexistent directory with the full triple, `lib/powerpc64-ibm-aix7.2.0.0`. It should only be `lib/aix`.

…d. NFC (llvm#106671) This predicate isn't bound to the scheduler model and and we may want to reuse it in the future. We already moved it to reuse it in our downstream.

…CI (llvm#106486)

…m#106890) `FindInstantiatedDecl()` relies on the `CurContext` to find the corresponding class template instantiation for a class template declaration. Previously, we pushed the semantic declaration context for constraint comparison, which is incorrect for constraints on friend declarations. In issue llvm#78101, the semantic context of the friend is the TU, so we missed the implicit template specialization `Template<void, 4>` when looking for the instantiation of the primary template `Template` at the time of checking the member instantiation; instead, we mistakenly picked up the explicit specialization `Template<float, 5>`, hence the error. As a bonus, this also fixes a crash when diagnosing constraints. The DeclarationName is not necessarily an identifier, so it's incorrect to call `getName()` on e.g. overloaded operators. Since the DiagnosticBuilder has correctly handled Decl printing, we don't need to find the printable name ourselves. Fixes llvm#78101

This patch extends TypeQuery matching to support anonymous namespaces. A new flag is added to control the behavior. In the "strict" mode, the query must match the type exactly -- all anonymous namespaces included. The dynamic type resolver in the itanium abi (the motivating use case for this) uses this flag, as it queries using the name from the demangles, which includes anonymous namespaces. This ensures we don't confuse a type with a same-named type in an anonymous namespace. However, this does *not* ensure we don't confuse two types in anonymous namespacs (in different CUs). To resolve this, we would need to use a completely different lookup algorithm, which probably also requires a DWARF extension. In the "lax" mode (the default), the anonymous namespaces in the query are optional, and this allows one search for the type using the usual language rules (`::A` matches `::(anonymous namespace)::A`). This patch also changes the type context computation algorithm in DWARFDIE, so that it includes anonymous namespace information. This causes a slight change in behavior: the algorithm previously stopped computing the context after encountering an anonymous namespace, which caused the outer namespaces to be ignored. This meant that a type like `NS::(anonymous namespace)::A` would be (incorrectly) recognized as `::A`). This can cause code depending on the old behavior to misbehave. The fix is to specify all the enclosing namespaces in the query, or use a non-exact match.

…llvm#106075) The op of phi transform wants to prevent moving an operation across a backedge, as this may lead to an infinite combine loop. Currently, this is done using isPotentiallyReachable(). The problem with that is that all blocks inside a loop are reachable from each other. This means that the op of phi transform is effectively completely disabled for code inside loops, even when it's not actually operating on a loop phi (just a phi that happens to be in a loop). Fix this by explicitly computing the backedges inside the function instead. Do this via RPOT, which is a bit more efficient than using FindFunctionBackedges() (which does it without any pre-computed analyses). For irreducible cycles, there may be multiple possible choices of backedge, and this just picks one of them. This is still sufficient to prevent combine loops. This also removes the last use of LoopInfo in InstCombine -- I'll drop the analysis in a followup.

Most of it is redundant with bfloat-convert.ll. One testcase is found in bfloat-imm.ll. The load and stores are more thoroughly tested in bfloat-mem.ll.

…am::operator<< (llvm#106877) These would implicitly cast the register to `unsigned`. Switch most of them to use printReg will give a more readable output. Change some others to use Register::id() so we can eventually remove the implicit cast to `unsigned`.

Recently added HLSL diagnostics (89fb849) pushed the Swift compiler over the existing limit. rdar://135126738

The `dli_sname` filed in `Dl_info` may be `NULL`, which could cause a crash

It may be profitable to revert SCCP propagation of C++ static values, if such constants are pointers, in order to avoid redundant pointer computation, since the method returning the constant is non-removable.

) If the uint64_t constructor is used, assert that the value is actually a signed or unsigned N-bit integer depending on whether the isSigned flag is set. Provide an implicitTrunc flag to restore the previous behavior, where the argument is silently truncated instead. In this commit, implicitTrunc is enabled by default, which means that the new assertions are disabled and no actual change in behavior occurs. The plan is to flip the default once all places violating the assertion have been fixed. See llvm#80309 for the scope of the necessary changes. The primary motivation for this change is to avoid incorrectly specified isSigned flags. A recurring problem we have is that people write something like `APInt(BW, -1)` and this works perfectly fine -- until the code path is hit with `BW > 64`. Most of our i128 specific miscompilations are caused by variants of this issue. The cost of the change is that we have to specify the correct isSigned flag (and make sure there are no excess bits) for uses where BW is always <= 64 as well.

…106721) This test case was failing to compile with a "ran out of registers during register allocation" error at -O0. This was because CMP_SWAP_64 has 3 operands which must be an even-odd register pair, and two other GPR operands. All of the def operands are also early-clobber, so registers can't be shared between uses and defs. Because the function has an over-aligned alloca it needs frame and base pointers, so r6 and r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the only valid register pairs, and if the two individual GPR operands happen to get allocated to registers in different pairs then only 2 pairs will be available for the three GPRPair operands. To fix this, I've merged the two GPR operands into a single GPRPair operand. This means that the instruction now has 4 GPRPair operands, which can always be allocated without relying on luck. This does constrain register allocation a bit more, but this pseudo instruction is only used at -O0, so I don't think that's a problem.

…NFC) Multiple buildbots were previously failing.

…lvm#106742)

llvm#106075 has removed the last dependency on LoopInfo in InstCombine, so don't fetch the analysis anymore and remove the use-loop-info pass option.

…6662) Fixes llvm#106418.

This patch adds vectorization support for [u|s]cmp intrinsic calls.

) We weren't taking account of the space we require in the stubs for things that are dllimported, and as a result we could hit the assertion failure for running out of stub space. Fix that. rdar://133473673 --------- Co-authored-by: Saleem Abdulrasool <compnerd@compnerd.org> Co-authored-by: Lang Hames <lhames@gmail.com> Co-authored-by: Ben Barham <b.n.barham@gmail.com>

f80 is only a thing on x86, and even then the size of long double can be changed with compiler flags. Instead set the size according to the host system (this is what is already done for integer types).

…text. (llvm#106849) Fixes llvm#106713.

…lvm#106954) Looks like I missed an `override` (maybe that warning was enabled recently?). Will revert and fix. Reverts llvm#102586

We can infer the range/nonnull attributes in non-interprocedural SCCP as well. The results may be better after the function has been simplified.

…for OpenBSD (llvm#106956) The thread name length is derived from _MAXCOMLEN which is 24.

When we decompose the GEP offset expression, and the arithmetic is not performed using nuw operations, we cannot retain the nuw flag on the decomposed GEP. For example, if we have `gep nuw p, (a-1)`, this is not at all the same as `gep nuw (gep nuw p, a), -1`. Fix this by tracking NUW through linear expression decomposition, similarly to what we already do for the NSW flag. This fixes the miscompilation reported in llvm#105496 (comment).

…ice (llvm#106755) This renames: - `arm_sme.move_tile_slice_to_vector` to `arm_sme.extract_tile_slice` - `arm_sme.move_vector_to_tile_slice` to `arm_sme.insert_tile_slice` The new names are more consistent with the rest of MLIR and should be easier to understand. The current names (to me personally) are hard to parse and easy to mix up when skimming through code. Additionally, the syntax for `insert_tile_slice` has changed from: ```mlir %4 = arm_sme.insert_tile_slice %0, %1, %2 : vector<[16]xi8> into vector<[16]x[16]xi8> ``` To: ```mlir %4 = arm_sme.insert_tile_slice %0, %1[%2] : vector<[16]xi8> into vector<[16]x[16]xi8> ``` This is for consistency with `extract_tile_slice`, but also helps with readability as it makes it clear which operand is the index.

… Solaris/illumos (llvm#106944)

…h fixes (llvm#106947) This reverts commit fa93be4, restoring commit d884b77, with fixes that ensure the CAPI declarations are exported properly. This commit implements LLVM_DIRecursiveTypeAttrInterface for the DISubprogramAttr to ensure cyclic subprograms can be imported properly. In the process multiple shortcuts around the recently introduced DIImportedEntityAttr can be removed.

For example, if the argument has an alignment attribute, preserve it.

When serialising to textual IR, there can be constant Values referred to by DbgRecords that don't appear anywhere else, and have types hidden even deeper in side them. Enumerate these when enumerating all types. Test by Mikael Holmén.

… state of vector legalization/lowering

)

llvm@1e65b76

) We weren't taking account of the space we require in the stubs for things that are dllimported, and as a result we could hit the assertion failure for running out of stub space. Fix that. Also add a couple of `override` specifiers that were missing last time (llvm#102586). rdar://133473673

…fic procedure (llvm#106693) This may happen when using modules. Fixes llvm#93707

This does nothing and returns 0.

... if done via a ImplicitValueInitExpr. We were already doing this later in visitZeroRecordInitializer().

…ce aliases (llvm#106706)

An `emitc.include` should be usable even though the parent is not a ModuleOp. This requirement is therefore removed.

…empt (llvm#106902)" This reverts commit 7c4cffd. This commit broke compilation in environments that don't use winpthreads.

http://itanium-cxx-abi.github.io/cxx-abi/ > This website may be mirrored in many places, some of which may become stale. The current canonical location is: > * http://itanium-cxx-abi.github.io/cxx-abi/ https://github.com/ARM-software/abi-aa > This is the official place for the latest documents of the Application Binary Interface for the Arm® Architecture, both for source files and officially released documents.

The LoopIdiomVectorize pass already creates calls to the intrinsic experimental_cttz_elts, but PR llvm#88385 will start calling this more too so I've created a helper for it.

Previously, we were returning an error if we couldn't read the whole region. This doesn't matter most of the time, because lldb caches memory reads, and in that process it aligns them to cache line boundaries. As (LLDB) cache lines are smaller than pages, the reads are unlikely to cross page boundaries. Nonetheless, this can cause a problem for large reads (which bypass the cache), where we're unable to read anything even if just a single byte of the memory is unreadable. This patch fixes the lldb-server to do that, and also changes the linux implementation, to reuse any partial results it got from the process_vm_readv call (to avoid having to re-read everything again using ptrace, only to find that it stopped at the same place). This matches debugserver behavior. It is also consistent with the gdb remote protocol documentation, but -- notably -- not with actual gdbserver behavior (which returns errors instead of partial results). We filed a [clarification bug](https://sourceware.org/bugzilla/show_bug.cgi?id=24751) several years ago. Though we did not really reach a conclusion there, I think this is the most logical behavior. The associated test does not currently pass on windows, because the windows memory read APIs don't support partial reads (I have a WIP patch to work around that).

This fixes a divergence between legacy and VPlan-based cost model, e.g. if one of the operands has an first-order recurrence phi as operand.

…#106969) because that doesn't work (results in `LINK : error LNK2001: unresolved external symbol malloc`). Based on the title of llvm#91862 it was only intended for use in 64-bit builds.

Due to a reviewer request on PR llvm#88385 I have created this patch to add a getPredicatedExitCount function, which is similar to getExitCount except that it uses the predicated backedge taken information. With PR llvm#88385 we will start to care about more loops with multiple exits, and want the ability to query exit counts for a particular exiting block. Such loops may require predicates in order to be vectorised. New tests added here: Analysis/ScalarEvolution/predicated-exit-count.ll

This patch introduces lowering of the partial add reduction intrinsic to a udot or svdot for AArch64. This also involves adding a `shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will return false from in the cases that it can be lowered.

Run for clang-tidy checks available in release/19.x branch. Some notable findings: - altera-id-dependent-backward-branch, stays slow with 13%. - misc-const-correctness become faster, going from 261% to 67%, but still above 8% threshold. - misc-header-include-cycle is a new SLOW check with 10% runtime implications - readability-container-size-empty went from 16% to 13%, still SLOW.

- Add validation subtest that tests assert failures in assert enabled builds, and that validation is disabled in assert disabled builds.

Fix intrinsic function attributes to not generate attribute sets that are empty in `getIntrinsicFnAttributeSet`. Refactor the code to use helper functions to get effective memory effects for an intrinsic and to check if it has non-default attributes. This eliminates one case statement in `getIntrinsicFnAttributeSet` that we generate today for the case when intrinsic attributes are default ones. Also rename `Intrinsic` to `Int` to follow the naming convention used in this file and adjust emission code to not emit unnecessary empty line between cases generated.

…lvm#106117) This patch covers CWG issues regarding declaration matching when `friend` declarations are involved: [CWG138](https://cplusplus.github.io/CWG/issues/138.html), [CWG386](https://cplusplus.github.io/CWG/issues/386.html), [CWG1477](https://cplusplus.github.io/CWG/issues/1477.html), and [CWG1900](https://cplusplus.github.io/CWG/issues/1900.html). Atypical for our CWG tests, the ones in this patch are quite extensively commented in-line, explaining the mechanics. PR description focuses on high-level concerns and references. [CWG138](https://cplusplus.github.io/CWG/issues/138.html) "Friend declaration name lookup" ----------- [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html): > [CWG138](https://cplusplus.github.io/CWG/issues/138.html) is resolved according to [N1229](http://wg21.link/n1229), except that using-directives that nominate nested namespaces are considered. I find it hard to pin down the scope of this issue, so I'm relying on three examples from the filing to define it. Because of that, it's also hard to pinpoint exact wording changes that resolve it. Relevant references are: [[dcl.meaning.general]/2](http://eel.is/c++draft/dcl.meaning#general-2), [[namespace.udecl]/10](https://eel.is/c++draft/namespace.udecl#10), [[dcl.type.elab]/3](https://eel.is/c++draft/dcl.type.elab#3), [[basic.lookup.elab]/1](https://eel.is/c++draft/basic.lookup.elab#1). [CWG386](https://cplusplus.github.io/CWG/issues/386.html) "Friend declaration of name brought in by _using-declaration_" ----------- [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html): > [CWG386](https://cplusplus.github.io/CWG/issues/386.html), [CWG1839](https://cplusplus.github.io/CWG/issues/1839.html), [CWG1818](https://cplusplus.github.io/CWG/issues/1818.html), [CWG2058](https://cplusplus.github.io/CWG/issues/2058.html), [CWG1900](https://cplusplus.github.io/CWG/issues/1900.html), and Richard’s observation in [“are non-type names ignored in a class-head-name or enum-head-name?”](http://lists.isocpp.org/core/2017/01/1604.php) are resolved by describing the limited lookup that occurs for a declarator-id, including the changes in Richard’s [proposed resolution for CWG1839](http://wiki.edg.com/pub/Wg21cologne2019/CoreWorkingGroup/cwg1839.html) (which also resolves CWG1818 and what of CWG2058 was not resolved along with CWG2059) and rejecting the example from [CWG1477](https://cplusplus.github.io/CWG/issues/1477.html). Wording ([[dcl.meaning.general]/2](http://eel.is/c++draft/dcl.meaning#general-2)): > — If the [id-expression](http://eel.is/c++draft/expr.prim.id.general#nt:id-expression) E in the [declarator-id](http://eel.is/c++draft/dcl.decl.general#nt:declarator-id) of the [declarator](http://eel.is/c++draft/dcl.decl.general#nt:declarator) is a [qualified-id](http://eel.is/c++draft/expr.prim.id.qual#nt:qualified-id) or a [template-id](http://eel.is/c++draft/temp.names#nt:template-id): >      — [...] >      — The [declarator](http://eel.is/c++draft/dcl.decl.general#nt:declarator) shall correspond to one or more declarations found by the lookup; they shall all have the same target scope, and the target scope of the [declarator](http://eel.is/c++draft/dcl.decl.general#nt:declarator) is that scope[.](http://eel.is/c++draft/dcl.meaning#general-2.2.2.sentence-1) This issue focuses on interaction of `friend` declarations with template-id and qualified-id with using-declarations. The short answer is that terminal name in such declarations undergo lookup, and using-declarations do what they usually do helping that lookup. Target scope of such friend declaration is the target scope of lookup result, so no conflicts arise with the using-declarations. [CWG1477](https://cplusplus.github.io/CWG/issues/1477.html) "Definition of a `friend` outside its namespace" ----------- [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html): > [...] and rejecting the example from [CWG1477](https://cplusplus.github.io/CWG/issues/1477.html). Wording ([[dcl.meaning.general]/3.4](http://eel.is/c++draft/dcl.meaning#general-3.4)): > Otherwise, the terminal name of the [declarator-id](http://eel.is/c++draft/dcl.decl.general#nt:declarator-id) is not looked up[.](http://eel.is/c++draft/dcl.meaning#general-3.4.sentence-1) If it is a qualified name, the [declarator](http://eel.is/c++draft/dcl.decl.general#nt:declarator) shall correspond to one or more declarations nominable in S; all the declarations shall have the same target scope and the target scope of the [declarator](http://eel.is/c++draft/dcl.decl.general#nt:declarator) is that scope[.](http://eel.is/c++draft/dcl.meaning#general-3.4.sentence-2) This issue focuses on befriending a function in one scope, then defining it from other scope using qualified-id. Contrary to what P1787R6 says in prose, this example is accepted by the wording in that paper. In the wording quote above, note the absence of a statement like "terminal name of the declarator-id is not bound", contrary to similar statements made before that in [dcl.meaning.general] about friend declarations and template-ids. There's also a note in [basic.scope.scope] that supports the rejection, but it's considered incorrect and expected to be removed in the future. This is tracked in cplusplus/draft#7238. [CWG1900](https://cplusplus.github.io/CWG/issues/1900.html) "Do `friend` declarations count as “previous declarations”?" ------------------ [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html): > [CWG386](https://cplusplus.github.io/CWG/issues/386.html), [CWG1839](https://cplusplus.github.io/CWG/issues/1839.html), [CWG1818](https://cplusplus.github.io/CWG/issues/1818.html), [CWG2058](https://cplusplus.github.io/CWG/issues/2058.html), [CWG1900](https://cplusplus.github.io/CWG/issues/1900.html), and Richard’s observation in [“are non-type names ignored in a class-head-name or enum-head-name?”](http://lists.isocpp.org/core/2017/01/1604.php) are resolved by describing the limited lookup that occurs for a declarator-id, including the changes in Richard’s [proposed resolution for CWG1839](http://wiki.edg.com/pub/Wg21cologne2019/CoreWorkingGroup/cwg1839.html) (which also resolves CWG1818 and what of CWG2058 was not resolved along with CWG2059) and rejecting the example from [CWG1477](https://cplusplus.github.io/CWG/issues/1477.html). Wording ([[dcl.meaning.general]/2.3](http://eel.is/c++draft/dcl.meaning#general-2.3)): > The declaration's target scope is the innermost enclosing namespace scope; if the declaration is contained by a block scope, the declaration shall correspond to a reachable ([[module.reach]](http://eel.is/c++draft/module.reach)) declaration that inhabits the innermost block scope[.](http://eel.is/c++draft/dcl.meaning#general-2.3.sentence-2) Wording ([[basic.scope.scope]/7](http://eel.is/c++draft/basic.scope#scope-7)): > A declaration is [nominable](http://eel.is/c++draft/basic.scope#def:nominable) in a class, class template, or namespace E at a point P if it precedes P, it does not inhabit a block scope, and its target scope is the scope associated with E or, if E is a namespace, any element of the inline namespace set of E ([[namespace.def]](http://eel.is/c++draft/namespace.def))[.](http://eel.is/c++draft/basic.scope#scope-7.sentence-1) Wording ([[dcl.meaning.general]/3.4](http://eel.is/c++draft/dcl.meaning#general-3.4)): > If it is a qualified name, the [declarator](http://eel.is/c++draft/dcl.decl.general#nt:declarator) shall correspond to one or more declarations nominable in S; [...] In the new wording it's clear that while `friend` declarations of functions do not bind names, declaration is still introduced, and is nominable, making it eligible for a later definition by qualified-id.

)

) Sometimes a collection of multilibs has a gap in it, where a set of driver command-line options can't work with any of the available libraries. For example, the Arm MVE extension requires special startup code (you need to initialize FPSCR.LTPSIZE), and also benefits greatly from -mfloat-abi=hard. So a multilib provider might build a library for systems without MVE, and another for MVE with -mfloat-abi=hard, anticipating that that's what most MVE users would want. But then if a user compiles for MVE _without_ -mfloat-abi=hard, thhey can't use either of those libraries – one has an ABI mismatch, and the other will fail to set up LTPSIZE. In that situation, it's useful to include a multilib.yaml entry for the unworkable intermediate situation, and have it map to a fatal error message rather than a set of actual libraries. Then the user gets a build failure with a sensible explanation, instead of selecting an unworkable library and silently generating bad output. The new regression test demonstrates this case. This patch introduces extra syntax into multilib.yaml, so that a record in the `Variants` list can omit the `Dir` key, and in its place, provide a `FatalError` key. Then, if that variant is selected, the error message is emitted as a clang diagnostic, and multilib selection fails. In order to emit the error message in `MultilibSet::select`, I had to pass a `Driver &` to that function, which involved plumbing one through to every call site, and in the unit tests, constructing one specially.

…emove-no-kernel-id-attribute.ll`

…lvm#106992) They don't mutate the context at all, so mark them const.

I don't think we need this node. We can isel fp_extend directly. fp_extend to f64 requires two instructions, but we can emit them with an isel pattern. I have not removed RISCVISD::FP_ROUND_BF16 because f64->bf16 needs more work to fix the double rounding.

…ied features (llvm#106625) (llvm#106850) Relands 2497739 addressing the buffer overflow caused when dereferencing an iterator past the end of ExtensionMap.

Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to TTI if available, as there are multiple places in TTI which use the IntrinsicInst. Fixes llvm#107016.

This moves the logic to create simplified operands using SCEV to MUL recipe creation. This is needed to match the behavior of the legacy's cost model. TODOs are to extend to other opcodes and move to a transform. Note that this also restricts the number of SCEV simplifications we apply to more precisely match the cases handled by the legacy cost model. Fixes llvm#107015.

…llvm#106904)

…ed by ISA (llvm#94695) … instructions. This is a fix I stumbled upon while working on something else. I decided to break it out since it seems like a good "first issue" to submit. I updated the comments in the "wrong error" test files to indicate that the messages are no longer incorrect, but I left the names of the test files alone. I was not sure what to do with those, so I would appreciate thoughts or guidance.

…vm#106783) Make ExpandFNEG return SDValue() when it doesn't expand. The caller already knows how to Unroll when Results is empty.

Apparently DragonFly BSD and Solaris/illumos call these APIs `pthread_get_name_np` / `pthread_set_name_np` (with an extra underscore) instead of `pthread_getname_np` / `pthread_setname_np`.

Use FCvtF16ToF32 instead of FCvtF32ToF16.

Without this patch, {ImportMapTy,SortedImportList}::{begin,end} make unnecessary copies of ImportIDTable via: map_iterator(Imports.begin(), IDs); The second parameter, IDs, is passed by value, so we make a copy of MapVector inside ImportIDTable every time we call begin and end. These begin and end show up as time-consuming functions in the performance profile. This patch fixes the problem by passing IDs by reference with std::cref. While we are at it, this patch deletes the copy constructor and assignment operator. I cannot think of any legitimate need reason to make a copy of the deduplication table.

Use _bf16 or _h instead of _s. The _s was copied from float-arith.ll

…lvm#102444)" This reverts commit 2eeeff8. See the post commit discussion in llvm@2eeeff8

…lvm#106033) Fixes llvm#80235 When trying to overload a function within `extern "C"`, the diagnostic `functions that differ only in their return type cannot be overloaded` is given. This diagnostic is inappropriate because overloading is basically not allowed in the C language. However, if the redeclared function has the `((overloadable))` attribute, it should be diagnosed as `functions that differ only in their return type cannot be overloaded`. This patch uses `isExternC()` to provide an appropriate diagnostic during the diagnostic process. `isExternC()` updates the linkage information cache internally, so calling it before merging functions can cause clang to crash. An example is declaring `static void foo()` and `void foo()` within an `extern "C"` block. Therefore, I decided to call `isExternC()` after the compilation error is confirmed and select the diagnostic message. The diagnostic message is `conflicting types for 'func'` similar to the diagnostic in C, and `functions that differ only in their return type cannot be overloaded` if the `((overloadable))` attribute is given. Regression tests verify that the expected diagnostics are given when trying to overload functions within `extern "C"` and when the `((overloadable))` attribute is present. --------- Co-authored-by: Sirraide <aeternalmail@gmail.com>

…107039) The LegalizeDAG expansion will go through memory since i16 isn't a legal type. Avoid this by using FMV nodes. Similar to what we did for llvm#106886 for FNEG and FABS. Special care is needed to handle the Sign operand being a different type.

…106967)

…vm#106520)

There is a typo in an assertion that causes the instruction break-point test to be unresolved

Current RecMII calculation is bigger than it needs to be. The calculation was refined in this patch.

…6995) Since it's SiFive VCIX specific register, it's better to have a prefix so that it's more understandable.

…vm#106874) Compiler-rt can be built for Windows, and most parts of it work. Some parts only really work on x86/x86_64 (like address sanitizers), but the OS overall is supported.

…lpha package (llvm#102636)

ReadProcessMemory will not perform the read if part of the memory is unreadable (and even though the API has a `number_of_bytes_read` argument). To make this work, I explicitly inspect the memory region being read and only read the accessible part.

…os/asin/atan and cosh/sinh/tanh libcalls (llvm#106844) Followup to llvm#106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents

Instead of doing this ourselves, just rely on printing the APValue.

Bazel builds currently fail with `Failed to fetch blobs because they do not exist remotely.`. These extra bazel flags hopefully fix it.

Previously we tracked data sharing attributes by the symbol itself not by the ultimate symbol. When the private clause came first, subsequent uses of the symbol found a host-associated version instead of the ultimate symbol and so the check didn't consider them to be the same symbol. Always adding and checking for the ultimate symbol ensures that we have the same behaviour no matter the order of clauses. The modified list is only used for this multiple clause check. Closes llvm#78235

…des do not generate poison

Currently, the testing infrastructure for SPIR-V is based on FileCheck. Those tests are great to check some level of codegen, but when the test needs check both the CFG layout and the content of each basic-block, things becomes messy. - Because the CHECK/CHECK-DAG/CHECK-NEXT state is limited, it is sometimes hard to catch the good block: if 2 basic blocks have similar instructions, FileCheck can match the wrong one. - Cross-lane interaction can be a bit difficult to understand, and writting a FileCheck test that is strong enough to catch bad CFG transforms while not being broken everytime some unrelated codegen part changes is hard. And lastly, the spirv-val tooling we have checks that the generated SPIR-V respects the spec, not that it is correct in regards to the source IR. For those reasons, I believe the best way to test the structurizer is to: - run spirv-val to make sure the CFG respects the spec. - simulate the function to validate result for each lane, making sure the generated code is correct. This simulator has no other dependencies than core python. It also only supports a very limited set of instructions as we can test most features through control-flow and some basic cross-lane interactions. As-is, the added tests are just a harness for the simulator itself. If this gets merged, the structurizer PR will benefit from this as I'll be able to add extensive testing using this. --------- Signed-off-by: Nathan Gauër <brioche@google.com>

…06960) Move the AMDGPU target specific testcases in MachineVerifier separately into new directory. Reference : llvm#105494 (comment)

…C) (llvm#105509) Overview of changes: - All memref input arguments are re-named to %mem. - All vector input arguments are re-named to %vec. - All index input arguments are re-named to %idx. - All tensor input arguments are re-named to %src/%dst. - LIT variables were updated to be consistent with input arguments. - Renamed all output arguments as %res. - Removed unused argument in `transfer_write_broadcast_unit_dim`. - Unified identation of `FileCheck` commands. - Split `transfer_write_permutations` and `transfer_write_broadcast_unit_dim` into tensor and memref variants. - Renamed `transfer_write_permutations_tensor` as `transfer_write_permutations_tensor_masked`.

Avoid most of the code being optimized away as a result of optimization improvements.

Reverts llvm#104020 Looks like it caused build failures.

This replaces `BENCHMARK_TEMPLATE` with `BENCHMARK` and uses `BENCHMARK_MAIN()` when possible.

)

@slydiman

This prevents the callback function from being called in a busy loop. Discovered by @slydiman on llvm#106955.

The existing function already used the MainLoop class, which allows one to wait on multiple events at once. It needed to do this in order to wait for v4 and v6 connections simultaneously. However, since it was creating its own instance of MainLoop, this meant that it was impossible to multiplex these sockets with anything else. This patch simply adds a version of this function which uses an externally provided main loop instance, which allows the caller to add any events it deems necessary. The previous function becomes a very thin wrapper over the new one.

This patch adds a common lower action for `G_FABS`, which generates `and x8, x8, #0x7fffffffffffffff` to reset the sign bit. The action does not support vectors since `G_AND` does not support fp128. This approach is different than what SDAG is doing. SDAG stores the value onto stack, clears the sign bit in the most significant byte, and loads the value back into register. This involves multiple memory ops and sounds slower.

…llvm#107003) Assigning to a pointer parameter does not leak the stack address because it stays within the function and is not shared with the caller. Previous implementation reported any association of a pointer parameter with a local address, which is too broad. This fix enforces that the pointer to a stack variable is related by at least one level of indirection. CPP-5642 Fixes llvm#106834

So we don't have to retrieve them from the InterpFrame, which is slow.

Avoid optimization away most of the code if we resolve this to a specific value.

It appears that the RUNTIMES build prefers the x86-64-unknown-linux-gnu triple notation for the host. This fixes runtime / test breakages when compiler-rt is used as the CLANG_DEFAULT_RTLIB.

…llvm#102747) As detailed in Issue llvm#101667, two `profile` tests `FAIL` on 32-bit SPARC, both Linux/sparc64 and Solaris/sparcv9 (where the tests work when enabled): ``` Profile-sparc :: ContinuousSyncMode/runtime-counter-relocation.c Profile-sparc :: ContinuousSyncMode/set-file-object.c ``` The Solaris linker provides the crucial clue as to what's wrong: ``` ld: warning: symbol '__llvm_profile_counter_bias' has differing sizes: (file runtime-counter-relocation-17ff25.o value=0x8; file libclang_rt.profile-sparc.a(InstrProfilingFile.c.o) value=0x4); runtime-counter-relocation-17ff25.o definition taken ``` In fact, the types in `llvm` and `compiler-rt` differ: - `__llvm_profile_counter_bias`/`INSTR_PROF_PROFILE_COUNTER_BIAS_VAR` is created in `llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp` (`InstrLowerer::getCounterAddress`) as `int64_t`, while `compiler-rt/lib/profile/InstrProfilingFile.c` uses `intptr_t`. While this doesn't matter in the 64-bit case, the type sizes differ for 32-bit. - `__llvm_profile_bitmap_bias`/`INSTR_PROF_PROFILE_BITMAP_BIAS_VAR` has the same issue: created in `InstrProfiling.cpp` (`InstrLowerer::getBitmapAddress`) as `int64_t`, while `InstrProfilingFile.c` again uses `intptr_t`. This patch changes the `compiler-rt` types to match `llvm`. At the same time, the affected testcases are enabled on Solaris, too, where they now just `PASS`. Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, `x86_64-pc-linux-gnu`, and `amd64-pc-solaris2.11.

…questing its size. Only some instructions should be considered as potentially reducing the size of the operands types, not all instructions should be considered. Fixes llvm#107036

Don't just leave the result as unknown. I think this currently works out thanks to undef resolution, but the correct thing to do is set it to overdefined explicitly.

collectInstsToScalarize may decide to scalarize a call. If so, we have to update the widening decision for the call, otherwise the call won't be scalarized as expected during VPlan construction. This issue was uncovered by f82543d509.

Need to check that thr whole number of register is attempted to vectorize before actually trying to build the node to avoid compiler crash.

…106212)

…7014) Functionally, this change affects only our printed stack traces. New version does not expose any internal rtsan interworking

/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10310:26: error: unused variable 'isExtractSubvectorMask' [-Werror,-Wunused-variable] bool isExtractSubvectorMask = ^ 1 error generated.

Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm#101428 [3] llvm#106494

They are quite long and not templated.

…lvm#106702) This patch moves to the new style of writing pattern for matching opcodes and thus deprecates using wip_match_opcoee. It moves G_FCONSTANT, G_ICMP, G_STORE, and G_OR.

Use IRBuilder when creating the new invariant instruction, so that the constant-folder has an opportunity to constant-fold the new Instruction that we desire to create.

…lvm#106769) Nothing went wrong in this case, we just successfully matched a module by identifier. No need to print to std::error like we would for something that should be user-visible. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

This PR: llvm#106995 names the vendor CSR in a wrong way, it should be `sf.` rather than `sf_` for prefix.

…ng mode if RNE. (llvm#106948) The rounding mode has no effect on the instruction behavior. Using RNE matches what we do for fcvt.s.h, fcvt.d.f, fcvt.d.h which are similarily not affected by the rounding mode. This appears to match the behavior of binutils. According to compiler explore, objdump is unable to disassembler fcvt.s.bf16 with a non-zero rounding mode.

This patch deprecates DenseMap::getOrInsertDefault in favor of DenseMap::operator[], which does the same thing, has been around longer, and is also a household name as part of std::map and std::unordered_map. Note that DenseMap provides several equivalent ways to insert or default-construct a key-value pair: - operator[Key] - try_emplace(Key).first->second - getOrInsertDefault(Key) - FindAndConstruct(Key).second

…cks (llvm#107059) rdar://135044923

llvm@26bf0b4

* Move the setup_host_tool calls to the directories of their tool. Although it works to call it in libclc, it can only appear in a single location so it fails the "what if everyone did this?" test and causes problems for downstream code that also wants to use native versions of these tools from other projects. * Correct the TARGET "${${tool}_target}" check. "${${tool}_target}" may be set to the path to the executable, which works in dependencies but cannot be tested using if(TARGET). For lack of a better alternative, just check that "${${tool}_target}" is non-empty and trust that if it is, it is set to a meaningful value. If somehow it turns out to be a valid target, its value will still show up in error messages anyway. * Account for llvm-spirv possibly being provided in-tree. Per https://github.com/KhronosGroup/SPIRV-LLVM-Translator?tab=readme-ov-file#llvm-in-tree-build it is possible to drop llvm-spirv into LLVM and have it built as part of LLVM's build. In this configuration, cross builds of LLVM require a native version of llvm-spirv to be built.

Trivially extend hoistBOAssociation to also handle the BinaryOperator Mul. Alive2 proofs: https://alive2.llvm.org/ce/z/zjtR5g

) This has been required by `lld/test/ELF/zsectionheader.s` since it was added in 5d972c5.

Bazel builds currently fail with `Failed to fetch blobs because they do not exist remotely.`. Set a cache-silo-key to start a new cache.

…06770) This is a follow up to 924907b, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907b, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.

- After 'RemoveLoadsIntoFakeUses' is enabled to support llvm.fake.use

The effect is the same, but this version doesn't take as long to evaluate.

These recurrence types don't have a meaningful identity, and the routine was abused to return the start value instead. Out of the three callers to this routine, only one actually wants this behavior. This is a prep change for removing the routine entirely and commoning it with other copies of the same logic.

This PR is adding support for `fp8` and `bfp8` on gfx12

…ing instructions (llvm#106966) This PR improves correctness of emitted MIR between passes for branching instructions and thus increase number of passing tests when expensive checks are on. Specifically, we address here such issues with machine verifier as: * fix switch generation: generate correct successors and undo the "address taken" status to reflect that a successor doesn't actually correspond to an IR-level basic block; * fix incorrect definition of OpBranch and OpBranchConditional in TableGen (SPIRVInstrInfo.td) to set isBarrier status properly and set a correct type of virtual registers; * fix a case when Phi refers to a type definition that goes after the Phi instruction, so that the virtual register definition of the type doesn't dominate all uses. This PR decrease number of failing tests under expensive checks from 56 to 50.

…Shader_DebugInfo_100 are not mixed up with other OpExtInst instructions (llvm#107007) This PR is to ensure that OpExtInst instructions generated by NonSemantic_Shader_DebugInfo_100 are not mixed up with other OpExtInst instructions. Original implementation (llvm#97558) has introduced an issue by moving OpExtInst instruction with the 3rd operand equal to DebugSource (value 35) or DebugCompilationUnit (value 1) even if OpExtInst is not generated by NonSemantic_Shader_DebugInfo_100 implementation code. The reproducer is attached as a new test case. The code of the test case reproduces the issue, because "lgamma" has the same code (35) inside OpenCL_std as DebugSource inside NonSemantic_Shader_DebugInfo_100.

Track it as an operand swap + a `setShuffleMask` and delegate to the `llvm::ShuffleVectorInst` implementation.

Make into enum class. Output really should be InputOutput since it also verifies the input IR.

…lvm#107009) std::is_virtual_base_of was implemented in llvm#105847

…vm#106603) Not every toolchain provides and want to use libatomic which is a part of GCC, some toolchains may opt into using compiler-rt atomic library.

This patch adds the remaining ConstantInt:: functions and it also implements the IntegerType class.

…lines (llvm#106790) Remove flag that turns on the PGOForceFunctionAttrs pass and always add it to default pipelines when using PGO. This is NFC by default since PGOOpt->ColdOptType is by default ColdFuncOpt::Default. Remove -O2 RUN line in basic.ll since we now have the pipeline tests.

This also disables the use of `__datasizeof`, since it's currently broken for empty types.

…rs analysis" This reverts commit b74e09c after post-commit review. The number of parts is calculated incorrectly.

…ster) number of elements in operands." This reverts commit a3ea90f after the post commit review. The number of parts is calculated incorrectly.

I'm planning to deprecate and eventually remove DenseMap::FindAndConstruct in favor of operator[].

This ensures forward compatibility, where old BOLT versions can consume the profile created by newer versions with extra keys. Test Plan: added yaml-unknown-keys.test

…lvm#107055) We were incorrectly not deduplicating results when looking up `_` which, for a lambda init capture, would result in an ambiguous lookup. The same bug caused some diagnostic notes to be emitted twice. Fixes llvm#107024

Analogous to 2c7786e, cleanup a case where the vectorizer is emitting a non-canonical identity value given the available flags. We use largest/smallest value during ISEL, and VP expansion, but not during vectorization. Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start value, this difference is only visible when masking of inactive lanes is required. Primary motivation of this change is simply to remove a difference between version of code which reason about the identity value of a reduction so I can kill all but one off. In review, it was pointed out that this is actually a functional fix as well. The old code used inf on a noinf reduction instruction - whose result is poison! That wasn't the intent of the code.

…vm#106875) Compiler-rt does support Windows just fine, even if outdated docs pages didn't list it as one of the supported OSes, this is being rectified in llvm#106874. MinGW is another environment configuration on Windows, where compiler-rt or libgcc is linked in automatically, so there's no issue with having such builtins functions available. For MSVC style environments, compiler-rt builtins do work just fine, but Clang doesn't automatically link them in. See e.g. https://discourse.llvm.org/t/improve-autolinking-of-compiler-rt-and-libc-on-windows-with-lld-link/71392 for a discussion on how to improve this situation. But none of that issue is that compiler-rt itself wouldn't support Windows.

…Z7 (llvm#106686) This fixes a regression from f58330c. That commit changed the clang-cl options /Zi and /Z7 to be implemented as aliases of -g rather than having separate handling. This had the unintended effect, that when assembling .s files with clang-cl, the /Z7 option (which implies using CodeView debug info) was treated as a -g option, which causes `ClangAs::ConstructJob` to pick up the option as part of `Args.getLastArg(options::OPT_g_Group)`, which sets the `WantDebug` variable. Within `Clang::ConstructJob`, we check for whether explicit `-gdwarf` or `-gcodeview` options have been set, and if not, we pick the default debug format for the current toolchain. However, in `ClangAs`, if debug info has been enabled, it always adds DWARF debug info. Add similar logic in `ClangAs` - check if the user has explicitly requested either DWARF or CodeView, otherwise look up the toolchain default. If we (either implicitly or explicitly) should be producing CodeView, don't enable the default `ClangAs` DWARF generation. This fixes the issue, where assembling a single `.s` file with clang-cl, with the /Z7 option, causes the file to contain some DWARF sections. This causes the output executable to contain DWARF, in addition to the separate intended main PDB file. By having the output executable contain DWARF sections, LLDB only looks at the (very little) DWARF info in the executable, rather than looking for a separate standalone PDB file. This caused an issue with LLDB's tests, llvm#101710.

Similarly to dd94537, setVectorizedCallDecision also did not consider ForcedScalars. This lead to VPlans not reflecting the decision by the legacy cost model (cost computation would use scalar cost, VPlan would have VPWidenCallRecipe). To fix this, check if the call has been forced to scalar in setVectorizedCallDecision. Note that this requires moving setVectorizedCallDecision after collectLoopUniforms (which sets ForcedScalars). collectLoopUniforms does not depend on call decisions and can safely be moved. Fixes llvm#107051.

Separate the path, which may begin with e.g. /Users, with "--" from the other options, to make it clear that it is a path, not an option. This fixes a test from fcb7b39.

This avoids having isel patterns that emit two instrutions. It also allows us to remove sext.w and slli+srli pairs by using fcvt.s.w(u) on RV64.

…vm#105855) Apple's API_AVAILABLE macro has its own notion of platform names which are supported by \_\_API_AVAILABLE_PLATFORM_<name> macros. They don't follow a consistent naming convention, but there's at least one that matches a valid availability attribute platform name. Instead of lowercasing the source spelling name, search for a defined macro and use that in the fix-it.

In function spillFPBP we already try to skip terminator, but there is a logic error, so when there is only terminator instruction in the MBB, it still tries to save/restore fp/bp around it if the terminator clobbers fp/bp, for example a tail call with ghc calling convention. Now this patch really skips terminator even if it is the only instruction in the MBB.

Use an explicit MSVC triple with an architecture that does have proper handling for MSVC style targets. This fixes a test from fcb7b39.

`MaxReleasedCachePages` has been set to 4. Initially, in llvm#105009 , we set `MaxReleasedCachePages` to 0 so that the partial chunk heuristic could be introduced incrementally as we observed its impact on retrieval order and more generally, performance. Co-authored-by: Joshua Baehring <josh.baehring@yale.edu>

Adjust register binding diagnostic flags code in a couple of ways: - Store the resource class in the Flags struct to avoid duplicated scanning for HLSLResourceClassAttribute - Avoid unnecessary indirection when converting resource class to register type - Remove recursion and reduce duplicated code Also fixes a case where struct with an array was incorrectly diagnosed unfit for `c` register binding. This will also simplify work that is needed to be done in this area for llvm#104861.

) ALLOCATE/DEALLOCATE statements for module allocatable variable with the pinned attribute can be lowered to the standard runtime call and do not need further action since these variables will have a unique descriptor that is on the host.

This patch fixes: clang/lib/Sema/SemaHLSL.cpp:838:12: error: unused variable 'TheVarDecl' [-Werror,-Wunused-variable] clang/lib/Sema/SemaHLSL.cpp:840:19: error: unused variable 'CBufferOrTBuffer' [-Werror,-Wunused-variable]

Now that more parts of LLDB know about SupportFiles, avoid going through FileSpec (and losing the Checksum in the process). Instead, use the SupportFile directly.

…vm#107143) This patch replaces the find-try_emplace sequence with just one call to try_emplace, thereby avoiding two successive hash lookups on the same key. I am not using the "inserted" boolean from try_emplace to preserve the original behavior (that is, before PR 107123) that checks to see if the value is nullptr or not.

….inc. NFC

Avoid creating an uncacheable conf variable by using a string instead of a function reference. Also has the effect of avoiding triggering the "config.cache" sphinx warning. Requires myst_parser 0.19.0 (specifically executablebooks/MyST-Parser#696) which is over a year old by now. Do we mandate any minimum version for these dependencies?

…s that emit two instructions. (llvm#107011) All of the test changes are because integer type legalization prefers to promote fp_to_uint to fp_to_sint if neither is "Legal".

Bump the lldb-dap version number so that we can publish and updated version in the Visual Studio Marketplace.

…ions reduced values Need to correctly track reduced values with multiple uses in the same reduction emission attempt. Otherwise, the number of the reuses might be calculated incorrectly, and may cause compiler crash. Fixes llvm#107037

Add three more special cases for loading registers with immediates. The first allows values in the range of [-255, 255] to be loaded with MOVEQ, even if the register is more than 8 bits and the sign extention is unwanted. This is done by loading the bitwise complement of the desired value, then performing a NOT instruction on the loaded register. This special case is only used when a simple MOVEQ cannot be used, and is only used for 32 bit data registers. Address registers cannot support MOVEQ, and the two-instruction sequence is no faster or smaller than a plain MOVE instruction when loading 16 bit immediates on the 68000, and likely slower for more sophisticated microarchitectures. However, the instruction sequence is both smaller and faster than the corresponding MOVE instruction for 32 bit register widths. The second special case is for zeroing address registers. This simply expands to subtracting a register with itself, consuming one instruction word rather than 2-3, with a small improvement in speed as well. The last special case is for assigning sign-extended 16-bit values to a full address register. This takes advantage of the fact that the movea.w instruction sign extends the output, permitting the immediate to be smaller. This is similar to using lea with a 16-bit address, which is not added in this patch as 16-bit absolute addressing is not yet implemented. This is a v2 submission of llvm#90817. It also creates a 'Data' test directory to better align with the backend's tablegen layout.

Select only needs branches and moves so we don't need to promote it. Promoting would canonicalize NaNs which select shouldn't do.

The only instance where we weren't already passing a `StringRef` with a known length to `Symbol`'s constructor is where the argument is a string literal. Even in that case, lazy `strlen` calls don't make sense, as the compiler can constant-evaluate the `StringRef(const char*)` constructor. For symbols that go into the symbol table we need the length when calculating the hash anyway. We could get away with not calling `getName()` for local symbols, but the total contribution of `strlen` to the run time is already below 1%, so that would just complicate the code for a negligible benefit.

Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant. Post-inlining, the update mainly consists of: - making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions) - in the contextual profile: - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual. - the contexts of the callee (at the inlined callsite) are moved to the caller. - the callee context at the inlined callsite is deleted.

…ctorizations reduced values" This reverts commit 98bb354 to fix buildbots https://lab.llvm.org/buildbot/#/builders/155/builds/2056 and https://lab.llvm.org/buildbot/#/builders/11/builds/4407

Uses a static lock to ensure multiple threads reporting issues at the same time don't have printing collisions. This isn't so important now, but will be with continue mode in the future.

`memory read` will return an error if you try to read more than 1k bytes in a single command, instructing you to set `target.max-memory-read-size` or use `--force` if you intended to read more than that. This is a safeguard for a command where people are being explicit about how much memory they would like lldb to read (either to display, or save to a file) and is an annoyance every time you need to read more than a small amount. If someone confuses the --count argument with the start address, lldb may begin dumping gigabytes of data but I'd rather that behavior than requiring everyone to special-case their way around a common use case. I don't want to remove the setting because many people have added (much larger) default max read sizes to their ~/.lldbinit files after hitting this behavior. Another option would be to stop reading/using the value in Target.cpp, but I see no harm in leaving the setting if someone really does prefer to have a small cap on their memory read size.

Despite the stale comments, none of these actually use TTI, and they're solely generating standard LLVM IR.

…ion) (llvm#107131) [CWG2486](https://cplusplus.github.io/CWG/issues/2486.html) "Call to `noexcept` function via `noexcept(false)` pointer/lvalue" allows `noexcept` functions to be called via `noexcept(false)` pointers or values. There appears to be no implementation divergence whatsoever: https://godbolt.org/z/3afTfeEM8. That said, in C++14 and earlier we do not issue all the diagnostics we issue in C++17 and newer, so I'm specifying the status of the issue accordingly.

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

This patch implements sandboxir:: ConstantAggregate, ConstantStruct, ConstantArray and ConstantVector, mirroring LLVM IR.

…in/Zvfhmin+Zfbfmin/Zfhmin. (llvm#106637) Previously, if Zfbfmin/Zfhmin were enabled, we only handled build_vectors that could be turned into splat_vectors. We promoted them to f32 splats by extending in the scalar domain and narrowing in the vector domain. This patch fixes a crash where we failed to account for whether the f32 vector type fit in LMUL<=8. Because the new lowering occurs after type legalization, we have to be careful to use XLenVT for the scalar integer type and use custom cast nodes.

…m#107157) The `Kind` argument does not need to passed separately.

… `outer_dims_perm` attribute (llvm#106687)

This patch covers Core issues about language linkage during declaration matching resolved in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), namely [CWG563](https://cplusplus.github.io/CWG/issues/563.html) and [CWG1818](https://cplusplus.github.io/CWG/issues/1818.html). [CWG563](https://cplusplus.github.io/CWG/issues/563.html) "Linkage specification for objects" ----------- [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html): > [CWG563](https://cplusplus.github.io/CWG/issues/563.html) is resolved by simplifications that follow its suggestions. Wording ([[dcl.link]/5](https://eel.is/c++draft/dcl.link#5)): > In a [linkage-specification](https://eel.is/c++draft/dcl.link#nt:linkage-specification), the specified language linkage applies to the function types of all function declarators and to all functions and variables whose names have external linkage[.](https://eel.is/c++draft/dcl.link#5.sentence-5) Now the wording clearly says that linkage-specification applies to variables with external linkage. [CWG1818](https://cplusplus.github.io/CWG/issues/1818.html) "Visibility and inherited language linkage" ------------ [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html): > [CWG386](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#386), [CWG1839](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1839), [CWG1818](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1818), [CWG2058](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2058), [CWG1900](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1900), and Richard’s observation in [“are non-type names ignored in a class-head-name or enum-head-name?”](http://lists.isocpp.org/core/2017/01/1604.php) are resolved by describing the limited lookup that occurs for a declarator-id, including the changes in Richard’s [proposed resolution for CWG1839](http://wiki.edg.com/pub/Wg21cologne2019/CoreWorkingGroup/cwg1839.html) (which also resolves CWG1818 and what of CWG2058 was not resolved along with CWG2059) and rejecting the example from [CWG1477](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1477). Wording ([[dcl.link]/6](https://eel.is/c++draft/dcl.link#6)): > A redeclaration of an entity without a linkage specification inherits the language linkage of the entity and (if applicable) its type[.](https://eel.is/c++draft/dcl.link#6.sentence-2). Answer to the question in the example is `extern "C"`, and not linkage mismatch. Further analysis of the example is provided as inline comments in the test itself. Note that https://eel.is/c++draft/dcl.link#7 does NOT apply in this example, as it's focused squarely at declarations that are already known to have C language linkage, and declarations of variables in the global scope.

These don't look like they've been used since the original 'use-diet' branch was merged in 2008 ( f6caff6)

…5614) In this patch, we implement the `computeCost()` function in `VPWidenMemoryRecipe`.

…vm#105686) The lowering happens in post-legalizer lowering if any source registers from G_BUILD_VECTOR are not constants. Add pattern pragment setting `scalar_to_vector ($src)` asequivalent to `vector_insert (undef), ($src), (i61 0)`

@jeliebig

…lvm#107041) This patch is provided by @jeliebig. Fixes llvm#107017.

…07021) Fixes llvm#106994.

Fixes llvm#105972. Co-authored-by: Qiu Chaofan <qcf@ecnelises.com>

…lvm#106094) Fix `RankedTensorType` equality check in unpack op canonicalization.

…7074) A macro definition needs its own scope stack in the annotator, so we add the MacroBodyScopes stack and use ScopeStack to refer to it when in the macro definition body. Also, we need to have a scope type for a child block because its parent line is parsed (and thus the scope type for the braces is popped off the scope stack) before the lines in the child block are. Fixes llvm#99271.

…SPIRV (llvm#107110) This patch add a type check for `tensor.extract` in TensorToSPIRV. Only convert `tensor.extract` with supported element type. Fix llvm#74466.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with f4b9839d (Sep 04) (20) #373

[AutoBump] Merge with f4b9839d (Sep 04) (20) #373

Commits on Aug 31, 2024

Commits on Sep 1, 2024

Commits on Sep 2, 2024

Commits on Sep 3, 2024

Commits on Sep 4, 2024

Commits on Sep 25, 2024

[AutoBump] Merge with f4b9839d (Sep 04) (20) #373

Are you sure you want to change the base?

[AutoBump] Merge with f4b9839d (Sep 04) (20) #373

Commits on Aug 31, 2024

Commits on Sep 1, 2024

Commits on Sep 2, 2024

Commits on Sep 3, 2024

Commits on Sep 4, 2024

Commits on Sep 25, 2024