[AutoBump] Merge with 83891777 (May 16) (48) #307

As mentioned in llvm#68882 and https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699 Gep arithmetic isn't consistent with different types. GVNSink didn't realize this and sank all geps as long as their operands can be wired via PHIs in a post-dominator. Fixes: llvm#85333 Reapply: llvm#88440 after fixing the non-determinism issues in llvm#90995

Adds the LLVM vector.deinterleave2 intrinsic to the MLIR LLVM dialect. The deinterleave2 intrinsic takes a vector and returns two vectors with the first having even elements and the second with odd elements from the input vector. The inverse of vector.interleave2.

MCOperand has a constructor that permits a nullptr MCInst, and BOLT makes use of that. Adjust MCOperand's dumper to permit such use.

Remove 'Valid' local boolean that has a single use, and return directly instead.

…#91933) Given `foo...[idx]` if idx is value dependent, the expression is type dependent. Fixes llvm#91885 Fixes llvm#91884

Remove excess parentheses and use `boolean ? true-case : false-case` idiom.

When writing the test for this I seemingly forgot to put 'CHECK' on the lines, so I didn't notice that I was printing the identifiers as pointers rather than their names. This patch corrects the tests and the print behavior.

Closes llvm#91188

…92072) Fixes llvm#92062

…0448) This patch rewrites the ArmSME tile allocator to use liveness information to make better tile allocation decisions and improve the correctness of the ArmSME dialect. This algorithm used here is a linear scan over live ranges, where live ranges are assigned to tiles as they appear in the program (chronologically). Live ranges release their assigned tile ID when the current program point is passed their end. This is a greedy algorithm (which is mainly to keep the implementation relatively straightforward), and because it seems to be sufficient for most kernels (e.g. matmuls) that use ArmSME. The general steps of this are roughly from https://link.springer.com/content/pdf/10.1007/3-540-45937-5_17.pdf, though there have been a few simplifications and assumptions made for our use case. Hopefully, the only changes needed for a user of the ArmSME dialect is that: - `-allocate-arm-sme-tiles` will no longer be a standalone pass - `-test-arm-sme-tile-allocation` is only for unit tests - `-convert-arm-sme-to-llvm` must happen after `-convert-scf-to-cf` - SME tile allocation is now part of the LLVM conversion By integrating this into the `ArmSME -> LLVM` conversion we can allow high-level (value-based) ArmSME operations to be side-effect-free, as we can guarantee nothing will rearrange ArmSME operations before we emit intrinsics (which could invalidate the tile allocation). The hope is for ArmSME operations to have no hidden state/side effects and allow easily lowering dialects such as `vector` and `arith` to SME, without making assumptions about how the input IR looks, as the semantics of the operations will be the same. That is no (new) side effects and the IR follows the rules of SSA (a value will never change). The aim is correctness, so we have a base for working on optimizations.

A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.

…m#92004) This makes the `vc-rev-enabled` feature unsupported if we fail to retrieve the git revision for any reason, such as if git is not installed.

…straints (llvm#92104) Clangd uses it to determine whether the argument is within the selection range. Fixes clangd/clangd#2033

…filename and location info (llvm#92050)

PR llvm#80680 added bits in the codegen to lazily add convergence intrinsics when required. This logic relied on the LoopStack. The issue is when parsing the condition, the loopstack doesn't yet reflect the correct values, as expected since we are not yet in the loop. However, convergence tokens should sometimes already be available. The solution which seemed the simplest is to greedily generate the tokens when we generate SPIR-V. Fixes llvm#88144 --------- Signed-off-by: Nathan Gauër <brioche@google.com>

Now that we've got (minus some issues around datatypes and invariant loads) working lowerings for address space 7, update the table in the AMDGPU usage guide to properly indicate the nature of these address spaces.

…llvm#92067) cm.push can't save X26 without also saving X27. This removes two other checks for this case. This causes CFI to be emitted since X27 is now explicitly a callee saved register. The affected tests use inline assembly to clobber X26 rather than the whole range of s0-s10.

Allow mixing objects with/without signed class ro data and category class properties as long as it happens before we register the metadata. These combinations are a warning in ld, not a hard error. The only case that is ABI-breaking is if we already registered with the feature enabled but later try to load an object that doesn't support it. rdar://127336061

…inations of 32-bit integers. NFC

tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid wastefully calling tryToCreateDiffCheck. This patch is an NFC simplification of program logic.

The target combine is no longer required because InstCombine will transform the DIV by a power of 2 into a multiply, so in practice this case will never trigger. Additionally, the generated code would have been incorrect for streaming(-compatible) functions, because it assumed NEON was available.

…lvm#92086) self.wait_for_running_event(process) is always called after self.runCmd("continue"). It is strange to expect eStateConnected here. This test failed in case of a remote target. The correct state is eStateRunning. Removed incorrect checking.

) The cost of `experimental.cttz.elts` in RISC-V equals to the cost of vfirst when the zero_is_poison argument is true. Otherwise, we add additional costs of cmp + select to convert the -1 result from vfirst to EVL.

…#90500) Currently, clang postpones all semantic analysis of unary operators with operands of pointer/pointer to member/array/function type until instantiation whenever that type is dependent (e.g. `T*` where `T` is a type template parameter). Consequently, the uninstantiated AST nodes all have the type `ASTContext::DependentTy` (which, for the purposes of llvm#90152, is undesirable as that type may be the current instantiation! (e.g. `*this`)) This patch moves the point at which we perform semantic analysis for such expression to be prior to instantiation.

llvm#91137 reverted in llvm#92001 A build error fix added in 28d5ece --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

Most diagnostics obey https://llvm.org/docs/CodingStandards.html#error-and-warning-messages but some diverge. Fix them. While here, adjust some diagnostics. Pull Request: llvm#92024

@kiranchandramohan

Currently, only those global variables which are at compile unit scope are added to the 'globals' list of the DICompileUnit. This does not work for languages which support modules (e.g. Fortran) where hierarchy can be variable -> module -> compile unit. To fix this, if a variable scope points to a module, we walk one level up and see if module is in the compile unit scope. This was initially part of llvm#91582 which adds debug information for Fortran module variables. @kiranchandramohan pointed out that MLIR changes should go in separate PRs.

These larger SEWs aren't in the ratified V spec. Thanks to dzaima and sorear on IRC for pointing this one out. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

…hCond from versionCallSite (llvm#81181) * This is to be used by llvm#81378 to implement a variant of versionCallSite that compares vtables. * The parent patch is llvm#81051

…idening-of-multiplication-result (llvm#92025) When expression got errors (missing typedef) and clang-tidy is compiled with asserts enabled, then we crash in this check on assert because type with errors is visible as an dependent one. This is issue caused by invalid input. But as there is not point to crash in such case and generate additional confusion, such expressions with errors will be now ignored. Fixes llvm#89515, llvm#55293

…rs (llvm#90500)" (llvm#92149) This reverts commit 8019cbb.

@bjope

Test courtesy to @bjope showing a regression due to ecae3ed.

Although we can't reduce the number of instructions, if we selected `li rd, -1` instead then this could be encoded in a 16-bit instruction.

…nstants (llvm#92131) This follows the same pattern as 20e6265. Although we can't reduce the number of instructions used, if we are able to use a sign-extended 6-bit immediate then the 16-bit c.li instruction can be selected (thus saving code size). Although this _could_ be gated so it only happens if C is enabled, I've opted not to because at worst it's neutral and it doesn't seem helpful to add unnecessary divergence between the RVC and non-RVC paths.

There are cases where a vector value has some users that demand the the single scalar value only (NeedsScalar), while other users demand the vector value (see attached test cases). In those cases, the NeedsScalar users should only demand the first lane. Fixes llvm#91883.

Prior to this, fixed point multiplication would lead to this assertion error on AArhc64, armv8, and armv7. ``` _Accum f(_Accum x, _Accum y) { return x * y; } // ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3 clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed. ``` This path into forceExpandWideMUL should only be taken if we don't support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case). But we should also check if we can just leverage regular wide multiplication. That is, extend the operands from 32 to 64, do a regular 64-bit mul, then trunc and shift. These ops are certainly available on aarch64 but for wider types.

…ts (llvm#91850) This PR is in preparation to some extensions to the `InferIntRangeInterface` around the `nsw` and `nuw` flags supported in the `arith` dialect and LLVM. We provide some common inference logic for `index` and `arith` in `InferIntRangeCommon.h` but our Test Ops are currently fixed to `Index` Types. As we test the range inference for arith Ops, especially around the overflow behaviour, it's handy to have native support for the typical integer types in the test Ops. This patch 1. Changes the Attributes of `test.with_bounds` ops from `Index` to `APInt` which matches the internal representation in `ConstantIntRanges`. 2. Allows the use of `AnyInteger` in addition to `Index` for the operands and results of the test Ops. This now requires explicit specification of the type in the IR, where before `Index` was implicit. 3. Requires bounds Attrs to be specified in the precision of the SSA value, eliminating any implicit truncation or extension. (*Could this lead to problems?*)

) This change allows us to pass creduce options to creduce-clang-crash.py script. With this, `--n` is no longer needed to specify the number of cores, so removed the flag. The motivation is llvm#87933 (comment) suggests that disabling creduce renaming passes helps people to further reduce crash manually.

@bjope

Applying the loop guards to the distance may prevent isSafeDependenceDistance from determining NoDep, unless loop guards are also applied to the backedge-taken-count. Instead of applying the guards to both Dist and the backedge-taken-count, just apply them after handling isSafeDependenceDistance and constant distances; there is no benefit to applying the guards before then. This fixes a regression flagged by @bjope due to ecae3ed.

Summary: The offload library supports basic JIT functionality, however we currently link against every single target even though only AMDGPU and NVPTX are supported. This somewhat bloats the dynamic library list, so we should constrain it to what's actually used.

I'm interested in being CC'd on these changes

Add test cases for llvm#89958.

…nctions. (llvm#91643) The existing code is checking for the presence of the +sve subtarget feature when deciding to use a base pointer for the function, but this check doesn't work when only +sme is used. rdar://126878490

…written for class/variable template specializations (llvm#81642)" (llvm#91393) Reapplies llvm#81642, fixing the crash which occurs when running the lldb test suite.

llvm#92158) …ducer.

This mirrors the LLDB_DEBUGSERVER_PATH environment variable and allows you to have lldb-argdumper in a non-standard location and still use it at runtime.

…del (llvm#92032) SubtargetEmitter::GenSchedClassTables takes a CodeGenProcModel, but calls hasReadOfWrite which loops over all ProcModels. We move hasReadOfWrite to CodeGenProcModel and remove the loop over all ProcModels. This leads to a 144% speedup on the RISC-V backend of our downstream.

As we have debuginfod as symbol locator available in lldb now, we want to make full use of it. In case of post mortem debugging, we don't always have the main executable available. However, the .note.gnu.build-id of the main executable(some other modules too), should be available in the core file, as those binaries are loaded in memory and dumped in the core file. We try to iterate through the NT_FILE entries, read and store the gnu build id if possible. This will be very useful as this id is the unique key which is needed for querying the debuginfod server. Test: Build and run lldb. Breakpoint set to https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147 Verified after this commit, module_uuid is the correct gnu build id of the main executable which caused the crash(first in the NT_FILE entry)

Reverts llvm#92078

…91486) Type unit DIE generated by clang contains DW_AT_comp_dir/DW_AT_dwo_name. This was added to clang to help LLDB to figure out where type unit come from when accessing an entry in a .debug_names accelerator table and type units in .dwp file. When BOLT writes out .dwo files it changes the name of them. User can also specify directory of where they can be written out. Added support to BOLT to update those attributes.

After patch llvm#88805 `I` Ext will be added automatically when we running the command like `./build/bin/llc -mtriple=riscv32 -mattr=+e -target-abi ilp32e -verify-machineinstrs llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll` it will generate ``` .text .attribute 4, 16 .attribute 5, "rv32i2p1_e2pe" .file "zcmp-additional-stack.ll" .globl func # -- Begin function func .p2align 1 .type func,@function ``` This patch reset the I ext in FeatureBit when `+e` has been specify

NFC. Makes the VOP1Inst_t16 interface more generic to support future instructions cleanly.

…n of VP_FSHL/FSHR. NFC There's a special path when the promoted type has an element size more than twice the size of the original type.

Summary: One of the downsides of the linker wrapper is that it made debugging more difficult. It is very powerful in that it can resolve a lot of input matching and library handling that could not be done before. However, the old method allowed users to simply copy-paste the script files to modify the output and test it. This patch attempts to make it easier to debug changes by letting the user override all the linker inputs. That is, we provide a user-created binary that is treated like the final output of the device link step. The intended use-case is for using `-save-temps` to get some IR, then modifying the IR and sticking it back in to see if it exhibits the old failures.

Utility converting a profile coming from `compiler_rt` to bitstream, and a reader. `PGOCtxProfileWriter::write` would be used as the `Writer` parameter for `__llvm_ctx_profile_fetch` API. This is expected to happen in user code, for example in the RPC hanler tasked with collecting a profile, and would look like this: ``` // set up an output stream "Out", which could contain other stuff { // constructing the Writer will start the section, in Out, containing // the collected contextual profiles. PGOCtxProfWriter Writer(Out); __llvm_ctx_profile_fetch(&Writer, +[](void* W, const ContextNode &N) { reinterpret_cast<PGOCtxProfWriter*>(W)->write(N); }); // Writer going out of scope will finish up the section. } ``` The reader produces a data structure suitable for maintenance during IPO transformations.

Reverts llvm#91859 Buildbot failures.

…ttening loops (llvm#86961) When flattening the loop, if the GEP was inbound, it should stay inbound, because the only thing that changed is how the pointers are calculated, not the elements being accessed. Proof: https://alive2.llvm.org/ce/z/dApMpQ

Avoid including MD5.h in a core IR header.

… before consuming it Close llvm#91418 Since we load the variable's initializers lazily, it'd be problematic if the initializers dependent on each other. So here we try to load the initializers of static variables to make sure they are passed to code generator by order. If we read any thing interesting, we would consume that before emitting the current declaration.

Spec reference: https://cdrdv2.intel.com/v1/dl/getContent/678938

…ariables before consuming it" This reverts commit 11b0591. The premerge bot is broken.

The ExceptionDemo example was no longer compiling (since llvm 14 at least). The PR makes the example work with the current API and also transition from MCJIT to ORC. Fixes llvm#63702

Split off from llvm#70549, this patch moves RISCVInsertVSETVLI to after phi elimination where we exit SSA and need to move to LiveVariables. The motivation for splitting this off is to avoid the large scheduling diffs from moving completely to after regalloc, and instead focus on converting the pass to work on LiveIntervals. The two main changes required are updating VSETVLIInfo to store VNInfos instead of MachineInstrs, which allows us to still check for PHI defs in needVSETVLIPHI, and fixing up the live intervals of any AVL operands after inserting new instructions. On O3 the pass is inserted after the register coalescer, otherwise we end up with a bunch of COPYs around eliminated PHIs that trip up needVSETVLIPHI. Co-authored-by: Piyou Chen <piyou.chen@sifive.com>

The following small thing caught my eye: 1) `EILSEQ` is not part of the generic asm error number macros. See the [full list of generic asm errno codes](https://github.com/torvalds/linux/blob/4b95dc87362aa57bdd0dcbad109ca5e5ef3cbb6c/include/uapi/asm-generic/errno-base.h). AFAIK the generic asm errno numbers are common between different operating systems and architectures. `EILSEQ` is not part of this common set of errno's. 2) `EILSEQ`'s value is wrong. During the addition of `EILSEQ` in https://reviews.llvm.org/D151129, the value `35` was probably chosen as its the consecutive number. This is not correct. The actual values can be looked up for example here: * [For Linux kernel](https://github.com/search?q=repo%3Atorvalds%2Flinux+EILSEQ&type=code&p=1): `EILSEQ = 84` (uapi; i.e. x86_64), `EILSEQ = 88` (mips), `EILSEQ = 47` (parisc) * [For Darwin kernel](https://github.com/apple-oss-distributions/xnu/blob/main/bsd/sys/errno.h#L237): `EILSEQ = 92`

…m#92051) As in the title, fixes llvm#91934

The middle end will remove the inner vsetvli otherwise, and it's more typical to set the AVL to the remaining VL. This also prevents the test from showing up as a regression in llvm#91319

Even as the NPM has been in use by Polly for a while now, the majority of the tests continue using the LPM passes. This patch ports the tests to use the NPM passes (for example, by replacing a flag such as -polly-detect with -passes=polly-detect following the NPM syntax for specifying passes) with some exceptions for some missing features in the new passes. Additionally, the lit substitution %loadPolly is replaced by the substitution of what was %loadNPMPolly and %loadNPMPolly is removed.

This was looking through an addrspacecast, and not finding a later unfoldable cast to another address space. Fixes improperly deleting a required alloca + memcpy and introducing an illegal addrspacecast. This also required fixing some worklist management issues with addrspacecast, and assuming that only memcpy sources could need replacement. Regresses one test function, but this looks like it optimized before by accident. It never saw the pointer use by the call to readonly_callee, which should require insertion of a new cast. Fixes llvm#68120

Reverts llvm#90632. Causing failures on buildbots that dynamically load polly. Reverting while we sort it out.

…al in WebKit checkers. (llvm#91873) Also allow CXXBindTemporaryExpr, which creates a temporary object with a non-trivial destructor, and add a few more std and WTF functions to the explicitly allowed list.

Introduced in llvm#91150. Not needed anymore as llvm#92041 fixed the root cause. `ENAMETOOLONG` and `EOVERFLOW` are well defined in `<linux/errno.h>`. Post mortem: Due to the previously missing inclusion of `<linux/errno.h>` (fixed with llvm#92041), I misinterpreted an undefined macro issue during the development of llvm#91150 as being caused by a missing definition rather than by the missing inclusion of the linux header. I realized too late that `ENAMETOOLONG` and `EOVERFLOW` were correctly defined in `<linux/errno.h>` and that it was my missing inclusion that caused the problem.

Prefer to emit the intrinsic over a libcall in the intrinsic or no-math-errno case.

…llvm#92120) This commit extends the verifier for atomics to properly verify larger types. Beforehand, the verifier strictly rejected larger integer types, while it now consults the data layout to determine if their bitsize is a power of two. This behavior reflects what LLVM's verifier is checking for.

When expanding an L128 (which is used to reload i128) it is possible that the quadword destination register clobbers an address register. This patch adds an assertion against the case where both of the expanded parts clobber the address, and in the case where one of the expanded parts do so puts it last. Fixes llvm#91437

…ttribute. (llvm#91732) `disable_sanitizer_instrumetation` is attached to functions that shall not be instrumented e.g. ifunc resolver because those run before everything is initialised. Some sanitizer already handles this attribute, this patch adds it to DataFLow and Coverage too.

...which caused issues like > ==42==ERROR: AddressSanitizer failed to deallocate 0x32 (50) bytes at address 0x117e0000 (error code: 28) > ==42==Cannot dump memory map on emscriptenAddressSanitizer: CHECK failed: sanitizer_common.cpp:81 "((0 && "unable to unmmap")) != (0)" (0x0, 0x0) (tid=288045824) > #0 0x14f73b0c in __asan::CheckUnwind()+0x14f73b0c (this.program+0x14f73b0c) > #1 0x14f8a3c2 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long)+0x14f8a3c2 (this.program+0x14f8a3c2) > #2 0x14f7d6e1 in __sanitizer::ReportMunmapFailureAndDie(void*, unsigned long, int, bool)+0x14f7d6e1 (this.program+0x14f7d6e1) > #3 0x14f81fbd in __sanitizer::UnmapOrDie(void*, unsigned long)+0x14f81fbd (this.program+0x14f81fbd) > #4 0x14f875df in __sanitizer::SuppressionContext::ParseFromFile(char const*)+0x14f875df (this.program+0x14f875df) > #5 0x14f74eab in __asan::InitializeSuppressions()+0x14f74eab (this.program+0x14f74eab) > #6 0x14f73a1a in __asan::AsanInitInternal()+0x14f73a1a (this.program+0x14f73a1a) when trying to use an ASan suppressions file under Emscripten: Even though it would be considered OK by SUSv4, the Emscripten runtime states "We don't support partial munmapping" (see <emscripten-core/emscripten@f4115eb> "Implement MAP_ANONYMOUS on top of malloc in STANDALONE_WASM mode (llvm#16289)"). Co-authored-by: Stephan Bergmann <stephan.bergmann@allotropia.de>

Since the set of COMPILER_RT_ASAN_SHADOW_SCALE_DEFINITION is removed in commit 8421fa5, cleanup the use of COMPILER_RT_ASAN_SHADOW_SCALE_DEFINITION.

…m#92093) The stack validation heuristic is counter-productive in this case, as the unaligned address is most likely the thing that caused the signal in the first place.

This is already in a if(isBlockPointer()) block.

Just create a local variable for them.

…1531) There are many environments where `errno` is a macro that expands to something like `(*__errno())` (different standard library implementations use different names instead of "__errno"). In these environments the ErrnoModeling checker creates a symbolic region which will be used to represent the return value of this "get the location of errno" function. Previously this symbol was only created when the checker was able to find the declaration of the "get the location of errno" function; but this commit eliminates the complex logic that was responsible for this and always creates the symbolic region when `errno` is not available as a "regular" global variable. This significantly simplifies a code and only introduces a minimal performance reduction (one extra symbol) in the case when `errno` is not declared (neither as a variable nor as a function). In addition to this simplification, this commit specifies that the `CallDescription`s for the "get the location of errno" functions are matched in `CDM::CLibrary` mode. (This was my original goal, but I was sidetracked by resolving a FIXME above the `CallDescriptionSet` in `ErrnoModeling.cpp`.) This change is very close to being NFC, but it fixes weird corner cases like the handling of a C++ method that happens to be named "__errno()" (previously it could've been recognized as an errno location getter function).

I though the test could work there as well, but (of course) it does not, because the lowest bit just means "run the code as thumb".

…lvm#92156) Arguments to openmp regions should not be tagged as dummy arguments. This is particularly unsafe because these openmp blocks will eventually be inlined into the calling function, where they will trivially alias with other values inside of the calling function. This is probably a theoretical issue because the calls to openmp runtime function calls would act as barriers, preventing optimizations that are too aggressive. But a lot more thought would need to go into a bet like that. This came out of discussion on llvm#92036

…mMetadata in emitFunctionEntryLabel. (llvm#92098)

This patch removes incorrect `byval` attribute from pointer argument passed with >128 bit long _BitInt types.

Update VPBlendRecipe::execute to support generating code for first-lane only. This fixes a crash in the newly added test @test_not_first_lane_only_wide_compare_incoming_order_swapped.

Add additional tests with udiv/urem/sdiv/srem in trip counts, where the divisor is constant. For llvm#92177.

Comparing a bit of the mock GDB server code to what was in the document I found these: * QLaunchArch * qSpeedTest * qSymbol qSymbol is the most mysterious but it did have some examples in a comment so I've adapted that.

A lot of `TestConcurrent*.py` expect one of the threads to crash, but we weren't checking for it properly. Possibly because signal reporting got better on FreeBSD at some point, and it now shows the same info as Linux does. ``` lldb-api :: functionalities/inferior-changed/TestInferiorChanged.py lldb-api :: functionalities/inferior-crashing/TestInferiorCrashing.py lldb-api :: functionalities/inferior-crashing/TestInferiorCrashingStep.py lldb-api :: functionalities/inferior-crashing/recursive-inferior/TestRecursiveInferior.py lldb-api :: functionalities/inferior-crashing/recursive-inferior/TestRecursiveInferiorStep.py lldb-api :: functionalities/thread/concurrent_events/TestConcurrentCrashWithBreak.py lldb-api :: functionalities/thread/concurrent_events/TestConcurrentCrashWithSignal.py lldb-api :: functionalities/thread/concurrent_events/TestConcurrentCrashWithWatchpoint.py lldb-api :: functionalities/thread/concurrent_events/TestConcurrentCrashWithWatchpointBreakpointSignal.py ``` Fixes llvm#48777 `TestConcurrentTwoBreakpointsOneSignal.py` no longer fails, at least on an AWS instance, so I've removed the xfail there.

Previously this was only populated in the create method later. This resolves some of invalid builder paths. This may also be sufficient that type inference functions no longer have to consider whether property conversion has happened (but haven't verified that yet). This also makes Attributes corresponding to Properties as optional inside the set from attributes method. Today that is in effect what happens with Property value initialization and folks use it to define custom C++ types whose default initialization is what they want. This is the behavior users get if they use properties directly. Propagating Attributes without allowing partial setting would require iterating over the dictionary attribute considering the properties of the op type that will be created. This could also have been an additional method generated or optional behavior on the set method. But doing it consistently seems better. In terms of whats lost, it doesn't seem like anything compared to the pure Property path where Property is default value initialized and then partially overwritten (this doesn't seem to buy anything else verification wise). Default valued Properties (as specified ODS side rather than C++ side) triggered error as the containing class was not yet complete but referenced nested class, so that we couldn't have default initializer for them in the parent class. Added an additional forwarding builder to avoid needing to update call sites. This could be split out to separate change. Inlined templated function in unit test that was only used once. Moved initialization earlier where seen.

…lvm#90963) We can extract any legal fixed length vector from a scalable vector by using VECTOR_SPLICE.

The tests TestPty and TestPtyServer use the Unix specific python builtin module termios. They are failed in case of Windows host and Linux target. Disable them for Windows host too.

Was added in 88e0b25 and is unused since fcef407.

Remove the `allocate`, because it needs to be used together with a privatizing clause. The only such clause for `taskgroup` is `task_reduction`, but it's not yet supported.

One of the functions in the test has `teams if(...)`. The `if` clause was only allowed on the `teams` directive in OpenMP 5.2.

This fixes a build error on the AMDGPU buildbot introduced in PR llvm#92172

…range empty (llvm#91994) Improves readability by changing comparisons of `*_begin` and `*_end` iterators into `.empty()` on their range.

Avoid using numbers as check prefix - replace with actual triple config names where possible

Avoid using numbers as check prefix - replace with actual triple config names

…er-list (llvm#91992) Previously, the call to `findArgs` for a `CallExpr` inside of a `min` or `max` call would call `findArgs` before checking if the argument is a call to `min` or `max`, which is what `findArgs` is expecting. The fix moves the name checking before the call to `findArgs`, such that only a `min` or `max` function call is used as an argument. Fixes llvm#91982 Fixes llvm#92249

Fixes llvm#91434 PR: llvm#92107

…lvm#91865)

…#92248 Avoid using numbers as check prefix - replace with actual triple config names

…lvm#92248 Avoid using numbers as check prefix - replace with actual triple config names

Avoid using leading numbers in check prefixes - replace with actual triple config names (and makes it easier to add X64 test coverage in a future commit).

…llvm#92248 Don't include "-LABEL" (or any other FileCheck modifier) in the core check prefix name

Avoid using leading numbers in check prefixes - replace with actual triple config names.

…vm#92257) Reverts llvm#69485

llvm#89751) The C++ standard requires that symmetric transfer from one coroutine to another is performed via a tail call. Failure to do so is a miscompile and often breaks programs by quickly overflowing the stack. Until now, the coro split pass tried to ensure this in the `addMustTailToCoroResumes()` function by searching for `llvm.coro.resume` calls to lower as tail calls if the conditions were right: the right function arguments, attributes, calling convention etc., and if a `ret void` was sure to be reached after traversal with some ad-hoc constant folding following the call. This was brittle, as the kind of implicit variants required for a tail call to happen could easily be broken by other passes (e.g. if some instruction got in between the `resume` and `ret`), see for example 9d1cb18 and 284da04. Also the logic seemed backwards: instead of searching for possible tail call candidates and doing them if the circumstances are right, it seems better to start with the intention of making the tail calls we need, and forcing the circumstances to be right. Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since f786881) which corresponds exactly to symmetric transfer, change the lowering of that to also include the `resume` part, always lowered as a tail call.

Commit 71fbbb6 moved getGUID out of line in llvm/IR/GlobalValue, now users have to link LLVMCore to have the definition of it. /usr/bin/ld: CMakeFiles/LLVMBOLTRewrite.dir/PseudoProbeRewriter.cpp.o: in function `(anonymous namespace)::PseudoProbeRewriter::parsePseudoProbe()': PseudoProbeRewriter.cpp:(.text._ZN12_GLOBAL__N_119PseudoProbeRewriter16parsePseudoProbeEv+0x3d0): undefined reference to `llvm::GlobalValue::getGUID(llvm::StringRef)' /usr/bin/ld: CMakeFiles/LLVMBOLTRewrite.dir/PseudoProbeRewriter.cpp.o: in function `(anonymous namespace)::PseudoProbeRewriter::encodePseudoProbes()': PseudoProbeRewriter.cpp:(.text._ZN12_GLOBAL__N_119PseudoProbeRewriter18encodePseudoProbesEv+0x11a1): undefined reference to `llvm::GlobalValue::getGUID(llvm::StringRef)' collect2: error: ld returned 1 exit status make[2]: *** [tools/bolt/lib/Rewrite/CMakeFiles/LLVMBOLTRewrite.dir/build.make:275: lib/libLLVMBOLTRewrite.so.19.0git] Error 1

@s-barannikov

llvm#90338) Make sure that empty structs are treated as if it has a size of one bit in function parameters and return types so that it occupies a full argument and/or return register slot. This fixes crashes and miscompilations when passing and/or returning empty structs. Reviewed by: @s-barannikov

…lvm#92185) Add a privatizing clause to the construct that uses `allocate` clause. Amend the CHECK lines to reflect the expected output.

…vm#92176) Add remaining clauses with the "privatizing" property to construct decomposition, specifically to the part handling the `allocate` clause. --------- Co-authored-by: Tom Eccles <t@freedommail.info>

…#92160) Detect the case when a reduction modifier ends up not being applied after construct decomposition, treat it as an error. This fixes a regression in the gfortran test suite after PR90098.

llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:1223:1: error: unused function 'scanPHIsAndUpdateValueMap' [-Werror,-Wunused-function] scanPHIsAndUpdateValueMap(Instruction *Prev, BasicBlock *NewBlock, ^ 1 error generated.

@tblah

We need the information in the `DeclareOp` to generate debug information for variables. Currently, cg-rewrite removes the `DeclareOp`. As `AddDebugInfo` runs after that, it cannot process the `DeclareOp`. My initial plan was to make the `AddDebugInfo` pass run before the cg-rewrite but that has few issues. 1. Initially I was thinking to use the memref op to carry the variable attr. But as @tblah suggested in the llvm#86939, it makes more sense to carry that information on `DeclareOp`. It also makes it easy to handle it in codegen and there is no special handling needed for arguments. For this reason, we need to preserve the `DeclareOp` till the codegen. 2. Running earlier, we will miss the changes in passes that run between cg-rewrite and codegen. But not removing the DeclareOp in cg-rewrite has the issue that ShapeOp remains and it causes errors during codegen. To solve this problem, I convert DeclareOp to XDeclareOp in cg-rewrite instead of removing it. This was mentioned as possible solution by @jeanPerier in https://reviews.llvm.org/D136254 The conversion follows similar logic as used for other operators in that file. The FortranAttr and CudaAttr are currently not converted but left as TODO when the need arise. Now `AddDebugInfo` pass can extracts information about local variables from `XDeclareOp` and creates `DILocalVariableAttr`. These are attached to `XDeclareOp` using `FusedLoc` approach. Codegen can use them to create `DbgDeclareOp`. I have added tests that checks the debug information in mlir from and also in llvm ir. Currently we only handle very limited types. Rest are given a place holder type. The previous placeholder type was basic type with `DW_ATE_address` encoding. When variables are added, it started causing assertions in the llvm debug info generation logic for some types. It has been changed to an interger type to prevent these issues until we handle those types properly.

…lvm#90030) When analyzing C code with function pointers the checker crashes because of how the implementation extracts `IdentifierInfo`. Without the fix, this test crashes.

getKnownMinValue returns uint64_t, use ULL to make sure the second arg is also 64 bit.

…lvm#92197) There is no reason to call combineMetadata directly with a list of MD_ nodes. The combineMetadataForCSE function handles all the metadata correctly Partially fixes: llvm#30866

When allocating a memory buffer, we use a non-throwing new so that we can explicitly handle memory buffers that are too large to fit into memory. However, when exceptions are disabled, LLVM installs a custom new handler (https://github.com/llvm/llvm-project/blob/90109d444839683b09f0aafdc50b749cb4b3203b/llvm/lib/Support/InitLLVM.cpp#L61) that explicitly crashes when we run out of memory (https://github.com/llvm/llvm-project/blob/de14b749fee41d4ded711e771e43043ae3100cb3/llvm/lib/Support/ErrorHandling.cpp#L188) and that means this particular out-of-memory situation cannot be gracefully handled. This was discovered while working on #embed (llvm#68620) on Windows and resulted in a crash rather than the preprocessor issuing a diagnostic as expected. This patch switches away from the non-throwing new to a call to malloc (and free), which will return a null pointer without calling a custom new handler. It is the only instance in Clang or LLVM that I could find which used a non-throwing new, so I did not think we would need anything more involved than this change. Testing this would be highly platform dependent and so it does not come with test coverage. And because it doesn't change behavior that users are likely to be able to observe, it does not come with a release note.

This should fix failures in release builds.

Since we later possibly initialize the value by using operator-new, we need the default value to _not_ allocate memory.

) A variable `typeConverterOp` may be nullptr after dynamic cast. There is a security guard for this, but during logging error message the variable getting dereferenced. Found with static analysis.

We need to check if `GR64Cand` a valid register before using it. Test is not needed since it's covered in llvm-test-suite. Fixes llvm#90954

…lvm#92111) Resolves llvm#90326

Add .md file documentation with all BOLT options to display it more conveniently.

…d pointee (llvm#92210) ponter int *p for following map, test currently crash. map(p, p[:100]) or map(p, p[1]) Currly IR looks like // &p, &p, sizeof(int), TARGET_PARAM | TO | FROM // &p, p[0], 100sizeof(float) TO | FROM Worrking IR is // map(p, p[0:100]) to map(p[0:100]) // &p, &p[0], 100*sizeof(float), TARGET_PARAM | TO | FROM | PTR_AND_OBJ The change is add new argument AreBothBasePtrAndPteeMapped in generateInfoForComponentList Use that to skip map for map(p), when processing map(p[:100]) generate map with right flag.

In a similar vein to llvm#90049, we currently model all of the effects of a vsetvli pseudo: * VL and VTYPE are marked as defs * VL preserving x0,x0 vsetvlis doesn't get emitted until RISCVInsertVSETVLI, and when they are they have implicit uses on VL * Regular vector pseudos are fully modelled too: Before RISCVInsertVSETVLI they can be moved between vsetvli pseudos because we will eventually insert vsetvlis to correct VL and VTYPE. Afterwards, they will have implicit uses on VL and VTYPE. Since we model everything we can remove hasSideEffects=1. This gives us some improvements like sinking in vsetvli-insert-crossbb.ll. We need to update RISCVDeadRegisterDefinitions to keep handling vsetvli pseudos since it only operates on instructions with unmodelled side effects.

Improved modernize-use-constraints check by fixing a crash that occurred in some scenarios and excluded system headers from analysis. Problem were with DependentNameTypeLoc having null type location as getQualifierLoc().getTypeLoc(). Fixes llvm#91872

…idelines-special-member-functions (llvm#71683) Improved cppcoreguidelines-special-member-functions check with a new option AllowImplicitlyDeletedCopyOrMove, which removes the requirement for explicit copy or move special member functions when they are already implicitly deleted. Closes llvm#62392

After llvm#85605 ([clang] Set correct FPOptions if attribute 'optnone' presents) the current FP options in Sema are saved during parsing function because Sema can modify them if optnone is present. However they were saved too late, it caused fails in some cases when precompiled headers are used. This patch moves the storing earlier.

Commit 6c0665e (https://reviews.llvm.org/D45164) enabled certain constant expression evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives, see llvm/test/MC/AsmParser/assembler-expressions.s). `getUseAssemblerInfoForParsing` was added to make `clang -c` handling inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`), where such expression folding (related to `AttemptToFoldSymbolOffsetDifference`) is unavailable. I believe this is overly conservative. We can make some parse-time expression folding work for `clang -c` even if `clang -S` would still report an error, a MCAsmStreamer issue (we cannot print `.if` directives) that should not restrict the functionality of MCObjectStreamer. ``` % cat b.cc asm(R"( .pushsection .text,"ax" .globl _start; _start: ret .if . -_start == 1 ret .endif .popsection )"); % gcc -S b.cc && gcc -c b.cc % clang -S -fno-integrated-as b.cc # succeeded % clang -c b.cc # succeeded with this patch % clang -S b.cc # still failed <inline asm>:4:5: error: expected absolute expression 4 | .if . -_start == 1 | ^ 1 error generated. ``` Close llvm#62520 Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841 Pull Request: llvm#91082

When -ObjC is passed, the linker must force load any object files that contain special sections that store Objective-C / Swift information that is used at runtime. This should work regadless if input files are bitcode or native, but it was not working with bitcode. This is because the sections that identify an object file that should be loaded were inconsistent when dealing with a native file vs bitcode file. In particular, bitcode files were not searched for `__TEXT,__swift` prefixed sections, while native files were. This means LLD wasn't loading certain bitcode files and forcing the user to introduce --force-load to their linker invocation for that archive. Co-authored-by: Nuri Amari <nuriamari@fb.com>

This amends dceaa0f because ASAN caught an issue where the allocation and deallocation were not properly paired: https://lab.llvm.org/buildbot/#/builders/239/builds/7001 Use malloc and free throughout this file to ensure that all kinds of memory buffers use the proper pairing.

…m#92276) Summary: Previously we had this `LIBOMPTARGET_ENABLED` variable which controlled including `libomptarget`. This is now redundant since it's controlled by `LLVM_ENABLE_RUNTIMES`. However, this had the extra effect of not building it when given unsupported targets. THis was lost during the move to `offload`. This patch moves this logic back and makes the `offload` target just quit without doing anything if used on an unsupported architecture. llvm#91881 llvm#91819 --------- Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>

If the `/usr/lib/...` path where compiler-rt is conventionally installed on OpenBSD does not exist, fall back to the regular logic to find it. This is a minimal change to allow OpenBSD cross compilation from a toolchain that doesn't adopt all of OpenBSD's monorepo's conventions.

By default, EmitCmp avoids cmpw with i16 immediates due to 66/67h length-changing prefixes causing stalls, instead extending the value to i32 and using a cmpl with a i32 immediate, unless it has the TuningFastImm16 flag or we're building for optsize/minsize. However, if we're loading the value for comparison, the performance costs of the decode stalls are likely to be exceeded by the impact of the load latency of the folded load, the shorter encoding and not needing an extra register to store the ext-load. This matches the behaviour of gcc and msvc. Fixes llvm#90355

Do not try to run lldb-server on localhost in case of the remote target.

…mFile (llvm#92088) The tests `test_file_permissions` and `test_file_permissions_fallback` are disabled for Windows target. These tests use MockGDBServerResponder and do not depend on the real target. These tests failed in case of Windows host and Linux target. Disable them for Windows host too.

Summary: Even though we moved `libomptarget` this is still present in `omp.h` and can't be removed.

…llvm#90108) Remove parameter `const List<Clause> &clauses` from functions that take construct queue. The clauses should now be accessed from the construct queue.

…#92281) Install the image to the remote target if necessary.

TestNetBSDCore.py contains 3 classes with the same test names test_aarch64 and test_amd64. It causes conflicts because the same build dir. Add suffixes to avoid conflicts.

) This test caused python crash on Windows x86_64 host with the exit code 0xC0000409 (STATUS_STACK_BUFFER_OVERRUN). Close the input stream before exit to avoid this crash.

…92208) On a few compilers (clang 11/12 for example [1]), the following does not result in a copy elision, and since `Error`'s copy dtor is elided, results in a compile error: ``` Expect<Something> foobar() { ... if (Error E = aCallReturningError()) return E; ... } ``` Moving `E` would, conversely, result in the pessimizing-move warning on more recent clangs ("moving a local object in a return statement prevents copy elision") We just need to make the `Expected` ctor taking an `Error` take it as a r-value reference. [1] https://lab.llvm.org/buildbot/#/builders/54/builds/10505

….cpp. NFC Most of these were to avoid undefined behavior if a shift left changed the sign of the result. I don't think its possible to change the sign of the result here. We're shifting left by 12 after an arithmetic right shift by more than 12. The bits we are shifting out with the left shift are guaranteed to be sign bits. Also use SignExtend64<32> to force upper bits to all 1s instead of an Or. We know the value isUInt<32> && !isInt<32> which means bit 31 is set.

Add extra test coverage for loops with strided and invariant accesses to the same object.

This reverts commit 03c7458. One of the problems was addressed in llvm#92208 The other problem: needed to add `BitstreamReader` to the list of link deps of `LLVMProfileData`

…#909… (llvm#92302) …05)" This reverts commit 61da636. Update llvm#90905 was causing many tests to fail. See comments in llvm#90905.

…lvm#92028) Otherwise the build step fails due to missing dependencies.

This reverts commit c19f2c7. Broke the gcc-7 bot.

Before llvm#91440 a VSETVLIInfo would have had an IMPLICIT_DEF defining instruction, but now we look up a VNInfo which doesn't exist, which triggers an assertion failure. Mark these undef AVLs as AVLIsIgnored.

@yronglin

…m#92113) This patch covers the following Core issues: [CWG930](https://cplusplus.github.io/CWG/issues/930.html) "`alignof` with incomplete array type" [CWG1110](https://cplusplus.github.io/CWG/issues/1110.html) "Incomplete return type should be allowed in `decltype` operand" [CWG1340](https://cplusplus.github.io/CWG/issues/1340.html) "Complete type in member pointer expressions" [CWG1352](https://cplusplus.github.io/CWG/issues/1352.html) "Inconsistent class scope and completeness rules" [CWG1458](https://cplusplus.github.io/CWG/issues/1458.html) "Address of incomplete type vs `operator&()`" [CWG1824](https://cplusplus.github.io/CWG/issues/1824.html) "Completeness of return type vs point of instantiation" [CWG1832](https://cplusplus.github.io/CWG/issues/1832.html) "Casting to incomplete enumeration" [CWG2304](https://cplusplus.github.io/CWG/issues/2304.html) "Incomplete type vs overload resolution" [CWG2310](https://cplusplus.github.io/CWG/issues/2310.html) "Type completeness and derived-to-base pointer conversions" [CWG2430](https://cplusplus.github.io/CWG/issues/2430.html) "Completeness of return and parameter types of member functions" [CWG2512](https://cplusplus.github.io/CWG/issues/2512.html) "`typeid` and incomplete class types" [CWG2630](https://cplusplus.github.io/CWG/issues/2630.html) "Syntactic specification of class completeness" [CWG2718](https://cplusplus.github.io/CWG/issues/2718.html) "Type completeness for derived-to-base conversions" [CWG2857](https://cplusplus.github.io/CWG/issues/2857.html) "Argument-dependent lookup with incomplete class types" Current wording for CWG1110 came from [P0135R1](https://wg21.link/p0135R1) "Wording for guaranteed copy elision through simplified value categories". As a drive-by fix, I fixed incorrect status of CWG1815, test for which was added in llvm#87933. CC @yronglin

Summary: If the user does not have the selected backend enabled, we should still be able to build the LLVM-IR an ddistribute it. This patch makes logic to suppress tests if the backend can't build it, as well as removing a flag for the building that's only present int he NVPTX backend.

llvm#92195) Fixes llvm#92193.

This reverts commit 2c54bf4. Fixed gcc-7 issue.

…eArchString

…ons (llvm#92167) As discussed in the last sync-up call, because these profiles are not yet finalised they shouldn't be exposed to users unless they opt-in to them (much like experimental extensions). We may later want to add a more specific flag, but reusing `-menable-experimental-extensions` solves the immediate problem. This is implemented using the new support for marking profiles s experimental added in llvm#91993 to move the unratified profiles to RISCVExperimentalProfile and making the necessary changes to logic in RISCVISAInfo to handle this.

…lysis. (llvm#91616) Assume in fewer places that the analysis is of a `FunctionDecl`, and initialize the `Environment` properly for `Stmt`s. Moves constructors for `Environment` to header to make it more obvious that there are only minor differences between them and very little initialization in the constructors. Tested with check-clang-tooling.

…#91334) This PR adds a string interface to `InternalizePass`' `MustPreserveGV` option, which is a callback function to indicate if a GV is not to be internalized. This is for use in LLVM.jl, the Julia wrapper for LLVM, which uses the C API and is thus required to use the PassBuilder string API for building NewPM pipelines.

They *are* still accepted by the HW but have a conservative effect. Leave them untouched since handling them would complicate the logic a bit, and developers who code to such a low level really need to revisit what they're doing anyway.

Summary: These are lots of random warnings due to inconsistent initialization or signedness.

…iles are sorted Just as we do for the arrays of extension names.

Re-apply llvm#81196, with a fix that handles the absence of llvm formatting: llvm@3ba650e b47d425f

…lls. These are allowed for Darwin and use the same ABI.

…s." (llvm#92321) Reverts llvm#91334 This broke the gcc7 build. I suspect the issue is a mismatch on user-defined move constructor on the return: `return PreservedGVs;` does not match the return type of the function.

This also fixes building ... on Linux. Seems like target_compatible_with isn't enough but you also need a manual tag.

This avoids visiting `co_await` or `co_yield` operand 5 times (it is repeated under transformed awaiter subexpression, and under `await_ready`, `await_suspend`, and `await_resume` generated call subexpressions).

`ArrayInitLoopExpr` AST node has two occurences of its as-written initializing expression in its subexpressions through a non-unique `OpaqueValueExpr`. It causes double-visiting of the initializing expression if not handled explicitly, as discussed in llvm#85837.

Most violations are stale and should be removed while a few can be adjusted. Reported at llvm#92238

Update the publisher and add a publish script that we can use from Github actions.

This option is a compilation action that parses a source file and performs semantic analysis on it, like the existing -fdebug-unparse option does. Its output, however, is preceded by the effective contents of all of the non-intrinsic modules on which it depends but does not define, transitively preceded by the closure of all of those modules' dependencies. The output from this option is therefore the analyzed parse tree for a source file encapsulated with all of its non-intrinsic module dependencies. This output may be useful for extracting code from large applications for use as an attachment to a bug report, or as input to a test case reduction tool for problem isolation.

…ication as used during partial ordering (llvm#91534) We do not deduce template arguments from the exception specification when determining the primary template of a function template specialization or when taking the address of a function template. Therefore, this patch changes `isAtLeastAsSpecializedAs` such that we do not mark template parameters in the exception specification as 'used' during partial ordering (per [temp.deduct.partial] p12) to prevent the following from being ambiguous: ``` template<typename T, typename U> void f(U) noexcept(noexcept(T())); // #1 template<typename T> void f(T*) noexcept; // #2 template<> void f<int>(int*) noexcept; // currently ambiguous, selects #2 with this patch applied ``` Although there is no corresponding wording in the standard (see core issue filed here cplusplus/CWG#537), this seems to be the intended behavior given the definition of _deduction substitution loci_ in [temp.deduct.general] p7 (and EDG does the same thing).

ELEMENTAL internal subprograms are pure unless explicitly IMPURE.

…plate-template-args' (llvm#92324) This allows the warning to be disabled in isolation, as it helps when treating them as errors.

When a procedure is defined with a subprogram but never referenced in a compilation unit, it may not be characterized until lowering, and any errors in characterization then may crash the compiler. So always ensure that procedure definitions are characterizable in declaration checking. Fixes llvm#91845.

When a BIND(C) interface or subprogram has a dummy argument whose derived type is not BIND(C) but meets the constraints and requirements of a BIND(C) type, accept it with a warning.

A !$CUF KERNEL DO directive is allowed to have advisory REDUCE clauses similar to those in OpenACC and DO CONCURRENT. Parse and represent them. Semantic validation will follow.

When the MOLD= argument's type is polymorphic, the type of the result cannot be known at compilation time, so the call cannot be folded even when the SOURCE= is constant. Fixes llvm#92264.

The `--temporal-profile-max-trace-length=0` flag in the `llvm-profdata merge` command is used to remove traces from a profile. There was a bug where traces would not be cleared if the profile was already sampled. This patch fixes that.

This was an oversight in llvm#91859. Using the subblock ID mechanism other places that use the bitstream APIs (e.g. `BitstreamRemarkSerializer`) use.

The code paths for mte enabled and disabled were interleaving and which increases the difficulty of reading each path in both source level and assembly level. In this change, we move the parts that they have different logic into functions and minor refactors on the code structure.

``` .irp foo,1 nop .endr nop ``` expands to an excess EOL between two nop lines. Remove the excess EOL.

…llvm#92327) CallStackIdConverter sets LastUnmappedId when a mapping failure occurs. Now, since toMemProfRecord takes an instance of CallStackIdConverter by value, namely std::function, the caller of toMemProfRecord never receives the mapping failure that occurs inside toMemProfRecord. The same problem applies to FrameIdConverter. The patch fixes the problem by passing FrameIdConverter and CallStackIdConverter by reference, namely llvm::function_ref. While I am it, this patch deletes the copy constructor and copy assignment operator to avoid accidental copies.

This reverts commit 03c53c6. This causes very large compile-time regressions in some cases, e.g. sqlite3 at O0 regresses by 5%.

…#90922) Make sure to merge the sourcelocation of the Dominating Instruction that is hoisted in a basic block in the CSEMIRBuilder in the legalizer pass. If this is not done, we can have a incorrect line table entry that makes the instruction pointer jump around. For example the line table without this patch looks like: ``` Address Line Column File ISA Discriminator OpIndex Flags ------------------ ------ ------ ------ --- ------------- ------- ------------- 0x0000000000000000 0 0 1 0 0 0 is_stmt 0x0000000000000010 11 14 1 0 0 0 is_stmt prologue_end 0x0000000000000028 12 1 1 0 0 0 is_stmt 0x000000000000002c 12 15 1 0 0 0 0x000000000000004c 12 13 1 0 0 0 0x000000000000005c 13 1 1 0 0 0 is_stmt 0x0000000000000064 12 13 1 0 0 0 is_stmt 0x000000000000007c 13 7 1 0 0 0 is_stmt 0x00000000000000c8 13 1 1 0 0 0 0x00000000000000e8 13 1 1 0 0 0 epilogue_begin 0x00000000000000f8 13 1 1 0 0 0 end_sequence ``` The line table entry for 0x000000000000005c should be 0 After this patch, the line table looks like: ``` Address Line Column File ISA Discriminator OpIndex Flags ------------------ ------ ------ ------ --- ------------- ------- ------------- 0x0000000000000000 0 0 1 0 0 0 is_stmt 0x0000000000000010 11 14 1 0 0 0 is_stmt prologue_end 0x0000000000000028 12 1 1 0 0 0 is_stmt 0x000000000000002c 12 15 1 0 0 0 0x000000000000004c 12 13 1 0 0 0 0x000000000000005c 0 0 1 0 0 0 0x0000000000000064 12 13 1 0 0 0 0x000000000000007c 13 7 1 0 0 0 is_stmt 0x00000000000000c8 13 1 1 0 0 0 0x00000000000000e8 13 1 1 0 0 0 epilogue_begin 0x00000000000000f8 13 1 1 0 0 0 end_sequence ```

…lvm#92220) Co-authored-by: Ryosuke Niwa <rniwa@apple.com>

…lvm#67657) The znver3/znver4 scheduler models are outliers, specifying very large LoopMicroOpBufferSizes at 512, while typical values for other subtargets are on the order of ~50. Even if this information is micro-architecturally correct (*), this does not mean that we want to runtime unroll all loops to a size that completely fills the loop buffer. Unless this is the single hot loop in the entire application, the massive code size increase will bust the micro-op and instruction caches. Protect against this by clamping to the default PartialThreshold of 150, which is the same as the default full-unroll threshold and half the aggressive full-unroll threshold. Allowing more partial unrolling than full unrolling certainly does not make sense. (*) I strongly doubt that this is actually correct -- I believe this may derive from an incorrect reading of Agner Fog's micro-architecture guide. The number 4096 that was originally used here is the size of the general micro-op cache, not that of a loop buffer. A separate loop buffer is not listed for the Zen microarchitecture. Comparing this to the listing for Skylake, it has a 1536 micro-op buffer, but only a 64 micro-op loopback buffer, with a note that it's rarely fully utilized. Our scheduling model specifies LoopMicroOpBufferSize of 50 in that case.

Summary: A previous patch moved the code here and accidentally overrwrote the include path that the LSP interface used. This caused incorrect errors when using clangd with the offload project. This patch removes the unnecessary header and makes sure we include the correct folder.

…ors (llvm#90500)" (llvm#92283) This patch reapplies llvm#90500, addressing a bug which caused binary operators with dependent operands to be incorrectly rebuilt by `TreeTransform`.

… before consuming it (llvm#92218) Close llvm#91418 Since we load the variable's initializers lazily, it'd be problematic if the initializers dependent on each other. For example, ``` SomeType a = ...; SomeType b = a; ``` Previously, when we load variable `b`, we need to load the initializer, then we need to load `a`. We can only mark the variable `b` as loaded after we load `a`. Then `a` is always initialized before `b`. However, it is not true after we implement lazy loading for initializers. So here we try to load the initializers of static variables to make sure they are passed to code generator by order. If we read any thing interesting, we would consume that before emitting the current declaration.

…, i) (llvm#89017) This patch adds a peephole to AArch64PostSelectOptimize for codegen that is caused by RegBankSelect limiting G_EXTRACT_VECTOR_ELT only to FPR registers in both the input and output registers. This can cause a generation of COPY from FPR to GPR when, for example, the output register of the G_EXTRACT_VECTOR_ELT is used in a branch condition. This was noticed when looking at codegen differences between SDAG and GI for the s1279 kernel in the TSVC benchmark.

…ts (NFC) Adding these for NVPTX because for AMDGPU the problematic -1 case does not get reordered in the first place.

We were checking the index of GEP twice, instead of checking both GEP and PtrGEP.

C99-C23 6.5.2.5 says: The type name shall specify an object type or an array of unknown size, but not a variable length array type. Fixes llvm#89835.

Use `os.devnull` instead of `/dev/null`.

…#91579) This patch adds nsw flag to the increment of do-variables when a new option is enabled. NOTE 11.10 in the Fortran 2018 standard says they never overflow. See also the discussion in llvm#74709 and the following discourse post. https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/5

The error checking is only for .macro directives. Move it to the .macro parser to remove one parameter.

…m#90578) This patch add support of intrinsics GNU extension ETIME llvm#84205. Some usage info and example has been added to `flang/docs/Intrinsics.md`. The patch contains both the lowering and the runtime code and works on both Windows and Linux. | System | Implmentation | |-----------|--------------------| | Windows| GetProcessTimes | | Linux |times |

…andled by LegalizeVectorOps. (llvm#92332) The expand code is present, but we were missing the type query code so the nodes would be ignored until LegalizeDAG.

…erleavedMemoryOpCost. (llvm#91825) isLegalInterleavedAccessType expects the subvector type, but getInterleavedMemoryOpCost is called with the full vector type. So we need to divide by Factor.

rdar://127846581

…ter (llvm#92303) As noted in llvm#91440 (comment), if the pass pipeline stops early because of -stop-after any allocated passes added with insertPass will not be freed if they haven't already been added. This was showing up as a failure on the address sanitizer buildbots. We can fix it by instead passing the pass ID instead so that allocation is deferred.

I built it and confirmed this fixes the issue locally. Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

Currently the irdl dialect page has no content beyond the header. By referring to the Ops.td in the CMake config, it pulls in all the types, attributes, etc., so that the doc generation can include them all in the page. Rendered locally to confirm it fixes the issue ![image](https://github.com/llvm/llvm-project/assets/2467754/8758f324-6bc3-4f0e-8fa9-8962cdb0177f) Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

…ariables before consuming it (llvm#92218)" This reverts commit 3a4c1b9. This breaks a bot on clang-s390x-linux

This field is present in LLVM, but was missing from the MLIR wrapper type. This addition allows MLIR languages to add proper DWARF info for GPU programs.

…ion" (llvm#92354) Reverts llvm#90578 This broke the premerge linux buildbot.

Wrongly removed in 45cc6bd.

In .macro, \+ expands to the per-macro invocation count. https://sourceware.org/pipermail/binutils/2024-May/134009.html \+ counts from 0 for .irp/.irpc/.rept . Note: We currently prints \q for `.print "\q"` while gas doesn't. This patch does not change this behavior.

If there is only one non-terminator operation in the update region then the update operation can be found and we can try to generate an atomicrmw instruction. Otherwise use the cmpxchg loop. Fixes llvm#91929

Support `R_AARCH64_AUTH_RELATIVE` relocation compression as described in https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#relocation-compression

Addresses old TODO about the exp10 intrinsic not existing.

) Unsupported ops on tile types can become dead after `-convert-arm-sme-to-llvm` resulting in incorrect results. Verify such operations don't exist post-conversion and fail if they do. Based on discussion from https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543

…#92288)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 83891777 (May 16) (48) #307

[AutoBump] Merge with 83891777 (May 16) (48) #307

Commits on May 14, 2024

Commits on May 15, 2024

Commits on May 16, 2024

Commits on Aug 23, 2024