[FXML-3548] bump llvm to d13da154a7c7eff77df8686b2de1cfdfa7cc7029 #84

…lvm#66571) Fold `(saddo (not a), 1)` to `(ssubo 0, a)` and `(saddo_carry (not a), b, c)` to `(ssubo_carry b, a, !c)`. Proof: https://alive2.llvm.org/ce/z/Lj49YM This is the same as https://reviews.llvm.org/D46505 and https://reviews.llvm.org/D59208, but for signed opcodes.

If the R_AARCH64_CALL26 against a symbol that has a lower address, then encodeValueAArch64 will return a wrong value. Reviewed By: Kepontry, yota9 Differential Revision: https://reviews.llvm.org/D159513

In the place it used to be linked from.

…ack-move optimization (llvm#66618) Stack-move optimization, the optimization that merges src and dest alloca of the full-size copy, replaces all uses of the dest alloca with src alloca. For safety, we needed to check all uses of the dest alloca locations are dominated by src alloca, to be replaced. This PR adds the check for that. Fixes llvm#65225

…ndly. (llvm#65177) See https://wg21.link/LWG3545 for background and details. Differential Revision: https://reviews.llvm.org/D158922

Reviewed By: Chia-hungDuan Differential Revision: https://reviews.llvm.org/D159449

Fixes llvm#66594

This bitcast is no longer necessary with opaque pointers. This results in some annoying variable name changes in tests.

It is possible for a derived type extending a type with private components to define components with the same name as the private components. This was not properly handled by lowering where several fir.record type component names could end-up being the same, leading to bad generated code (only the first component was accessed via fir.field_index, leading to bad generated code). This patch handles the situation by adding the derived type mangled name to private component.

Summary: This patch copies a config file for the GPU similar to the baremetal/embedded implementation. This will configure the implementations of functions like `sprintf` and `snprintf` to be compiled into more simple versions that can be run on the GPU. These functions cannot be enabled yet as Vararg support hasn't landed, but it will be used then.

llvm#66387) …(trunci) expansion This revision adds a rewrite for sequences of vector `bitcast(trunci)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the pre-trunci vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` followed by an optional final `trunci`/`extui`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.

This patch moves the group of OpenMP MLIR passes using after lowering of Fortran to MLIR into a pipeline to be shared by `flang-new` and `bbc`. Currently, the `bbc` tool does not produce the expected FIR for offloading- enabled OpenMP codes due to not running these passes. Unit tests exercising these passes are updated to check `bbc` output as well.

…tant (llvm#65905) This patch simplifies the pattern `icmp X and/or C1, X and/or C2` when one constant mask is the subset of the other. If `C1 & C2 == C1`, `A = X and/or C1`, `B = X and/or C2`, we can do the following folds: `icmp ule A, B -> true` `icmp ugt A, B -> false` We can apply similar folds for signed predicates when `C1` and `C2` are the same sign: `icmp sle A, B -> true` `icmp sgt A, B -> false` Alive2: https://alive2.llvm.org/ce/z/Q4ekP5 Fixes llvm#65833.

…tination (llvm#65468) This revision adds support for empty tensor elimination to "bufferization.materialize_in_destination" by implementing the `SubsetInsertionOpInterface`. Furthermore, the One-Shot Bufferize conflict detection is improved for "bufferization.materialize_in_destination".

…vm#66385) This gets rid of the separate parameter enable_modules_lsv in favor of adding a named option to the enable_modules parameter. The patch also removes the getModuleFlag helper, which was just a really complicated way of hardcoding "none".

…NFC) /data/llvm-project/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp:229:21: error: unused function 'operator<<' [-Werror,-Wunused-function] static raw_ostream &operator<<(raw_ostream &os, ^ 1 error generated.

… undef read (llvm#66211) Update LiveIntervals after rewriting: %reg = INSERT_SUBREG undef %reg, %subreg, subidx to: undef %reg:subidx = COPY %subreg D113044 implemented this for the non-undef case.

Promotion can add/remove arguments. We need to update the indices in the allocsize attribute accordingly. Fixes llvm#66103.

… being treated as titles

…#66206) Thanks to Giuseppe D'Angelo for pointing this out on the cpplang Slack! The example implementation in https://eel.is/c++draft/string.view.comparison#example-1 was necessary when it was written, in C++17, but in C++20 we don't need that complexity anymore, because of the reversed candidates that are synthesized by the compiler.

…) -> (assertzext x) fold. We'll need to generalize this fold to check for any zero upperbits to address some of the D155472 regressions, but this exposes a number of issues. For now, just use the general MaskedValueIsZero test instead of the assertzext.

…pass options, remove bufferization.escape attribute (llvm#66619) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the `allow-return-allocs` pass option will default to true now, `create-deallocs` defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new `allow-return-allocs-from-loops` option is added as a temporary workaround for some bufferization limitations.

…ocale "cs_CZ.ISO8859-2" Reviewers: David Tenty, Mark de Wever Differential Revision: https://reviews.llvm.org/D126407

…llvm#66085) Fixes llvm#65570

…m#66653) Summary: The GPU build has a lot of magic around how we package the output. Generally, the GPU needs to exist as a secondary fatbinary image for offloading languages. This is because offloading languages pretend like offloading to an accelerator is a single file. This then needs to be put into a single file to make it mesh with the existing build infrastructure. To work with this, the `libc` makes an installed version of the library that simply embeds the GPU code into an empty stub file. This wasn't being updated correctly, which lead to the installed `libc` static library not being updated correctly when the underlying file was changed. The previous behaviour only updated when the entrypoint itself was modified, but not any of its headers. By adding a dependcy on the actual *object* file we should now capture the regular CMake semantics.

…t. NFC (llvm#66199) Some VP intrinsic definitions were missing the VP_PROPERTY_FUNCTIONAL_INTRINSIC property. This patch fills them in, and adds a static_assert that all VP intrinsics have an equivalent opcode or intrinsic defined so we don't forget them in future. Some VP intrinsics don't have an equivalent, namely merge and strided load/store. For those, a new property was added to mark that they don't have a non-VP equivalent. This adds a helper method to get the ID of the functionally equivalent intrinsic, similar to the existing getFunctionalOpcodeForVP and getConstrainedIntrinsicIDForVP method.

…nts (llvm#66238) The POINTER= and TARGET= arguments to the intrinsic function ASSOCIATED() can be the results of references to functions that return object pointers or procedure pointers. NULL() was working well but not program-defined pointer-valued functions. Correct the validation of ASSOCIATED() and extend the infrastructure used to detect and characterize procedures and pointers.

…riable fragment size" This reverts commit 47324cf. This exposed incorrect debuginfo in rustc. Revert the verification until this has been fixed.

This is a fix for a subset of legalization problems around 64 bit indices on rv32 targets. For RV32+V, we were using the wrong mask type for the manual truncation lowering for fixed length vectors. Instead, just use the generic TRUNCATE node, and let it be lowered as needed. Note that legalization is still broken for rv32+zve32. That appears to be a different issue.

…66239) The runtime implementation for INQUIRE(EXIST=x) is returning .TRUE. for all non-existent unit, which is incorrect for valid unit numbers.

Use the recently introduced llvm.mlir.zero operation for values with LLVM target extension type. Replaces the previous workaround that uses a single zero-valued integer attribute constant operation. Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>

Fold references to the (relatively new) intrinsic function NORM2 at compilation time when the argument(s) are all constants. (Getting this done right involved some changes to the API of the accumulator function objects used by the DoReduction<> template, which rippled through some other reduction function folding code.)

… for ISD::PREFETCH. (llvm#66601) The intrinsic uses ImmArg so TargetConstant would be consistent with how other intrinsics are handled. This hides the constants from type legalization so we can remove the promotion support. isel patterns are updated accordingly.

llvm#66563) This is a minor step towards deprecating bufferization.alloc_tensor(). It replaces the examples with tensor.empty() and adjusts the underlying rewriting logic to prepare for this upcoming change.

When running multiple shards in parallel, one shard might write to the cache while another one is reading this cache. Instead of updating the file in place, write to a temporary file and swap the cache file using os.replace(). This is an atomic operation and means shards will either see the old state or the new one.

…a large constant. On the first split we create two i32 trunc stores and a srl to shift the high part down. The srl gets constant folded, but to produce a new i32 constant. But the truncstore for the low store still uses the original constant. This original constant then gets converted to a constant pool before we revisit the stores to further split them. The constant pool prevents further constant folding of the additional srls. After legalization is done, we run DAGCombiner and get some constant folding of srl via computeKnownBits which can peek through the constant pool load. This can create new constants that also need a constant pool.

…::expandUnalignedStore. If the SRL for Hi constant folds, but we don't remoe those bits from the Lo, we can end up with strange constant folding through DAGCombine later. I've only seen this with constants being lowered to constant pools during lowering on RISC-V.

The Z#_HI register definitions were created during the very early SVE enablement work and before the SVE calling convention was locked in. As they look entirely unused, they need to go.

The rewriting of the extension intrinsic function SIZEOF was producing results that would reference symbols that were not available in the current scope, leading to crashes in lowering. The symbols could be function result variables, for SIZEOF(func()), or bare derived type component names, for SIZEOF(array(n)%component). Fixing this without regressing on a current test case involved careful threading of some state through the TypeAndShape characterization code and the shape/bounds analyzer, and some clean-up was done along the way.

When the old extension of D debug lines in fixed form source is enabled, recognize continuation lines that begin with a D. (No test is added, since the community's driver doesn't support the GNU Fortran option -fd-lines-as-code.)

… ops (llvm#65777) When checking to see if our index expressions can be converted into strided operations, we previously gave up if the index type wasn't an exact match for the intptrty for the address. Per gep semantics, this mismatch implies a sext or trunc cast to the respective index type. For constants, go ahead and evaluate that cast instead of giving up. Note that the motivation of this is mostly test cleanup. We canonicalize at IR such that the gep index will match the intptrty. This is mostly useful so that we can write both RV32 and RV64 tests from the same source. Its also helpful in preventing confusion - I've stumbled across this at least four times now and wasted time each one. Note: The test change for scatters unit stride cases contains a minor regression for rv32 and 64 bit indices. This is an artifact of order in which changes are landing. This will be addressed in a near future change for all configurations.

… creating BinaryCoverageReader Only InstrProfSymtab is needed to retrieve function names when debug info corrletaion is enabled.

…s enabled (llvm#66434)" This reverts commit 003bcad. ARM folks say it regresses some of their benchmarks: llvm#66434 (comment)

…ing zeroes (llvm#66548) As suggested by Greg in llvm#66534, I'm adding a setting at the Target level that controls whether to show leading zeroes in hex ValueObject values. This has the benefit of reducing the amount of characters displayed in certain interfaces, like VSCode.

This fixes a merge conflict in llvm#66563

llvm#66244) …lues Emit an error at runtime when a list-directed input value is not followed by a value separator or end of record. Previously, the runtime I/O library was consuming as many input characters that were valid for the type of the value, and leaving any remaining characters for the next input edit, if any.

This patch adds `host_assoc` attribute for operations that implement FortranVariableInterface (e.g. `hlfir.declare`). The attribute is used by the alias analysis to make better conclusions about memory overlap. For example, a dummy argument of an inner subroutine and a host's variable used inside the inner subroutine cannot refer to the same object (if the dummy argument does not satisify exceptions in F2018 15.5.2.13). This closes a performance gap between HLFIR optimization pipeline and FIR ArrayValueCopy for Polyhedron/nf.

llvm#66648) …cast) expansion This revision adds a rewrite for sequences of vector `ext(bitcast)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the source vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` with shifts`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.

…uring phi-of-ops optimization (llvm#66314) - Test for future commit in NewGVN - [NewGVN] Set parent to the temporal instructions that are generated during phi-of-ops optimization

When a COMMON block object has a derived type that is part of a set of mutually-dependent types with other members, the compiler loops. Fixes llvm#65572.

…65272) The method implementations remain in the .cpp file; explicit instantiations have been added for these two types. makeMatrix has been duplicated to makeIntMatrix and makeFracMatrix.

This patch fixes a warning: flang/runtime/edit-input.cpp:27:20: error: unused function 'IsListDirectedFieldComplete' [-Werror,-Wunused-function]

This test was reported as failing by https://lab.llvm.org/buildbot/#/builders/68/builds/60172. The fix is very simple. We need to invoke the correct setting.

…on (llvm#66585) The helper function `__pair_like_explicit_wknd` is only SFINAE-ed with `tuple_size<remove_cvref_t<_PairLike>>::value == 2`, but its function body assumes `std::get` being valid. Fixes llvm#65620

llvm#65272)" This reverts commit efca035. Reverting due to windows build bot failure: https://lab.llvm.org/buildbot/#/builders/13/builds/40242/steps/6/logs/stdio

llvm#66248) …e pointer The procedure characterization package correctly characterizes the result of a reference to a function that returns a procedure pointer. In the event of a result that is a pointer to a function that itself is a procedure pointer, the code in pointer assignment semantics checking was mistakenly using that result's procedure characteristics as the characteristics of the original function reference. This is just wrong; delete it.

Memoize SymbolRef::getAddress() for sorting symbol table entries by their address. Saves about 10 seconds of processing time on large binaries with over 2 million symbols. NFCI. Reviewed By: jobnoorman, Amir Differential Revision: https://reviews.llvm.org/D159524

Minor refactoring to delete redundant code. Reviewed By: jobnoorman Differential Revision: https://reviews.llvm.org/D159525

…6250) The warning message about a derived type not having a FINAL subroutine for a particular object's rank should not issue for an assumed-rank dummy argument.

…nd/OpenMP/OMP.td`

Doing so allows the use of smaller constants overall, and may allow (for some small vector constants) avoiding the constant pool entirely. This can result in extra VTYPE toggles if we get unlucky. This was reviewed under PR llvm#66405.

…lvm#66251) A NULL() pointer is an acceptable actual argument for association with an (absent) optional allocatable dummy argument. Semantics was unconditionally emitting an error that the actual argument is not allocatable.

Reviewed By: #libc, var-const Differential Revision: https://reviews.llvm.org/D150831

The Fortran standards require (F'2023 C745) that a derived type with the SEQUENCE attribute have at least one component. No Fortran compiler actually enforces this constraint. Accept this usage with a warning.

Projects like libc++ include their own cache files, and it's convenient to just be able to reuse those cache files as part of a runtimes build instead of having to duplicate the settings inside a special runtimes cache file (which will inevitably get out of sync). I'm not completely happy about overloading the existing argument passing behavior based on the argument name, but I also couldn't think of a good alternative. Suggestions are welcome. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D158791

https://reviews.llvm.org/D71154 prevented Clang from search for libc++ headers installed alongside the driver when targeting Android. The motivation was the NDK's use of a different libc++ inline namespace (`__ndk1` instead of the standard `__1`), which made regular libc++ headers incompatible with the NDK's libc++ library. Since then, libc++ has gained the ability to install its `__config_site` header (which controls the inline namespace, among other things) to a per-target include directory, which enables per-target customizations. If this directory is present, the user has expressly built libc++ for Android, and we should use those headers. The motivation is that, with the current setup, if a user builds their own libc++ for Android, they'll use the library they built themselves but the NDK's headers instead of their own, which is surprising at best and can cause all sorts of problems (e.g. if you built your own libc++ with a different ABI configuration). It's important to match the headers and libraries in that scenario, and checking for an Android per-target include directory lets us do so without regressing the original scenario which https://reviews.llvm.org/D71154 was addressing. While I'm here, switch to using sys::path::append instead of slashes directly, to get system path separators on Windows, which is consistent with how library paths are constructed (and that consistency will be important in a follow-up, where we use a common search function for the include and library path construction). (As an aside, one of the motivations for https://reviews.llvm.org/D71154 was to support targeting both Android and Apple platforms, which expected libc++ headers to be provided by the toolcain at the time. Apple has since switched to including libc++ headers in the platform SDK instead of in the toolchain, so that specific motivation no longer applies either.) Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D159292

Searching for target-specific standard library header and library paths should perform fallback searches for targets, the same way searching for the runtime libraries does. It's important for the header and library searches to be consistent, otherwise we could end up using mismatched headers and libraries. (See also https://reviews.llvm.org/D146664.) Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D159293

Semantic checking of COMMON blocks and EQUIVALENCE sets has an assumption that the base storage sequence object of each COMMON block object will also be in that COMMON block's list of objects, and emits an error message when this is not the case. This assumption is faulty; it is possible for a base object to have its COMMON block set during offset assignment. Fixes llvm#65922.

7d81813 says that this was used because functions missing certain attributes (e.g. fast math) would inherit behavior from previous functions with those attributes. However, later c378e52 explicitly set those attributes if they were missing and removed the use of DefaultOptions.

llvm#66256) …mmy argument Several compilers accept a null pointer (with or without a MOLD=) as an actual argument for association with an INTENT(IN) allocatable dummy argument. At runtime, the allocatable dummy argument appears to be in the unallocated state. This seems useful, unambiguous, unlikely to invalidate conforming code, and works with Intel, NAG, & XLF, so it should be supported with an optional portability warning in this compiler as well.

…#66678) Broken debug information can give empty names for an inlined frame, e.g, ``` 0x1d605c68: ryKeyINS7_17SmartCounterTypesEEESt10shared_ptrINS7_15AsyncCacheValueIS9_EEESaIhESt6atomicEEE9fetch_subElSt12memory_order at Filename: edata.h Function start filename: edata.h Function start line: 266 Function start address: 0x1d605c68 Line: 267 Column: 0 (inlined by) at Filename: edata.h Function start filename: edata.h Function start line: 274 Function start address: 0x1d605c68 Line: 275 Column: 0 (inlined by) _EEEmmEv at Filename: arena.c Function start filename: arena.c Function start line: 1303 Line: 1308 Column: 0 ``` This patch avoids creating a sample context with an empty function name by stopping tracking at that frame. This prevents a hash failure that leads to an ICE, where empty context serves at an empty key for the underlying MapVector https://github.com/llvm/llvm-project/blob/7624de5beae2f142abfdb3e32a63c263a586d768/llvm/lib/ProfileData/SampleProfWriter.cpp#L261

…6258) When a DATA statement object is not valid, there's a number of possible reasons. Emit an error message for the most egregious violation, so that an unlucky user doesn't fix something easy (due to a less-severe error message masking one that is worse) and then run into something that might be more serious.

…ates. This change corrects some cases where the source location for an instantiated specialization of a function template or a member function of a class template was assigned the location of a non-defining declaration rather than the location of the definition the specialization was instantiated from. Fixes llvm#26057 Reviewed By: cor3ntin Differential Revision: https://reviews.llvm.org/D64087

The CFI_CDESC_T(rank) macro defined for C (not C++) in ISO_Fortran_binding.h incorporates a cdesc_t structure as a member, which works for data layout but doesn't allow for direct access to its members. (The C++ definition can use inheritance.) Restructure the definitions in that header file so that CFI_CDESC_T(rank) for C defines a struct with the expected members.

Given the transition to opaque pointers we no longer need to emit some pointer casts. Int8PtrTy was set up to be a ptr in same address space a OrigDest, making the first CreatePointerCast dead. And then NewDestGEP will end up having the same type as OrigDest, making the second CreatePointerCast dead.

Given that transition to opaque pointers a call to CreatePointerCast in processMemSetMemCpyDependence was found redundant. It would cast from "ptr" to "ptr" (both associated with the same address space).

…FString Given the transition into using opaque pointers we no longer need to add casts to handle "mismatched pointer types" as all pointers types involved now are the same.

This should be repaired. Fixes build bots quickly. Introduced: https://reviews.llvm.org/D146169

) evalIntegralCast was using makeIntVal, and when _BitInt() types were introduced this exposed a crash in evalIntegralCast as a result. Improve evalIntegralCast to use makeIntVal more efficiently to avoid the crash exposed by use of _BitInt. This was caught with our internal randomized testing. <src-root>/llvm/include/llvm/ADT/APInt.h:1510: int64_t llvm::APInt::getSExtValue() const: Assertion `getSignificantBits() <= 64 && "Too many bits for int64_t"' failed.a ... #9 <address> llvm::APInt::getSExtValue() const <src-root>/llvm/include/llvm/ADT/APInt.h:1510:5 llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::ento::SVal, clang::QualType, clang::QualType) <src-root>/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp:607:24 clang::Expr const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) <src-root>/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp:413:61 ... Fixes: llvm#61960 Reviewed By: donat.nagy

Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A. Among other things, this enables additional simplifications after applying versioned strides, as follow up to D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159200

This config is not actually used anywhere and it is not used on Android. Since it does not test anything not tested elsewhere, remove it. Remove the size class data associated with this config too.

…O. NFC (llvm#66688) We have been getting errors from emscripten users where including the name of the symbol that triggered the inclusion would be useful in the diagnosis. e.g: emscripten-core/emscripten#20275

This test was broken by 710276a because DumpDataExtractor now accesses the Target properties, which someone ends up relying on the file system. This is an instance of this error https://lab.llvm.org/buildbot/#/builders/96/builds/45607/steps/6/logs/stdio I cannot reproduce this locally, but it seems that the error happens because we are not initializing the FileSystem and the Host as part of the test setup.

…lvm#65887)" This reverts commit 4898c33. Lots of buildbots are failing, probably because lots of targets not supporting large _BitInt types.

This warning needs to be disabled. The format string is deliberately too large.

In 014c41d I tried to fix these tests, but it seems that I needed to change TEST for TEST_F to make that work. It's a pain that these failures don't repro on any of my machines, but I verified thta the initialization code for the tests is invoked.

…alls (llvm#66699) `CXXCtorInitializer` may not refer to a FieldDecl because it might also denote another constructor call. Fixes llvm#66324

To fix issue: llvm#66664

… and prevent devirtualization on types with native RTTI Discussion about this approach: https://discourse.llvm.org/t/rfc-safer-whole-program-class-hierarchy-analysis/65144/18 When enabling WPD in an environment where native binaries are present, types we want to optimize can be derived from inside these native files and devirtualizing them can lead to correctness issues. RTTI can be used as a way to determine all such types in native files and exclude them from WPD providing a safe checked way to enable WPD. The approach is: 1. In the linker, identify if RTTI is available for all native types. If not, under `--lto-validate-all-vtables-have-type-infos` `--lto-whole-program-visibility` is automatically disabled. This is done by examining all .symtab symbols in object files and .dynsym symbols in DSOs for vtable (_ZTV) and typeinfo (_ZTI) symbols and ensuring there's always a match for every vtable symbol. 2. During thinlink, if `--lto-validate-all-vtables-have-type-infos` is set and RTTI is available for all native types, identify all typename (_ZTS) symbols via their corresponding typeinfo (_ZTI) symbols that are used natively or outside of our summary and exclude them from WPD. Testing: ninja check-all large Meta service that uses boost, glog and libstdc++.so runs successfully with WPD via --lto-whole-program-visibility. Previously, native types in boost caused incorrect devirtualization that led to crashes. Reviewed By: MaskRay, tejohnson Differential Revision: https://reviews.llvm.org/D155659

…llvm#65658) LDR_PXI is a load instruction, so it should be in isLoadFromStackSlot.

Android triples include a version number, which makes direct triple comparisons for per-target runtime directory searching not always work. Instead, look for the triple with the highest compatible version number and use that per-target runtime directory instead. This maintains the existing fallback to a triple without any version number, but I'm hoping we can remove that in the future. https://discourse.llvm.org/t/62717 discusses this further. The one remaining triple mismatch after this is that Android armv7 triples usually have an environment of `androideabi`, which Clang normalizes to `android`. If you use the `androideabi` triple when building the runtimes with a per-target runtimes dir, the directory will get created with `androideabi` in its name, but Clang's triple search uses the normalized triple and will look for an `android` directory instead. https://reviews.llvm.org/D140925 will fix that by normalizing triples when creating the per-target runtimes directories as well. Reviewed By: phosek, pirama Differential Revision: https://reviews.llvm.org/D158476

I missed this before I committed.

If we have a gather load whose indices correspond to valid offsets for a gather with element type twice that our source, we can reduce the number of indices and perform the operation at the larger element type. This is generally profitable since we half VL - and these operations are linear in VL. This may require some additional VL/VTYPE toggles, but this appears to be worthwhile on the whole.

Added pass optimizes MLProgram global operations by reducing to only the minimal load/store operations for global tensors. This avoids unnecessary global operations throughout a program and potentially improves operation gusion. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D159228

This prepare the code for rework to collect all nececcecary data before symbolization. Symbolization as any untrivial computations may affect hwasan metadata.

…OpFoldResult`. (llvm#66566)

…vm#65654) Bufferization of tensor.reshape generates a memref.reshape operation. memref.reshape requires the source memref to have an identity layout. The bufferization process may result in the source memref having a non-identity layout, resulting in a verification failure. This change causes the bufferization interface for tensor.reshape to copy the source memref to a new buffer when the source has a non-identity layout.

@A

Given a function like the following: https://godbolt.org/z/T9c99fr88 ```c 1161_noReadWrite(int *Preds) { for (int i = 0; i < LEN_1D-1; ++i) { if (Preds[i] != 0) b[i] = c[i] + 1; else a[i] = i * i; } } ``` LLVM will optimize the IR to a single store by a phi instruction: ```llvm %1 = load ptr, ptr @A, align 64 %2 = load ptr, ptr @b, align 64 ... for.inc: %.sink = phi ptr [ %1, %if.then ], [ %2, %if.else ] %add.sink = phi double [ %add, %if.then ], [ %conv8, %if.else ] %arrayidx7 = getelementptr inbounds double, ptr %.sink, i64 %indvars.iv store double %add.sink, ptr %arrayidx7, align 8 ``` LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the forked pointer operand of the store. This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required, refers to D108699, but here address the phi node. Fixes llvm#64888 Reviewed By: huntergr-arm, fhahn Differential Revision: https://reviews.llvm.org/D158965

…#66682)

…llvm#66506) Treat the case where all branch weights are zero as if there was no profile. Fixes llvm#66382

…expr (llvm#66643) clang was crashing on a lambda attribute with a statement expression that contained a `return`. It attempted to access the lambda type which was unknown at that point. Fixes llvm#48527

Watchpoints in lldb can be either 'read', 'write', or 'read/write'. This is exposing the actual behavior of hardware watchpoints. gdb has a different behavior: a "write" type watchpoint only stops when the watched memory region *changes*. A user is using a watchpoint for one of three reasons: 1. Want to find what is changing/corrupting this memory. 2. Want to find what is writing to this memory. 3. Want to find what is reading from this memory. I believe (1) is the most common use case for watchpoints, and it currently can't be done in lldb -- the user needs to continue every time the same value is written to the watched-memory manually. I think gdb's behavior is the correct one. There are some use cases where a developer wants to find every function that writes/reads to/from a memory region, regardless of value, I want to still allow that functionality. This is also a bit of groundwork for my large watchpoint support proposal https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116 where I will be adding support for AArch64 MASK watchpoints which watch power-of-2 memory regions. A user might ask to watch 24 bytes, and a MASK watchpoint stub can do this with a 32-byte MASK watchpoint if it is properly aligned. And we need to ignore writes to the final 8 bytes of that watched region, and not show those hits to the user. This patch adds a new 'modify' watchpoint type and it is the default. rdar://108234227

There is an issue: llvm#64612 This issue happens because in RISCVMCCodeEmitter::getImmOpValue it only handles MCExpr kind Target and SymbolRef. When code with format like .+ comes in, it comes with MCExpr kind Binary, the fixupkind remains fixup_riscv_invalid and reports error. This patch make MCExpr kind Binary handled with the same way as MCExpr kind SymbolRef, so code with binary expression can get correct fixupkind and be used to generate more complex relocation. Differential Revision: https://reviews.llvm.org/D157694

llvm#66667) When inserting prolgue/epilogue, we use the spimm of CM.PUSH/POP to reduce the following offset for sp. Previously, we tried to use the free space of the push stack to minimize the following sp-offset. But it's useless, since the free space must be less than 16 and required stack should be aligned to 16 before/after the adjustment.

Enable usage where capturing AsmState is good (e.g., avoiding creating AsmState over and over again when walking IR and printing). This also only changes one C API to verify plumbing. But using the AsmState makes the cost more explicit than the flags interface (which hides the traversals and construction here) and also enables a more efficient usage C side.

D159460 regressed the bugfix in D156644. Fix that and emit a warning. Add a test case. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159529

After commit cedf2ea, `RISCVMergeBaseOffset` can handle `BlockAddress` currently. But we didn't handle it in `PrintAsmMemoryOperand` so we get `invalid operand in inline asm` error. This patch fixes the error.

When doing a clean build from vscode, it makes the subdirectories in the source tree rather than in the build folder. Elsehwere in LLVM, they prefix the MAKE_DIRECTORY calls, so this appears to be the correct approach.

* Moves several orphaned methods from Operation/OpView -> _OperationBase so that both hierarchies share them (whether unknown or known to ODS). * Adds typing information for missing `MLIRError` exception. * Adds `DiagnosticInfo` typing. * Adds `DenseResourceElementsAttr` typing that was missing.

Also add a --compress-debug-sections=zlib test to demonstrate issue llvm#66738

…6592) The CHECK2 test in code_placement_ext_tsp_large.ll now has the same result as the CHECK test: when chain(0,2,3,4,1) is merged with chain(8), the result is now chain(0,2,3,4,8,1). Ideally we should have test coverage for -ext-tsp-chain-split-threshold=1, but it seems challenging to craft one. Perhaps the default value of -ext-tsp-chain-split-threshold can be decreased as the -ext-tsp-enable-chain-split-along-jumps heuristic is now more powerful.

…6308)" TestStepOverWatchpoint.py and TestUnalignedWatchpoint.py are failing on the ubuntu and debian bots https://lab.llvm.org/buildbot/#/builders/68/builds/60204 https://lab.llvm.org/buildbot/#/builders/96/builds/45623 and the newly added test TestModifyWatchpoint.py does not work on windows bot https://lab.llvm.org/buildbot/#/builders/219/builds/5708 I will debug tomorrow morning and reland. This reverts commit 3692267.

There are some issues in `llvm-libgcc` before this patch: Commit c5a20b5 ([llvm-libgcc] initial commit) uses `$<TARGET_OBJECTS:unwind_static>` to get libunwind objects, which is empty. The built library is actually a shared version of libclang_rt.builtins. When configuring with `llvm/CMakeLists.txt`, target `llvm-libgcc` requires a corresponding target in `llvm-libgcc/CMakeLists.txt`. Per target installation is not handled by `llvm-libgcc`, which is not consistent with `libunwind`. This patch fixes those issues by: Reusing target `unwind_shared` in `libunwind`, linking `compiler-rt.builtins` objects into it, and applying version script. Adding target `llvm-libgcc`, creating symlinks, and utilizing cmake's dependency and component mechanism to ensure link targets will be built and installed along with symlinks. Mimicking `libunwind` to handle per target installation. It is quite hard to set necessary options without further modifying the order of runtime projects in `runtimes/CMakeLists.txt`. So though this patch reveals the possibility of co-existence of `llvm-libgcc` and `compiler-rt`/`libunwind` in `LLVM_ENABLE_RUNTIMES`, we still inhibit it to minimize influence on other projects, considering that `llvm-libgcc` is only intended for toolchain vendors, and not for casual use.

…ccessor()`. (llvm#66742) The additions to the test trigger crashes without the fixes.

This removes `CreateMalloc` from `CallInst` and adds it to the `IRBuilderBase` class. We no longer needed the `Instruction *InsertBefore` and `BasicBlock *InsertAtEnd` arguments of the `createMalloc` helper function because we're using `IRBuilder` now. That's why I we also don't need 4 `CreateMalloc` functions, but only two. Differential Revision: https://reviews.llvm.org/D158861

…m#66622) In line with llvm#66515, change `MutableArrayRange::begin`/`end` to enumerate `OpOperand &` instead of `Value`. Also remove `ForOp::getIterOpOperands`/`setIterArg`, which are now redundant. Note: `MutableOperandRange` cannot be made a derived class of `indexed_accessor_range_base` (like `OperandRange`), because `MutableOperandRange::assign` can change the number of operands in the range.

…#66345) Running: $ clang-format -i $(find -regex "\./lldb/.*\.c") $(find -regex "\./lldb/.*\.cpp") $(find -regex "\./lldb/.*\.h") Resulted in: 1602 files changed, 25090 insertions(+), 25849 deletions(-) (note: this includes tests which we wouldn't format, just using this as an example) The vast majority of which were whitespace changes. So as far as formatting we're not deviating from llvm for any reason other than not churning old code. Formatting aside, the major features of lldb (single line if, early return) are all reflected in llvm's style. We differ mainly on variable naming (proposed to change in https://llvm.org/docs/Proposals/VariableNames.html anyway) and use of asserts. Which was already documented.

/Users/jiefu/llvm-project/llvm/examples/BrainF/BrainF.cpp:92:15: error: unused variable 'BB' [-Werror,-Wunused-variable] BasicBlock* BB = builder->GetInsertBlock(); ^ 1 error generated.

…rs support Differential Revision: https://reviews.llvm.org/D156049

… dup (llvm#66508) If there are copy instructions between uaddlv with v4i16/v8i16 and dup for transfer from gpr to fpr, try to remove them with duplane. It is a follow-up patch of https://reviews.llvm.org/D159267

710276a added settings to control leading zeros but the initial test case assumed a 64 bit target.

…C) (llvm#66520) This helps to understand what the problem is when vectorization of structured ops failes due to mismatching vector sizes.

…lvm#66541) The mix-in of this op did not allow to pass in no argument. This special case is now handled correctly and covered by the tests.

Differential Revision: https://reviews.llvm.org/D158761

A lot of these use defines that I made up for this purpose, which is not obvious at first glance. Document that at the top of each file.

…vm#66086) strlen(..) call should not propagate taintedness, because it brings in many false positive findings. It is a common pattern to copy user provided input to another buffer. In these cases we always get warnings about tainted data used as the malloc parameter: buf = malloc(strlen(tainted_txt) + 1); // false warning This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string). Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block). The change has been evaluated on the following open source projects: - memcached: [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - tmux: 0 lost reports - twin [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline&newcheck=twin_v0.8.1_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - vim [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - openssl 0 lost reports - sqliste [2 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline&newcheck=sqlite_version-3.33.0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - ffmpeg 0 lost repots - postgresql [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline&newcheck=postgres_REL_13_0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - tinyxml 0 lost reports - libwebm 0 lost reports - xerces 0 lost reports In all cases the lost reports are originating from copying untrusted environment variables into another buffer. There are 2 types of lost false positive reports: 1) [Where the warning is emitted at the malloc call by the TaintPropagation Checker ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2648506&report-hash=2079221954026f17e1ecb614f5f054db&report-filepath=%2amemcached.c) ` len = strlen(portnumber_filename)+4+1; temp_portnumber_filename = malloc(len); ` 2) When pointers are set based on the length of the tainted string by the ArrayOutofBoundsv2 checker. For example [this ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2649310&report-hash=79dc8522d2cd34ca8e1b2dc2db64b2df&report-filepath=%2aos_unix.c)case.

Add declarations declared with attribute(cleanup(...)) to the CFG, similar to destructors. Differential Revision: https://reviews.llvm.org/D157385

Similarly to D158861 I'm moving the `CreateFree` method from `CallInst` to `IRBuilderBase`. Differential Revision: https://reviews.llvm.org/D159418

RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.

Note: This requires later commits for ZA to function properly, it is split for ease of review. Testing is also in a later patch. The "Matrix" part of the Scalable Matrix Extension is a new register "ZA". You can think of this as a square matrix made up of scalable rows, where each row is one scalable vector long. However it is not made of the existing scalable vector registers, it is its own register. Meaning that the size of ZA is the vector length in bytes * the vector length in bytes. https://developer.arm.com/documentation/ddi0616/latest/ It uses the streaming vector length, even when streaming mode itself is not active. For this reason, it's register data header always includes the streaming vector length. Due to it's size I've changed kMaxRegisterByteSize to the maximum possible ZA size and kTypicalRegisterByteSize will be the maximum possible scalable vector size. Therefore ZA transactions will cause heap allocations, and non ZA registers will perform exactly as before. ZA can be enabled and disabled independently of streaming mode. The way this works in ptrace is different to SVE versus streaming SVE. Writing NT_ARM_ZA without register data disables ZA, writing NT_ARM_ZA with register data enables ZA (LLDB will only support the latter, and only because it's convenient for us to do so). https://kernel.org/doc/html/v6.2/arm64/sme.html LLDB does not handle registers that can appear and dissappear at runtime. Rather than add complexity to implement that, LLDB will show a block of 0s when ZA is disabled. The alternative is not only updating the vector lengths every stop, but every register definition. It's possible but I'm not sure it's worth pursuing. Users should refer to the SVCR register (added in later patches) for the final word on whether ZA is active or not. Writing to "VG" during streaming mode will change the size of the streaming sve registers and ZA. LLDB will not attempt to preserve register values in this case, we'll just read back the undefined content the kernel shows. This is in line with, as stated, the kernel ABIs and the prospective software ABIs look like. ZA is defined as a vector of size SVL*SVL, so the display in lldb is very basic. A giant block of values. This is no worse than SVE, just larger. There is scope to improve this but that can wait until we see some use cases. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159502

This adds a register "svg" which mirrors SVE's "vg" register. This reports the streaming vector length at all times, read from the ZA ptrace header. This register is needed first to implement ZA resizing as the streaming vector length changes. Like vg, svg will be expedited to the client so it can reconfigure its register definitions. The other use is for users to be able to know the streaming vector length without resorting to counting the (many, many) bytes in ZA, or temporarily entering streaming mode (which would be destructive). Some refactoring has been done so we don't have to recalculate the register offsets twice. Testing for this will come in a later patch. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159503

The verifier was unconditionally accessing the body block terminator, but it's not guaranteed that the block has one in general.

The `cp` command will copy the permission bits from the original file to the new one, which will cause permission denied (no written access) for the following "echo" command in some system. Switch to use `cat` which is more robust.

We should not optimize it in D158062. This adds the test coverage. And unneeded attributes `nonnull` and `inbounds` are removed. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D159530

The size of ZA depends on the streaming vector length regardless of the active mode. So in addition to vg (which reports the active mode) we must send the client svg. Otherwise the mechanics are the same as for non-streaming SVE. Use the svg value to update the defined size of ZA, accounting for the fact that ZA is not a single vector but a suqare matrix. So if svg is 8, a single streaming vector would be 8*8 = 64 bytes. ZA is that squared, so 64*64 = 4096 bytes. Testing is included in a later patch. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159504

An SME enabled program has the following extra state: * Streaming mode or non-streaming mode. * ZA enabled or disabled. * The active vector length. Covering the transition between all possible states and all other possible states is not viable, therefore the testing added here is a cross section of that, all of which found real bugs in LLDB and the Linux Kernel during development. Many of those transitions will not be possible via LLDB (e.g. disabling ZA) and many more are possible but unlikely to be used in normal use. Added testing: * TestSVEThreadedDynamic now checks for correct SVG values. * New test TestZAThreadedDynamic creates 3 threads with different ZA sizes and states and switches between them verifying the register value (derived from the existing threaded SVE test). * New test TestZARegisterSaveRestore starts in a given SME state, runs a set of expressions in various orders, then checks that the original state has been restored. * TestArm64DynamicRegsets has ZA and SVG checks added, including writing to ZA to enable it. Running these tests will as usual require QEMU as there is no real SME hardware available at this time, and a very recent kernel. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D159505

Instead of unsetting flags on the instruction, attempting the fold, and the resetting the flags if it failed, add support to simplifyWithOpReplaced() to ignore poison-generating flags/metadata and collect all instructions where they may need to be dropped. This allows us to perform the fold a) with poison-generating metadata, which was previously not handled and b) poison-generating flags/metadata that are not on the root instruction. Proof for the ctpop case: https://alive2.llvm.org/ce/z/3H3HFs Fixes llvm#62450.

…vm#66655) Instead of checking whether the last operation might be a terminator, always insert operations to the end of the block. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

@andykaylor

Update handling of math errno. This change updates the logic for generation of math intrinics in place of math library function calls. The previous logic https://reviews.llvm.org/D151834 was incorrectly using intrinsics when math errno handling was needed at optimization levels above -O0. This also fixes issue mentioned in https://reviews.llvm.org/D151834 by @uabelho This is joint work with @andykaylor Andy.

Currently if Dexter encounters a parser error with a command, the resulting error message will refer to the most recently declared file (i.e. the source file it is testing) rather than the file containing the command itself. This patch fixes this so that parser errors point towards the correct location.

…nfoOp and DataBoundsOp operations to the OpenMP Dialect This patch adds two new operations: The first is the DataBoundsOp, which is based on OpenACC's DataBoundsOp, which holds stride, index, extent, lower bound and upper bounds which will be used in future follow up patches to perform initial array sectioning of mapped arrays, and Fortran pointer and allocatable mapping. Similarly to OpenACC, this new OpenMP DataBoundsOp also comes with a new OpenMP type, which helps to restrict operations to accepting only DataBoundsOp as an input or output where necessary (or other related operations that implement this type as a return). The patch also adds the MapInfoOp which rolls up some of the old map information stored in target operations into this new operation, and adds new information that will be utilised in the lowering of mapped variables, e.g. the aforementioned DataBoundsOp, but also a new ByCapture OpenMP MLIR attribute, and isImplicit boolean attribute. Both the ByCapture and isImplicit arguments will affect the lowering from the OpenMP dialect to LLVM-IR in minor but important ways, such as shifting the final maptype or generating different load/store combinations to maintain semantics with the OpenMP standard and alignment with the current Clang OpenMP output as best as possible. This MapInfoOp operation is slightly based on OpenACC's DataEntryOp, the main difference other than some slightly different fields (e,g, isImplicit/MapType/ByCapture) is that OpenACC's data operations "inherit" (the MLIR ODS equivalent) from this operation, whereas in OpenMP operations that utilise MapInfoOp's are composed of/contain them. A series of these MapInfoOp (one per map clause list item) is now held by target operations that represent OpenMP directives that utilise map clauses, e.g. TargetOp. MapInfoOp's do not have their own specialised lowering to LLVM-IR, instead the lowering is dependent on the particular container of the MapInfoOp's, e.g. TargetOp has its own lowering to LLVM-IR which utilised the information stored inside of MapInfoOp's to affect it's lowering and the end result of the LLVM-IR generated, which in turn can differ for host and device. This patch contains these operations, minor changes to the printing and parsing to support them, changes to tests (only those relevant to this segment of the patch, other test additions and changes are in other dependent patches in this series) and some alterations to the OpenMPToLLVM rewriter to support the new OpenMP type and operations. This patch is one in a series that are dependent on each other: https://reviews.llvm.org/D158734 https://reviews.llvm.org/D158735 https://reviews.llvm.org/D158737 Reviewers: kiranchandramohan, TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158732

…ations and tie them to relevant Target operations This patch builds on top of a prior patch in review which adds a new map and bounds operation by modifying the OpenMP PFT lowering to support these operations and generate them from the PFT. A significant amount of the support for the Bounds operation is borrowed from OpenACC's own current implementation and lowering, just ported over to OpenMP. The patch also adds very preliminary/initial support for lowering to a new Capture attribute, which is stored on the new Map Operation, which helps the later lowering from OpenMP -> LLVM IR by indicating how a map argument should be handled. This capture type will influence how a map argument is accessed on device and passed by the host (different load/store handling etc.). It is reflective of a similar piece of information stored in the Clang AST which performs a similar role. As well as some minor adjustments to how the map type (map bitshift which dictates to the runtime how it should handle an argument) is generated to further support more use-cases for future patches that build on this work. Finally it adds the map entry operation creation and tying it to the relevant target operations as well as the addition of some new tests and alteration of previous tests to support the new changes. Depends on D158732 reviewers: kiranchandramohan, TIFitis, clementval, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158734

…Entry and declare target globals This patch is a required change for the device side IR to maintain apporpiate links for declare target variables to their global variables for later lowering. It is also a requirement to clone over map bounds and entry operations to maintain the correct information for later lowering of the IR. It simply tries to clone over the relevant information maintaining the appropriate links they would have maintained prior to the pass, rather than redirecting them to new function arguments which causes a loss of information in the case of Declare Target and map information. Depends on D158734 reviewers: TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158735

…to Bounds and MapEntry operations This patch adjusts the lower to LLVM-IR inside of OpenMPToLLVMIRTranslation to faciliate the changes made to Target related operations to add the new Map related operations. It also includes adjustments to tests to support these changes, primarily modifying the MLIR as opposed to the LLVM-IR, the LLVM-IR should be identical after this patch. Depends on D158735 Reviewers: kiranchandramohan, TIFitis, razvanlupusoru Differential Revision: https://reviews.llvm.org/D158737

Currently the naming scheme is a bit funky; the specializations are named after the original function followed by an arbitrary decimal number. This makes it hard to debug inlined specializations of recursive functions. With this patch I am adding ".specialized." in between of the original name and the suffix, which is now a single increment counter.

Lookup extended instruction numbers in the given instruction set so that correct names are now emitted for GLSL.std.450 instructions as well as OpenCL.std. Add a single test to verify correct abs intrinsic names are emitted when targetting logical SPIR-V. Depends on D156424 Differential Revision: https://reviews.llvm.org/D159227

This fixes a bug where functions generated by the MLIR Math dialect, for example ipowi, would fail to link with link.exe on Windows due to having linkonce linkage but no associated comdat. Adding the comdat on ELF also allows linkers to perform better garbage collection in the binary. Simply adding comdats to all functions with this linkage type should also cover future cases where linkonce or linkonce_odr functions might be necessary.

Consider cleanup functions in thread safety analysis. Differential Revision: https://reviews.llvm.org/D152504

Add Int16, Int64 and Float64 capabilities as always available for Vulkan (since 1.0), and add tests covering most of the basic types from clang/test/CodeGenHLSL/basic_types.hlsl except for half floats. Depends on D156049

… a different module (llvm#66549) `unw_getcontext` saves the caller's registers in the context. However, if the caller of `unw_getcontext` is in a different module, the glue code of `unw_getcontext` sets the TOC register (r2) with the new TOC base and saves the original TOC register value in the stack frame. This causes the incorrect TOC value is used when the caller steps up frames, which fails libunwind LIT test case `unw_resume.pass.cpp`. This PR fixes the problem by using the original TOC register value saved in the stack if the caller is in a different module and enables `unw_resume.pass.cpp` on AIX.

Subsequent PRs will add the scheduling model and support for macro fusions.

Add a DAG combine to form a masked.store from a masked_strided_store intrinsic with stride equal to element size. This is the store analogy to PR llvm#65674. As seen in the tests, this does pickup a few cases that we'd previously missed due to selection ordering. We match strided stores early without going through the recently added generic mscatter combines, and thus weren't recognizing the unit strided store.

Unlike the load case, stores past the end of the alloca are removed by SROA as undefined behavior. As such, there is no need to handle this case when rewriting stores.

) As reported in llvm#66612, we aren't correctly treating the placeholder expression type correctly, so we ended up trying to get a reference version of it, and this resulted in an assertion, since the placeholder type cannot have a reference added. Fixes: llvm#66612

This function was inconsistent with the remaining API because it accepted `OpOperand &` that do not belong to the op. All the other functions assert. This helper function is also not really necessary, as the iter_arg number is identical to the result number.

…egions (llvm#66754) This commit implements `LoopLikeOpInterface` on `scf.while`. This enables LICM (and potentially other transforms) on `scf.while`. `LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and can now return multiple regions. Also fix a bug in the default implementation of `LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false" for some values that are defined outside of the loop (in a nested op, in such a way that the value does not dominate the loop). This interface is currently only used for LICM and there is no way to trigger this bug, so no test is added.

…vm#66766) This is the VP equivalent of llvm#65674. We already combine MGATHER loads with unit stride to MLOAD, so this extends it for EXPERIMENTAL_VP_STRIDED_LOAD.

…lvm#66774) This is the VP equivalent of llvm#66677. If we have a strided store where the stride is equal to the element width, we can just use a regular VP store.

…#65976) Calling isPlainlyKilled instead of directly checking for a kill flag should make processTiedPairs behave the same with LiveIntervals (i.e. when compiling with -early-live-intervals) as it does with LiveVariables.

The predicates inside the AMOPat class were being overridden by the Predicates = [HasStdExtA] at the instantiation.

Extract non-MPFR math tests into libc-math-smoke-tests. Reviewed By: sivachandra, jhuber6 Differential Revision: https://reviews.llvm.org/D159477

This patch and D156954 were discussed in <https://discourse.llvm.org/t/rfc-improving-lits-debug-output/72839>. **Motivation**: -a shows output from all tests, and -v shows output from just failed tests. Without this patch, that output from each test includes a section called "Script:", which includes all shell commands that lit has computed from RUN directives and will attempt to run for that test. The effect of -vv (which also implies -v if neither -a or -v is specified) is to extend that output with shell commands as they are executing so you can easily see which one failed. For example, when using lit's internal shell and -vv: ``` Script: -- : 'RUN: at line 1'; echo hello world : 'RUN: at line 2'; 3c40 hello world : 'RUN: at line 3'; echo hello world -- Exit Code: 127 Command Output (stdout): -- $ ":" "RUN: at line 1" $ "echo" "hello" "world" hello world $ ":" "RUN: at line 2" $ "3c40" "hello" "world" '3c40': command not found error: command failed with exit status: 127 -- ``` Notice that all shell commands that actually execute appear in the output twice, once for "Script:" and once for -vv. Especially for tests with many RUN directives, the result is noisy. When searching through the output for a particular shell command, it is easy to get lost and mistake shell commands under "Script:" for shell commands that actually executed. **Change**: With this patch, a test's output changes in two ways. First, the "Script:" section is never shown. Second, omitting -vv no longer disables printing of shell commands as they execute. That is, -a and -v imply -vv, and so -vv is deprecated as it is just an alias for -v. **Secondary motivation**: We are also working to introduce a PYTHON directive, which can appear between RUN directives. How should PYTHON directives be represented in the "Script:" section, which has previously been just a shell script? We could probably think of something, but adding info about PYTHON directive execution in the -vv trace seems more straight-forward and more useful. (This patch also removes a confusing point in the -vv documentation: at least when using bash as an external shell, -vv echoes commands to the shell's stderr not stdout.) Reviewed By: awarzynski, Endill, ldionne, MaskRay Differential Revision: https://reviews.llvm.org/D154984

This patch and D154984 were discussed in <https://discourse.llvm.org/t/rfc-improving-lits-debug-output/72839>. Motivation ---------- D154984 removes the "Script:" section that lit prints along with a test's output, and it makes -v and -a imply -vv. For example, after D154984, the "Script:" section below is never shown, but -v is enough to produce the execution trace following it: ``` Script: -- : 'RUN: at line 1'; echo hello | FileCheck bogus.txt && echo success -- Exit Code: 2 Command Output (stdout): -- $ ":" "RUN: at line 1" $ "echo" "hello" # command output: hello $ "FileCheck" "bogus.txt" # command stderr: Could not open check file 'bogus.txt': No such file or directory error: command failed with exit status: 2 -- ``` In the D154984 review, some reviewers point out that they have been using the "Script:" section for copying and pasting a test's shell commands to a terminal window. The shell commands as printed in the execution trace can be harder to copy and paste for the following reasons: - They drop redirections and break apart RUN lines at `&&`, `|`, etc. - They add `$` at the start of every command, which makes it hard to copy and paste multiple commands in bulk. - Command stdout, stderr, etc. are interleaved with the commands and are not clearly delineated. - They don't always use proper shell quoting. Instead, they blindly enclose all command-line arguments in double quotes. Changes ------- D154984 plus this patch converts the above example into: ``` Exit Code: 2 Command Output (stdout): -- # RUN: at line 1 echo hello | FileCheck bogus-file.txt && echo success # executed command: echo hello # .---command stdout------------ # | hello # `----------------------------- # executed command: FileCheck bogus-file.txt # .---command stderr------------ # | Could not open check file 'bogus-file.txt': No such file or directory # `----------------------------- # error: command failed with exit status: 2 -- ``` Thus, this patch addresses the above issues as follows: - The entire execution trace can be copied and pasted in bulk to a terminal for correct execution of the RUN lines, which are printed intact as they appeared in the original RUN lines except lit substitutions are expanded. Everything else in the execution trace appears in shell comments so it has no effect in a terminal. - Each of the RUN line's commands is repeated (in shell comments) as it executes to show (1) that the command actually executed (e.g., `echo success` above didn't) and (2) what stdout, stderr, non-zero exit status, and output files are associated with the command, if any. Shell quoting in the command is now correct and minimal but is not necessarily the original shell quoting from the RUN line. - The start and end of the contents of stdout, stderr, or an output file is now delineated clearly in the trace. To help produce some of the above output, this patch extends lit's internal shell with a built-in `@echo` command. It's like `echo` except lit suppresses the normal execution trace for `@echo` and just prints its stdout directly. For now, `@echo` isn't documented for use in lit tests. Without this patch, libcxx's custom lit test format tries to parse the stdout from `lit.TestRunner.executeScriptInternal` (which runs lit's internal shell) to extract the stdout and stderr produced by shell commands, and that parse no longer works after the above changes. This patch makes a small adjustment to `lit.TestRunner.executeScriptInternal` so libcxx can just request stdout and stderr without an execution trace. (As a minor drive-by fix that came up in testing: lit's internal `not` command now always produces a numeric exit status and never `True`.) Caveat ------ This patch only makes the above changes for lit's internal shell. In most cases, we do not know how to force external shells (e.g., bash, sh, window's `cmd`) to produce execution traces in the manner we want. To configure a test suite to use lit's internal shell (which is usually better for test portability than external shells anyway), add this to the test suite's `lit.cfg` or other configuration file: ``` config.test_format = lit.formats.ShTest(execute_external=False) ``` Reviewed By: MaskRay, awarzynski Differential Revision: https://reviews.llvm.org/D156954

Before <https://reviews.llvm.org/D154984> and <https://reviews.llvm.org/D156954>, lit reported full RUN lines in a `Script:` section. Now, in the case of lit's internal shell, it's the execution trace that includes them. However, if lit is configured to use an external shell (e.g., bash, windows `cmd`), they aren't reported at all. A fix was requested at the following: * <https://reviews.llvm.org/D154984#4627605> * <https://discourse.llvm.org/t/rfc-improving-lits-debug-output/72839/35?u=jdenny-ornl> This patch does not address the case when the external shell is windows `cmd`. As discussed at <llvm#65242>, it's not clear whether that's a use case that people still care about, and it seems to be generally broken anyway.

…hose return values are unused When AMOs are used to implement parallel reduction operations, typically the return value would be discarded. This patch adds a peephole pass `RISCVDeadRegisterDefinitions`. It rewrites `rd` to `x0` when `rd` is marked as dead. It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO. Comparison with GCC: https://godbolt.org/z/bKaxnEcec Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158759

…m#66455) These do not produce extension-specific ops and are handled via common patterns for both the KHR and the NV coop matrix extension. Also improve match failure reporting and error handling in type conversion.

This also allows tensor.empty in the "conversion" path of the sparse compiler, further paving the way to deprecate the bufferization.allocated_tensor() op.

…on. (llvm#66789) This adds the shifts and the immediate forms of the instructions that were already supported. There are still more instructions that can be predicated, but this is the rest of what we had in our downstream.

This is a pre-commit test of accessing the variable __stack_chk_guard when the static relocation model is imposed on a module compiled with pic enabled. It confirms issue [llvm#64999](llvm#64999). The intent is to update this test with the fix for the aforementioned issue.

While I'm here, cleanup a few implemented todos.

…6721) This is a follow-up to 14d95b2. I would have changed it in that commit, but I don't build the intel-pt plugin so I didn't see this until later.

…nImpl::GetSyntheticTypeName (llvm#66724) Instead of copying memory out of the PythonString (via a std::string) and then using that to create a ConstString, it would make more sense to just create the ConstString from the original StringRef in the first place.

…ug info correlation not working on mac for unkown reasons.

I renamed something but forgot to update the uses of it. Minor thinko.

…ssignments. (llvm#66736) This patch places the finalization code for the RHS of a user-defined assignment after the assignment code. The change only affects standalone RegionAssignOp operations.

This fixes a bug in my 928564c that didn't get noticed in review. I found it when looking at the strided load case (upcoming patch), and realized the previous commit was buggy too. p.s. Sorry for the slightly confusing test diff. I'd apparently used the wrong mask for the aligned positive test; it was actually unaligned. Didn't seem worthy of a separate precommit.

Add support for fir.box_addr, fir.array_corr, fir.coordinate, fir.embox, fir.rebox and fir.load. 1) Through the use of boolean `followBoxAddr` determine whether the analysis should apply to the address of the box or the address wrapped by the box. 2) Some asserts have been removed to allow for more SourceKinds though the flow, in a particular SourceKind::Direct 3) getSource was a public method but the returned type (SourceKind) was not public making it impossible to be called publicly 4) About 12 tests have been added to check for real Fortran scenarios 5) More tests will be added with HLFIR 6) A few TODOs have been identified and will need to be addressed in follow-up patches. I felt that more changes would increase the complexity of the patch.

… for medium code model Since those data are assumed to be within the relocation offset limit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D150297

…eviously (llvm#65822) The problem is that the when the "attach" command is initiated, the ExecutionContext for the command has a process - it's the exited one from the previour run. But the `attach wait` creates a new process for the attach, and then errors out instead of interrupting when it finds that its process and the one in the command's ExecutionContext don't match. This change checks that if we're returning a target from GetExecutionContext, we fill the context with it's current process, not some historical one.

…efined assignments. (llvm#66736)" This reverts commit a9a1f84.

…r user-defined assignments. (llvm#66736)"" This reverts commit 775754e. Relanding with removing part of the LIT test. There seems to be operations ordering indeterminism that is unrelated to my change. I will address this issue separately.

Add pointer write functionality to MemoryAccess that is needed for implementing redirection manager. It also refactors the code a bit by introducing InProcessMemoryAccess class. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D157378

… locations.

vectorized node uses. If the instruction is vectorized in many different vector nodes, it may break the dependency analysis for gathered nodes with matched scalars. Need to properly check the dependency between such gather nodes to avoid cycle dependency.

Avoid false positives by requiring space after `/branch` command so the action won't trigger on diffs that include filenames like `.../BranchProbabilityInfo.cpp`.

…vm#66718) Construct entities that are associations from selectors in ASSOCIATE, CHANGE TEAMS, and SELECT TYPE constructs do not have the ALLOCATABLE or POINTER attributes, even when associating with allocatables or pointers; associations from selectors in SELECT RANK constructs do have those attributes.

I'm using clang-10 to build bolt which doesn't have moutline-atomics option and though it doesn't do it. So test compiler for supporting it before appending to the list of cxxflags. Differential Revision: https://reviews.llvm.org/D159521

…vm#66696) This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. In this commit, we add conversion patterns for the newly introduced operations `arith.minnumf` and `arith.maxnumf`. When converting to `spirv.CL`, there is no need to insert additional guards to propagate non-NaN values when one of the arguments is NaN because `CL` ops do exactly the same. However, `GL` ops have undefined behavior when one of the arguments is NaN, so we should insert additional guards to enforce the semantics of Arith's ops. This patch addresses the 1.5 task of the mentioned RFC.

__call_once is large and cluttered with #ifdef preprocessor guards. This cleans it up a bit by using an exception guard instead of try-catch. Differential Revision: https://reviews.llvm.org/D112319 Co-authored-by: Louis Dionne <ldionne.2@gmail.com>

We want to activate `llvm-header-guard` (llvm#66477) but the current CMake configuration includes paths that should be `isystem`. This PR restricts the number of `-I` passed to the clang command line and correctly marks the llvm libc include path as `isystem`.

Removed lots of outdated statements that were misleading.

This change matches a masked.stride.load from a mgather node whose index operand is a strided sequence. We can reuse the VID matching from build_vector lowering for this purpose. Note that this duplicates the matching done at IR by RISCVGatherScatterLowering.cpp. Now that we can widen gathers to a wider SEW, I don't see a good way to remove this duplication. The only obvious alternative is to move thw widening transform to IR, but that's a no-go as I want other DAGs to run first. I think we should just live with the duplication - particularly since the reuse is isSimpleVIDSequence means the duplication is somewhat minimal.

This parallels the binutils/BSD flag of the same name. Debugging information is loaded to print line number information for symbols. Defined symbols are symbolized by their section addresses, and undefined symbols by their first text reloc with line info. Differential Revision: https://reviews.llvm.org/D150987

This reverts commit a35a3b7. This broke libc benchmarks.

Differential Review: https://reviews.llvm.org/D158553

Yup, a bit of an oversight ;-)

We were losing the function entry count, which is useful to check profile quality. For the original cases where we want entrypoint-relative MBB frequencies, the user would just need to divide these values by the entrypoint (first MBB, with ID=0) value.

…xt. (llvm#66021) Per CWG2760, default members initializers should be consider part the body of constructors, which mean they are evaluated in an immediate escalating context. However, this does not apply to static members. This patch produces some extraneous diagnostics, unfortunately we do not have a good way to report an error back to the initializer and this is a pre existing issue Fixes llvm#65985 Fixes llvm#66562

This is in preparation for adding a KHR variant which does not share the same parameters and needs a separate attribute.

…L-3548-bump-llvm-to-d13da154a7c7eff77df8686b2de1cfdfa7cc7029

The new python formatting on changed files triggers for all files in the merge from upstream. If we fix those errors, we would get a huge diff to upstream. Therefore temporarily disable the formatter and re-enable it after the bump.

…da154a7c7eff77df8686b2de1cfdfa7cc7029

* Use new operator printing syntax * Change i64 -> i32 for axis of reduction ops

…da154a7c7eff77df8686b2de1cfdfa7cc7029

…L-3548-bump-llvm-to-d13da154a7c7eff77df8686b2de1cfdfa7cc7029

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FXML-3548] bump llvm to d13da154a7c7eff77df8686b2de1cfdfa7cc7029 #84

[FXML-3548] bump llvm to d13da154a7c7eff77df8686b2de1cfdfa7cc7029 #84

Commits on Sep 18, 2023

Commits on Sep 19, 2023

Commits on Nov 16, 2023

Commits on Jan 10, 2024

Commits on Jan 11, 2024

Commits on Jan 26, 2024

Commits on Feb 1, 2024