[AutoBump] Merge with 51365212 (Aug 25) (10) #363

…05549) Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies.

…lvm#83807) This fixes an odd problem with the regex when `CMAKE_INSTALL_LIBDIR` is not defined: `string sub-command REGEX, mode REPLACE: regex "$" matched an empty string.` Fixes llvm#83802

…eclare} (llvm#105570) Constify debug DbgVariableRecord::{isDbgValue,isDbgDeclare}.

This reverts commit 6528157. I'm reverting llvm#104523 (llvm@f01f80c) and this fixup belongs to the same series of changes.

This reverts commit 6f45602, which depends on llvm#104523, which I'm reverting.

llvm#104523)" This reverts commit f01f80c. This commit introduces an msan violation. See the discussion on llvm#104523.

Discard the subexpr.

…#105544) - Refactor SetTheory code to use const pointers when possible. - Use auto for variables initialized using dyn_cast<>. - Use range based for loops and early continue.

There was a duplicate link target.

This region is intended to separate alloca operations from reduction variable initialization. This makes it easier to hoist allocas to the entry block before control flow and complex code for initialization. The verifier checks that there is at most one block in the alloc region. This is not sufficient to avoid control flow in general MLIR, but by the time we are converting to LLVMIR structured control flow should already have been lowered to the cf dialect. 1/3 Part 2: llvm#102524 Part 3: llvm#102525

The intention of this change is to ensure that allocas end up in the entry block not spread out amongst complex reduction variable initialization code. The tests we have are quite minimized for readability and maintainability, making the benefits less obvious. The use case for this is when there are multiple reduction variables each will multiple blocks inside of the init region for that reduction. 2/3 Part 1: llvm#102522 Part 3: llvm#102525

I removed the `*-hlfir*` tests because they are duplicate now that the other tests have been updated to use the HLFIR lowering. 3/3 Part 1: llvm#102522 Part 2: llvm#102524

Proof: https://alive2.llvm.org/ce/z/v6VtXz

…finitions and partial specializations (llvm#104030) We need to rebuild the template parameters of out-of-line definitions/specializations of member templates in the context of the current instantiation for the purposes of declaration matching. We already do this for function templates and class templates, but not variable templates, partial specializations of variable template, and partial specializations of class templates. This patch fixes the latter cases.

) Convert them to Pointers, do the offset calculation and then convert them back to function pointers.

Tests for cases that would have been regressed by llvm#104941.

@skatrak

…105644) This can be handled in ODS instead of writing custom parsing/printing code. Thanks for the idea @skatrak

…vm#102752) Currently `mlir.llvm.constant` of structure types restricts that the structure type effectively represents a complex type -- it must have exactly two fields of the same type and the field type must be either an integer type or a float type. This PR relaxes this restriction and it allows the structure type to have an arbitrary number of fields.

…ble objects (llvm#104778) Whilst dealing with review comments on llvm#96752 I discovered that SCEV does not know about the dereferenceable attribute on function arguments so I have updated getRangeRef to make use of it by calling getPointerDereferenceableBytes.

These builtins are currently returning CR0 which will have the format [0, 0, flag_true_if_saved, XER]. We only want to return flag_true_if_saved. This patch adds a shift to remove the XER bit before returning.

This aligns the transform with what foldLogOpOfMaskedICmp() does.

- Landing page: add link to the libc++ Discord channel - Landing page: reorder "Getting Involved" above "Design documents" - Landing page: remove "Notes and Known Issues" which was completely outdated - Rename "Using Libc++" to "User Documentation" and update contents - Rename "Building Libc++" to "Vendor Documentation" and update contents The "BuildingLibcxx" and "UsingLibcxx" pages have basically been used for vendor and user documentation respectively. However, they were named in a way that doesn't really make that clear. Renaming the pages now gives us a location to clearly document what we target at vendors and what we target at users, and to do that separately.

…ns (llvm#105455) This allows the use a single wider operation with a restricted EVL instead of padding the vector with the neutral element. For RISCV specifically, it's worth noting that an alternate padded lowering is available when VL is one less than a power of two, and LMUL <= m1. We could slide the vector operand up by one, and insert the padding via a vslide1up. We don't currently pattern match this, but we could. This form would arguably be better iff the surrounding code wanted VL=4. This patch will force a VL toggle in that case instead. Basically, it comes down to a question of whether we think odd sized vectors are going to appear clustered with odd size vector operations, or mixed in with larger power of two operations. Note there is a potential downside of using vp nodes; we loose any generic DAG combines which might have applied to the widened form.

…lvm#104689) This is a fairly narrow transform (at the moment) to reduce the VLs of instructions feeding a store with a smaller VL. Note that the goal of this transform isn't really to reduce VL - it's to reduce VL *toggles*. To our knowledge, small reductions in VL without also changing LMUL are generally not profitable on existing hardware. For a single use instruction without side effects, fp exceptions, or a result dependency on VL, reducing VL is legal if only a subset of elements are legal. We'd already implemented this logic for vmv.v.v, and this patch simply applies it to stores as an alternate root. Longer term, I plan to extend this to other root instructions (i.e. different kind of stores, reduces, etc..), and add a more general recursive walkback through operands. One risk with the dataflow based approach is that we could be reducing VL of an instruction scheduled in a region with the wider VL (i.e. mixed mode computations) forcing an additional VL toggle. An example of this is the @insert_subvector_dag_loop test case, but it doesn't appear to happen widely. I think this is a risk we should accept.

This patch extends llvm#73964 and optimises SVE cmp intrinsics to zero vector when predicate is zero.

This patch removes obsolete status pages for projects that were completed: LLVM 18 release, C++20 Ranges and Spaceship support. Co-authored-by: Hristo Hristov <zingam@outlook.com>

If we switch over ucmp/scmp and have two switch cases going to the same destination, we can convert into icmp+br. Fixes llvm#105632.

…cks. If the external user of the scalar to be extract is in unreachable/landing pad block, we can skip counting their cost. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#105667

Static analyser tool cppcheck flags ordered comparison with `bool`s. Replace with equivalent logical operators to prevent this. Closes llvm#102912

This allows combining some test files that were only split because adding new RUN lines introduced too much churn in the checks.

This allows us to handle return values that are too large to fit in x10 and x11. They will be converted to a sret by passing a pointer to where to store the return value.

…d_Resume calls (llvm#105513) Similar to the fix for llvm#57469, ensure that the other `_Unwind_Resume` call emitted by DwarfEHPrepare has a debug location if needed. This fixes nbdd0121/unwinding#34.

SLP vectorizer has an estimation for gather/buildvector nodes, which contain some scalar loads. SLP vectorizer performs pretty similar (but large in SLOCs) estimation, which not always correct. Instead, this patch implements clustering analysis and actual node allocation with the full analysis for the vectorized clustered scalars (not only loads, but also some other instructions) with the correct cost estimation and vector insert instructions. Improves overall vectorization quality and simplifies analysis/estimations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#104144

…05676)

…ee (llvm#105576) In f9f3316, Adrian fixed an issue where LLDB wouldn't update the target's architecture when the process reported a different triple that only differed in its sub-architecture. This unintentionally regressed core file debugging when the core file reports the base architecture (e.g. armv7) while the main binary knows the correct CPU subtype (e.g. armv7em). After the aforementioned change, we update the target architecture from armv7em to armv7. Fix the issue by trusting the target architecture over the ProcessMachCore process. rdar://133834304

…105677) An assembly input with > .fpu fp-armv8-fullfp16-d16 crashes the compiler because the ELF FPU attribute emitter misses the respective entry. This patch fixes this. Interestingly, compiling with -mfpu=fp-armv8-fullfp16-d16 does not cause the crash because FPv5_D16 is an alias in the compiler and > .fpu fpv5-d16 is emitted instead, which does not crash. The existing .fpu directive test with multiple FPUs serves the purpose of verifying that each possible FPU option is defined, but does not trigger the crash because only the last .fpu directive goes effectively down the code path. Therefore one test for each FPU is required. Fixes llvm#105674.

This mirrors what GISel already does, extending the existing lowering of aarch64_neon_saddlv/aarch64_neon_uaddlv to SADDLV/UADDLV. This allows us to remove some tablegen patterns, and provides a little nicer codegen in places as the nodes represent the result being in a vector register correctly.

…" (llvm#105601) This reverts commit 2704b80 and relands llvm#104404. The Darwin should not fail after llvm#105599.

This patch fixes warnings of the form: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:9300:23: error: loop variable '[E, Idx]' creates a copy from type 'const value_type' (aka 'const std::pair<const llvm::slpvectorizer::BoUpSLP::TreeEntry *, unsigned int>') [-Werror,-Wrange-loop-construct]

This patch fixes: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:6102:9: error: unused variable 'OpVT' [-Werror,-Wunused-variable]

The start state of a new section is `EMS_None`, often leading to a $d/$x at offset 0. Introduce a MCTargetOption/cl::opt "implicit-mapsyms" to allow an alternative behavior (ARM-software/abi-aa#274): * Set the start state to `EMS_Data` or `EMS_A64`. * For text sections, add an ending $x only if the final data is not instructions. * For non-text sections, add an ending $d only if the final data is not data commands. ``` .section .text.1,"ax" nop // emit $d .long 42 // emit $x .section .text.2,"ax" nop ``` This new behavior decreases the .symtab size significantly: ``` % bloaty a64-2/bin/clang -- a64-0/bin/clang FILE SIZE VM SIZE -------------- -------------- -5.4% -1.13Mi [ = ] 0 .strtab -50.9% -4.09Mi [ = ] 0 .symtab -4.0% -5.22Mi [ = ] 0 TOTAL ``` --- This scheme works as long as the user can rule out some error scenarios: * .text.1 assembled using the traditional behavior is combined with .text.2 using the new behavior * A linker script combining non-text sections and text sections. The lack of mapping symbols in the non-text sections could make them treated as code, unless the linker inserts extra mapping symbols. The above mix-and-match scenarios aren't an issue at all for a significant portion of users. A text section may start with data commands in rare cases (e.g. -fsanitize=function) that many users don't care about. When combing `(.text.0; .word 0)` and `(.text.1; .word 0)`, the ending $x of .text.0 and the initial $d of .text.1 may have the same address. If both sections reside in the same file, ensure the ending symbol comes before the initial $d of .text.1, so that a dumb linker respecting the symbol order will place the ending $x before the initial $d. Disassemblers using stable sort will see both symbols at the same address, and the second will win. When section ordering mechanisms (e.g. --symbol-ordering-file, --call-graph-profile-sort, `.text : { second.o(.text) first.o(.text) }`) are involved, the initial data in a text section following a text section with trailing data could be misidentified as code, but the issue is local and the risk could be acceptable. Pull Request: llvm#99718

…lvm#105517) Disable fixed-point iteration in all AMDGPU Combiners after llvm#102163. This saves around 2% compile time in ad hoc testing on some large graphics shaders. I did not notice any regressions in the generated code, just a bunch of harmless differences in instruction selection and register allocation.

-Wa,-mmapsyms=implicit enables the alternative mapping symbol scheme discussed at llvm#99718. While not conforming to the current aaelf64 ABI, the option is invaluable for those with full control over their toolchain, no reliance on weird relocatable files, and a strong focus on minimizing both relocatable and executable sizes. The option is discouraged when portability of the relocatable objects is a concern. https://maskray.me/blog/2024-07-21-mapping-symbols-rethinking-for-efficiency elaborates the risk. Pull Request: llvm#104542

This paper proposes no normative changes, just updates an example in the standard. It was incorrect for us to have marked it as No in the first place.

This better aligns with how the feature is being referred to and what runtimes (V8) are calling it.

…04707) The previous code made this a compile-time decision but it's not.

…est. (llvm#105586) Also allow XFAIL for armv7-*-linux-gnueabihf targets, not only for armv7l-*.

…vm#105460) TargetProperties.td had a few settings listed as signed integral values, but the Target.cpp methods reading those values were reading them as unsigned. e.g. target.max-memory-read-size, some accesses of target.max-children-count, still today, previously target.max-string-summary-length. After Jonas' change to use templates to read these values in https://reviews.llvm.org/D149774, when the code tried to fetch these values, we'd eventually end up calling OptionValue::GetAsUInt64 which checks that the value is actually a UInt64 before returning it; finding that it was an SInt64, it would drop the user setting and return the default value. This manifested as a bug that target.max-memory-read-size is never used for memory read. target.max-children-count is less straightforward, where one read of that setting was fetching it as an int64_t, the other as a uint64_t. I suspect all of these settings were originally marked as SInt64 so a user could do -1 for "infinite", getting it static_cast to a UINT64_MAX value along the way. I can't find any documentation for this behavior, but it seems like something Greg would have done. We've partially lost that behavior already via llvm#72233 for target.max-string-summary-length, and this further removes it. We're still fetching UInt64's and returning them as uint32_t's but I'm not overly pressed about someone setting a count/size limit over 4GB. I added a simple API test for the memory read setting limit.

…aths call a `cold` function" Fixed up the uar test that was failing. It seems with the new `cold` attribute the order of the functions is different. As far as I can tell this is not a concern. Closes llvm#105559

For variable X of type std::optional, X && X.value_or(Y) == Z is equivalent to X == Z when Y != Z.

Missed IceLakeServer when I updated the other CPUs in 6ec4c9c

This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should Fixes llvm#105675

Move VPWiden[Load|Store]EVLRecipe::executeto VPlanRecipes.cpp in line with other ::execute implementations that don't depend on anything defined in LoopVectorization.cpp

…04635) PR llvm#102851 marks reference types in union as error on msvc by changing the clang, which makes 'transform_error.mandates.verify.cpp' no longer failing on msvc from ToT. However, all libcxx buildbots do not build clang from source, therefore, this test will still fail on these bots, which is incorrect. This patch changed the expected error message of this test so it can pass with both release branch clang and ToT clang.

Summary: This patch adds all the libc ctype variants. These ignore the locale ingormation completely, so they're pretty much just stubs. Because these use locale information, which is system scope, we do not enable building them outisde of full build mode.

This reverts commit 8f005f8.

Summary: This patch adds the macros and entrypoints associated with the `locale.h` entrypoints. These are mostly stubs, as we (for now and the forseeable future) only expect to support the C and maybe C.UTF-8 locales in the LLVM libc.

This adds the SPIRV fdot, sdot, and udot intrinsics and allows them to be created at codegen depending on the target architecture. This required moving some of the DXIL-specific choices to DXIL instruction expansion out of codegen and providing it with at a more generic fdot intrinsic as well. Removed some stale comments that gave the obsolete impression that type conversions should be expected to match overloads. The SPIRV intrinsic handling involves generating multiply and add operations for integers and the existing OpDot operation for floating point. New tests for generating SPIRV float and integer dot intrinsics are added as well as expanding HLSL tests to include SPIRV generation Used new dot product intrinsic generation to implement normalize() in SPIRV Incidentally changed existing dot intrinsic definitions to use DefaultAttrsIntrinsic to match the newly added inrinsics Fixes llvm#88056

We have got several customer reporting of slow stepping over the past year in VSCode. Profiling shows the slow stepping is caused by `stackTrace` request which can take around 1 second for certain targets. Since VSCode sends `stackTrace` during each stop event, the slow `stackTrace` request would slow down stepping in VSCode. Below is the hot path: ``` |--68.75%--lldb_dap::DAP::HandleObject(llvm::json::Object const&) | | | |--57.70%--(anonymous namespace)::request_stackTrace(llvm::json::Object const&) | | | | | |--54.43%--lldb::SBThread::GetCurrentExceptionBacktrace() | | | lldb_private::Thread::GetCurrentExceptionBacktrace() | | | lldb_private::Thread::GetCurrentException() | | | lldb_private::ItaniumABILanguageRuntime::GetExceptionObjectForThread(std::shared_ptr<lldb_private::Thread>) | | | | | | | |--53.43%--lldb_private::FunctionCaller::ExecuteFunction(lldb_private::ExecutionContext&, unsigned long*, lldb_private::EvaluateExpressionOptions const&, lldb_private::DiagnosticManager&, lldb_private::Value&) | | | | | | | | | |--25.23%--lldb_private::FunctionCaller::InsertFunction(lldb_private::ExecutionContext&, unsigned long&, lldb_private::DiagnosticManager&) | | | | | | | | | | | |--24.56%--lldb_private::FunctionCaller::WriteFunctionWrapper(lldb_private::ExecutionContext&, lldb_private::DiagnosticManager&) | | | | | | | | | | | | | |--19.73%--lldb_private::ExpressionParser::PrepareForExecution(unsigned long&, unsigned long&, std::shared_ptr<lldb_private::IRExecutionUnit>&, lldb_private::ExecutionContext&, bool&, lldb_private::ExecutionPolicy) | | | | | | | lldb_private::ClangExpressionParser::DoPrepareForExecution(unsigned long&, unsigned long&, std::shared_ptr<lldb_private::IRExecutionUnit>&, lldb_private::ExecutionContext&, bool&, lldb_private::ExecutionPolicy) | | | | | | | lldb_private::IRExecutionUnit::GetRunnableInfo(lldb_private::Status&, unsigned long&, unsigned long&) | | | | | | | | ``` The hot path is added by https://reviews.llvm.org/D156465 which should at least be disabled for Linux. Note: I am seeing similar performance hot path on Mac. This PR hides the feature behind `enableDisplayExtendedBacktrace` option which needs to be enabled on-demand. --------- Co-authored-by: jeffreytan81 <jeffreytan@fb.com>

MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well

Summary: This patch adds all the libc ctype variants. These ignore the locale ingormation completely, so they're pretty much just stubs. Because these use locale information, which is system scope, we do not enable building them outisde of full build mode.

llvm#105716) …le data" This reverts commit 2c1f064. Many build failures in: CodeGen/X86/scatter-schedule.ll Example of a build failure: https://lab.llvm.org/buildbot/#/builders/155/builds/1675

…vm#105555) The new helper functions make the intent clearer while hiding implementation details, including how we handle previously added entries. Note that: - If we are adding a GUID as a GlobalValueSummary::Definition, then we override a previously added GlobalValueSummary::Declaration entry for the same GUID. - If we are adding a GUID as a GlobalValueSummary::Declaration, then a previously added GlobalValueSummary::Definition entry for the same GUID takes precedence, and no change is made.

These have been replaced with atomicrmw.

Move the logic for pre-computing costs of certain instructions to a separate helper function, allowing re-use in a follow-up patch.

…e functions. (llvm#105499) This implements Fast-Forward Sequences documented in ARM64EC ABI https://learn.microsoft.com/en-us/windows/arm/arm64ec-abi. There are two conditions when linker should generate such thunks: - For each exported ARM64EC functions. It applies only to ARM64EC functions (we may also have pure x64 functions, for which no thunk is needed). MSVC linker creates `EXP+<mangled export name>` symbol in those cases that points to the thunk and uses that symbol for the export. It's observable from the module: it's possible to reference such symbols as I did in the test. Note that it uses export name, not name of the symbol that's exported (as in `foo` in `/EXPORT:foo=bar`). This implies that if the same function is exported multiple times, it will have multiple thunks. I followed this MSVC behavior. - For hybrid_patchable functions. The linker tries to generate a thunk for each undefined `EXP+*` symbol (and such symbols are created by the compiler as a target of weak alias from the demangled name). MSVC linker tries to find corresponding `*$hp_target` symbol and if fails to do so, it outputs a cryptic error like `LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage`. I just skip generating the thunk in such case (which causes undefined reference error). MSVC linker additionally checks that the symbol complex type is a function (see also llvm#102898). We generally don't do such checks in LLD, so I made it less strict. It should be fine: if it's some data symbol, it will not have `$hp_target` symbol, so we will skip it anyway.

There are cases where VPlans contain some simplifications that are very hard to accurately account for up-front in the legacy cost model. Those cases are caused by un-simplified inputs, which trigger the assert ensuring both the legacy and VPlan-based cost model agree on the VF. To avoid false positives due to missed simplifications in general, only trigger the assert if the chosen VPlan doesn't contain any additional simplifications. Fixes llvm#104714. Fixes llvm#105713.

libunwind shouldn't know that compact_unwind_encoding.h is part of a MachO module that it doesn't own. Delete the mach-o module map, and let whatever is in charge of the mach-o directory be the one to say how its module is organized and where compact_unwind_encoding.h fits in.

…102622) Introduce the `-fsanitize=realtime` flag in clang driver Plug in the RealtimeSanitizer PassManager pass in Codegen, and attribute a function based on if it has the `[[clang::nonblocking]]` function effect.

Diagnose this early after parsing declaration specifiers; this allows us to issue a better diagnostic. This also checks for `concept friend` and concept declarations w/o a template-head because it’s easiest to do that at the same time. Fixes llvm#45182.

… in lit internal shell (llvm#104880) This patch rewrites tests to remove the use of the `unset` command, which is not supported in the lit internal shell. The tests now use the `env -u` to unset environment variables. The `unset` command is used in shell environments to remove the environment variable. However, because the lit internal shell does not support the `unset` command, using it in tests would result in errors or other unexpected behavior. To overcome this limitation, the tests have been updated to use the `env -u` command instead. `env -u` is supported by lit and effectively removes specified environment variables. This allows the tests to achieve the same goal of unsetting environment variables while ensuring compatibility with the lit internal shell. This change is relevant for [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179/3) Fixes: llvm#102397

Fix a bug found when coalescing loops which have iteration arguments, such that the inner loop's terminator may have operands of the inner loop iteration arguments which are about to be replaced by the outer loop's iteration arguments. The current flow leads to crush within the IR code.

…105579) - Add reverse iterators and `value_type` to StringRef. - Add unit test for all 4 iterator flavors. - This prepares StringRef to be used with `SequenceToOffsetTable`.

llvm#105744) …r (llvm#102622)" This reverts commit d010ec6. Build failure: https://lab.llvm.org/buildbot/#/builders/159/builds/4466

This patch fixes: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7245:1: error: unused function 'planContainsAdditionalSimplifications' [-Werror,-Wunused-function]

I missed this one when I introduced helper functions in: commit 3082a38 Author: Kazu Hirata <kazu@google.com> Date: Thu Aug 22 12:06:47 2024 -0700

) This fixes a crash introduced by my ac6e1fd. I had failed to consider the case where a vector is truncated to an illegal element type. The resulting intermediate VT wasn't an MVT and we'd fail an assertion. Surprisingly, SLP does query illegal element types in some cases.

This patch implements sandboxir::CatchReturnInst mirroring llvm::CatchReturnInst.

…nalysis (llvm#104906)" (llvm#105752) Revert as it breaks libc++ tests, see llvm#104906. This reverts commit c368a72.

…4866) Noticed that the release notes currently have a weird order: C++17, C++14(!), C++20, C++23, C++2c. Reorder them in reverse chronological order, which also matches the [status page](https://clang.llvm.org/cxx_status.html).

ScalarizedMaskedMemIntr contains an optimization where the <N x i1> mask is bitcast into an iN and then bit-tests with powers of two are used to determine whether to load/store/... or not. However, on machines with branch divergence (mainly GPUs), this is a mis-optimization, since each i1 in the mask will be stored in a condition register - that is, ecah of these "i1"s is likely to be a word or two wide, making these bit operations counterproductive. Therefore, amend this pass to skip the optimizaiton on targets that it pessimizes. Pre-commit tests llvm#104645

This pass performs RAUW by walking the machine function for each RAUW operation. For large functions, this runtime in this pass starts to blow up. Linearize the pass by batching the RAUW ops at once.

On linux lldb-dap uses the location of the lldb-dap binary to search for lldb-server. Previously these were produced in different directories corresponding to the BUILD file paths. It's not ideal that the BUILD file location matters for the binary at runtime but it doesn't hurt to have this tool here too like the others.

Add missing `getIterationDomainTileFromOperandTile` and `getTiledImplementationFromOperandTile` to `tensor.pack` and enable fusing it as a consumer. NOTE that, it only expects perfect tiling scenario without padding semantic currently.

- Add `EmitStringLiteralDef` to StringToOffsetTable class to emit more readable string table. - Use that in `EmitIntrinsicToBuiltinMap`.

As suggested in review for PR llvm#100067. Refactor code for S_GETPC_B64 bundle updates for use with multiple hazard mitigations.

Also, don't insert a space after ::* for method pointers. See llvm#86253 (comment). Fixes llvm#100841.

Triggers assert in compiler https://lab.llvm.org/buildbot/#/builders/51/builds/2836 ``` Instructions.cpp:1700: llvm::ShuffleVectorInst::ShuffleVectorInst(Value *, Value *, ArrayRef<int>, const Twine &, InsertPosition): Assertion `isValidOperands(V1, V2, Mask) && "Invalid shuffle vector instruction operands!"' failed. ``` This reverts commit a625435.

This commit introduces emission of DebugSource, DebugCompileUnit from NonSemantic.Shader.DebugInfo.100 and required OpString with filename. NonSemantic.Shader.DebugInfo.100 is divided, following DWARF into two main concepts – emitting DIE and Line. In DWARF .debug_abbriev and .debug_info sections are responsible for emitting tree with information (DEIs) about e.g. types, compilation unit. Corresponding to that in NonSemantic.Shader.DebugInfo.100 have instructions like DebugSource, DebugCompileUnit etc. which preforms same role in SPIR-V file. The difference is in fact that in SPIR-V there are no sections but logical layout which forces order of the instruction emission. The NonSemantic.Shader.DebugInfo.100 requires for this type of global information to be emitted after OpTypeXXX and OpConstantXXX instructions. One of the goals was to minimize changes and interaction with SPIRVModuleAnalysis as possible which current commit achieves by emitting it’s instructions directly into MachineFunction. The possibility of duplicates are mitigated by guard inside pass which emits the global information only once in one function. By that method duplicates don’t have chance to be emitted. From that point, adding new debug global instructions should be straightforward.

…d friends. API clients may want to use things other than paths as the buffer identifiers. No testcase -- I haven't thought of a good way to expose this via the regression testing tools. rdar://133536831

Revert was wrong, The bot is still broken https://lab.llvm.org/buildbot/#/builders/51/builds/2838 Reverts llvm#105771

The last use was removed by: commit 89fe570 Author: Philip Reames <listmail@philipreames.com> Date: Tue May 12 23:39:23 2015 +0000

…ine (llvm#101882) Related issues that have requested this feature: llvm#51833 llvm#23796 llvm#53190 Partially solves - this issue requests is for both arguments and parameters

This patch turns type alias ImportMapTy into a proper class to provide a more intuitive interface like: ImportList.addDefinition(...) as opposed to: FunctionImporter::addDefinition(ImportList, ...) Also, this patch requires all non-const accesses to go through addDefinition, maybeAddDeclaration, and addGUID while providing const accesses via: const ImportMapTyImpl &getImportMap() const { return ImportMap; } I realize ImportMapTy may not be the best name as a class (maybe OK as a type alias). I am not renaming ImportMapTy in this patch at least because there are 47 mentions of ImportMapTy under llvm/.

…ing" (llvm#105780) with "[Vectorize] Fix warnings" It introduced compiler crashes, see llvm#104144. This reverts commit 69332bb and 351f4a5.

…05635) It is possible to have a subview with a fully static size and a type that matches the source type, but a dynamic offset that may be different. However, currently the memref dialect folds: ```mlir func.func @subview_of_static_full_size( %arg0: memref<16x4xf32, strided<[4, 1], offset: ?>>, %idx: index) -> memref<16x4xf32, strided<[4, 1], offset: ?>> { %0 = memref.subview %arg0[%idx, 0][16, 4][1, 1] : memref<16x4xf32, strided<[4, 1], offset: ?>> to memref<16x4xf32, strided<[4, 1], offset: ?>> return %0 : memref<16x4xf32, strided<[4, 1], offset: ?>> } ``` To: ```mlir func.func @subview_of_static_full_size( %arg0: memref<16x4xf32, strided<[4, 1], offset: ?>>, %arg1: index) -> memref<16x4xf32, strided<[4, 1], offset: ?>> { return %arg0 : memref<16x4xf32, strided<[4, 1], offset: ?>> } ``` Which drops the dynamic offset from the `subview` op.

The o32 ABI specifies: > Each relocation type of R_MIPS_HI16 must have an associated R_MIPS_LO16 entry immediately following it in the list of relocations. [...] the addend AHL is computed as (AHI << 16) + (short)ALO In practice, the high-part and low-part relocations may not be adjacent in assembly files, requiring the assembler to reorder relocations. http://reviews.llvm.org/D19718 performed the reordering, but did not optimize for the common case where a %lo immediately follows its matching %hi. The quadratic time complexity could make sections with many relocations very slow to process. This patch implements the fast path, simplifies the code, and makes the behavior more similar to GNU assembler (for the .rel.mips_hilo_8b test). We also remove `OriginalSymbol`, removing overhead for other targets. Fix llvm#104562 Pull Request: llvm#104723

… multiple module units for explicit specialization Relax the case for duplicated declaration in multiple module units for explicit specialization and refactor the implementation of checkMultipleDefinitionInNamedModules a little bit. This is intended to not affect any end users since it only relaxes the condition to emit an error.

…of setCompleteDefinition(false) for CXXRecordDecl When we merge the definition for CXXRecordDecl, we would use setCompleteDefinition(false) to mark the merged definition. But this was not the correct/good interface. We can't know that the merged definition was a definition then. And actually, we provided an interface for this: demoteThisDefinitionToDeclaration. So this patch tries to use the correct API. This was found in the downstream developing. This is not strictly NFC but it is intended to be NFC for every end users.

Fixes llvm#105785.

The generic subtarget has neither of these features. Rather than forcing HasMovrel on, it is simpler to expand dynamic vector indexing to a sequence of compare/select instructions. NFC for real subtargets.

…lvm#103044) As per [1] the indices for a matrix element access operator shall have integral or unscoped enumeration types and be non-negative. At the moment, the index expression is converted to SizeType irrespective of the signedness of the index expression. This causes implicit sign conversion warnings if any of the indices is signed. As per the spec, using signed types as indices is allowed and should not cause any warnings. If the index expression is signed, extend to SignedSizeType to avoid the warning. [1] https://clang.llvm.org/docs/MatrixTypes.html#matrix-type-element-access-operator PR: llvm#103044

…m#105550) When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order.

…(REAPPLIED) This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should Reapplied with codegen fix for scatter-schedule.ll Fixes llvm#105675

…lvm#105799) It is a long standing issue that the duplicated declarations in multiple module units would cause the compilation performance to get slowed down. And there are many questions or issue reports. So I think it is better to add a warning for it. And given this is not because the users' code violates the language specification or any best practices, the warning is disabled by default even if `-Wall` is specified. The users need to specify the warning explcitly or use `Weverything`. The documentation will add separately.

This mirrors what we do for SDAG, scalarizing i128 vectors with add/sub/mul/and/or/xor operators.

…lvm#105804) This check was removed a while ago from visit(), remove it from delegate() as well.

This reverts c79d1fa and 125aa10 Instead, use the previous approach but allow void-typed InitListExprs with 0 initializers.

This allows clients to check buffers that they don't own. rdar://133536831

…zer (llvm#105811)

…lvm#105691) First patch to fix a BIND(C) ABI issue (llvm#102113). I need to keep track of BIND(C) in more locations (fir.dispatch and func.func operations), and I need to fix a few passes that are dropping the attribute on the floor. Since I expect more procedure attributes that cannot be reflected in mlir::FunctionType will be needed for ABI, optimizations, or debug info, this NFC patch adds a new enum attribute to keep track of procedure attributes in the IR. This patch is not updating lowering to lower more attributes, this will be done in a separate patch to keep the test changes low here. Adding the attribute on fir.dispatch and func.func will also be done in separate patches.

- Make `EmitString` const by not mutating `AggregateString`. - Use C++17 structured bindings in `GetOrAddStringOffset`. - Use StringExtras version of isDigit instead of std::isdigit.

This patch fixes: clang/lib/Serialization/ASTReader.cpp:9978:27: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]

…05819)

Add an unique suffix to .sbss/.sdata if -fdata-sections. Without assigning an unique .sbss/.sdata section to each symbols, a linker may not be able to remove unused part when gc-section since all used and unused symbols are all mixed in the same .sbss/.sdata section. I believe this also matches the behavior of gcc.

…lvm#104910) Mem2Reg assumes SSA dependencies but did not check for graph regions. This fixes it. --------- Co-authored-by: Christian Ulmann <christianulmann@gmail.com>

I found the current stable hash is not deterministic across multiple runs on a specific platform. This is because it uses `hash_combine` instead of `stable_hash_combine`.

…lvm#105812) Common up handling of intrinsics that are a no-op on uniform arguments. This catches a couple of new cases: readlane (readlane x, y), z -> readlane x, y (for any z, does not have to equal y). permlane64 (readfirstlane x) -> readfirstlane x (and likewise for any other uniform argument to permlane64).

SLP vectorizer has an estimation for gather/buildvector nodes, which contain some scalar loads. SLP vectorizer performs pretty similar (but large in SLOCs) estimation, which not always correct. Instead, this patch implements clustering analysis and actual node allocation with the full analysis for the vectorized clustered scalars (not only loads, but also some other instructions) with the correct cost estimation and vector insert instructions. Improves overall vectorization quality and simplifies analysis/estimations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#104144

Currently vector tuple registers don't have the specified names, the default name is, for example: `VRN3M2` -> `V8M2_V10M2_V12M2`, however it's equivalent to `v8` in the assembly.

…04717)" This reverts commit 7597e09. It caused several buildbot failures due to stack overflows with the parser test.

…eSVECmpNE (llvm#102472)

…ally used pointer. If the strided node is reversed, need to cehck for the last instruction, not the first one in the list of scalars, when checking if the root pointer must be extracted.

…cover regression from llvm#101751. (llvm#104114)" This caused an assert to fire: llvm/include/llvm/Support/Casting.h:566: decltype(auto) llvm::cast(const From &) [To = llvm::ConstantSDNode, From = llvm::SDValue]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed. see comment on the PR. > If c1 is a shifted mask with c3 leading zeros and c4 trailing zeros. If > c2 is greater than c3, we can use (srli (srai y, c2 - c3), c3 + c4) > followed by a SHXADD with c4 as the X amount. > > Without Zba we can use (slli (srli (srai y, c2 - c3), c3 + c4), c4). > Alive2: https://alive2.llvm.org/ce/z/AwhheR This reverts commit 5144817.

…05810) `x86_64-sie-ps5` is the triple we share with PS5 toolchain users who have reason to care about such things. The vast majority of PS5 checks and tests already use this variant. Quashing the handful of stragglers will help prevent future copy+paste of the discouraged variant.

Don't consider the cost of branches marked to be skipped in VPlan cost pre-computation. Those aren't included in the legacy cost, so they should not be included in the VPlan cast.

…vm#104404)" (llvm#105601)" that change still breaks SanitizerCommon-asan-x86_64-Darwin :: Darwin/print-stack-trace-in-code-loaded-after-fork.cpp > This reverts commit 2704b80 > and relands llvm#104404. > > The Darwin should not fail after llvm#105599. This reverts commit 8c6f8c2.

) This reverts commit a1e9b7e This relands commit d010ec6 No modifications from the original patch. It was determined that the ubsan build failure was happening even after the revert, some examples: https://lab.llvm.org/buildbot/#/builders/159/builds/4477 https://lab.llvm.org/buildbot/#/builders/159/builds/4478 https://lab.llvm.org/buildbot/#/builders/159/builds/4479

WG14 N2401 was removed from the list because it was library-only changes that don't impact the compiler. Everything having to do with decimal floating-point types was changed to No because we do not currently have any support for those. WG14 N2314 remains Unknown because it has changes to Annex F for binary floating-point types.

…ternal shell (llvm#105720) This patch addresses compatibility issues with the lit internal shell by removing the use of subshell execution (parentheses and subshell syntax) in the `BOLT` tests. The lit internal shell does not support parentheses, so the tests have been refactored to use separate command invocations, with outputs redirected to temporary files where necessary. This change is relevant for enabling the lit internal shell by default, as outlined in [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179) fixes: llvm#102401

…may escape out of the loop (llvm#105755) Previously the values in the peeled prologue that weren't treated with the `predicateFn` were passed to the loop body without any other predication. If those values are later used outside of the loop body, they may be incorrect if the num iterations is smaller than num stages - 1. We need similar masking for those, as is done in the main loop body, using already existing predicates.

The implementation follows the resolution of CWG2922

…tern. NFC.

Instead of tracking those using our static CSV files, I created lists of subtasks in their respective issues (llvm#99939 and llvm#105169) to track the work that is still left.

llvm#104671) Move the implementation of `ReconcileUnrealizedCasts` to `DialectConversion.cpp`, so that it can be called from there in a future commit. This commit is in preparation of decoupling argument/source/target materializations from the dialect conversion framework. The existing logic around unresolved materializations that predicts IR changes to decide if a cast op can be folded/erased will become obsolete, as `ReconcileUnrealizedCasts` will perform these kind of foldings on fully materialized IR. --------- Co-authored-by: Markus Böck <markus.boeck02@gmail.com>

…nalysis (llvm#104906)" (llvm#105838) Reland without the `EnableLifetimeWarnings` removal. I will remove the EnableLifetimeWarnings in a follow-up patch. I have added a test to prevent regression.

This reverts commit 19d3f34.

…recover regression from llvm#101751. (llvm#104114)" Fixed an incorrect cast. Original message: If c1 is a shifted mask with c3 leading zeros and c4 trailing zeros. If c2 is greater than c3, we can use (srli (srai y, c2 - c3), c3 + c4) followed by a SHXADD with c4 as the X amount. Without Zba we can use (slli (srli (srai y, c2 - c3), c3 + c4), c4). Alive2: https://alive2.llvm.org/ce/z/AwhheR

…ng (llvm#100821)

…::Delegate (llvm#105725) The main difference is that it's possible for multiple change observers to be installed at the same time whereas there can only be one MachineFunction delegate installed. This allows downstream targets to continue to use observers to recursively select. The target in question was selecting a gMIR instruction to a machine instruction plus some gMIR around it and relying on observers to ensure it correctly selected any gMIR it created before returning to the main loop.

the visit order depended on hashing because we iterated over a SmallPtrSet

…tion We can add additional tests in the future, but this is an initial placeholder Inspired by llvm#105775

The comment talks about left-associative operators twice, when the latter mention is actually describing right-associative operators.

…05731) We don't need that name variable for contextual instrumentation, we just use the function to get its GUID which we pass to the runtime, and rely on metadata to capture it through the various optimization passes. This change removes the need for the name global variable.

Similar to what was already done for static initializers, we need to unlock the state mutext when calling out to libobjc to run +load methods in case they cause us to reenter the runtime, which was previously deadlocking. No test for now, because we don't have any code paths in llvm-jitlink itself that could lead to this deadlock. If we interpose calls to dlopen to go back to the JIT in the future then calling dlopen from a +load is the easiest way to reproduce this. rdar://133430490

…ture (llvm#97103) There are currently no diagnostics being emitted for when a resource is bound to a register with an incorrect binding type prefix. For example, a CBuffer type resource should be bound with a a binding type prefix of 'b', but if instead the prefix is 'u', no errors will be emitted. This PR implements such diagnostics. The focus of this PR is to implement both the flag setting and diagnostic emisison steps specified in the relevant spec: microsoft/hlsl-specs#230 The relevant issue is: llvm#57886 This is a continuation / refresh of this PR: llvm#87578

… scf.while/for. (llvm#105565)

…ared allocations for assumed shape/size descriptor types (llvm#97855) This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering. Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far. The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original. However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.

This reverts commit fd7904a.

…acktraces (llvm#104523)"" This reverts commit 547917a.

…"" This reverts commit aa70f83.

…tatement"" This reverts commit 7323e7e.

- Replace use of std::isalnum/ispunct with StringExtras version to avoid possibly locale dependent behavior. - Remove `static` from printChar (do its deduplicated when linking). - Use range based for loops and structured bindings. - No need to use `llvm::` for code in llvm namespace.

…lementation. (llvm#105566)

…inition(const EnumType*) (llvm#105556) This commit adds an assert to check for a non-null enum definition in CGDebugInfo::CreateTypeDefinition(const EnumType*), ensuring precondition validity. Previous discussion on llvm#97105

Allow a runtime build to disable SELECTED_REAL_KIND from returning kind 3 (16-bit truncated form of 32-bit IEEE-754 floating point, a/k/a "brain float" or bfloat16).

This is part of CHPE metadata containing a sorted list of x86_64 export thunks RVAs and RVAs of ARM64EC functions associated with them. It's stored in a dedicated .a64xrm section.

…5794)

…th empty mapping. (llvm#105793) Current folding of one-trip count loop does not kick in with an empty mapping. Enable this for empty mapping. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

If a resources is used multiple times, we should only have one resource record for it. This comes up most prominantly with arrays of resources like so: ```hlsl RWBuffer<float4> BufferArray[10] : register(u0, space4); RWBuffer<float4> B1 = BufferArray[0]; RWBuffer<float4> B2 = BufferArray[SomeIndex]; RWBuffer<float4> B3 = BufferArray[3]; ``` In this case, there's only one resource, but we'll generate 3 different `dx.handle.fromBinding` calls to access different slices. Note that this adds some API that won't be used until llvm#104447 later in the stack. Trying to avoid that results in unnecessary churn. Fixes llvm#105143 Pull Request: llvm#105602

This is part of CHPE metadata containing a sorted list of x86_64 export thunks RVAs and sizes.

…m#105821) The new warning flag is `-Winvalid-gnu-asm-cast`, which is enabled by default and is a downgradable diagnostic which defaults to an error. This language dialect flag only controls whether a single diagnostic is emitted as a warning or as an error, and has never been expanded to include other behaviors. Given the rather perjorative name, it's better for us to just expose a diagnostic flag for the one warning in question and let the user elect to do `-Wno-error=` if they need to. There's not a lot of use of the language dialect flag in the wild, but there is some use of it. For the time being, this aliases the -f flag to `-Wno-error=invalid-gnu-asm-cast`, but the -f flag can eventually be removed.

The `@llvm.dx.handle.fromBinding` intrinsic is lowered either to the `CreateHandle` op or a pair of `CreateHandleFromBinding` and `AnnotateHandle` ops, depending on the DXIL version. Regardless of the DXIL version we need to emit metadata about the binding, but that's left to a separate change. These DXIL ops all need to return the `%dx.types.Handle` type, but the llvm intrinsic returns a target extension type. To facilitate changing the type of the operation and all of its users, we introduce `%llvm.dx.cast.handle`, which can cast between the two handle representations. Pull Request: llvm#104251

…lvm#105873) The existing algorithm was performing the following comparisons for an `aaa,bbb,ccc,ddd`: aaa\0bbb,ccc,ddd == "stack" aaa\0bbb\0ccc,ddd == "stack" aaa\0bbb\0ccc\0ddd == "stack" Which wouldn't work. This commit just dispatches to a known algorithm implementation.

…pertiesAnalysis (llvm#104867) We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result. This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case) The loop info change might be more involved and would follow in a subsequent PR.

…lvm#105749) It is better to do the replacement in the caller. This avoids the footgun if the caller needs the original operation. Instead return the produced operation and replacement values. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

…llvm#104668) This commit makes source/target/argument materializations (via the `TypeConverter` API) optional. By default (`ConversionConfig::buildMaterializations = true`), the dialect conversion infrastructure tries to legalize all unresolved materializations right after the main transformation process has succeeded. If at least one unresolved materialization fails to resolve, the dialect conversion fails. (With an error message such as `failed to legalize unresolved materialization ...`.) Automatic materializations through the `TypeConverter` API can now be deactivated. In that case, every unresolved materialization will show up as a `builtin.unrealized_conversion_cast` op in the output IR. There used to be a complex and error-prone analysis in the dialect conversion that predicted the future uses of unresolved materializations. Based on that logic, some casts (that were deemed to unnecessary) were folded. This analysis was needed because folding happened at a point of time when some IR changes (e.g., op replacements) had not materialized yet. This commit removes that analysis. Any folding of cast ops now happens after all other IR changes have been materialized and the uses can directly be queried from the IR. This simplifies the analysis significantly. And certain helper data structures such as `inverseMapping` are no longer needed for the analysis. The folding itself is done by `reconcileUnrealizedCasts` (which also exists as a standalone pass). After casts have been folded, the remaining casts are materialized through the `TypeConverter`, as usual. This last step can be deactivated in the `ConversionConfig`. `ConversionConfig::buildMaterializations = false` can be used to debug error messages such as `failed to legalize unresolved materialization ...`. (It is also useful in case automatic materializations are not needed.) The materializations that failed to resolve can then be seen as `builtin.unrealized_conversion_cast` ops in the resulting IR. (This is better than running with `-debug`, because `-debug` shows IR where some IR changes have not been materialized yet.)

…04733) It is undefined behavior to lock or unlock an uninitialized lock, and unlock a lock which isn't locked. Introduce a fixture to set up and tear down the locks where appropriate, and separates them into two tests (realtime death and non realtime survival) so each test is guaranteed a fresh lock.

DefOrUseGUIDs is used only for membership checking purposes. We don't need std::set's strengths like iterators staying valid or the ability to traverse in a sorted order. While I am at it, this patch replaces count with contains for slightly increased readability.

…xt(x < y)` to `ucmp/scmp(x, y)` (llvm#105272) This patch expands already existing funcionality to include these two additional folds, which are nearly identical to the ones already implemented. Proofs: https://alive2.llvm.org/ce/z/Xy7s4j

Add support for nan detection. llvm#100305

…operations. (llvm#105567)

…llvm#105889) The declaration in SPIRV.h had this returning a `MachineFunctionPass *`, but the implementation returned a `FunctionPass *`. This showed up as a build error on windows, but it was clearly a mistake regardless. I also updated the pass to include SPIRV.h rather than using its own declarations for pass initialization, as this results in better errors for this kind of typo. Fixes a build break after llvm#97558

…)" (llvm#105895) Reland llvm#104404. In addition to llvm#104404 it raises required verbosity for stack tracing on global registration. It confuses a symbolizer test on Darwin. This reverts commit 6a8f738.

…m#105892) This adds implementations for the two TilingInterface methods required for fusion to `tensor.pad`: `getIterationDomainTileFromResultTile` and `generateResultTileValue`, allowing fusion of pad with a tiled consumer.

An assert was left over after addressing feedback. In the process of fixing, realized the way I addressed the feedback was also incomplete.

This patch introduces type alias ModuleToSummariesForIndexTy. I'm planning to change the type slightly to allow heterogeneous lookup (that is, std::map<K, V, std::less<>>) in a subsequent patch. The problem is that changing the type affects many places. Using a type alias reduces the impact.

Reverts llvm#101531 Fails https://lab.llvm.org/buildbot/#/builders/66/builds/3051

…m#105844) In practice most of these expressions just resolve to implicitly provided `operator new` and standard says it's not necessary to include `<new>` for that. Hence this is resulting in a lot of churn in cases where inclusion of `<new>` doesn't matter, and might even be undesired by the developer. By switching to an ambiguous reference we try to find a middle ground here, ensuring that we don't drop providers of `operator new` when the developer explicitly listed them in the includes, and chose to believe it's the implicitly provided `operator new` and don't insert an include in other cases.

ccae7b4 improved handling for nested calls, but this resulted in a lot of changes near `new` expressions. This patch tries to restore previous behavior around new expressions, by treating them as simple functions, which seem to align with the concept. Fixes llvm#105133.

This patch implements sandboxir::CleanupReturnInst mirroring llvm::CleanupReturnInst.

This is a follow-up to address a suggestion from llvm#105619. The main goal of this change is to efficiently implement stable hash functions using the xxh3 64bits API. `stable_hash_combine_range` and `stable_hash_combine_array` functions are removed and consolidated into a more general `stable_hash_combine` function that takes an `ArrayRef<stable_hash>` as input.

Mistakenly used markdown style rather than rst in llvm#104499.

This depends on signed-ness.

…lvm#104404)"" (llvm#105926) Reverts llvm#105895 Still breaks the test https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/1864/

@vitalybuka

…cs (llvm#105709) From @vitalybuka's review on llvm#104889: - [x] remove unused variable in tests - [x] rename `post-decr-while` --> `unsigned-post-decr-while` - [x] split `add-overflow-test` into `add-unsigned-overflow-test` and `add-signed-overflow-test` - [x] be more clear about defaults within docs - [x] add table to docs Here's a screenshot of the rendered table so you don't have to build the html docs yourself to inspect the layout: ![image](https://github.com/user-attachments/assets/5d3497c4-5f5a-4579-b29b-96a0fd192faa) CCs: @vitalybuka --------- Signed-off-by: Justin Stitt <justinstitt@google.com> Co-authored-by: Vitaly Buka <vitalybuka@google.com>

Since this must be true, add an assertion instead of just documenting it via the comment.

…and has the `nuw` or `nsw` property. (llvm#105914) This patch updates the select operand when the cond has the nuw or nsw property. Considering the semantics of the nuw and nsw flag, if there is no poison value in this expression, this code assumes that X can only be 0, 1 or -1. close: llvm#96765 alive2: https://alive2.llvm.org/ce/z/3n3n2Q

The intent is that the tests should not be running on PowerPC as the fp128 type will differ. This attempts to fix the bots by using __powerpc__ instead, which appears to be defined in godbolt.

…ephole (llvm#105792) Currently we move the source down to where vmv.v.v to make sure that the new passthru dominates, but we do this even if it already does. This adds a simple local dominance check (taken from X86FastPreTileConfig.cpp) and avoids doing the move if it can. It also modifies the move to only move it to just past the passthru definition, and not all the way down to the vmv.v.v. This allows folding to succeed in some edge cases, which prevents regressions in an upcoming patch.

Only used for an assertion.

TLI might not be valid for all contexts that constant folding is performed. Add a quick guard that it is not null.

On macOS the dynamic loader prunes dyld specific environment variables such as `DYLD_INSERT_LIBRARIES`, `DYLD_LIBRARY_PATH`, etc. If these are set in the lit config it's safe to assume that the user actually wanted their subprocesses to run with these variables, versus the python interpreter that gets executed with them before they are pruned. This change exports all known variables in the shell script instead of relying on them being passed through.

Followup to llvm#90109. In Microsoft, our automated scans are warning that LLVM has vulnerable dependencies. Specifically: * [CVE-2024-35195](https://nvd.nist.gov/vuln/detail/CVE-2024-35195) was fixed in `requests` 2.32.0. * [CVE-2024-37891](https://nvd.nist.gov/vuln/detail/CVE-2024-37891) was fixed in `urllib3` 2.2.2. I've updated LLVM's dependencies by running the following commands in `llvm/utils/git`: ``` pip-compile --upgrade --generate-hashes --output-file=requirements.txt requirements.txt.in pip-compile --upgrade --generate-hashes --output-file=requirements_formatting.txt requirements_formatting.txt.in ``` Note that for `requirements_formatting.txt` this adds `--generate-hashes` (according to my vague understanding, it's highly desirable and was already used for `requirements.txt`) and was locally run within `llvm/utils/git` (changing the recorded command, which apparently was originally run from the repo root - again, `requirements.txt` was already being regenerated with a locally run command, so this increases consistency). I observe that this has updated the relevant components to pick up the CVE fixes. Note that I am largely clueless in this area, so I hope that (like llvm#90109) no other changes will be necessary.

Followup to llvm#99570. * `TEST_COMPILER_MSVC` must be tested for `defined`ness, as it is everywhere else. + Definition: https://github.com/llvm/llvm-project/blob/52a7116f5c6ada234f47f7794aaf501a3692b997/libcxx/test/support/test_macros.h#L71-L72 + Example usage: https://github.com/llvm/llvm-project/blob/52a7116f5c6ada234f47f7794aaf501a3692b997/libcxx/test/std/utilities/function.objects/func.not_fn/not_fn.pass.cpp#L248 + Fixes: `llvm-project\libcxx\test\support\atomic_helpers.h(33): fatal error C1017: invalid integer constant expression` * Fix bogus return type: `msvc_is_lock_free_macro_value()` returns `2` or `0`, so it needs to return `int`. + Fixes: `llvm-project\libcxx\test\support\atomic_helpers.h(41): warning C4305: 'return': truncation from 'int' to 'bool'` * Clarity improvement: also add parens when mixing bitwise with arithmetic operators.

Fix bug introduced in llvm#105730 The bug is in how the batch RAUW is implemented. If we have ``` %0 = mov %src %1 = mov %0 use %0 use %1 ``` The use of `%1` is rewritten to `%0`, not `%src`. This PR just looks for a replacement when it maps to the src register, which should transitively propagate the replacements.

…tions (llvm#105840) This is a follow up to llvm#105455 which updates the VPIntrinsic mappings for the fadd and fmul cases, and supports both ordered and unordered reductions. This allows the use a single wider operation with a restricted EVL instead of padding the vector with the neutral element. This has all the same tradeoffs as the previous patch.

@ya

… mangling for MSVC 1920+ / VS2019+ (llvm#104722) Reapply llvm#102848. The description in this PR will detail the changes from the reverted original PR above. For `auto&&` return types that can partake in reference collapsing we weren't properly handling that mangling that can arise. When collapsing occurs an inner reference is created with the collapsed reference type. If we return `int&` from such a function then an inner reference of `int&` is created within the `auto&&` return type. `getPointeeType` on a reference type goes through all inner references before returning the pointee type which ends up being a builtin type, `int`, which is unexpected. We can use `getPointeeTypeAsWritten` to get the `AutoType` as expected however for the instantiated template declaration reference collapsing already occurred on the return type. This means `auto&&` is turned into `auto&` in our example above. We end up mangling an lvalue reference type. This is unintended as MSVC mangles on the declaration of the return type, `auto&&` in this case, which is treated as an rvalue reference. ``` template<class T> auto&& AutoReferenceCollapseT(int& x) { return static_cast<int&>(x); } void test() { int x = 1; auto&& rref = AutoReferenceCollapseT<void>(x); // "??$AutoReferenceCollapseT@X@@ya$$QEA_PAEAH@Z" // Mangled as an rvalue reference to auto } ``` If we are mangling a template with a placeholder return type we want to get the first template declaration and use its return type to do the mangling of any instantiations. This fixes the bug reported in the original PR that caused the revert with libcxx `std::variant`. I also tested locally with libcxx and the following test code which fails in the original PR but now works in this PR. ``` #include <variant> void test() { std::variant<int> v{ 1 }; int& r = std::get<0>(v); (void)r; } ```

) Currently, process of replacing bitwise operations consisting of `LSR`/`LSL` with `And` is performed by `DAGCombiner`. However, in certain cases, the `AND` generated by this process can be removed. Consider following case: ``` lsr x8, x8, #56 and x8, x8, #0xfc ldr w0, [x2, x8] ret ``` In this case, we can remove the `AND` by changing the target of `LDR` to `[X2, X8, LSL #2]` and right-shifting amount change to 56 to 58. after changed: ``` lsr x8, x8, #58 ldr w0, [x2, x8, lsl #2] ret ``` This patch checks to see if the `SHIFTING` + `AND` operation on load target can be optimized and optimizes it if it can.

v16i8 VECTOR_REG_CAST (v16i8 Op) can use v16i8 Op directly, as the VECTOR_REG_CAST is a noop.

We assign I->getNumOperands() to J and immediately print that out as a debug message. We don't need to keep J across iterations.

…uble (llvm#104929)" ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`. LLVM should not change the behavior depending on host configurations. This reverts commit 14c7e4a. (llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)

…llvm#105921) Fixes llvm#105880.

…lvm#105941) Fixes llvm#105877.

…105973)

Current VCIX ISDs are placed after FIRST_TARGET_STRICTFP_OPCODE which is not expected, it should be in normal OPCODE area.

…iveIn/removeLiveIn. NFC We already used it for addLiveIn.

If we fail to initialize the ASTContext builtins, LLDB may crash in non-obvious ways down-the-line, e.g., when it tries to call `ASTContext::getTypeSize` on a builtin like `ast.UnsignedCharTy`, which would derefernce a `null` `QualType`. The initialization can fail if we either didn't set the `TypeSystemClang` target triple, or if the embedded clang isn't enabled for a certain target. This patch attempts to help pin-point the failure case post-mortem by adding a log message here that prints the triple. rdar://134260837

) This reverts commit 1f89cd4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 51365212 (Aug 25) (10) #363

[AutoBump] Merge with 51365212 (Aug 25) (10) #363

Commits on Aug 22, 2024

Commits on Aug 23, 2024

Commits on Aug 24, 2024

Commits on Aug 25, 2024

Commits on Sep 20, 2024

[AutoBump] Merge with 51365212 (Aug 25) (10) #363

Are you sure you want to change the base?

[AutoBump] Merge with 51365212 (Aug 25) (10) #363

Commits on Aug 22, 2024

Commits on Aug 23, 2024

Commits on Aug 24, 2024

Commits on Aug 25, 2024

Commits on Sep 20, 2024