[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

…tCFInstrCost implementations. We were using the default implementations instead of the CRTP versions.

…r each i1 mask element These can nearly always be folded into the existing cost of the branch, and brings the throughput costs of the scalarised gather/scatter code much closer to the llvm-mca/uica estimates

…1581)

Being able to add custom dialects is one of the big missing pieces of the C API. This change should make it achievable via IRDL. Hopefully this should open custom dialect definition to non-C++ users of MLIR.

Previously, isRoot() would return true for pointers with a base of sizeof(InlineDescriptor), even if the actual metadata size of the pointee was 0.

…lvm#91844) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 24 under clang/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".

…trieve source location in `ConvertConstructorToDeductionGuideTransform`. The commit fec4716 was reverted by accident in 7415524. Reland it with a testcase.

…ypes (llvm#87716)

…llvm#91738) There is a follow-up commit for llvm#90319. The Windows test was disabled in that commit, but it should pass on this operating system. Therefore, it would be beneficial to have it enabled for MS Windows.

This effectively reverts 5cd2804 and changes to QualifierFixerTest.cpp from e62ce1f. Failed buidbots: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968

Avoid using bitfield in dxbc::ProgramHeader. It could potentially be read incorrectly on any host depending on the compiler. From [C++17's [class.bit]](https://timsong-cpp.github.io/cppwp/n4659/class.bit#1) > Bit-fields are packed into some addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ] For llvm#91793

…nters"" This reverts commit fb1c2db.

Fix race condition in internal NFC test.

PR llvm#87090 amended `accumulateBitfields` to do the correct clipping. The scissor is no longer necessary and `checkBitfieldClipping` can compute its location directly when needed.

…OpCost directly The generic getCommonMaskedMemoryOpCost now gives the same cost estimates for scalarized gather/scatter.

getGSVectorCost has supported other TargetCostKind since a551272

…#91807) The pass runs a `DataFlowSolver` and collects state information on the input IR. Then, the rewrite driver and folding is applied. During pattern application and folding it can happen that an Op from the input IR is deleted and a new Op is created at the same address. When the newly created Ops is looked up in the `DataFlowSolver` state memory, the state of the original Op is returned. This patch adds a method to `DataFlowSolver` which removes all state related to a `ProgramPoint`. It also adds a listener to the Pass which clears the state information of deleted Ops from the `DataFlowSolver`. Fix llvm#81228

…ggregate initialization using a default member initializer (llvm#87933) This PR complete [DR1815](https://wg21.link/CWG1815) under the guidance of `FIXME` comments. And reuse `CXXDefaultInitExpr` rewrite machinery to clone the initializer expression on each use that would lifetime extend its temporaries. --------- Signed-off-by: yronglin <yronglin777@gmail.com>

) This is probably the most involved addition, as it tries to make use of isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a number of different intrinsics that are all lane-wise. Additional tests have been added for some of the different intrinsics from isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.

… YAML Fix an issue where the profile for all branches that have a BRANCHENTRY is dropped. If the branch has an entry in BAT, it will be translated to its input offset. We used to only permit the basic block offset as a branch source. Perform a lookup of containing basic block instead. Test Plan: Updated bolt-address-translation-yaml.test Reviewers: maksfb, dcci, rafaelauler, ayermolo Reviewed By: maksfb Pull Request: llvm#91273

…91846) This is how MSVC handles it. https://godbolt.org/z/fG386bjnf

This reverts commit 0869204, which caused a buildbot failure: https://lab.llvm.org/buildbot/#/builders/5/builds/43322

…st (llvm#89170) This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same

Fix the following buildbot failures by making LangOpts in the unit test static: https://lab.llvm.org/buildbot/#/builders/236/builds/11223 https://lab.llvm.org/buildbot/#/builders/239/builds/6968

…m#90995) GVNSink used to order instructions based on their pointer values and was prone to non-determinism because of that. This patch ensures all the values stored are using a deterministic order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to assert when ordering isn't preserved. Additionally, I have added a test case (mirror graph image of an existing test) that would have failed before this patch. Fixes: llvm#77852

…cInstrCost (llvm#89170)" This reverts commit ed16e7a.

…icInstrCost (llvm#89170) Insert a break to fix the implicit-fallthrough caught by sanitizer. Original commit message: This patch made following changes: 1. Support ISD FDIV/UDIV/SDIV/UREM/SREM 2. Classify instructions which cost the same

…#91190) Follow up patch to llvm#89217, before we make changes to atomic optimizer.

"Value" is still used afterwards in the return value. In this case, this doesn't actually make a difference because a move for a primitive type is the same as a copy, so there is no actual misbehavior. Still drop the std::move to make the code less confusing.

…1464) This commit fixes Mem2Regs mutli-slot allocator handling and extends the test dialect to test this. Additionally, this modifies Mem2Reg's API to always attempt a full promotion on all the passed in "allocators". This ensures that the pass does not require unnecessary walks over the regions and improves caching benefits.

This patch fixes: llvm/lib/Transforms/Scalar/GVNSink.cpp:270:33: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] While I am at it, this patch replaces llvm::for_each with a range-based for loop.

In ASTBitCodes.h, there are two type alias for the ID type of Identifiers with the same underlying type. It is confusing. This patch tries to merge the `IdentID` to `IdentifierID` to erase such confusion.

) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 276 under llvm-project/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".

…#91712)" This reverts commits e62ce1f, 5cd2804, and de641e2 due to buildbot failures.

See the following case: ``` define i32 @SRC1(i32 %x) { %dec = sub nuw i32 -2, %x %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false) %sub = sub nsw i32 32, %ctlz %shl = shl i32 1, %sub %ugt = icmp ult i32 %x, -2 %sel = select i1 %ugt, i32 %shl, i32 1 ret i32 %sel } define i32 @tgt1(i32 %x) { %dec = sub nuw i32 -2, %x %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false) %sub = sub nsw i32 32, %ctlz %and = and i32 %sub, 31 %shl = shl nuw i32 1, %and ret i32 %shl } ``` `nuw` in `%dec` should be dropped after the select instruction is eliminated. Alive2: https://alive2.llvm.org/ce/z/7S9529 Fixes llvm#91691.

…1851) This PR fixes the warning message due to the non ISO standard usage of `__FUNCTION__` ``` /home/lewuathe/llvm-project/mlir/test/CAPI/transform_interpreter.c: In function ‘testApplyNamedSequence’: /home/lewuathe/llvm-project/mlir/test/CAPI/transform_interpreter.c:21:27: warning: ISO C does not support ‘__FUNCTION__’ predefined identifier [-Wpedantic] 21 | fprintf(stderr, "%s\n", __FUNCTION__); | ``` As `__FUNCTION__` is another name of `__func__` and it conforms to the specification. We should be able to use `__func__` here. Ref: https://stackoverflow.com/questions/52962812/how-to-silence-gcc-pedantic-wpedantic-warning-regarding-function Compiler ``` Ubuntu clang version 18.1.3 (1) Target: x86_64-pc-linux-gnu ```

Fix an integer value in the prose to match the rest of the content.

Pretty much all logic that we have today for lowering vector.transpose assumes fixed length vectors (it's done via vector.shuffle that don't support scalable vectors). This patch updates related tests and patterns to capture and document this limitation more explicitly. Note that `vector.transpose` is a valid operation in the context of scalable vectors, but we are yet to implement the missing lowerings. Summary of changes: * `@transpose_nx8x2xf32` is renamed as `@transpose_scalabl`e and moved near other tests using `lowering_strategy = "shuffle_1d" (to avoid duplicating TD sequences) * tests specific to X86 (`avx2_lowering_strategy = true`) are moved to a dedicated file (to separate generic tests from target-specific tests) * `@transpose10_nx4xnx1xf32` duplicated `@transpose10_4xnx1xf32` and was deleted (the latter is renamed as `@transpose10_4x1xf32_scalable` to match its fixed-width counterpart: `@transpose10_4x1xf32`)

llvm#90961) In the clang AST, constraint nodes are deliberately not instantiated unless they are actively being evaluated. Consequently, occurrences of template parameters in the require-clause expression have a subtle "depth" difference compared to normal occurrences in places, such as function parameters. When transforming the require-clause, we must take this distinction into account. The existing implementation overlooks this consideration. This patch is to rewrite the implementation of the require-clause transformation to address this issue. Fixes llvm#90177

This helps it produce a single instruction for the saturate, as opposed to having to scalarize.

This commit creates an expansion pattern to lower math.rsqrt(x) into fdiv(1, sqrt(x)).

…#91893) Added a check for unexpanded parameter pack in attribute [[assume]]. Tested it with expected-error statements from clang fronted. This fixes llvm#91232.

…. NFC Should hopefully help with llvm#91854

…TableGen (llvm#89932)

Hotfix for "[mlir][math] lower rsqrt to sqrt + fdiv (llvm#91344)"

…or RTTI (llvm#91466) rdar://127732562

…lvm#87994) Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer. This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken. The DWARF specification version 3 mentions in the Data Representation section, page 127: > If one of the DW_FORM_data<n> forms is used to represent a signed or unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data<n>. Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer. Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether. This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation. One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.

The C locale on AIX uses `ISO-8859-1`, where `0xFB` is a valid character. Widening char(-5) succeeds and produces L'\u00fb' the same as on macOS, FreeBSD, and Windows. This patch removes `XFAIL: LIBCXX-AIX-FIXME` and uses the macOS, FreeBSD, and WIN32 code path for AIX.

The `grouping` string for locale `en_US.UTF-8` and `fr_FR.UTF-8` on AIX is `3`. This is different from Linux's `3;3` but is the same as Windows. This patch removes `XFAIL: LIBCXX-AIX-FIXME` and changes to use the `WIN32` code path.

Based on discussion from https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788 Current interface is: llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask) The integer type used by 'inc_amount' needs to match the type of the buckets in memory. The intrinsic covers the following operations: * Gather load * histogram on the elements of 'ptrs' * multiply the histogram results by 'inc_amount' * add the result of the multiply to the values loaded by the gather * scatter store the results of the add Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.

llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:27379:19: error: unused variable 'CID' [-Werror,-Wunused-variable] ConstantSDNode *CID = cast<ConstantSDNode>(IntID.getNode()); ^ 1 error generated.

This reverts commit b903bad. TestInterruptBacktrace was broken on AArch64/Windows as a result of this change. see lldb-aarch64-windows buildbot here: https://lab.llvm.org/buildbot/#/builders/219/builds/11261

…nction (llvm#91321)" This reverts commit fd1bd53. TestInterruptBacktrace was broken on AArch64/Windows as a result of this change. See lldb-aarch64-windows buildbot here: https://lab.llvm.org/buildbot/#/builders/219/builds/11261

This patch addresses llvm#90034 (comment).

…ecker` (llvm#91119) Resolves llvm#89264 Values should not be stored in addresses of labels, this throws a fatal error when this happens. --------- Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>

…lvm#91646)

This avoids spurious test changes in a future commit.

Fix llvm#91814 When instructions are extracted into a new function the `DIAssignID` metadata uses and attachments need to be remapped so that the stores and assignment markers don't link to stores and assignment markers in the original function. This matches existing inlining behaviour for DIAssignIDs.

This is follow up fix on top of 9a7262c This fixes delayed-definition-die-searching.test to use -gdwarf. This is required to explicitly select DWARF instead of PDB on windows. Fixe LLDB build lldb-aarch64-windows: https://lab.llvm.org/buildbot/#/builders/219/builds/11303

…m#91915) The setting `platform.module-cache-directory` is a local path on the host. It cannot be set to a working directory from the remote target. This test failed in case of Windows host and Linux target because of the incompatible path. Use the local build dir instead.

os.path.join() uses the path separator of the host OS by default. outfile_arg will be incorrect in case of Windows host and Linux target. Use lldbutil.append_to_process_working_directory() instead.

We called getIConstantVRegVal which again queried MRI to get the VReg def. We already have the def, so just get the CImm directly. It can't fail.

…#91945) The argument order to MatchBinaryAddToConst doesn't match the comment and also is counter-intuitive (passing RHS before LHS, C2 before C1). This patch adjusts the order to be inline with the calls above, which should be equivalent, but more natural: https://alive2.llvm.org/ce/z/ZWGp-Z PR: llvm#91945

llvm::sort requires the comparator to return `false` for equal elements, otherwise it triggers `Your comparator is not a valid strict-weak ordering` assert.

…lvm#90603) The clang-query tool has the ability to execute or pre-load queries from a file when the tool is launched, but doesn't have the ability to do the same from the interactive REPL prompt. Because the prompt also doesn't seem to allow multi-line matchers, this can make prototyping and iterating on more complicated matchers difficult. Supporting a dynamic load at REPL time allows the cost of reading the compilation database and building the AST to be imposed just once, and allows faster prototyping.

One of the constraints of an AST is that every node object must appear at most once, hence we define lamdas that create a new AST node at every use.

…tKinds Add TypeConversionCostKindTblEntry to hold the costs kinds and update the cast tables to take the existing default codesize/latency/sizelatency values (I'll update these values in future commits). I've moved AdjustCost to the end of the function to ensure we don't accidentally use it, apart from when we fallback to default cost calculations.

…lvm#90098) A compound construct with a list of clauses is broken up into individual leaf/composite constructs. Each such construct has the list of clauses that apply to it based on the OpenMP spec. Each lowering function (i.e. a function that generates MLIR ops) is now responsible for generating its body as described below. Functions that receive AST nodes extract the construct, and the clauses from the node. They then create a work queue consisting of individual constructs, and invoke a common dispatch function to process (lower) the queue. The dispatch function examines the current position in the queue, and invokes the appropriate lowering function. Each lowering function receives the queue as well, and once it needs to generate its body, it either invokes the dispatch function on the rest of the queue (if any), or processes nested evaluations if the work queue is at the end.

…ons (llvm#90249) Closes llvm#89443 I added the two missing functions and respective test cases. Let me know if anything needs changing.

)

…r functions" (llvm#91966) Reverts llvm#90249 Fullbuild is broken: https://lab.llvm.org/buildbot/#/builders/163/builds/56501

…owering (llvm#90098)" It breaks some builds, e.g. https://lab.llvm.org/buildbot/#/builders/268/builds/13909 This reverts commit ca1bd59.

…arget (llvm#91923) Transfer `stdio.log` from the remote target if necessary.

…se in `omp.task` correctly (llvm#90891) This patch fixes the code generation of the if clause, specifically when the condition evaluates to false and when the task directive has the depend clause on it. When the if clause of a task construct evaluates to false, then the task is an undeferred task. This undeferred task still has to honor dependencies. Previously, the OpenMPIRbuilder didn't honor dependencies. This patch fixes that. Fixes llvm#90869

The current comparator doesn't work correctly when two identical entries with -1 are compared. The comparator returns `first` is case when `aIndex == -1 && bIndex == -1`, but it should `continue` as those indexes are the same.

…text (llvm#91939) We did not set the correct evaluation context for the compound statement of an ``if consteval`` statement in a templated entity in TreeTransform. Fixes llvm#91509

Allow non-constants in the `sizes` clause such as ``` #pragma omp tile sizes(a) for (int i = 0; i < n; ++i) ``` This is permitted since tile was introduced in [OpenMP 5.1](https://www.openmp.org/spec-html/5.1/openmpsu53.html#x78-860002.11.9). It is possible to sneak-in negative numbers at runtime as in ``` int a = -1; #pragma omp tile sizes(a) ``` Even though it is not well-formed, it should still result in every loop iteration to be executed exactly once, an invariant of the tile construct that we should ensure. `ParseOpenMPExprListClause` is extracted-out to be reused by the `permutation` clause of the `interchange` construct. Some care was put into ensuring correct behavior in template contexts.

…m#89932) G_ICMP NE => XOR(G_ICMP EQ, -1) moved to Legalizer to allow for combines if they come up in following passes.

…lvm#91867) This script takes IR before and after canonicalization, translates it into llvm IR and converts it to format suitable for Alive2 https://alive2.llvm.org/ce/ This is primarily for arith canonicalizations verification, but technically it can be adapted for any dialect translatable to llvm. Usage `python verify_canon.py canonicalize.mlir -f func1 func2 ...` Example output: https://alive2.llvm.org/ce/z/KhQs4J Initial discussion: llvm#91646 (review)

…r options category (llvm#91932) The `CTUPhase1InliningMode`option was originally placed under Unsigned analyzer options, but its value is a string. This move aligns the option with its actual type.

llvm#91821) … (llvm#90885)" This reverts commit eea81aa.

…eal 10 on supported platforms (llvm#91629) The real 10 tests fail on `AIX` on `PPC`, only check them on `x86_64` Co-authored-by: Mark Danial <mark.danial@ibm.com>

…lvm#90054)

device_type, also spelled as dtype, specifies the applicability of the clauses following it, and takes a series of identifiers representing the architectures it applies to. As we don't have a source for the valid architectures yet, this patch just accepts all. Semantically, this also limits the list of clauses that can be applied after the device_type, so this implements that as well.

Don't create a new local SDLoc and then take a reference to it, just create the SDLoc directly.

…#91943) Updates tests in "vector-transfer-permutation-lowering.mlir" to make a clearer split into cases for : * xfer_read vs xfer_write * fixed-width vs scalable tests A new test case is added for fixed-width vectors for vector.transfer_read. This is to complement an existing test for scalable vectors. This is in preparation for llvm#90835 and also for adding more tests for scalable vectors.

I wasn't able to reproduce the test crash, but I believe this might be a different definition of 'assert' on some platforms, so I believe this patch should fix it (and fixes the suggested warning).

…lvm#90098) A compound construct with a list of clauses is broken up into individual leaf/composite constructs. Each such construct has the list of clauses that apply to it based on the OpenMP spec. Each lowering function (i.e. a function that generates MLIR ops) is now responsible for generating its body as described below. Functions that receive AST nodes extract the construct, and the clauses from the node. They then create a work queue consisting of individual constructs, and invoke a common dispatch function to process (lower) the queue. The dispatch function examines the current position in the queue, and invokes the appropriate lowering function. Each lowering function receives the queue as well, and once it needs to generate its body, it either invokes the dispatch function on the rest of the queue (if any), or processes nested evaluations if the work queue is at the end. Re-application of ca1bd59 with fixes for compilation errors.

Extends the computation of the matching distance in the generic resolution to support options described in the table: https://docs.nvidia.com/hpc-sdk/archive/24.3/compilers/cuda-fortran-prog-guide/index.html#cfref-var-attr-unified-data Options are added as language features in the `SemanticsContext` and a flag is added in bbc for testing purpose.

This reverts commit c4a9a37. This and the followup patch keep hitting an assert I wrote on the build bots in a way that isn't clear. Reverting so I can fix it without a rush.

This fixes https://lab.llvm.org/buildbot/#/builders/268/builds/13925, which somehow doesn't show in any of my local builds.

This was left out from llvm@257013e

…m#87672) Android supports per thread stack protectors that are individually managed and initialized, which can provide stronger protections than using the global stack protector cookie. This patch matches the convention for other architectures targeting Android platforms.

…tDefEmitter (llvm#91941) getAllDerivedDefinitions produces a fatal error if there are no definitions. In practice this isn't much of a problem for llvm/lib/Target/RISCV/*.td where it's hard to imagine not having at least one of the required defitions. But it limits our ability to structure and maintain tests (which is how I came across this issue). This commit moves to using getAllDerivedDefinitionsIfDefined and aims to skip emission of data structures that make no sense if no definitions were found.

Pull Request: llvm#90670

…nstantiation (llvm#91972) llvm#90152 introduced a bug that occurs when typo-correction attempts to fix a reference to a non-existent member of the current instantiation (even though `operator->` may return a different type than the object type). This patch fixes it by simply considering the object expression to be of type `ASTContext::DependentTy` when the arrow operator is used with a dependent non-pointer non-function operand (after any implicit conversions).

This patch fixes: flang/lib/Lower/OpenMP/OpenMP.cpp:2346:14: error: unused variable 'origDirective' [-Werror,-Wunused-variable]

…en merged (llvm#91246) Remove redundant debug instructions after blocks have been merged into the predecessor, It can reduce some compile time in some cases. This change only fixes the situation of loop unrolling, and other situations are not considered. "RemoveRedundantDbgInstrs" seems to be very time-consuming. Thus, we just add here after the "Dest" has been merged into the "Fold", this may be a more targeted solution!!! fixes: llvm#89073

…91137) In summary: - `Monomial` -> `MonomialBase` with two inheriting `IntMonomial` and `FloatMonomial` for the different coefficient types - `Polynomial` -> `PolynomialBase` with `IntPolynomial` and `FloatPolynomial` inheriting - `PolynomialAttr` -> `IntPolynomialAttr`, and new `FloatPolynomialAttr` attribute, both of which may be input to `polynomial.constant` - Refactoring common parts of attribute parsers. --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

Doing so avoids negative interactions with other combines which don't know the shl_add is a single instruction. From the commit log, we've had several combine loops already. This was originally posted as part of llvm#88791, where a bug was pointed out. That bug was fixed by llvm#89789 which hits the same issue from another angle. To confirm the fix, I included the reduced test case here.

…vm#91817) A general question is: is it possible to support hooks here to infer the encoding? E.g., when the extracted tensor slice is rank-reduced, the encoding need to be updated accordingly as well.

llvm#91137)" (llvm#92001) This reverts commit 91a14db. Not sure how to fix the build error this introduced, so reverting until I can figure it out https://lab.llvm.org/buildbot/#/builders/264/builds/10468 Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

…le in VectorToSPIRV (llvm#91800) Context: iree-org/iree#17346. Test IREE integrate showing it's fixing the problem it's intended to fix, i.e. it allows IREE to drop its local revert of llvm#89131: iree-org/iree#17359 This is added to VectorToSPIRV because SPIRV doesn't currently handle `vector.interleave` (see motivating context above). This is limited to 1D, non-scalable vectors.

…91737) When an overflow happens during shift left, i.e. the last sign bit or the most significant data bit gets shifted out, the current approach of inferring the range of results does not work anymore. This patch checks for possible overflow and returns the max range in that case. Fix llvm#82158

…template parameters (llvm#91833) When partial ordering alias templates against template template parameters, allow pack expansions when the alias has a fixed-size parameter list. These expansions were generally disallowed by proposed resolution for CWG1430. By previously diagnosing these when checking template template parameters, we would be too strict in trying to prevent any potential invalid use. This flows against the more general idea that template template parameters are weakly typed, that we would rather allow an argument that might be possibly misused, and only diagnose the actual misuses during instantiation. Since this interaction between P0522R0 and CWG1430 is also a backwards-compat breaking change, we implement provisional wording to allow these. Fixes llvm#62529

device_type, also spelled as dtype, specifies the applicability of the clauses following it, and takes a series of identifiers representing the architectures it applies to. As we don't have a source for the valid architectures yet, this patch just accepts all. Semantically, this also limits the list of clauses that can be applied after the device_type, so this implements that as well. This reverts commit 06f04b2. This reapplies commit c4a9a37. The build failures were caused by the patch depending on the order of evaluation of arguments to a function. This reapplication separates out the capture of one of the values.

When enabled, input sections that would otherwise overflow a memory region are instead spilled to the next matching output section. This feature parallels the one in GNU LD, but there are some differences from its documented behavior: - /DISCARD/ only matches previously-unmatched sections (i.e., the flag does not affect it). - If a section fails to fit at any of its matches, the link fails instead of discarding the section. - The flag --enable-non-contiguous-regions-warnings is not implemented, as it exists to warn about such occurrences. The implementation places stubs at possible spill locations, and replaces them with the original input section when effecting spills. Spilling decisions occur after address assignment. Sections are spilled in reverse order of assignment, with each spill naively decreasing the size of the affected memory regions. This continues until the memory regions are brought back under size. Spilling anything causes another pass of address assignment, and this continues to fixed point. Spilling after rather than during assignment allows the algorithm to consider the size effects of unspillable input sections that appear later in the assignment. Otherwise, such sections (e.g. thunks) may force an overflow, even if spilling something earlier could have avoided it. A few notable feature interactions occur: - Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the input section were actually placed there. - SHF_MERGE synthetic sections use the spill list of their first contained input section (the one that gives the section its name). - ICF occurs oblivious to spill sections; spill lists for merged-away sections become inert and are removed after assignment. - SHF_LINK_ORDER and .ARM.exidx are ordered according to the final section ordering, after all spilling has completed. - INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.

Reverts llvm#90007 Broke in merging I think.

This ensures that we log pointers as lower-case hex. E.g., instead of: ``` LayoutRecordType on (ASTContext*)0x000000010E78D600 'scratch ASTContext' for (RecordDecl*)0x000000010E797 ``` we now log: ``` LayoutRecordType on (ASTContext*)0x000000010e78d600 'scratch ASTContext' for (RecordDecl*)0x000000010e797 ``` Which is consistent with how the AST dump gets emitted into the log. This makes it easier to correlate pointers we log from LLDB and pointers that are part of any AST dumps in the same `expr` log.

…ion (llvm#91811)" (llvm#91837) With blocking issues fixed, re-enable relaxed template template argument matching by reverting these commits. This reverts commit 4198aeb. This reverts commit 2d5634a.

…rn, enable in VectorToSPIRV" (llvm#92006) Reverts llvm#91800 Reason: https://lab.llvm.org/buildbot/#/builders/268/builds/13935

This patch calculates knownbits from fp instructions/dominating fcmp conditions. It will enable more optimizations with signbit idioms.

…cation.ll. NFC The test we had didn't match it's description. Now we have one test with a large offset that requires a virtual base register and a test with a smaller offset that should not. There is currently a bug that causes the offset to double counted leading to the small case also using a virtual base register.

…sFrameBaseReg. It's already added in isFrameOffsetLegal so adding it in needsFrameBaseReg causes it to be double counted.

When enabled, input sections that would otherwise overflow a memory region are instead spilled to the next matching output section. This feature parallels the one in GNU LD, but there are some differences from its documented behavior: - /DISCARD/ only matches previously-unmatched sections (i.e., the flag does not affect it). - If a section fails to fit at any of its matches, the link fails instead of discarding the section. - The flag --enable-non-contiguous-regions-warnings is not implemented, as it exists to warn about such occurrences. The implementation places stubs at possible spill locations, and replaces them with the original input section when effecting spills. Spilling decisions occur after address assignment. Sections are spilled in reverse order of assignment, with each spill naively decreasing the size of the affected memory regions. This continues until the memory regions are brought back under size. Spilling anything causes another pass of address assignment, and this continues to fixed point. Spilling after rather than during assignment allows the algorithm to consider the size effects of unspillable input sections that appear later in the assignment. Otherwise, such sections (e.g. thunks) may force an overflow, even if spilling something earlier could have avoided it. A few notable feature interactions occur: - Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the input section were actually placed there. - SHF_MERGE synthetic sections use the spill list of their first contained input section (the one that gives the section its name). - ICF occurs oblivious to spill sections; spill lists for merged-away sections become inert and are removed after assignment. - SHF_LINK_ORDER and .ARM.exidx are ordered according to the final section ordering, after all spilling has completed. - INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.

…expression log (llvm#91985) We emit `ASTContext` and `TypeSystem` pointers into the `expr` log but there is no easy way (that I know of) to correlate the pointer value back to an easily readible form. This patch simply logs the name of the `TypeSystem` and the associated `ASTContext` into the `expr` channel whenever we create a new `TypeSystemClang`. The following is an example of the new log entries: ``` $ grep Created /tmp/lldb.log Created new TypeSystem for (ASTContext*)0x0000000101a2e200 'ASTContext for '/Users/michaelbuch/a.out'' Created new TypeSystem for (ASTContext*)0x0000000102512a00 'scratch ASTContext' Created new TypeSystem for (ASTContext*)0x0000000102116a00 'ClangModulesDeclVendor ASTContext' Created new TypeSystem for (ASTContext*)0x00000001022e8c00 'Expression ASTContext for '<user expression 0>'' Created new TypeSystem for (ASTContext*)0x00000001103e7200 'AppleObjCTypeEncodingParser ASTContext' Created new TypeSystem for (ASTContext*)0x00000001103f7000 'AppleObjCDeclVendor AST' Created new TypeSystem for (ASTContext*)0x00000001104bfe00 'Expression ASTContext for '<clang expression>'' Created new TypeSystem for (ASTContext*)0x0000000101f01000 'Expression ASTContext for '<clang expression>'' Created new TypeSystem for (ASTContext*)0x00000001025d3c00 'Expression ASTContext for '<clang expression>'' Created new TypeSystem for (ASTContext*)0x0000000110422400 'Expression ASTContext for '<clang expression>'' Created new TypeSystem for (ASTContext*)0x000000011602c200 'Expression ASTContext for '<user expression 1>'' Created new TypeSystem for (ASTContext*)0x0000000110641600 'Expression ASTContext for '<clang expression>'' Created new TypeSystem for (ASTContext*)0x0000000110617400 'Expression ASTContext for '<clang expression>'' ```

Co-authored-by: Leon Clark <leoclark@amd.com>

…nversion (llvm#90410) Ignore optionals in unevaluated context, like static_assert or decltype. Closes llvm#89593

Add a new clang-tidy check that converts absl::StrFormat (and similar functions) to std::format (and similar functions.) Split the configuration of FormatStringConverter out to a separate Configuration class so that we don't risk confusion by passing two boolean configuration parameters into the constructor. Add AllowTrailingNewlineRemoval option since we never want to remove trailing newlines in this check.

…lvm#91988) This commit allows `inferFragType` to see through all arith.ext op and other elementwise users before reaching contract op for figuring out the fragment type.

Offset annotation was missed when optimizing an unconditional branch to a tail call. Test Plan: update bb-with-two-tail-calls.s

llvm#91686) If you change the generation script and re-run ninja (or whatever drives your build), it currently will not regenerate SBLanguages.h. With dependency tracking, it should re-run when the script changes.

…le in VectorToSPIRV (llvm#92012) This is the second attempt at merging llvm#91800, which bounced due to a linker error apparently caused by an undeclared dependency. `MLIRVectorToSPIRV` needed to depend on `MLIRVectorTransforms`. In fact that was a preexisting issue already flagged by the tool in https://discourse.llvm.org/t/ninja-can-now-check-for-missing-cmake-dependencies-on-generated-files/74344. Context: iree-org/iree#17346. Test IREE integrate showing it's fixing the problem it's intended to fix, i.e. it allows IREE to drop its local revert of llvm#89131: iree-org/iree#17359 This is added to VectorToSPIRV because SPIRV doesn't currently handle `vector.interleave` (see motivating context above). This is limited to 1D, non-scalable vectors.

This is continuation of efforts to split `Sema` up, following the example of OpenMP, OpenACC, etc. Context can be found in llvm#82217 and llvm#84184. I split formatting changes into a separate commit to help reviewing the actual changes.

…on. (llvm#91974)

This avoids 'Permission denied' when PWD is read-only. While here, change the triple from a Linux one to a generic ELF one.

…s in RISCVRegisterInfo::needsFrameBaseReg. Instead of using getReservedRegs, just check the subtarget reserved list. getReservedRegs considers the frame pointer to be reserved when it is being used, but we do need to save/restore it so it should be counted as a callee saved register. AArch64 hardcodes their callee saved size, but the comment mentions the Frame Pointer being counted.

…:needsFrameBaseReg The vector callee saved registers shouldn't affect the frame pointer offset so we don't want to consider them. I've listed the GPR, FPR32, and FPR64 register classes explicitly because getMinimalPhysRegClass is slow and this function is called frequently. So explicitly listing the interesting classs should be a compile time improvement.

The testing we have for vector ptradd was a bit lacking. In adding tests this patch found a couple of issues mostly with the way v3 vectors of ptrs were sometimes legalized via i64, and with non-i64 additions. It does not attempt to fix the issue with mergevalues from returning vector ptrs.

…2016) This is a proof of concept recognition of the most basic forms of ReLu operations, used to show-case sparsification of end-to-end PyTorch models. In the long run, we must avoid lowering such constructs too early (with this need for raising them back). See discussion at https://discourse.llvm.org/t/min-max-abs-relu-recognition-starter-project/78918

Switch from FuncBranchData intermediate maps (Intra/InterIndex) to aggregated Data, same as one used by DataReader: https://github.com/llvm/llvm-project/blob/e62ce1f8842cca36eb14126d79dcca0a85bf6d36/bolt/lib/Profile/DataReader.cpp#L385-L389 This aligns the order of the output between YAMLProfileWriter and writeBATYAML. Test Plan: updated bolt-address-translation-yaml.test Reviewers: rafaelauler, dcci, ayermolo, maksfb Reviewed By: ayermolo, maksfb Pull Request: llvm#91289

There is nothing specific here and it is not different from i16 or f16.

## Why Currently, the system header `errno.h` is included in `libc_errno.h`, which is supposed to be consumed by internal implementations only. As unit and hermetic tests should never use `#include <errno.h>` but instead use `#include "src/errno/libc_errno.h"`, we do not want to implicitly include `errno.h`. In order to have a clear seperation between those two, we want to pull out the definitions of errno numbers from `errno.h`. ## What * Extract the definitions of errno numbers from [include/errno.h.def](https://github.com/llvm/llvm-project/pull/91150/files#diff-ed38ed463ed50571b498a5b69039cab58dc9d145da7f751a24da9d77f07781cd) and place it under [include/llvm-libc-macros/linux/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-d6192866629690ebb7cefa1f0a90b6675073e9642f3279df08a04dcdb05fd892) * Provide mips-specific errno numbers in [include/llvm-libc-macros/linux/mips/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-3fd35a4c94e0cc359933e497b10311d857857b2e173e8afebc421b04b7527743) * Find definition of mips errno numbers in glibc [here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/mips/bits/errno.h#L32-L50) (equally defined in the Linux kernel) * Provide sparc-specific errno numbers in [include/llvm-libc-macros/linux/sparc/error-number-macros.h](https://github.com/llvm/llvm-project/pull/91150/files#diff-5f16ffb2a51a6f72ebd4403aca7e1edea48289c99dd5978a1c84385bec4f226b) * Find definition of sparc errno numbers in glibc [here](https://github.com/bminor/glibc/blob/ea73eb5f581ef5931fd67005aa0c526ba43366c9/sysdeps/unix/sysv/linux/sparc/bits/errno.h#L33-L51) (equally defined in the Linux kernel) * Include proxy header `errno_macros.h` instead of the system header `errno.h` in `libc_errno.h`/`libc_errno.cpp` Closes llvm#80172

…ack ops. (llvm#90641) Windows build of `mlir` with Visual Studio (19.36.32538 for x64) using with the following command: `cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1 -DLLVM_TARGETS_TO_BUILD=host ../llvm` is leading to a crash when calling canonicalization on `tensor.pack`/`tensor.unpack` ops `mlir-opt --canonicalize input.mlir` where the `input.mlir` is as follows (this is taken from one of the filecheck tests for `tensor.pack`): ``` func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> { %pack_dest = tensor.empty() : tensor<8x16x8x32xf32> %unpack_dest = tensor.empty() : tensor<128x256xf32> %tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32> %tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32> return %tup : tensor<128x256xf32> } ``` The crash is seemingly coming from invalid memory access during iterating over `innerDimsPos` within `getPackOpResultTypeShape`. This crash is also causing the following tests to fail: ``` MLIR :: Dialect/Linalg/canonicalize.mlir MLIR :: Dialect/Linalg/data-layout-propagation.mlir MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir MLIR :: Dialect/Linalg/transform-lower-pack.mlir MLIR :: Dialect/Linalg/transform-op-fuse.mlir MLIR :: Dialect/Linalg/transform-op-pack.mlir MLIR :: Dialect/Linalg/transform-pack-greedily.mlir MLIR :: Dialect/Tensor/canonicalize.mlir MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir MLIR :: Dialect/Tensor/invalid.mlir MLIR :: Dialect/Tensor/ops.mlir MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir MLIR :: Dialect/Tensor/tiling.mlir ```

…lvm#92041)

This fixes the new test linkerscript/enable-non-contiguous-regions.test from llvm#90007 in -stdlib=libc++ -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG builds. adjustOutputSections does not discard the output section .potential_a because it contained .a (which would be spilled to .actual_a). .potential_a and .bc have the same address and will cause an assertion failure.

This reapplies llvm@195d8ac [DirectX] Fix DXIL part header version encoding. The endian issue was fixed by llvm@f42117c. Move MinorVersion be the lower 8 bit. Set DXIL version in DXContainerObjectWriter::writeObject. Fixes llvm#89952

…on Windows This marks delayed-definition-die-searching.test as unsupported on Windows. Clang uses link.exe as default linker if not marked explicitly to use lld. When used with link.exe clang produces PDB format debug info even when -gdwarf is specified. This test will be unsupported until we make lldb-aarch64-windows buildbot to use lld.

Instead of hardcoding all of the register name strings.

This PR: - Make `clock_gettime` a header-only library - Add `clock_conversion` header library to allow conversion between clocks relative to the time of call - Add `timeout` header library to manage the absolute timeout used in POSIX's timed locking/waiting APIs

) This change improves the matching algorithm by using the diff algorithm, the current matching algorithm only processes the callsites grouped by the same name functions, it doesn't consider the order relationships between different name functions, this sometimes fails to handle this ambiguous anchor case. For example. (`Foo:1` means a calliste[callee_name: callsite_location]) ``` IR : foo:1 bar:2 foo:4 bar:5 Profile : bar:3 foo:5 bar:6 ``` The `foo:1` is matched to the 2nd `foo:5` and using the diff algorithm(finding longest common subsequence ) can help on this issue. One well-known diff algorithm is the Myers diff algorithm(paper "An O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its variations have been implemented and used in many famous tools, like the GNU diff or git diff. It provides an efficient way to find the longest common subsequence or the shortest edit script through graph searching. There are several variations/refinements for the algorithm, but as in our case, the num of function callsites is usually very small, so we implemented the basic greedy version in this change which should be good enough. We observed better matchings and positive perf improvement on our internal services.

… V/Zve is not enabled. We can't save vector registers without V/Zve.

Patch llvm#91150 added a proxy header for errno macros. This patch fixes the bazel build since it needs to be added as a dependency.

… PRs (llvm#91826) We have been collecting release notes from the PRs for most of the 18.1.x releases and this just helps automate the process.

…ult in LLVM (llvm#89799)"" This reverts commit 91446e2 and a unittest followup 1530f31 (llvm#90476). In a stage-2 -flto=thin -gsplit-dwarf -g -fdebug-info-for-profiling -fprofile-sample-use= build of clang, a ThinLTO backend compile has assertion failures: Global is external, but doesn't have external or weak linkage! ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE function declaration may only have a unique !dbg attachment ptr @_ZN5clang12ast_matchers8internal18makeAllOfCompositeINS_8QualTypeEEENS1_15BindableMatcherIT_EEN4llvm8ArrayRefIPKNS1_7MatcherIS5_EEEE The failures somehow go away if -fprofile-sample-use= is removed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

[AutoBump] Merge with 23f8fac7 (May 14) (44) #303

Commits on May 11, 2024

Commits on May 12, 2024

Commits on May 13, 2024

Commits on Aug 23, 2024