[AutoBump] Merge with 894d3eeb (Aug 15) (4) #357

…, fsqrt(, l, f128) to math.yaml. (llvm#103494) Added auto function hdrgen specification for functions: totalordermag(,f, l, f128), dsqrt(l, f128), fsqrt(, l, f128)

Also combine the GlobalISel tests into the SelectionDAG ones.

This commit adds three matchers that unlike the m_NonZero matcher not only match constants, but also operations that implement the InferIntRangeInterface. These matchers can then match a non-zero value or a value that is not minus one based on the inferred range. Additionally, the commit uses the new matchers in the getSpeculatability functions of Arith's signed and unsigned integer divisions. At the moment, the matchers only look at the defining operation to avoid expensive IR walks. This range based matchers can be useful when hoisting divisions out of a loop, which requires knowing the divisor is non-zero and not minus one for signed divisions. Just checking for a constant divisor may not be sufficient, if the divisor is, for example, the result of an operation that returns the number of threads of a team of threads.

Allow subvector extraction as long as at least one operand extraction is free. Refactor existing cases into a switch statement to allow easier reuse + future expansion.

) It seems that the parameters can be passed through the class members.

…03767)

...because it is too noisy to be useful right now, and its architecture is terrible, so it can't act a starting point of future development. The main problem with this checker is that it tries to do (or at least fake) path-sensitive analysis without actually using the established path-sensitive analysis engine. Instead of actually tracking the symbolic values and the known constraints on them, this checker blindly gropes the AST and uses heuristics like "this variable was seen in a comparison operator expression that is not a loop condition, so it's probably not too large" (which was improved in a separate commit to at least ignore comparison operators that appear after the actual `malloc()` call). This might have been acceptable in 2011 (when this checker was added), but since then we developed a significantly better standard approach for analysis and this old relic doesn't deserve to remain in the codebase. Needless to say, this primitive approach causes lots of false positives (and presumably false negatives as well), which ensures that this alpha checker won't be missed by the users. Moreover, the goals of this checker would be questionable even if it had a perfect implementation. It's very aggressive to assume that the argument of malloc can overflow by default (unless the checker sees a bounds check); and this produces too many false positives -- perhaps even for an optin checker. It may be possible to eventually create a useful (and properly path-sensitive) optin checker for these kinds of suspicious code, but this is a very low priority goal. Also note that we already have `alpha.security.TaintedAlloc` which provides more practical heuristics for detecting somewhat similar "argument of malloc may be too large" vulnerabilities.

…sions There's some coverage in RISCVISAInfoTest, but it's worth adding a quick test to ensure nothing happens to the frontend handling of this option.

When instantiating a delayed template, the recorded token stream is passed to `Parser::ParseLateTemplatedFuncDef` which will append the current token "so it doesn't get lost". With incremental extensions enabled, this is `repl_input_end` which subsequently needs support for (de)serialization.

… order (llvm#102844) Put the newest standards first, same as for the [C++ status page](https://clang.llvm.org/cxx_status.html). The diff is pretty busted, but I swear I copy & pasted faithfully 😅 The only change beyond shuffling sections around is unfolding the sections for C99/C11 (6dbce28), which isn't necessary anymore now that they're safely tucked away towards the end of the page.

…file (llvm#103004)" This reverts commit 2d53f0a. This causes warnings when building with MSVC.

getRawData exposes some internal details of APInt. The code was iterating over the uint64_t pieces and then iterating breaking them into 4 uint16_t pieces. This patch changes the code to extract 16-bit pieces directly from the APInt without using getRawData.

…te global data (llvm#101224) This patch aims to reduce TOC usage by merging internal and private global data. Moreover, we also add the GlobalMerge pass within the PPCTargetMachine pipeline, which is disabled by default. This transformation can be enabled by -ppc-global-merge.

…nstant into a signed comparison (llvm#103480) Given an unsigned integer comparison of `add nsw X, C1` with some constant `C2` we can fold it into a signed comparison of `X` and `C2 - C1` under the following conditions: * There's a `nsw` flag on the addition * `C2` is non-negative * `X + C1` is non-negative * `C2 - C1` is non-negative

…lvm#103392) ... whereever we have the Decl for it, and even when we don't keep the SourceLocation of it aimed at the call site. Fixes: llvm#102983

llvm#103935)

In preparing for the future upcoming patches, just moving the call to the proper place, which is NFC for now.

…p_atomics (llvm#103732) This commit adds support amdgpu-unsafe-gp-atomics attr plumbing via introduction of `rocdl.unsafe_fp_atomics`. This adds the missing translation for amdgpu-waves-per-eu attr.

…vm#103927) This commit changes the LLVM dialect's inliner interface to no longer be registered at dialect initialization. Instead, it is now a promised interface, that needs to be registered explicitly. This change is desired to avoid pulling in a lot of dependencies into the `MLIRLLVMDialect` library, especially considering future patches that plan to extend it further with strong IR analysis.

…rs, NFC GateredScalars is a full copy of the E->Scalars in this places and can be safely used for now. Unifies the code across the function.

…to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (llvm#101751) If the upper bits of the shr aren't demanded. This helps with cases where the outer srl was originally an sra and was converted to a srl by SimplifyDemandedBits before it had a chance to combine with the inner sra. This can occur when the inner sra was part of a sign_extend_inreg expansion. There are some regressions in ARM and Thumb2.

…lvm#102952) This PR addresses the issue detailed in iree-org/iree#17948. The problem occurs when distributed types are set to NULL, leading to compilation crashes. --------- Signed-off-by: Bangtian Liu <liubangtian@gmail.com>

@tstellar

…eded for explicit symbol visibility (llvm#103900) In multiple source files function definitions never sees there declaration in a header because its never included causing linker errors when explicit symbol visibility macros\dllexport are added to the declarations. Most of these were originally found by @tstellar in llvm#67502 TargetRegistry.h is needed in MCExternalSymbolizer.cpp for createMCSymbolizer Analysis/Passes.h is needed in LazyValueInfo.cpp and RegionInfo.cpp for createLazyValueInfoPassin and createRegionInfoPass Transforms/Scalar.h is needed in SpeculativeExecution.cpp for createSpeculativeExecutionPass

And declare it to take an MCRegister. Also rename related entities and remove a comment for the function that depending on its purpose is either irrelevant or misleading.

Suggested in llvm#95308 (comment)

…m#100749) Fixes llvm#65072. This allows binary ops of splats to be scalarized if the operation isn't legal on the element type isn't legal, but is legal on the type it will be legalized to. I assume if an Op is legal both in scalar and vector, choose scalar version should always be better no matter what the type is. There are some cases that my approach can't scalarize, for example: ``` llvm ; test/CodeGen/RISCV/rvv/select-int.ll define <vscale x 4 x i64> @select_nxv4i64(i1 zeroext %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b) { %v = select i1 %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b ret <vscale x 4 x i64> %v } ``` https://godbolt.org/z/xzqrKrxvK `xor (splat i1, splat i1)` is generated in late step after LegalizeType, from select. I didn't figure out how to make `xor i1, i1` legal at this time. --------- Co-authored-by: Luke Lau <luke@igalia.com>

…04045) Reverts llvm#101409 This caused some memory usage regressions and it has a known bug in page releasing.

llvm#102987)

This intrinsic supports [P2647R1](https://wg21.link/p2674r1) "A trait for implicit lifetime types". Resolves llvm#98627 --------- Co-authored-by: Timm Baeder <tbaeder@redhat.com>

- LWG2308 was voted into C++14 in the 2014 Issaquah meeting - LWG2682 was voted into C++20 in San Diego 2018, not C++17 in Issaquah - LWG2769 was voted into C++17 in Kona 2017, not Issaquah 2016

…/n) (llvm#102286) Adds tests for scalable vectors in: * sink-vector-broadcast.mlir This test file excercises patterns grouped under `populateSinkVectorBroadcastPatterns`, which includes: * `ReorderElementwiseOpsOnBroadcast`, * `ReorderCastOpsOnBroadcast`. Right now there are only tests for the former. However, I've noticed that "vector-reduce-to-contract.mlir" contains tests for the latter and I've left a few TODOs to group these tests back together in one file. Additionally, added some helpful `notifyMatchFailure` messages in `ReorderElementwiseOpsOnBroadcast`.

LLDB on OSX is looking at a `bin` directory sibling to the `lib` one that contains liblldb for its supporting executables. This works well for CMake, however, for other build systems like bazel, it's not that easy to have that build structure, for which it's much easier to also use the `lib` directory as a fallback under the absence of `bin`. This shouldn't break anything, but instead should make it a bit easier for LLDB to work with different build systems and folder structures.

Make sure Cxx14Papers.csv has the same columns as the other CSV files. Somehow this was missed in my previous passes to standardize this.

This specific callback should now be at parity with the old pass manager version. There are still some missing IR passes before this point. Also I don't understand the need for the RequiresAnalysisPass at the end. SelectionDAG should just be using the uncached getResult?

Tested w/ bazel locally.

Summary: I am attempting to get the GPU to build and support libc++. One issue I've encountered is that it will look for `timeval` unless this macro is set. We can support `CLOCK_MONOTONIC` on the GPU fairly easily as we have access to a fixed-frequency clock via `__builtin_readsteadycounter` intrinsics with a known frequency. This also requires `CLOCK_REALTIME` which we can't support, but provide anyway from the GPU `libc` to make this happy. It will return an error so at least that will be obvious. I may need a more consistent configuration for this in the future, maybe I should put a common macro in a different common header that's just `__GPU__`? I don't know where I would put such a thing however.

…as free Allows further reductions in instruction vector widths

…),m3,concat(y0,u)) Reference the lowest subvector if higher subvectors match - this often occurs in length changing shuffles. Fixes llvm#103564

This is a continuation of the __ptr32 support added here llvm@135fecd

…e attribute to struct fields (llvm#101585) Extend the unsafe_buffer_usage attribute, so they can also be added to struct fields. This will cause the compiler to warn about the unsafe field at their access sites. Co-authored-by: MalavikaSamak <malavika2@apple.com>

@mikaelholmen

Solves typo found by @mikaelholmen here: llvm#98551 (comment)

We cannot promote this case unless we know the value is only observed through flat operations. We cannot analyze this through a call. PointerMayBeCaptured was an imprecise check for this. A callee with a nocapture attribute may still cast to private and observe the address space, so really we need a different notion of nocapture. I doubt this was of any use anyway. The promotable cases should have optimized out addrspacecast to begin earlier. Fixes llvm#66669 Fixes llvm#104035

This patch implements sandboxir::Instruction flags.

Per [basic.scope], the locus of a concept is immediately after the introduction of its name. This let us provide better diagnostics for attempt to define recursive concepts. Note that recursive concepts are not supported per https://eel.is/c++draft/basic#scope.pdecl-note-3, but there is no normative wording for that restriction. This is a known defect introduced by [p1787r6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html). Fixes llvm#55875

When introducing the address space predicates, move and mutate the original instruction, and clone for the shared case.

llvm#103409) Fixes llvm#101960

…it a new spellable attribute. (llvm#102414) Much like llvm#98193, this PR takes some more data out of the resource attribute, specifically the ROV data. This PR introduces a new attribute called HLSLROVAttr, which contains data on whether or not the decl the attribute applies to is an ROV. Tests were added to ensure the attribute is found on the AST. This attribute may take any boolean condition as an argument. If the condition is true, then the object the attribute applies to "is" an ROV. Fixes #llvm#102392

…m#102992) Updated version of llvm#102686. The issue was that in some rebox case the addendum presence flag should be updated and not always taken from the "from" box. This is the case when reboxing a fir.class to a fir.box that doesn't require an addendum for example. Open a new review since there is a bit of additional code in the CodeGen part.

Each LSP server type (mlir-lsp-server, pdll-lsp-server and tblgen-lsp-server) should have a different "additional_server_args" entry in the config for passing arguments to the server such as `--log=verbose`.

Summary: The 'omp_alloc' function should be callable from a target region. This patch implemets it by simply calling `malloc` for every non-default trait value allocator. All the special access modifiers are unimplemented and return null. The null allocator returns null as the spec states it should not be usable from the target.

…vm#104056) Summary: This caused issues with https://gitlab.e4s.io/uo-public/llvm-openmp-offloading/-/jobs/301520 because adding `-flto` caused it to pass `-plugin` sometimes, which isn't supported.

…m#102005) Summary: This utility function gets a temp file to use for tests. It either uses WIN32 or POSIX to create it. Some targets only follow the C standard, and this test case will fail. This patch simply adds a fallback that uses the `tmpnam` function from standard C. This function isn't ideal, but it is good enough for our use-case. --------- Co-authored-by: Mark de Wever <zar-rpg@xs4all.nl>

) Summary: The logic for this `__is_function_overridden` check requires accessing a runtime array normally created by the linker. The NVPTX target is an `__ELF__` target, however it does not support emitting the `__start/__stop` symbols for C-identifier named sections. This needs to be disabled explicitly so that the user can compile this with anything.

Local data is referenced in Objective-C metadata via section + offset relocations on x86-64 rather than via symbols. Without this change, we would crash on incorrect casts of the referents to `Defined`. A basic test based on the existing `objc-relative-method-lists-simple.s` adopted to x86-64 is added.

…vm#101747) Summary: These assembly constraints are illegal / invalid on the AMDGPU target. The `r` constraint is only valid on inputs and the `m` constraint isn't accepted at all. The NVPTX target can handle them because it uses a more permissive virtual machine (PTX is an IR). Simply add exceptions on the target to make these work.

The warning has been active for a few releases now, first only in user code, later in system headers, and finally as an error by default. Therefore, we believe it is now time to transition into a hard error, as required by the C++ Standard. The main affected C++ projects have by now fixed the error, or there's a pending patch for review that does it. Fixes llvm#59036

…iners, NFC

@bdudleback

CUDA Fortran is meant to be an equivalent to the runtime API. Therefore, it makes more sense to use the cuda rt API in the allocators for CUF. @bdudleback

In most cases when an instruction or function call clobbers fp and/or bp register, we can fix it by save/restore the clobbered register. But we still can't handle it when an invoked function clobbers fp and/or bp according to its calling convention. This patch detects this case and reports error instead of silently generating wrong code.

Add support for `llvm.nvvm.idp2a` and `llvm.nvvm.idp4a` which correspond directly to `dp2a` and `dp4a` PTX instructions.

Part of making s32 not legal for RV64. Unfortunately, generic widening/narrowing is not implement for this operation so I had to remove all tests. I don't think clang uses G_VAARG on RISC-V so this shouldn't be a big deal in practice.

We can only track existing LWG issues because we need a valid LWG issue number for all issues. I'll create another GH issue to track creating that LWG issue instead.

Reverts llvm#103488

@Max191

Refactored @Max191's PR llvm#94637 to move it to `Tensor` From the original PR >This PR adds fusion by expansion patterns to push a tensor.expand_shape up through a tensor.collapse_shape with non-intersecting reassociations. Sometimes parallel collapse_shape ops like this can block propagation of expand_shape ops, so this allows them to pass through each other. I'm not sure if I put the code/tests in the right places, so let me know where those go if they aren't. cc @MaheshRavishankar @hanhanW --------- Co-authored-by: Max Dawkins <max.dawkins@gmail.com>

Broke this out into its own commit to make the next one easier to review. Pull Request: llvm#100700

…utor (llvm#100952)" This reverts commit 36467bf.

This implements the DXILResourceAnalysis pass for `dx.TypedBuffer` and `dx.RawBuffer` types. This should be sufficient to lower `dx.handle.fromBinding` for this set of types, but it leaves a number of TODOs around for other resource types. This also includes a straightforward `print` method in `ResourceInfo` to make the analysis testable. This is deliberately different than the printer in `lib/Target/DirectX/DXILResource.cpp`, which attempts to print bindings in a format compatible with the comments `dxc` prints. We will eventually want to make that functionality driven by this analysis pass, but it isn't sufficient for testing so we need both. Pull Request: llvm#100699

…ert.ll. NFC This shows that Zfinx generates a sext.w instruction on RV64. The fadd.s should have filled the upper bits of the GPR with sign bits so this is unnecessary. Proving it is unnecessary might be difficult though.

…file (llvm#102366) This patch separates the lit tests that check for the functionality of lit's built-in cat command into its own test file and folder. This is a prerequisite for llvm#101530.

…lvm#101590) When using the lit internal shell with the command: ``` LIT_USE_INTERNAL_SHELL=1 ninja check-compiler-rt ``` The follow error is encountered: ``` File "TestRunner.py", line 770, in _executeShCmd inproc_builtin = inproc_builtins.get(args[0], None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: unhashable type: 'GlobItem' ``` This error is in a compiler-rt file: ``` TestCases/Linux/long-object-path.cpp ``` This error occurs because `args[0]` is of type `GlobItem`, which is not hashable, leading to a `TypeError` when it is passed in `inproc_builtins.get()`. To resolve this issue, I have updated the implementation to ensure that `args[0]` is hashable before it is used in `inproc_builtins`. fixes: llvm#102389 [link to RFC](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)

…math (llvm#98234) Summary: Currently we replace all math calls with vendor specific ones. This patch introduces a macro `__CLANG_GPU_DISABLE_MATH_WRAPPERS` that when defined will disable this. I went this route instead of a flag for two reasons. One, I think we have too many flags as is, and we already have `-nogpuinc` to cover disabling these wrappers entirely, so this would be a really specific subset of that. Second, these math headers aren't easily decoupled by simply not including a single header from the clang driver layer. There's the cmath and the regular math forward declares it would disable as well. Note, this currently causes errors because the GPU `libm` doesn't have `powi`, that's an NVIDIA extension I'll add to LLVM libm.

CUDA Fortran is meant to be an equivalent to the runtime API. Therefore, it makes more sense to use the cuda rt API in the allocators for CUF.

…m#103489) This PR removes an unneeded extract element instruction from codegen, along with the variable that captured that instruction's return value.

With Zfh and Zfhmin this combine creates a fmv_x_signexth node so we can remember that the result is sign extended. This become a fmv.x.h instruction which sign extends its result. With Zhinx, fmv_x_signexth becomes a COPY_TO_REGCLASS. In order for this to guarantee the result is properly sign extended we need all producers of a GPRF16 register class to guarantee the rest of the GPR is sign extended. I don't think we've done that. bitcasts from i16 to f16 definitely don't do it. The safest thing to do is to not do this combine so the sign_extend_inreg will emit a shift pair. This is also consistent with the code generated for Zfinx on RV64, we don't assume the upper 32 bits are sign extended.

As discussed here llvm#99296 (comment) Fixes llvm#99296 Fixes llvm#50294

@efriedma-quic

Introduce "-fsanitize-overflow-pattern-exclusion=" which can be used to disable sanitizer instrumentation for common overflow-dependent code patterns. For a wide selection of projects, proper overflow sanitization could help catch bugs and solve security vulnerabilities. Unfortunately, in some cases the integer overflow sanitizers are too noisy for their users and are often left disabled. Providing users with a method to disable sanitizer instrumentation of common patterns could mean more projects actually utilize the sanitizers in the first place. One such project that has opted to not use integer overflow (or truncation) sanitizers is the Linux Kernel. There has been some discussion[1] recently concerning mitigation strategies for unexpected arithmetic overflow. This discussion is still ongoing and a succinct article[2] accurately sums up the discussion. In summary, many Kernel developers do not want to introduce more arithmetic wrappers when most developers understand the code patterns as they are. Patterns like: if (base + offset < base) { ... } or while (i--) { ... } or #define SOME -1UL are extremely common in a code base like the Linux Kernel. It is perhaps too much to ask of kernel developers to use arithmetic wrappers in these cases. For example: while (wrapping_post_dec(i)) { ... } which wraps some builtin would not fly. This would incur too many changes to existing code; the code churn would be too much, at least too much to justify turning on overflow sanitizers. Currently, this commit tackles three pervasive idioms: 1. "if (a + b < a)" or some logically-equivalent re-ordering like "if (a > b + a)" 2. "while (i--)" (for unsigned) a post-decrement always overflows here 3. "-1UL, -2UL, etc" negation of unsigned constants will always overflow The patterns that are excluded can be chosen from the following list: - add-overflow-test - post-decr-while - negated-unsigned-const These can be enabled with a comma-separated list: -fsanitize-overflow-pattern-exclusion=add-overflow-test,negated-unsigned-const "all" or "none" may also be used to specify that all patterns should be excluded or that none should be. [1] https://lore.kernel.org/all/202404291502.612E0A10@keescook/ [2] https://lwn.net/Articles/979747/ CCs: @efriedma-quic @kees @jyknight @fmayer @vitalybuka Signed-off-by: Justin Stitt <justinstitt@google.com> Co-authored-by: Bill Wendling <morbo@google.com>

…expansion (llvm#102228) - Add support fef memory_space_cast to strided metadata expansion and narrow type emulation - Add support for expand_shape to narrow type emulation (like collapse_shape, it's a noop after linearization) and to expand-strided-metadata (mirroring the collapse_shape pattern) - Add support for memref.dealloc to narrow type emulation (it is a trivial rewrite) and for memref.copy (which is unsupported when it is used for a layout change but a trivial rewrite otherwise)

This script linkifies (i.e. makes clickable in the terminal) text that appears to be a pull request or issue reference (e.g. llvm#12345 or PR12345) or a 40-character commit hash (e.g. abc123). You can configure git to automatically send the output of commands that pipe their output through a pager, such as `git log` and `git show`, through this script by running this command from within your LLVM checkout: git config core.pager 'llvm/utils/git/linkify | pager' The pager command is run from the root of the repository even if the git command is run from a subdirectory, so the relative path should always work. It requires OSC 8 support in the terminal. For a list of compatible terminals, see https://github.com/Alhadis/OSC8-Adoption Reviewers: MaskRay Reviewed By: MaskRay Pull Request: llvm#103496

The nullness check is unreachable. * For the main thead and pthread_create created threads, the `*Allocate` functions must be called after `*_current_thread` is set. set. * For threads created by Linux's `clone`, static TLS is either reused or set to a new value (CLONE_SETTLS). Make this change for asan/msan and possibly extend the change to other sanitizers. (asan supports many platforms and I am not 100% certain that all platforms have the property.) Pull Request: llvm#102828

Continuing from llvm#102084, which introduced the analysis, we now populate it with info about functions contained in the module. When we will update the profile due to e.g. inlined callsites, we'll ingest the callee's counters and callsites to the caller. We'll move those to the caller's respective index space (counter and callers), so we need to know and maintain where those currently end. We also don't need to keep profiles not pertinent to this module. This patch also introduces an arguably much simpler way to track the GUID of a function from the frontend compilation, through ThinLTO, and into the post-thinlink compilation step, which doesn't rely on keeping names around. A separate RFC and patches will discuss extending this to the current PGO (instrumented and sampled) and other consumers as an infrastructural component.

llvm#104228) This was done as an afterthought in c3c9e45 without justification. Nothing relies on it being a specific kind of section, and downstream in CHERI LLVM we pass a non-GotSection to this function. Thus revert this overly-restrictive change and allow downstreams to pass other section types again. This partially reverts commit c3c9e45.

…lvm#104081) A crash was happening when both ObjC Category Merging and Relative method lists were enabled. ObjC Category Merging creates new data sections and adds them by calling `addInputSection`. `addInputSection` uses the symbols within the added section to determine which container to actually add the section to. The issue is that ObjC Category merging is calling `addInputSection` before actually adding the relevant symbols the the added section. This causes `addInputSection` to add the `InputSection` to the wrong container, eventually resulting in a crash. To fix this, we ensure that ObjC Category Merging calls `addInputSection` only after the symbols have been added to the `InputSection`.

…r RISCV (llvm#102560) This optimization helps reduce repeated calculations of base addresses by extracting type extensions when the same base address is accessed multiple times but its offset is a constant.

…m#104398) Singular warning I noticed when compiling lldb. Co-authored-by: Daniel <d.wedzicha@efg.gg>

@ya

…+ / VS2019+ (llvm#102848) Partial fix for llvm#92204. This PR just fixes VS2019+ since that is the suite of compilers that I require link compatibility with at the moment. I still intend to fix VS2017 and to update llvm-undname in future PRs. Once those are also finished and merged I'll close out llvm#92204. I am hoping to get the llvm-undname PR up in a couple of weeks to be able to demangle the VS2019+ name mangling. MSVC 1920+ mangles placeholder return types for non-templated functions with "@". For example `auto foo() { return 0; }` is mangled as `?foo@@ya@XZ`. MSVC 1920+ mangles placeholder return types for templated functions as the qualifiers of the AutoType followed by "_P" for `auto` and "_T" for `decltype(auto)`. For example `template<class T> auto foo() { return 0; }` is mangled as `??$foo@H@@ya?A_PXZ` when `foo` is instantiated as follows `foo<int>()`. Lambdas with placeholder return types are still mangled with clang's custom mangling since MSVC lambda mangling hasn't been deciphered yet. Similarly any pointers in the return type with an address space are mangled with clang's custom mangling since that is a clang extension. We cannot augment `mangleType` to support this mangling scheme as the mangling schemes for variables and functions differ. auto variables are encoded with the fully deduced type where auto return types are not. The following two functions with a static variable are mangled the same ``` template<class T> int test() { static int i = 0; // "?i@?1???$test@H@@yahxz@4HA" return i; } template<class T> int test() { static auto i = 0; // "?i@?1???$test@H@@yahxz@4HA" return i; } ``` Inside `mangleType` once we get to mangling the `AutoType` we have no context if we are from a variable encoding or some other encoding. Therefore it was easier to handle any special casing for `AutoType` return types with a separate function instead of using the `mangleType` infrastructure.

FindCountedByField can be used in more places than CodeGen. Move it into FieldDecl to avoid layering issues.

llvm#96649) C23 introduced new functions fminimum_num and fmaximum_num, and they follow the minimumNumber and maximumNumber of IEEE754-2019. Let's introduce new intrinsics to support them. This patch introduces support only support for scalar values. The support of vector (vp, vp.reduce, vector.reduce), experimental.constrained will be added in future patches. With this patch, MIPSr6 and LoongArch can work out of box with fcanonical and fmax/fmin. Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while they have no fcanonical support yet. I will add it in future patches. The FMIN/FMAX of RISC-V instructions follows the minimumNumber/maximumNumber of IEEE754-2019. We can just add it in future patch. Background https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735 Currently we have fminnum/fmaxnum, which have different behavior on different platform for NUM vs sNaN: 1) Fallback to fmin(3)/fmax(3): return qNaN. 2) ARM64/ARM32+Neon: same as libc. 3) MIPSr6/LoongArch/RISC-V: return NUM. And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008 will submit as separated patches.

To align with gas's latest changes. relate gas patch: https://sourceware.org/pipermail/binutils/2024-May/134360.html

This patch adds a verifier to `tosa.table` which fixes a crash. Fix llvm#103086.

Move VPWidenStoreRecipe::execute to VPlanRecipes.cpp in line with other ::execute implementations that don't depend on anything defined in LoopVectorization.cpp

) For now, the testcases are grouped in a single TEST. I'll sort them out and add more testcases in follow-up commits.

3 MLIR tests `FAIL` on SPARC, both Solaris/sparcv9 and Linux/sparc64: ``` MLIR :: Conversion/ArithToSPIRV/arith-to-spirv-le-specific.mlir MLIR :: IR/elements-attr-interface.mlir MLIR :: Target/LLVMIR/llvmir-le-specific.mlir ``` The issue is always the same: the tests in question are little-endian-only currently, so this patch `XFAIL`s them on `sparc*` as is already done for `s390x`. Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`, `amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.

…lvm#103722) `Flang :: Lower/default-initialization-globals.f90` `FAIL`s on SPARC, both Solaris/sparcv9 and Linux/sparc64. The failure mode is same as on AIX/PowerPC, so both targets being big-endian, this patch treats them the same. Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`, `amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.

…C) (llvm#103723) This makes `LayoutAlignElem` / `PointerAlignElem` and `AlignTypeEnum` inner types of `DataLayout`. The types are also renamed to match their meaning (LangRef refers to them as "specification" and "specifier"). Pull Request: llvm#103723

Removing them simplifies the content and means we don't confuse anyone who joined after the Phabricator shutdown. You could use them for review archaeology but this is only a subset of the names you'd encounter there anyway. So I don't think this is a good reason to keep them here. With a couple of exceptions the Phabricator/GitHub names are the same and/or related to their full name anyway.

…lvm#103730) `Flang :: Driver/fveclib-codegen.f90` currently `FAIL`s on SPARC, both Solaris/sparcv9 and Linux/sparc64: ``` bin/flang-new -S -Ofast -fveclib=LIBMVEC -o - /vol/llvm/src/llvm-project/local/flang/test/Driver/fveclib-codegen.f90 flang/test/Driver/fveclib-codegen.f90:11:10: error: CHECK: expected string not found in input ! CHECK: _ZGVbN4vv_powf ^ ``` The code in question only contains calls to `powf`. Given that `glibc` only supports `libmvec` on `aarch64` and `x86_64`, this test targets only those if possible. Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`, `amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.

Until llvm#103056 lands or another more appropriate check can be found. This test fails on Ubuntu Focal where zdump is built with 32 bit time_t but passes on Ubuntu Jammy where zdump is built with 64 bit time_t. Marking it unsupported means Linaro can upgrade its bots to Ubuntu Jammy without getting an unexpected pass.

This commit introduces a slicing utility that can be used to walk arbitrary IR slices. It additionally ships logic to determine control flow predecessors, which allows users to walk backward slices without dealing with both `RegionBranchOpInterface` and `BranchOpInterface`. This utility is used to improve the `noalias` propagation in the LLVM dialect's inliner interface. Before this change, it broke down as soon as pointer were passed through region control flow operations.

…#98586) `cast_or_null` is deprecated. https://github.com/llvm/llvm-project/blob/062844615db5e141da118c1ad780bf102537f40a/llvm/include/llvm/Support/Casting.h#L717-L722

Adds m_FPToUI/m_FPToSI matchers for ISD::FP_TO_UINT/ISD::FP_TO_SINT in SDPatternMatch.h with suitable test coverage. Fixes llvm#103872

…llvm#104037) The target needs to be initialized in order to compute the correct target triple from the command line. Without initialized targets the OS component of the triple might not reflect what would be computed by the driver for an actual compiler invocation. Fixes llvm#61762

…pNestOp (llvm#103731) This patch adds an assert to `genLoopNestClauses` to ensure the number of symbols and corresponding loop wrapper entry block arguments have the same size. This is checked by some of the callers, but it makes more sense moving it into the function itself and avoid having to replicate it.

This updates the "dxil-metadata-emit" pass flag to be spelled "dxil-translate-metadata" to better match the pass name. Pull Request: llvm#104249

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 894d3eeb (Aug 15) (4) #357

[AutoBump] Merge with 894d3eeb (Aug 15) (4) #357

Commits on Aug 14, 2024

Commits on Aug 15, 2024

Commits on Sep 20, 2024

[AutoBump] Merge with 894d3eeb (Aug 15) (4) #357

Are you sure you want to change the base?

[AutoBump] Merge with 894d3eeb (Aug 15) (4) #357

Commits on Aug 14, 2024

Commits on Aug 15, 2024

Commits on Sep 20, 2024