[AutoBump] Merge with ce7c828e (Aug 30) (16) #369

mgehre-amd · 2024-09-25T07:30:32Z

No description provided.

They're never used in `constexpr` functions, so we can simply use `std::isnan` and `std::isfinite` instead.

- Use formatv() and raw string literals to simplify emission code. - Use range based for loops and structured bindings to simplify loops. - Use const Pointers to Records. - Rename `ComputeFixedEncoding` to `ComputeTypeSignature` to reflect what the function actually does, cnd change it to return a vector. - Use reverse() and range based for loop to pack 8 nibbles into 32-bits. - Rename some variables to follow LLVM coding standards. - For function memory effects, print human readable effects in comment.

There were only codegen tests for the fadd vector case, so round out the test coverage for the scalar cases and all the other operations.

- The subtest, if enabled correctly, will fail with assert in Debug builds and validation is disabled in Release builds. - Hence deleting the test to fix test failures in CI.

…03702) Use LLSC or cmpxchg in the same cases as for the unsupported integer operations. This required some fixups to the LLSC implementatation to deal with the fp128 case. The comment about floating-point exceptions was wrong, because floating-point exceptions are not really exceptions at all.

fneg/fabs are not supposed to canonicalize nans. Promoting to f32 will go through an fp_extend which will canonicalize. The generic Promote handler needs to be removed from LegalizeDAG. We need to use integer bit manip to clear the bit instead. Unfortunately, this is going through the stack due to i16 not being a legal type. Fixing that will require custom legalization or some other generic SelectionDAG change.

These are getting different output on some build hosts for some reason. The stack offsets of temporaries are different.

SCF loops now can operate on integer-typed IV, thus I'm changing the loop unroller correspondingly.

- Eliminate comma at end of a MemoryEffects print. - Added basic unit test to validate that.

…st checks for fallback to llvm intrinsics Check for cases where there isn't a amdlib call but it still vectorises the math call

The new class implements a deduplication table to convert import list elements: {SourceModule, GUID, Definition/Declaration} into 32-bit integers, and vice versa. This patch adds a unit test but does not add a use yet. To be precise, the deduplication table holds {SourceModule, GUID} pairs. We use the bottom one bit of the 32-bit integers to indicate whether we have a definition or declaration. A subsequent patch will collapse the import list hierarchy -- FunctionsToImportTy holding many instances of FunctionsToImportTy -- down to DenseSet<uint32_t> with each element indexing into the deduplication table above. This will address multiple sources of space inefficiency.

…06440) We'd previously just deferred to the base implementation, but that more or less always returns 1. This underestimates the cost of the insert/extract, biases the SLP vectorizer towards forming illegally typed vectors, and underestimates the cost of scalarized operations (like unaligned scatter/gather).

This is a partial revert of c66e1d6. Even though that allowed us to declare v9.2-a support without picking up SVE2 in both the backend and the driver, the frontend itself still enabled SVE via the arch version's default extensions. Avoid that by reverting back to v8.7-a while we look into longer-term solutions.

@wolfy1961

llvm#86149) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>

) These lists are quite static and several of the parameters are actually constant across all users. Heavy use of macros is undesirable, and not idiomatic in LLVM, so let's just use the naive switch cases. I'll probably continue with removing the other property macros. These two just happened to be the two I actually had to figure out for an unrelated change.

This reverts commit 42d3ccc which caused a test failure.

…and (llvm#98414) This patch addresses an issue with lit's internal shell when env is without any arguments, it fails with exit code 127 because `env` requires a subcommand. This patch addresses the issue by encoding the command to properly return environment variables even when no arguments are provided. The error occurred when running the command ` LIT_USE_INTERNAL_SHELL=1 ninja check-llvm`. fixes: llvm#102383 This is part of the test cleanups proposed in the RFC: [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)

Previously, functions named "main" got the NoRecurse attribute consistent with the behavior of C++, which HLSL largely follows. However, standard recursion is not allowed in HLSL, so all functions should really have this attribute. This doesn't prevent recursion, but rather signals that these functions aren't expected to recurse. Practically, this was done so that entry point functions named "main" would have all have the same attributes as otherwise identical entry points with other names. This required small changes to the this assignment tests because they no longer generate so many attribute sets since more of them match. related to llvm#105244 but done to simplify testing for llvm#89806

…llvm#106437) These tests have been failing since db279c7 "[HLSL] Change default linkage of HLSL functions to internal (llvm#95331)". This presumably went unnoticed because they're not run by default since they rely on an external tool (dxil-dis).

llvm#105574) These lists are quite static. Heavy use of macros is undesirable, and not idiomatic in LLVM, so let's just use the naive switch cases. Note that the first two fields in the CONSTRAINEDFP property were utterly unused (aside from a C++ test). In the same vien as llvm#105551. Once both changes have landed, we'll be left with _BINARYOP which needs a bit of additional untangling, and the actual opcode mappings.

armv7a and armv8a are common names for the application subarch for arm. These names in particular are used in ChromeOS, Android, and a few other known applications. In ChromeOS, we encountered a bug where armv7a arch was not recognised and segfaulted when starting an executable on an arm32 device. Google Issue Tracker: https://issuetracker.google.com/361414339

…lvm#106589) Reverts llvm#105745 Some bots are broken apparently.

) This is a split-off from llvm#96023, where this change has already been reviewed by libcxx maintainers. This will prevent that PR from triggering libcxx-ci from now on.

…atv()"" (llvm#106592) Reverts llvm#106589 The fix for bot failures caused by the reverted commit was committed already, so this revert is not needed.

Several tests for the new fake use intrinsic are failing on NVPTX buildbots due to relying on behaviour for their expected triple; this commit adds that triple to each of them to prevent failures. Fixes commit 3d08ade (llvm#86149). Example buildbot failures: https://lab.llvm.org/buildbot/#/builders/160/builds/4175 https://lab.llvm.org/buildbot/#/builders/180/builds/4173

…106582) This isn't quite just code motion as the four different versions we had of this routine differed in whether they ignored the "size" marker used to represent undef. I doubt this matters in practice, but it is a functional change. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>

Add ResourceType, ResourceKind and ResourceFlag enum class for PSV resource. This is for llvm#103275

The interceptor types are supposed to match size_t (and the non-Windows ssize_t) exactly, but on 32-bit Windows `size_t` uses `unsigned int` whereas `SIZE_T` is `unsigned long`. The current definition results in `uptr` not matching `uintptr_t` since we otherwise get typedef redefinition errors. Work around this by using a #define instead of a typedef when defining SIZE_T. It would probably be cleaner to stop using these uppercase types, but that is a rather invasive change and this one is the minimal change to allow uptr to match uintptr_t on Windows. To ensure this compiles on Windows, we also remove the interceptor.h defines of uptr (that do not always match __sanitizer::uptr) and rely on __sanitizer::uptr instead. The interceptor types most likely predate those other types so clean up the unnecessary definition while here. This also reverts commit 18e06e3 and commit bb27dd8. Reviewed By: mstorsjo, vitalybuka Pull Request: llvm#106311

These functions in interception_win.cpp already exist in sanitizer_common. Use those instead. Reviewed By: mstorsjo Pull Request: llvm#106488

…lvm#106517) In llvm#106110 we had to mark v[f]slide1down.vx as ActiveElementsAffectResult since the elements in the body depend on VL. However it doesn't depend on the mask, so this was overly conservative and broke the vmerge peephole. We can recover this by splitting up ActiveElementsAffectResult into VL and Mask bits, so we can more accurately model v[f]slide1down.vx and re-enable the peephole.

This patch implements sandboxir::Type, a thin wrapper of llvm::Type. This is designed very similarly to sandbox::Value. Context owns all sandboxir::Type objects and maintains a map between llvm::Type and sandboxir::Type. There are a couple of reasons for migrating from llvm::Type to sandboxir::Type: - Creating an llvm::Type from within SandboxIR-only code doesn't work well because it requires you to pass llvm::Context to functions like llvm::Type::getInt32Ty(C), but you wouldn't normally have access to llvm::Context C. In unit tests this is not such a big deal because you have access to both, but it will become an issue in SandboxIR-only code. - Not being able to get the sandboxir::Context from llvm::Type results in awkward sandboir APIs with additional sandboxir::Context arguments. - llvm::Type::getContext() can basically give you access to the whole LLVM IR, which we should try to avoid.

This works for MinGW, but the MSVC linker apparently doens't pull in those symbols. Reverting for now since I won't be able to reproduce it today. https://lab.llvm.org/buildbot/#/builders/107/builds/2337 This reverts commit 9df92cb.

@foo

When [lowering return values](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L3422) from LLVM IR to SelectionDAG, we check that [the number of values `SelectionDAG` tells us to return is equal to the number of values that `ComputePTXValueVTs()` tells us to return](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L3441). However, this check can fail on valid IR. For example: ``` define <6 x half> @foo() { ret <6 x half> zeroinitializer } ``` `ComputePTXValueVTs()` tells us to return ***3*** `v2f16` values, while `SelectionDAG` tells us to return ***6*** `f16` values. Thus, the compiler will crash. `ComputePTXValueVTs()` [supports all `half` element vectors with an even number of elements](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L213). Whereas `SelectionDAG` [only supports power-of-2 sized vectors](https://github.com/llvm/llvm-project/blob/4e078e3797098daa40d254447c499bcf61415308/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1580). This is the root of the discrepancy. Assuming that the developers who added the code to `ComputePTXValueVTs()` overlooked this, I've restricted `ComputePTXValueVTs()` to compute the same number of return values as `SelectionDAG`, instead of extending `SelectionDAG` to support non-power-of-2 sized vectors.

…nctionPropertiesAnalysis (llvm#104867) (llvm#106309) Reverts c992690. The problem is that if there is a sequence "{delete A->B} {delete A->B} {insert A->B}" the net result is "{delete A->B}", which is not what we want. Duplicate successors may happen in cases like switch statements (as shown in the unit test). The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore. The fix is to sanitize the list of deletes, just like we do for inserts.

) See llvm#89706 (comment).

…er. (llvm#106481) This patch add a legality check that checks if target machine support vector of address in `isLegalMaskedGatherScatter()`.

… (llvm#106624) This reverts commit 66927fb. Fixed clang tests

Solve clangd/clangd#2094 Due clangd will enable PCH automatically, the previous mechanism to skip ODR check in GMF may be invalid. This patch fixes this for a case.

…m#106477) Address space information may be encoded anywhere along the use-def chain. Take advantage of this by traversing the chain until we find a non-generic addrspace.

Introducing `HLSLAttributedResourceType` - a new type that is similar to `AttributedType` but with additional data specific to HLSL resources. `AttributeType` currently only stores an attribute kind and no additional data from the type attribute parameters. This does not really work for HLSL resources since its type attributes contain non-boolean values that need to be retained as well. For example: ``` template <typename T> class RWBuffer { __hlsl_resource_t [[hlsl::resource_class(uav)]] [[hlsl::is_rov]] handle; }; ``` The data `HLSLAttributedResourceType` needs to eventually store are: - resource class (SRV, UAV, CBuffer, Sampler) - texture dimension(1-3) - flags is_rov, is_array, is_feedback and is_multisample - contained type All of these values except contained type will be stored in `HLSLAttributedResourceType::Attributes` struct and accessed individually via the fields. There is also `Data` alias that covers all of these values as a `unsigned` which is used for hashing and the AST type serialization. During type attribute processing all HLSL type attributes will be validated and collected by SemaHLSL (by `SemaHLSL::handleResourceTypeAttr`) and in the end combined into a single `HLSLAttributedResourceType` instance (in `SemaHLSL::ProcessResourceTypeAttributes`). `SemaHLSL` will also need to short-term store the `TypeLoc` information for the new type that will be grabbed by `TypeSpecLocFiller` soon after the type is created. Part 1/2 of llvm#104861

…106628) ALLOCATE and DEALLOCATE statements can be inlined in device function. This patch updates the condition that determined to inline these actions in lowering. This avoid runtime calls in device function code and can speed up the execution. Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used elsewhere.

…rser.

Allow some interaction between LLVM and FIR dialect by allowing conversion between FIR memory types and llvm.ptr type. This is meant to help experimentation where FIR and LLVM dialect coexists, and is useful to deal with cases where LLVM type makes it early into the MLIR produced by flang, like when inserting LLVM stack intrinsic here: https://github.com/llvm/llvm-project/blob/0a00d32c5f88fce89006dcde6e235bc77d7b495e/flang/lib/Optimizer/Transforms/StackReclaim.cpp#L57

The current pattern was failing OpenACC semantics in acc parse tree canonicalization: ``` !acc loop !dir vector aligned do i=1,n ... ``` Fix it by moving the directive before the OpenACC construct node. Note that I think it could make sense to propagate the $dir info to the acc.loop, at least with classic flang, the $dir seems to make a difference. This is not done here since few directives are supported anyway.

…oDate function This patch extracts ModuleFile class from StandalonePrerequisiteModules so that we can reuse it further. And also we implement IsModuleFileUpToDate function to implement StandalonePrerequisiteModules::CanReuse. Both of them aims to ease the future improvements to the support of modules in clangd. And both of them should be NFC.

__getauxval is a libgcc function that doesn't exist on Android. Also on Linux let's use getauxval as it is anyway used other places in compiler-rt.

…Date to reference It is better to use references instead of pointers as the argument type of IsModuleFileUpToDate. Since the PrerequisiteModules is always expected to exist.

…#106564) shortloop is a non standard OpenACC extension (https://docs.nvidia.com/hpc-sdk/pgi-compilers/2015/pgirn157.pdf) that can be found on loop directives. f18 parser was choking when seeing it. Since it can be found in existing apps and is mainly an optimization hint, parse it on loop directives and ignore it with a warning. For the records, here is shortloop meaning according to the manual linked above: "If the shortloop clause appears on a loop directive with the vector clause, it tells the compiler that the loop trip count is less than or equal to the number of vector lanes created for that loop. This means the value of the vector() clause on the loop directive in a kernels region, or the value of the vector_length() clause on the parallel directive in a parallel region will be greater than or equal to the loop trip count. This allows the compiler to generate more efficient code for the loop"

…lvm#106621) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects.

Similarly to the existing range attribute inference, also infer the nonnull attribute on function return values. I think in practice FunctionAttrs will handle nearly all cases, the main one I think it doesn't is cases involving branch conditions. But as we already have the information here, we may as well materialize it.

Fixes llvm#102064

…intrinsic (llvm#103037) Adds support for wider-than-legal vector types for the histogram intrinsic (llvm.experimental.vector.histogram.add) by splitting the vector. Also adds integer promotion for the Inc operand.

…-bundles`. NFC. (llvm#106661) With `-view-edge-bundles`, before the change, the dot file output is kinda like ```dot digraph { "%bb.0" [ shape=box ] 0 -> "%bb.0" "%bb.0" -> 1 "%bb.0" -> "%bb.1" [ color=lightgray ] "%bb.0" -> "%bb.6" [ color=lightgray ] "%bb.1" [ shape=box ] 1 -> "%bb.1" "%bb.1" -> 1 "%bb.1" -> "%bb.2" [ color=lightgray ] "%bb.1" -> "%bb.6" [ color=lightgray ] "%bb.2" [ shape=box ] 1 -> "%bb.2" "%bb.2" -> 1 "%bb.2" -> "%bb.3" [ color=lightgray ] "%bb.3" [ shape=box ] 1 -> "%bb.3" "%bb.3" -> 2 "%bb.3" -> "%bb.4" [ color=lightgray ] "%bb.4" [ shape=box ] 2 -> "%bb.4" "%bb.4" -> 2 "%bb.4" -> "%bb.4" [ color=lightgray ] "%bb.4" -> "%bb.5" [ color=lightgray ] "%bb.5" [ shape=box ] 2 -> "%bb.5" "%bb.5" -> 1 "%bb.5" -> "%bb.6" [ color=lightgray ] "%bb.5" -> "%bb.3" [ color=lightgray ] "%bb.6" [ shape=box ] 1 -> "%bb.6" "%bb.6" -> 3 } ``` However, the graph output by graphviz is ![t](https://github.com/user-attachments/assets/24056c0a-3ba9-49c3-a5da-269f3140e619) The node name corresponding to the MBB is incorrect. After the change, the node name is consistent with MBB's name. ![s](https://github.com/user-attachments/assets/38c649d1-7222-4de1-971c-56f7721ab64c)

[D156118](https://reviews.llvm.org/D156118) states that this note is always present, but it is better to check it explicitly, as otherwise `lldb` may crash when trying to read registers.

philnik777 and others added 30 commits August 29, 2024 17:05

[libc++][NFC] Remove unused struct in <string> (llvm#106527)

025f03f

[libc++][NFC] Remove __constexpr_is{nan,finite} (llvm#106205)

a705e8c

They're never used in `constexpr` functions, so we can simply use `std::isnan` and `std::isfinite` instead.

AArch64: Add tests for atomicrmw fp operations (llvm#103701)

4ee2ad2

There were only codegen tests for the fadd vector case, so round out the test coverage for the scalar cases and all the other operations.

[Support] Delete FormatVariadicTest Validate sub-test (llvm#106570)

5048fab

- The subtest, if enabled correctly, will fail with assert in Debug builds and validation is disabled in Release builds. - Hence deleting the test to fix test failures in CI.

[SLP] Early return in getReorderingData [nfc]

b5a1b45

AArch64: Delete tests of fp128 atomicrmw fmin/fmax

e05c224

These are getting different output on some build hosts for some reason. The stack offsets of temporaries are different.

[mlir][scf] Allow unrolling loops with integer-typed IV. (llvm#106164)

c08c6a7

SCF loops now can operate on integer-typed IV, thus I'm changing the loop unroller correspondingly.

[NFC][Support] Eliminate ',' at end of MemoryEffects print (llvm#106545)

115b876

- Eliminate comma at end of a MemoryEffects print. - Added basic unit test to validate that.

[LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths te…

81acc84

…st checks for fallback to llvm intrinsics Check for cases where there isn't a amdlib call but it still vectorises the math call

Fix MSVC "not all control paths return a value" warning. NFC.

a777a93

Revert "[Analysis] Guard logf128 cst folding"

eed135f

This reverts commit 42d3ccc which caused a test failure.

Revert "[Support] Validate number of arguments passed to formatv()" (l…

ed37b5f

…lvm#106589) Reverts llvm#105745 Some bots are broken apparently.

libcxx: [NFC] relax error expectation for clang diagnostics (llvm#106591

67ffd14

) This is a split-off from llvm#96023, where this change has already been reviewed by libcxx maintainers. This will prevent that PR from triggering libcxx-ci from now on.

Revert "Revert "[Support] Validate number of arguments passed to form…

9ce4af5

…atv()"" (llvm#106592) Reverts llvm#106589 The fix for bot failures caused by the reverted commit was committed already, so this revert is not needed.

[DirectX] add enum for PSV resource type/kind/flag. (llvm#106227)

fd0dbc7

Add ResourceType, ResourceKind and ResourceFlag enum class for PSV resource. This is for llvm#103275

arichardson and others added 29 commits August 29, 2024 15:59

[compiler-rt] Remove duplicates of sanitizer_common functions

9df92cb

These functions in interception_win.cpp already exist in sanitizer_common. Use those instead. Reviewed By: mstorsjo Pull Request: llvm#106488

[clang-format] Correctly identify token-pasted record names (llvm#106484

7579787

) See llvm#89706 (comment).

[RISCV][TTI] Add legality check of vector of address for gather/scatt…

e29c5f3

…er. (llvm#106481) This patch add a legality check that checks if target machine support vector of address in `isLegalMaskedGatherScatter()`.

Reapply "[HWASan] remove incorrectly inferred attributes" (llvm#106622)…

12b0257

… (llvm#106624) This reverts commit 66927fb. Fixed clang tests

[C++20] [Modules] Skip checking ODR for merged context in GMF

ca2351d

Solve clangd/clangd#2094 Due clangd will enable PCH automatically, the previous mechanism to skip ODR check in GMF may be invalid. This patch fixes this for a case.

[HWASan] add OptimizationRemark for alloca safety (llvm#105872)

ddaf2e2

[NVPTX][AA] Traverse use-def chain to find non-generic addrspace (llv…

e004566

…m#106477) Address space information may be encoded anywhere along the use-def chain. Take advantage of this by traversing the chain until we find a non-generic addrspace.

[ARM] Use MCRegister instead of unsigned for RegisterReqs in ARMAsmPa…

24e791b

…rser.

[compiler-rt][AArch64][Android] Use getauxval on Android. (llvm#102979)

cd634f5

__getauxval is a libgcc function that doesn't exist on Android. Also on Linux let's use getauxval as it is anyway used other places in compiler-rt.

[NFC] [clangd] [Modules] Change the argument type of IsModuleFileUpTo…

d68059b

…Date to reference It is better to use references instead of pointers as the argument type of IsModuleFileUpToDate. Since the PrerequisiteModules is always expected to exist.

[NFC] Add explicit #include llvm-config.h where its macros are used. (l…

89e6a28

…lvm#106621) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects.

__noop not marked as constexpr llvm#102064 (llvm#105983)

e0fa2f1

Fixes llvm#102064

[lldb][AArch64] Do not crash if NT_ARM_TLS is missing (llvm#106478)

ce7c828

[D156118](https://reviews.llvm.org/D156118) states that this note is always present, but it is better to check it explicitly, as otherwise `lldb` may crash when trying to read registers.

[AutoBump] Merge with ce7c828 (Aug 30)

27fa658

cferry-AMD approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with ce7c828e (Aug 30) (16) #369

[AutoBump] Merge with ce7c828e (Aug 30) (16) #369

mgehre-amd commented Sep 25, 2024

[AutoBump] Merge with ce7c828e (Aug 30) (16) #369

Are you sure you want to change the base?

[AutoBump] Merge with ce7c828e (Aug 30) (16) #369

Conversation

mgehre-amd commented Sep 25, 2024