forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with ce7c828e (Aug 30) (16) #369
Open
mgehre-amd
wants to merge
92
commits into
bump_to_fc110202
Choose a base branch
from
bump_to_ce7c828e
base: bump_to_fc110202
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
They're never used in `constexpr` functions, so we can simply use `std::isnan` and `std::isfinite` instead.
- Use formatv() and raw string literals to simplify emission code. - Use range based for loops and structured bindings to simplify loops. - Use const Pointers to Records. - Rename `ComputeFixedEncoding` to `ComputeTypeSignature` to reflect what the function actually does, cnd change it to return a vector. - Use reverse() and range based for loop to pack 8 nibbles into 32-bits. - Rename some variables to follow LLVM coding standards. - For function memory effects, print human readable effects in comment.
There were only codegen tests for the fadd vector case, so round out the test coverage for the scalar cases and all the other operations.
- The subtest, if enabled correctly, will fail with assert in Debug builds and validation is disabled in Release builds. - Hence deleting the test to fix test failures in CI.
…03702) Use LLSC or cmpxchg in the same cases as for the unsupported integer operations. This required some fixups to the LLSC implementatation to deal with the fp128 case. The comment about floating-point exceptions was wrong, because floating-point exceptions are not really exceptions at all.
fneg/fabs are not supposed to canonicalize nans. Promoting to f32 will go through an fp_extend which will canonicalize. The generic Promote handler needs to be removed from LegalizeDAG. We need to use integer bit manip to clear the bit instead. Unfortunately, this is going through the stack due to i16 not being a legal type. Fixing that will require custom legalization or some other generic SelectionDAG change.
These are getting different output on some build hosts for some reason. The stack offsets of temporaries are different.
SCF loops now can operate on integer-typed IV, thus I'm changing the loop unroller correspondingly.
- Eliminate comma at end of a MemoryEffects print. - Added basic unit test to validate that.
…st checks for fallback to llvm intrinsics Check for cases where there isn't a amdlib call but it still vectorises the math call
The new class implements a deduplication table to convert import list elements: {SourceModule, GUID, Definition/Declaration} into 32-bit integers, and vice versa. This patch adds a unit test but does not add a use yet. To be precise, the deduplication table holds {SourceModule, GUID} pairs. We use the bottom one bit of the 32-bit integers to indicate whether we have a definition or declaration. A subsequent patch will collapse the import list hierarchy -- FunctionsToImportTy holding many instances of FunctionsToImportTy -- down to DenseSet<uint32_t> with each element indexing into the deduplication table above. This will address multiple sources of space inefficiency.
…06440) We'd previously just deferred to the base implementation, but that more or less always returns 1. This underestimates the cost of the insert/extract, biases the SLP vectorizer towards forming illegally typed vectors, and underestimates the cost of scalarized operations (like unaligned scatter/gather).
This is a partial revert of c66e1d6. Even though that allowed us to declare v9.2-a support without picking up SVE2 in both the backend and the driver, the frontend itself still enabled SVE via the arch version's default extensions. Avoid that by reverting back to v8.7-a while we look into longer-term solutions.
llvm#86149) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
) These lists are quite static and several of the parameters are actually constant across all users. Heavy use of macros is undesirable, and not idiomatic in LLVM, so let's just use the naive switch cases. I'll probably continue with removing the other property macros. These two just happened to be the two I actually had to figure out for an unrelated change.
This reverts commit 42d3ccc which caused a test failure.
…and (llvm#98414) This patch addresses an issue with lit's internal shell when env is without any arguments, it fails with exit code 127 because `env` requires a subcommand. This patch addresses the issue by encoding the command to properly return environment variables even when no arguments are provided. The error occurred when running the command ` LIT_USE_INTERNAL_SHELL=1 ninja check-llvm`. fixes: llvm#102383 This is part of the test cleanups proposed in the RFC: [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)
Previously, functions named "main" got the NoRecurse attribute consistent with the behavior of C++, which HLSL largely follows. However, standard recursion is not allowed in HLSL, so all functions should really have this attribute. This doesn't prevent recursion, but rather signals that these functions aren't expected to recurse. Practically, this was done so that entry point functions named "main" would have all have the same attributes as otherwise identical entry points with other names. This required small changes to the this assignment tests because they no longer generate so many attribute sets since more of them match. related to llvm#105244 but done to simplify testing for llvm#89806
…llvm#106437) These tests have been failing since db279c7 "[HLSL] Change default linkage of HLSL functions to internal (llvm#95331)". This presumably went unnoticed because they're not run by default since they rely on an external tool (dxil-dis).
llvm#105574) These lists are quite static. Heavy use of macros is undesirable, and not idiomatic in LLVM, so let's just use the naive switch cases. Note that the first two fields in the CONSTRAINEDFP property were utterly unused (aside from a C++ test). In the same vien as llvm#105551. Once both changes have landed, we'll be left with _BINARYOP which needs a bit of additional untangling, and the actual opcode mappings.
armv7a and armv8a are common names for the application subarch for arm. These names in particular are used in ChromeOS, Android, and a few other known applications. In ChromeOS, we encountered a bug where armv7a arch was not recognised and segfaulted when starting an executable on an arm32 device. Google Issue Tracker: https://issuetracker.google.com/361414339
…lvm#106589) Reverts llvm#105745 Some bots are broken apparently.
) This is a split-off from llvm#96023, where this change has already been reviewed by libcxx maintainers. This will prevent that PR from triggering libcxx-ci from now on.
…atv()"" (llvm#106592) Reverts llvm#106589 The fix for bot failures caused by the reverted commit was committed already, so this revert is not needed.
Several tests for the new fake use intrinsic are failing on NVPTX buildbots due to relying on behaviour for their expected triple; this commit adds that triple to each of them to prevent failures. Fixes commit 3d08ade (llvm#86149). Example buildbot failures: https://lab.llvm.org/buildbot/#/builders/160/builds/4175 https://lab.llvm.org/buildbot/#/builders/180/builds/4173
…106582) This isn't quite just code motion as the four different versions we had of this routine differed in whether they ignored the "size" marker used to represent undef. I doubt this matters in practice, but it is a functional change. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Add ResourceType, ResourceKind and ResourceFlag enum class for PSV resource. This is for llvm#103275
The interceptor types are supposed to match size_t (and the non-Windows ssize_t) exactly, but on 32-bit Windows `size_t` uses `unsigned int` whereas `SIZE_T` is `unsigned long`. The current definition results in `uptr` not matching `uintptr_t` since we otherwise get typedef redefinition errors. Work around this by using a #define instead of a typedef when defining SIZE_T. It would probably be cleaner to stop using these uppercase types, but that is a rather invasive change and this one is the minimal change to allow uptr to match uintptr_t on Windows. To ensure this compiles on Windows, we also remove the interceptor.h defines of uptr (that do not always match __sanitizer::uptr) and rely on __sanitizer::uptr instead. The interceptor types most likely predate those other types so clean up the unnecessary definition while here. This also reverts commit 18e06e3 and commit bb27dd8. Reviewed By: mstorsjo, vitalybuka Pull Request: llvm#106311
These functions in interception_win.cpp already exist in sanitizer_common. Use those instead. Reviewed By: mstorsjo Pull Request: llvm#106488
…lvm#106517) In llvm#106110 we had to mark v[f]slide1down.vx as ActiveElementsAffectResult since the elements in the body depend on VL. However it doesn't depend on the mask, so this was overly conservative and broke the vmerge peephole. We can recover this by splitting up ActiveElementsAffectResult into VL and Mask bits, so we can more accurately model v[f]slide1down.vx and re-enable the peephole.
This patch implements sandboxir::Type, a thin wrapper of llvm::Type. This is designed very similarly to sandbox::Value. Context owns all sandboxir::Type objects and maintains a map between llvm::Type and sandboxir::Type. There are a couple of reasons for migrating from llvm::Type to sandboxir::Type: - Creating an llvm::Type from within SandboxIR-only code doesn't work well because it requires you to pass llvm::Context to functions like llvm::Type::getInt32Ty(C), but you wouldn't normally have access to llvm::Context C. In unit tests this is not such a big deal because you have access to both, but it will become an issue in SandboxIR-only code. - Not being able to get the sandboxir::Context from llvm::Type results in awkward sandboir APIs with additional sandboxir::Context arguments. - llvm::Type::getContext() can basically give you access to the whole LLVM IR, which we should try to avoid.
This works for MinGW, but the MSVC linker apparently doens't pull in those symbols. Reverting for now since I won't be able to reproduce it today. https://lab.llvm.org/buildbot/#/builders/107/builds/2337 This reverts commit 9df92cb.
When [lowering return values](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L3422) from LLVM IR to SelectionDAG, we check that [the number of values `SelectionDAG` tells us to return is equal to the number of values that `ComputePTXValueVTs()` tells us to return](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L3441). However, this check can fail on valid IR. For example: ``` define <6 x half> @foo() { ret <6 x half> zeroinitializer } ``` `ComputePTXValueVTs()` tells us to return ***3*** `v2f16` values, while `SelectionDAG` tells us to return ***6*** `f16` values. Thus, the compiler will crash. `ComputePTXValueVTs()` [supports all `half` element vectors with an even number of elements](https://github.com/llvm/llvm-project/blob/99a10f1fe8a7e4b0fdb4c6dd5e7f24f87e0d3695/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L213). Whereas `SelectionDAG` [only supports power-of-2 sized vectors](https://github.com/llvm/llvm-project/blob/4e078e3797098daa40d254447c499bcf61415308/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1580). This is the root of the discrepancy. Assuming that the developers who added the code to `ComputePTXValueVTs()` overlooked this, I've restricted `ComputePTXValueVTs()` to compute the same number of return values as `SelectionDAG`, instead of extending `SelectionDAG` to support non-power-of-2 sized vectors.
…nctionPropertiesAnalysis (llvm#104867) (llvm#106309) Reverts c992690. The problem is that if there is a sequence "{delete A->B} {delete A->B} {insert A->B}" the net result is "{delete A->B}", which is not what we want. Duplicate successors may happen in cases like switch statements (as shown in the unit test). The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore. The fix is to sanitize the list of deletes, just like we do for inserts.
…er. (llvm#106481) This patch add a legality check that checks if target machine support vector of address in `isLegalMaskedGatherScatter()`.
… (llvm#106624) This reverts commit 66927fb. Fixed clang tests
Solve clangd/clangd#2094 Due clangd will enable PCH automatically, the previous mechanism to skip ODR check in GMF may be invalid. This patch fixes this for a case.
…m#106477) Address space information may be encoded anywhere along the use-def chain. Take advantage of this by traversing the chain until we find a non-generic addrspace.
Introducing `HLSLAttributedResourceType` - a new type that is similar to `AttributedType` but with additional data specific to HLSL resources. `AttributeType` currently only stores an attribute kind and no additional data from the type attribute parameters. This does not really work for HLSL resources since its type attributes contain non-boolean values that need to be retained as well. For example: ``` template <typename T> class RWBuffer { __hlsl_resource_t [[hlsl::resource_class(uav)]] [[hlsl::is_rov]] handle; }; ``` The data `HLSLAttributedResourceType` needs to eventually store are: - resource class (SRV, UAV, CBuffer, Sampler) - texture dimension(1-3) - flags is_rov, is_array, is_feedback and is_multisample - contained type All of these values except contained type will be stored in `HLSLAttributedResourceType::Attributes` struct and accessed individually via the fields. There is also `Data` alias that covers all of these values as a `unsigned` which is used for hashing and the AST type serialization. During type attribute processing all HLSL type attributes will be validated and collected by SemaHLSL (by `SemaHLSL::handleResourceTypeAttr`) and in the end combined into a single `HLSLAttributedResourceType` instance (in `SemaHLSL::ProcessResourceTypeAttributes`). `SemaHLSL` will also need to short-term store the `TypeLoc` information for the new type that will be grabbed by `TypeSpecLocFiller` soon after the type is created. Part 1/2 of llvm#104861
…106628) ALLOCATE and DEALLOCATE statements can be inlined in device function. This patch updates the condition that determined to inline these actions in lowering. This avoid runtime calls in device function code and can speed up the execution. Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used elsewhere.
Allow some interaction between LLVM and FIR dialect by allowing conversion between FIR memory types and llvm.ptr type. This is meant to help experimentation where FIR and LLVM dialect coexists, and is useful to deal with cases where LLVM type makes it early into the MLIR produced by flang, like when inserting LLVM stack intrinsic here: https://github.com/llvm/llvm-project/blob/0a00d32c5f88fce89006dcde6e235bc77d7b495e/flang/lib/Optimizer/Transforms/StackReclaim.cpp#L57
The current pattern was failing OpenACC semantics in acc parse tree canonicalization: ``` !acc loop !dir vector aligned do i=1,n ... ``` Fix it by moving the directive before the OpenACC construct node. Note that I think it could make sense to propagate the $dir info to the acc.loop, at least with classic flang, the $dir seems to make a difference. This is not done here since few directives are supported anyway.
…oDate function This patch extracts ModuleFile class from StandalonePrerequisiteModules so that we can reuse it further. And also we implement IsModuleFileUpToDate function to implement StandalonePrerequisiteModules::CanReuse. Both of them aims to ease the future improvements to the support of modules in clangd. And both of them should be NFC.
__getauxval is a libgcc function that doesn't exist on Android. Also on Linux let's use getauxval as it is anyway used other places in compiler-rt.
…Date to reference It is better to use references instead of pointers as the argument type of IsModuleFileUpToDate. Since the PrerequisiteModules is always expected to exist.
…#106564) shortloop is a non standard OpenACC extension (https://docs.nvidia.com/hpc-sdk/pgi-compilers/2015/pgirn157.pdf) that can be found on loop directives. f18 parser was choking when seeing it. Since it can be found in existing apps and is mainly an optimization hint, parse it on loop directives and ignore it with a warning. For the records, here is shortloop meaning according to the manual linked above: "If the shortloop clause appears on a loop directive with the vector clause, it tells the compiler that the loop trip count is less than or equal to the number of vector lanes created for that loop. This means the value of the vector() clause on the loop directive in a kernels region, or the value of the vector_length() clause on the parallel directive in a parallel region will be greater than or equal to the loop trip count. This allows the compiler to generate more efficient code for the loop"
…lvm#106621) Without these explicit includes, removing other headers, who implicitly include llvm-config.h, may have non-trivial side effects.
Similarly to the existing range attribute inference, also infer the nonnull attribute on function return values. I think in practice FunctionAttrs will handle nearly all cases, the main one I think it doesn't is cases involving branch conditions. But as we already have the information here, we may as well materialize it.
…intrinsic (llvm#103037) Adds support for wider-than-legal vector types for the histogram intrinsic (llvm.experimental.vector.histogram.add) by splitting the vector. Also adds integer promotion for the Inc operand.
…-bundles`. NFC. (llvm#106661) With `-view-edge-bundles`, before the change, the dot file output is kinda like ```dot digraph { "%bb.0" [ shape=box ] 0 -> "%bb.0" "%bb.0" -> 1 "%bb.0" -> "%bb.1" [ color=lightgray ] "%bb.0" -> "%bb.6" [ color=lightgray ] "%bb.1" [ shape=box ] 1 -> "%bb.1" "%bb.1" -> 1 "%bb.1" -> "%bb.2" [ color=lightgray ] "%bb.1" -> "%bb.6" [ color=lightgray ] "%bb.2" [ shape=box ] 1 -> "%bb.2" "%bb.2" -> 1 "%bb.2" -> "%bb.3" [ color=lightgray ] "%bb.3" [ shape=box ] 1 -> "%bb.3" "%bb.3" -> 2 "%bb.3" -> "%bb.4" [ color=lightgray ] "%bb.4" [ shape=box ] 2 -> "%bb.4" "%bb.4" -> 2 "%bb.4" -> "%bb.4" [ color=lightgray ] "%bb.4" -> "%bb.5" [ color=lightgray ] "%bb.5" [ shape=box ] 2 -> "%bb.5" "%bb.5" -> 1 "%bb.5" -> "%bb.6" [ color=lightgray ] "%bb.5" -> "%bb.3" [ color=lightgray ] "%bb.6" [ shape=box ] 1 -> "%bb.6" "%bb.6" -> 3 } ``` However, the graph output by graphviz is ![t](https://github.com/user-attachments/assets/24056c0a-3ba9-49c3-a5da-269f3140e619) The node name corresponding to the MBB is incorrect. After the change, the node name is consistent with MBB's name. ![s](https://github.com/user-attachments/assets/38c649d1-7222-4de1-971c-56f7721ab64c)
[D156118](https://reviews.llvm.org/D156118) states that this note is always present, but it is better to check it explicitly, as otherwise `lldb` may crash when trying to read registers.
cferry-AMD
approved these changes
Sep 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.