Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 83891777 (May 16) (48) #307

Merged
merged 295 commits into from
Sep 3, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

hiraditya and others added 30 commits May 14, 2024 06:13
As mentioned in llvm#68882 and
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699

Gep arithmetic isn't consistent with different types. GVNSink didn't
realize this and sank all geps as long as their operands can be wired
via PHIs
in a post-dominator.

Fixes: llvm#85333
Reapply: llvm#88440 after fixing the non-determinism issues in llvm#90995
Adds the LLVM vector.deinterleave2 intrinsic to the MLIR LLVM dialect. The 
deinterleave2 intrinsic takes a vector and returns two vectors with the first 
having even elements and the second with odd elements from the input 
vector. The inverse of vector.interleave2.
MCOperand has a constructor that permits a nullptr MCInst, and BOLT makes use of that. Adjust MCOperand's dumper to permit such use.
Remove 'Valid' local boolean that has a single use, and return directly instead.
…#91933)

Given `foo...[idx]` if idx is value dependent, the expression is type
dependent.

Fixes llvm#91885
Fixes llvm#91884
Remove excess parentheses and use `boolean ? true-case : false-case` idiom.
When writing the test for this I seemingly forgot to put 'CHECK' on the
lines, so I didn't notice that I was printing the identifiers as
pointers rather than their names.  This patch corrects the tests and the
print behavior.
…0448)

This patch rewrites the ArmSME tile allocator to use liveness
information to make better tile allocation decisions and improve the
correctness of the ArmSME dialect. This algorithm used here is a linear
scan over live ranges, where live ranges are assigned to tiles as they
appear in the program (chronologically). Live ranges release their
assigned tile ID when the current program point is passed their end.
This is a greedy algorithm (which is mainly to keep the implementation
relatively straightforward), and because it seems to be sufficient for
most kernels (e.g. matmuls) that use ArmSME. The general steps of this
are roughly from
https://link.springer.com/content/pdf/10.1007/3-540-45937-5_17.pdf,
though there have been a few simplifications and assumptions made for
our use case.

Hopefully, the only changes needed for a user of the ArmSME dialect is
that:

- `-allocate-arm-sme-tiles` will no longer be a standalone pass 
  - `-test-arm-sme-tile-allocation` is only for unit tests 
- `-convert-arm-sme-to-llvm` must happen after `-convert-scf-to-cf` 
   - SME tile allocation is now part of the LLVM conversion

By integrating this into the `ArmSME -> LLVM` conversion we can allow
high-level (value-based) ArmSME operations to be side-effect-free, as we
can guarantee nothing will rearrange ArmSME operations before we emit
intrinsics (which could invalidate the tile allocation).

The hope is for ArmSME operations to have no hidden state/side effects
and allow easily lowering dialects such as `vector` and `arith` to SME,
without making assumptions about how the input IR looks, as the
semantics of the operations will be the same. That is no (new) side
effects and the IR follows the rules of SSA (a value will never change).

The aim is correctness, so we have a base for working on optimizations.
A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.
…m#92004)

This makes the `vc-rev-enabled` feature unsupported if we fail to
retrieve the git revision for any reason, such as if git is not
installed.
…straints (llvm#92104)

Clangd uses it to determine whether the argument is within the selection
range.

Fixes clangd/clangd#2033
PR llvm#80680 added bits in the codegen to lazily add convergence intrinsics
when required. This logic relied on the LoopStack. The issue is when
parsing the condition, the loopstack doesn't yet reflect the correct
values, as expected since we are not yet in the loop.

However, convergence tokens should sometimes already be available. The
solution which seemed the simplest is to greedily generate the tokens
when we generate SPIR-V.

Fixes llvm#88144

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
Now that we've got (minus some issues around datatypes and invariant
loads) working lowerings for address space 7, update the table in the
AMDGPU usage guide to properly indicate the nature of these address
spaces.
…llvm#92067)

cm.push can't save X26 without also saving X27. This removes two other
checks for this case.

This causes CFI to be emitted since X27 is now explicitly a callee saved
register.

The affected tests use inline assembly to clobber X26 rather than the
whole range of s0-s10.
Allow mixing objects with/without signed class ro data and category
class properties as long as it happens before we register the metadata.
These combinations are a warning in ld, not a hard error. The only case
that is ABI-breaking is if we already registered with the feature
enabled but later try to load an object that doesn't support it.

rdar://127336061
tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck
is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid
wastefully calling tryToCreateDiffCheck. This patch is an NFC
simplification of program logic.
The target combine is no longer required because InstCombine will
transform the DIV by a power of 2 into a multiply, so in practice
this case will never trigger.

Additionally, the generated code would have been incorrect for
streaming(-compatible) functions, because it assumed NEON was available.
…lvm#92086)

self.wait_for_running_event(process) is always called after
self.runCmd("continue"). It is strange to expect eStateConnected here.
This test failed in case of a remote target. The correct state is
eStateRunning. Removed incorrect checking.
)

The cost of `experimental.cttz.elts` in RISC-V equals to the cost of
vfirst when the zero_is_poison argument is true. Otherwise, we add
additional costs of cmp + select to convert the -1 result from vfirst to
EVL.
…#90500)

Currently, clang postpones all semantic analysis of unary operators with
operands of pointer/pointer to member/array/function type until
instantiation whenever that type is dependent (e.g. `T*` where `T` is a
type template parameter). Consequently, the uninstantiated AST nodes all
have the type `ASTContext::DependentTy` (which, for the purposes of
llvm#90152, is undesirable as that type may be the current instantiation!
(e.g. `*this`))

This patch moves the point at which we perform semantic analysis for
such expression to be prior to instantiation.
llvm#91137 reverted in llvm#92001

A build error fix added in 28d5ece

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
Most diagnostics obey
https://llvm.org/docs/CodingStandards.html#error-and-warning-messages
but some diverge. Fix them.

While here, adjust some diagnostics.

Pull Request: llvm#92024
Currently, only those global variables which are at compile unit scope
are added to the 'globals' list of the DICompileUnit. This does not work
for languages which support modules (e.g. Fortran) where hierarchy
can be
variable -> module -> compile unit.

To fix this, if a variable scope points to a module, we walk one level
up and see if module is in the compile unit scope.

This was initially part of llvm#91582 which adds debug information for
Fortran module variables. @kiranchandramohan pointed out that MLIR
changes should go in separate PRs.
slydiman and others added 24 commits May 16, 2024 07:44
Use `os.devnull` instead of `/dev/null`.
…#91579)

This patch adds nsw flag to the increment of do-variables when a new
option is enabled.
NOTE 11.10 in the Fortran 2018 standard says they never overflow.

See also the discussion in llvm#74709 and the following discourse post.
https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/5
The error checking is only for .macro directives. Move it to the .macro
parser to remove one parameter.
…m#90578)

This patch add support of intrinsics GNU extension ETIME
llvm#84205. Some usage info and
example has been added to `flang/docs/Intrinsics.md`. The patch contains
both the lowering and the runtime code and works on both Windows and
Linux.


|   System  |   Implmentation  |
|-----------|--------------------|
| Windows| GetProcessTimes |
| Linux      |times                     |
…andled by LegalizeVectorOps. (llvm#92332)

The expand code is present, but we were missing the type query code
so the nodes would be ignored until LegalizeDAG.
…erleavedMemoryOpCost. (llvm#91825)

isLegalInterleavedAccessType expects the subvector type, but
getInterleavedMemoryOpCost is called with the full vector type. So we
need to divide by Factor.
…ter (llvm#92303)

As noted in
llvm#91440 (comment),
if the pass pipeline stops early because of -stop-after any allocated
passes added with insertPass will not be freed if they haven't already
been added.

This was showing up as a failure on the address sanitizer buildbots. We
can fix it by instead passing the pass ID instead so that allocation is
deferred.
I built it and confirmed this fixes the issue locally.

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
Currently the irdl dialect page has no content beyond the header.

By referring to the Ops.td in the CMake config, it pulls in all the
types, attributes, etc., so that the doc generation can include them all
in the page.

Rendered locally to confirm it fixes the issue


![image](https://github.com/llvm/llvm-project/assets/2467754/8758f324-6bc3-4f0e-8fa9-8962cdb0177f)

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
…ariables before consuming it (llvm#92218)"

This reverts commit 3a4c1b9.

This breaks a bot on clang-s390x-linux
This field is present in LLVM, but was missing from the MLIR wrapper
type. This addition allows MLIR languages to add proper DWARF info for
GPU programs.
In .macro, \+ expands to the per-macro invocation count.
https://sourceware.org/pipermail/binutils/2024-May/134009.html

\+ counts from 0 for .irp/.irpc/.rept .

Note: We currently prints \q for `.print "\q"` while gas doesn't. This
patch does not change this behavior.
If there is only one non-terminator operation in the update region then
the update operation can be found and we can try to generate an
atomicrmw instruction. Otherwise use the cmpxchg loop.

Fixes llvm#91929
Addresses old TODO about the exp10 intrinsic not existing.
)

Unsupported ops on tile types can become dead after
`-convert-arm-sme-to-llvm` resulting in incorrect results. Verify such
operations don't exist post-conversion and fail if they do.

Based on discussion from
https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543
Base automatically changed from bump_to_ecce5ccd to feature/fused-ops September 3, 2024 20:08
An error occurred while trying to automatically change base from bump_to_ecce5ccd to feature/fused-ops September 3, 2024 20:08
@mgehre-amd mgehre-amd merged commit b8d108f into feature/fused-ops Sep 3, 2024
11 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_83891777 branch September 3, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment