[AutoBump] Merge with 83891777 (May 16) (48) #307

mgehre-amd · 2024-08-23T22:17:03Z

No description provided.

As mentioned in llvm#68882 and https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699 Gep arithmetic isn't consistent with different types. GVNSink didn't realize this and sank all geps as long as their operands can be wired via PHIs in a post-dominator. Fixes: llvm#85333 Reapply: llvm#88440 after fixing the non-determinism issues in llvm#90995

Adds the LLVM vector.deinterleave2 intrinsic to the MLIR LLVM dialect. The deinterleave2 intrinsic takes a vector and returns two vectors with the first having even elements and the second with odd elements from the input vector. The inverse of vector.interleave2.

MCOperand has a constructor that permits a nullptr MCInst, and BOLT makes use of that. Adjust MCOperand's dumper to permit such use.

Remove 'Valid' local boolean that has a single use, and return directly instead.

…#91933) Given `foo...[idx]` if idx is value dependent, the expression is type dependent. Fixes llvm#91885 Fixes llvm#91884

Remove excess parentheses and use `boolean ? true-case : false-case` idiom.

When writing the test for this I seemingly forgot to put 'CHECK' on the lines, so I didn't notice that I was printing the identifiers as pointers rather than their names. This patch corrects the tests and the print behavior.

Closes llvm#91188

…92072) Fixes llvm#92062

…0448) This patch rewrites the ArmSME tile allocator to use liveness information to make better tile allocation decisions and improve the correctness of the ArmSME dialect. This algorithm used here is a linear scan over live ranges, where live ranges are assigned to tiles as they appear in the program (chronologically). Live ranges release their assigned tile ID when the current program point is passed their end. This is a greedy algorithm (which is mainly to keep the implementation relatively straightforward), and because it seems to be sufficient for most kernels (e.g. matmuls) that use ArmSME. The general steps of this are roughly from https://link.springer.com/content/pdf/10.1007/3-540-45937-5_17.pdf, though there have been a few simplifications and assumptions made for our use case. Hopefully, the only changes needed for a user of the ArmSME dialect is that: - `-allocate-arm-sme-tiles` will no longer be a standalone pass - `-test-arm-sme-tile-allocation` is only for unit tests - `-convert-arm-sme-to-llvm` must happen after `-convert-scf-to-cf` - SME tile allocation is now part of the LLVM conversion By integrating this into the `ArmSME -> LLVM` conversion we can allow high-level (value-based) ArmSME operations to be side-effect-free, as we can guarantee nothing will rearrange ArmSME operations before we emit intrinsics (which could invalidate the tile allocation). The hope is for ArmSME operations to have no hidden state/side effects and allow easily lowering dialects such as `vector` and `arith` to SME, without making assumptions about how the input IR looks, as the semantics of the operations will be the same. That is no (new) side effects and the IR follows the rules of SSA (a value will never change). The aim is correctness, so we have a base for working on optimizations.

A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.

…m#92004) This makes the `vc-rev-enabled` feature unsupported if we fail to retrieve the git revision for any reason, such as if git is not installed.

…straints (llvm#92104) Clangd uses it to determine whether the argument is within the selection range. Fixes clangd/clangd#2033

…filename and location info (llvm#92050)

PR llvm#80680 added bits in the codegen to lazily add convergence intrinsics when required. This logic relied on the LoopStack. The issue is when parsing the condition, the loopstack doesn't yet reflect the correct values, as expected since we are not yet in the loop. However, convergence tokens should sometimes already be available. The solution which seemed the simplest is to greedily generate the tokens when we generate SPIR-V. Fixes llvm#88144 --------- Signed-off-by: Nathan Gauër <brioche@google.com>

Now that we've got (minus some issues around datatypes and invariant loads) working lowerings for address space 7, update the table in the AMDGPU usage guide to properly indicate the nature of these address spaces.

…llvm#92067) cm.push can't save X26 without also saving X27. This removes two other checks for this case. This causes CFI to be emitted since X27 is now explicitly a callee saved register. The affected tests use inline assembly to clobber X26 rather than the whole range of s0-s10.

Allow mixing objects with/without signed class ro data and category class properties as long as it happens before we register the metadata. These combinations are a warning in ld, not a hard error. The only case that is ABI-breaking is if we already registered with the feature enabled but later try to load an object that doesn't support it. rdar://127336061

…inations of 32-bit integers. NFC

tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid wastefully calling tryToCreateDiffCheck. This patch is an NFC simplification of program logic.

The target combine is no longer required because InstCombine will transform the DIV by a power of 2 into a multiply, so in practice this case will never trigger. Additionally, the generated code would have been incorrect for streaming(-compatible) functions, because it assumed NEON was available.

…lvm#92086) self.wait_for_running_event(process) is always called after self.runCmd("continue"). It is strange to expect eStateConnected here. This test failed in case of a remote target. The correct state is eStateRunning. Removed incorrect checking.

) The cost of `experimental.cttz.elts` in RISC-V equals to the cost of vfirst when the zero_is_poison argument is true. Otherwise, we add additional costs of cmp + select to convert the -1 result from vfirst to EVL.

…#90500) Currently, clang postpones all semantic analysis of unary operators with operands of pointer/pointer to member/array/function type until instantiation whenever that type is dependent (e.g. `T*` where `T` is a type template parameter). Consequently, the uninstantiated AST nodes all have the type `ASTContext::DependentTy` (which, for the purposes of llvm#90152, is undesirable as that type may be the current instantiation! (e.g. `*this`)) This patch moves the point at which we perform semantic analysis for such expression to be prior to instantiation.

llvm#91137 reverted in llvm#92001 A build error fix added in 28d5ece --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

Most diagnostics obey https://llvm.org/docs/CodingStandards.html#error-and-warning-messages but some diverge. Fix them. While here, adjust some diagnostics. Pull Request: llvm#92024

@kiranchandramohan

Currently, only those global variables which are at compile unit scope are added to the 'globals' list of the DICompileUnit. This does not work for languages which support modules (e.g. Fortran) where hierarchy can be variable -> module -> compile unit. To fix this, if a variable scope points to a module, we walk one level up and see if module is in the compile unit scope. This was initially part of llvm#91582 which adds debug information for Fortran module variables. @kiranchandramohan pointed out that MLIR changes should go in separate PRs.

Use `os.devnull` instead of `/dev/null`.

…#91579) This patch adds nsw flag to the increment of do-variables when a new option is enabled. NOTE 11.10 in the Fortran 2018 standard says they never overflow. See also the discussion in llvm#74709 and the following discourse post. https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/5

The error checking is only for .macro directives. Move it to the .macro parser to remove one parameter.

…m#90578) This patch add support of intrinsics GNU extension ETIME llvm#84205. Some usage info and example has been added to `flang/docs/Intrinsics.md`. The patch contains both the lowering and the runtime code and works on both Windows and Linux. | System | Implmentation | |-----------|--------------------| | Windows| GetProcessTimes | | Linux |times |

…andled by LegalizeVectorOps. (llvm#92332) The expand code is present, but we were missing the type query code so the nodes would be ignored until LegalizeDAG.

…erleavedMemoryOpCost. (llvm#91825) isLegalInterleavedAccessType expects the subvector type, but getInterleavedMemoryOpCost is called with the full vector type. So we need to divide by Factor.

rdar://127846581

…ter (llvm#92303) As noted in llvm#91440 (comment), if the pass pipeline stops early because of -stop-after any allocated passes added with insertPass will not be freed if they haven't already been added. This was showing up as a failure on the address sanitizer buildbots. We can fix it by instead passing the pass ID instead so that allocation is deferred.

I built it and confirmed this fixes the issue locally. Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

Currently the irdl dialect page has no content beyond the header. By referring to the Ops.td in the CMake config, it pulls in all the types, attributes, etc., so that the doc generation can include them all in the page. Rendered locally to confirm it fixes the issue ![image](https://github.com/llvm/llvm-project/assets/2467754/8758f324-6bc3-4f0e-8fa9-8962cdb0177f) Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

…ariables before consuming it (llvm#92218)" This reverts commit 3a4c1b9. This breaks a bot on clang-s390x-linux

This field is present in LLVM, but was missing from the MLIR wrapper type. This addition allows MLIR languages to add proper DWARF info for GPU programs.

…ion" (llvm#92354) Reverts llvm#90578 This broke the premerge linux buildbot.

Wrongly removed in 45cc6bd.

In .macro, \+ expands to the per-macro invocation count. https://sourceware.org/pipermail/binutils/2024-May/134009.html \+ counts from 0 for .irp/.irpc/.rept . Note: We currently prints \q for `.print "\q"` while gas doesn't. This patch does not change this behavior.

If there is only one non-terminator operation in the update region then the update operation can be found and we can try to generate an atomicrmw instruction. Otherwise use the cmpxchg loop. Fixes llvm#91929

Support `R_AARCH64_AUTH_RELATIVE` relocation compression as described in https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#relocation-compression

Addresses old TODO about the exp10 intrinsic not existing.

) Unsupported ops on tile types can become dead after `-convert-arm-sme-to-llvm` resulting in incorrect results. Verify such operations don't exist post-conversion and fail if they do. Based on discussion from https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543

…#92288)

hiraditya and others added 30 commits May 14, 2024 06:13

Cope with MCOperand null Insts (llvm#91794)

cfa0947

MCOperand has a constructor that permits a nullptr MCInst, and BOLT makes use of that. Adjust MCOperand's dumper to permit such use.

[BOLT][NFC] Simplify CFG validation (llvm#91977)

725014d

Remove 'Valid' local boolean that has a single use, and return directly instead.

[Clang] Fix dependency computation for pack indexing expression (llvm…

312f83f

…#91933) Given `foo...[idx]` if idx is value dependent, the expression is type dependent. Fixes llvm#91885 Fixes llvm#91884

[BOLT][NFC] Simplify successor check (llvm#91980)

1aff294

Remove excess parentheses and use `boolean ? true-case : false-case` idiom.

[OpenACC] Fix ast-print of device_type clause

03eba20

When writing the test for this I seemingly forgot to put 'CHECK' on the lines, so I didn't notice that I was printing the identifiers as pointers rather than their names. This patch corrects the tests and the print behavior.

[libclc] Clarify condition expression (NFC)

e60b83a

Closes llvm#91188

[GlobalIsel][AArch64] fix out of range access in regbankselect (llvm#…

d422e90

…92072) Fixes llvm#92062

[AArch64] Postcommit fixes for histogram intrinsic (llvm#92095)

2b15c4a

A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.

[PowerPC][test] Catch any exception when retrieving git revision (llv…

d9db266

…m#92004) This makes the `vc-rev-enabled` feature unsupported if we fail to retrieve the git revision for any reason, such as if git is not installed.

[Clang] Retain the angle loci for invented template parameters of con…

8070b2d

…straints (llvm#92104) Clangd uses it to determine whether the argument is within the selection range. Fixes clangd/clangd#2033

[Support] Add option to print SMDiagnostic into a buffer without the …

a4accdf

…filename and location info (llvm#92050)

Update documentation for buffer fat pointers (llvm#92034)

ac0d415

Now that we've got (minus some issues around datatypes and invariant loads) working lowerings for address space 7, update the table in the AMDGPU usage guide to properly indicate the nature of these address spaces.

[LoongArch] Add test cases for div/mod to cover various extended comb…

82434c7

…inations of 32-bit integers. NFC

[ARM] iabs.ll - regenerate test checks

b2c5e9b

Restore llvm#91137 (llvm#92003)

2ff43ce

llvm#91137 reverted in llvm#92001 A build error fix added in 28d5ece --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

[libc][bazel] Updates for 292b300

344c73e

RISCVAsmParser: Make diagnostics more conventional

5f7477a

Most diagnostics obey https://llvm.org/docs/CodingStandards.html#error-and-warning-messages but some diverge. Fix them. While here, adjust some diagnostics. Pull Request: llvm#92024

Fix Bazel Build (llvm#92139)

1355dcb

slydiman and others added 24 commits May 16, 2024 07:44

[lldb] Fixed the TestFdLeak test (llvm#92273)

ce961c5

Use `os.devnull` instead of `/dev/null`.

[clang-format][NFC] Reformat with 18.1.5

b11a660

[MCAsmParser] Simplify expandMacro

3cc445a

The error checking is only for .macro directives. Move it to the .macro parser to remove one parameter.

[LegalizeVectorOps][X86] Add ISD::ABDS/ABSDU to the list of opcodes h…

f2d7400

…andled by LegalizeVectorOps. (llvm#92332) The expand code is present, but we were missing the type query code so the nodes would be ignored until LegalizeDAG.

[RISCV] Pass subvector type to isLegalInterleavedAccessType in getInt…

487b43c

…erleavedMemoryOpCost. (llvm#91825) isLegalInterleavedAccessType expects the subvector type, but getInterleavedMemoryOpCost is called with the full vector type. So we need to divide by Factor.

[ORC] Support visionOS in LC_BUILD_VERSIONs for JITDylibs.

6bf1859

rdar://127846581

[clang] NFC: Add a few more interesting test cases for CWG2398

70a926c

[mlir] fix polynomial docs for MLIR website (llvm#92348)

5bd8091

I built it and confirmed this fixes the issue locally. Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

Revert "[Serialization] Read the initializer for interesting static v…

3c2638d

…ariables before consuming it (llvm#92218)" This reverts commit 3a4c1b9. This breaks a bot on clang-s390x-linux

[MLIR][LLVM] add dwarfAddressSpace to DIDerivedType (llvm#92043)

5c35b63

This field is present in LLVM, but was missing from the MLIR wrapper type. This addition allows MLIR languages to add proper DWARF info for GPU programs.

Revert "[flang] Add ETIME runtime and lowering intrinsics implementat…

6706aeb

…ion" (llvm#92354) Reverts llvm#90578 This broke the premerge linux buildbot.

[clang][Interp] Implement __builtin_shufflevector

45cc6bd

[github] Add keith back to bazel codeowners

e27f9bb

Wrongly removed in 45cc6bd.

[Flang][OpenMP] Fix update operation not found issue (llvm#92165)

89ee3ae

If there is only one non-terminator operation in the update region then the update operation can be found and we can try to generate an atomicrmw instruction. Otherwise use the cmpxchg loop. Fixes llvm#91929

[lld][AArch64][ELF][PAC] Support .relr.auth.dyn section (llvm#87635)

ca1f0d4

Support `R_AARCH64_AUTH_RELATIVE` relocation compression as described in https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#relocation-compression

InstCombine: Try to use exp10 intrinsic instead of libcall (llvm#92287)

ce1ce5d

Addresses old TODO about the exp10 intrinsic not existing.

SimplifyLibCalls: Use IRBuilder helpers for creating intrinsics (llvm…

8389177

…#92288)

[AutoBump] Merge with 8389177 (May 16)

5f25d08

mgehre-amd requested a review from cferry-AMD August 26, 2024 09:08

cferry-AMD approved these changes Aug 26, 2024

View reviewed changes

Base automatically changed from bump_to_ecce5ccd to feature/fused-ops September 3, 2024 20:08

An error occurred while trying to automatically change base from bump_to_ecce5ccd to feature/fused-ops September 3, 2024 20:08

mgehre-amd merged commit b8d108f into feature/fused-ops Sep 3, 2024
11 checks passed

mgehre-amd deleted the bump_to_83891777 branch September 3, 2024 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 83891777 (May 16) (48) #307

[AutoBump] Merge with 83891777 (May 16) (48) #307

mgehre-amd commented Aug 23, 2024

[AutoBump] Merge with 83891777 (May 16) (48) #307

[AutoBump] Merge with 83891777 (May 16) (48) #307

Conversation

mgehre-amd commented Aug 23, 2024