Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with e55d6f5e (Sep 11) (22) #376

Open
wants to merge 858 commits into
base: bump_to_37263b6c
Choose a base branch
from

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

alexey-bataev and others added 30 commits September 9, 2024 12:33
)

Migrate CodeGenHWModes to use const RecordKeeper and const Record
pointers.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
The `@llvm.dx.typedBufferLoad` intrinsic is lowered to `@dx.op.bufferLoad`.
There's some complexity here in translating to scalarized IR, which I've
abstracted out into a function that should be useful for samples, gathers, and
CBuffer loads.

I've also updated the DXILResources.rst docs to match what I'm doing here and
the proposal in llvm/wg-hlsl#59. I've removed the content about stores and raw
buffers for now with the expectation that it will be added along with the work.

Note that this change includes a bit of a hack in how it deals with
`getOverloadKind` for the `dx.ResRet` types - we need to adjust how we deal
with operation overloads to generate a table directly rather than proxy through
the OverloadKind enum, but that's left for a later change here.

Part of llvm#91367

Pull Request: llvm#104252
The inconsistency surfaced in
llvm#95305. Split off the reduce
the diff.
…lvm#107899)

This reapplies 8fa66c6 ([asan][windows]
Eliminate the static asan runtime on windows) for a second time.

That PR bounced off the tests because it caused failures in the other
sanitizer runtimes, these have been fixed by only building interception,
sanitizer_common, and asan with /MD, and continuing to build the rest of
the runtimes with /MT. This does mean that any usage of the static
ubsan/fuzzer/etc runtimes will mean you're mixing different runtime
library linkages in the same app, the interception, sanitizer_common,
and asan runtimes are designed for this, however it does result in some
linker warnings.

Additionally, it turns out when building in release-mode with
LLVM_ENABLE_PDBs the build system forced /OPT:ICF. This totally breaks
asan's "new" method of doing "weak" functions on windows, and so
/OPT:NOICF was explicitly added to asan's link flags.

---------

Co-authored-by: Amy Wishnousky <amyw@microsoft.com>
Fills in many missing functions from VectorType
This information helps with tuning the heuristic of selecting memory
groups to release the unused pages.
Fix a bug that `lto_runtime_lib_symbols_list` is returning the address
of a local variable that will be freed when getting out of scope. This
is a regression from llvm#98512 that rewrites the runtime libcall function
lists into a SmallVector.

rdar://135559037
partially fixes llvm#70078

### Changes
- Added `int_spv_sign` intrinsic in `IntrinsicsSPIRV.td`
- Added lowering and map to `int_spv_sign in
`SPIRVInstructionSelector.cpp`
- Added SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/sign.ll`

### Related PRs
- llvm#101988
- llvm#101989
…lvm#107858)

Change CGIOperandList::OperandInfo::Rec and CGIOperandList::TheDef to
const pointer.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
This patch implements the Pass base class and the FunctionPass sub-class
that operate on Sandbox IR.
…07919)

Fixes generation of invalid loads leading to misaligned access errors.
The bug got exposed by SLP vectorizer change ec360d6 which allowed SLP
to produce `v16i8` vectors.

Also updated the tests to use automatic check generator.
shifts are the same as sub where rhs == 0 is identity.
and is the inverted case where:
    `SELECT (AND(X,1) == 0), (AND Y, Z), Y`
        -> `(AND Y, (OR NEG(AND(X, 1)), Z))`
With -1 as the identity.

Closes llvm#107910
…m#107498)

Apparently, there are two almost identical implementations: one for
MachO and another one for ELF. The ELF bits somehow slipped while
llvm#84573 was reviewed.

The particular implementation is identical to MachO case.
This patch implements sandboxir::UndefValue mirroring llvm::UndefValue.
Lower `fcopysign` SDNodes into `copysign` PTX instructions where
possible. See [PTX ISA: 9.7.3.2. Floating Point Instructions: copysign]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-copysign).
…vm#107499)

This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore.

Issue [llvm#89287](llvm#89287)
This patch drop redundant rankReductionStrategy in
`populateFoldUnitExtentDimsViaSlicesPatterns` and fixes comment typos.
…lvm#107432)

After llvm#92205, LoongArch ISel
selects `div.w` for `trunc i64 (sdiv i64 3202030857, (sext i32 X to
i64)) to i32`. It is incorrect since `3202030857` is not a signed 32-bit
constant. It will produce wrong result when `X == 2`:
https://alive2.llvm.org/ce/z/pzfGZZ

This patch adds additional `sexti32` checks to operands of
`PatGprGpr_32`.
Alive2 proof: https://alive2.llvm.org/ce/z/AkH5Mp

Fix llvm#107414.
…#107909)

Make constructors that take const Record * implicit, allowing us to
simplify some range based loops to use that class instance as the loop
variable.

Change remaining constructor calls to use () instead of {} to construct
objects.
Fixes: llvm#107355

Reviewed By: SixWeining

Pull Request: llvm#107523
After 2773719

e.g.

```
external/llvm-project/libc/test/src/math/smoke/NextTowardTest.h:12:10: error: module llvm-project//libc/test/src/math/smoke:nexttowardf_test does not depend on a module exporting 'src/__support/CPP/bit.h'
```
…lvm#107897)

After landing llvm#99285 we found
that the call graph update was causing the following crash when
expensive checks are turned on
```
llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe
tRC)) && "New call edge is not trivial!"' failed.                                                                                                                                                                                                                                                                               
```
I have to admit I believe that the call graph update process I did for
that patch could be wrong.

After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced
that `CoroAnnotationElidePass` can be a FunctionPass and rely on the
adaptor to update the call graph for us, so long as we properly
invalidate the caller's analyses.

After this patch,
`llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer
fails under expensive checks.
We shouldn't assume that we're using system zlib installation.
This patch tries to infer is-power-of-2 from assumptions. I don't see
that this kind of assumption exists in my dataset.
Related issue: rust-lang/rust#129795

Close llvm#58996.
Data type conversion between fp16 and bf16 will generate fptrunc and
fpextend nodes, but they are actually bitcast nodes.
partially fixes llvm#70078

### Changes
- Implemented `sign` clang builtin
- Linked `sign` clang builtin with `hlsl_intrinsics.h`
- Added sema checks for `sign` to `CheckHLSLBuiltinFunctionCall` in
`SemaChecking.cpp`
- Add codegen for `sign` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`
- Add codegen tests to `clang/test/CodeGenHLSL/builtins/sign.hlsl`
- Add sema tests to `clang/test/SemaHLSL/BuiltIns/sign-errors.hlsl`

### Related PRs
- llvm#101987
- llvm#101988

### Discussion
- Should there be a `usign` intrinsic that handles the unsigned cases?
kazutakahirata and others added 29 commits September 11, 2024 06:40
Handle the case of same pointer used as both inputs to the
`CompareOptionRecords`, to avoid emitting errors for equivalent options.

Follow-up to llvm#107696.
…lvm#108207)

We will deref<>() it later, so this is the right check.
…lvm#53045

Add tests for "sub(select(icmp(a,b),a,b),select(icmp(a,b),b,a)) -> abd(a,b)" patterns that still fail to match to abd nodes

This will hopefully be helped by llvm#108218
Vector cases are broken, so leave those for later.
Check cost of all instructions in an interleave group, to prepare for
follow-up changes.
…uilding (llvm#108076)

* Split buildCoroutineFrame into code related to normalization and code
related to actually building the coroutine frame.
* This will enable future specialization of buildCoroutineFrame for
different ABIs while the normalization can be done by splitCoroutine
prior to calling buildCoroutineFrame.

See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
This is a three deep expression which is deeper than we've otherwise
gone for multiple expansions, but I think it's reasonable to do so. This
covers mul by 50, 100, and 200 which are reasonably common naturally
arising numbers.
…lvm#107785)"

This reverts commit 15106c2.  Commit does
not pass check-flang on x86 host.
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.

This patch includes the asm/disasm changes to encode/decode the 16bit
vsrc, vdst and src modifieres for vop and dpp format. This patch is a
dependency for many 16 bit instructions while only three instructions
are updated to make it easier to review.

There will be another patch to support these three instructions in the
codeGen level, this patch just replaces these two instructions with its
fake16 format.
This return is dead code as the return just above will always be taken.
There was a mistake in a comment regarding dyn_cast_or_null deprication.
It was suggested to use cast_if_present instead of dyn_cast_or_null, but
that was probably a copy paste mistake, and dyn_cast_if_present is the
function that should be used instead of dyn_cast_or_null.

Authored-by: Ofri Frishman <ofri.frishman@mobileye.com>
…#108215)

Right now `describe()`ing a `FunctionDecl` dups the whole code of the
function. Dump only its name.
…108020)

The check for IV increments in collectUsersInEntryBlock currently
triggers for exit-block PHIs which use the IV start value, resulting in
us failing to add the input value for the middle block to these PHIs.

Fix this by amending the check for IV increments to only include
incoming values that are instructions inside the loop.

Fixes llvm#108004
…7921)

Change CodeGenInstruction::{TheDef, InfereredFrom} to const pointers.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
Print a warning when the debugger detects a mismatch between the MD5
checksum in the DWARF 5 line table and the file on disk. The warning is
printed only once per file.
…lvm#108013)

Change SubtargetFeatureInfo to use const Record pointers.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
This is a preparation for upcoming changes to Dense[Map|Set] regarding
hardening against OOM scenarios (see [this
RFC](https://discourse.llvm.org/t/rfc-malfunction-safe-densemap-denseset/81036/7)).
We have changed a lot of code inside Dense[Map|Set] and this preparation
change helps to isolate the relevant parts from pure formatting stuff.
This makes it slightly easier to see what's different between the two.
…lvm#107889)

Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment