Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 894d3eeb (Aug 15) (4) #357

Open
wants to merge 132 commits into
base: feature/fused-ops
Choose a base branch
from

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

wldfngrs and others added 30 commits August 14, 2024 08:22
…, fsqrt(, l, f128) to math.yaml. (llvm#103494)

Added auto function hdrgen specification for functions:  totalordermag(,f, l, f128), dsqrt(l, f128), fsqrt(, l, f128)
Also combine the GlobalISel tests into the SelectionDAG ones.
This commit adds three matchers that unlike the m_NonZero matcher
not only match constants, but also operations that implement the
InferIntRangeInterface. These matchers can then match a non-zero value
or a value that is not minus one based on the inferred range. Additionally,
the commit uses the new matchers in the getSpeculatability functions of
Arith's signed and unsigned integer divisions. At the moment, the
matchers only look at the defining operation to avoid expensive IR walks.

This range based matchers can be useful when hoisting divisions out of
a loop, which requires knowing the divisor is non-zero and not minus one
for signed divisions. Just checking for a constant divisor may not be
sufficient, if the divisor is, for example, the result of an operation that
returns the number of threads of a team of threads.
Allow subvector extraction as long as at least one operand extraction is free.

Refactor existing cases into a switch statement to allow easier reuse + future expansion.
)

It seems that the parameters can be passed through the class members.
...because it is too noisy to be useful right now, and its architecture
is terrible, so it can't act a starting point of future development.

The main problem with this checker is that it tries to do (or at least
fake) path-sensitive analysis without actually using the established
path-sensitive analysis engine.

Instead of actually tracking the symbolic values and the known
constraints on them, this checker blindly gropes the AST and uses
heuristics like "this variable was seen in a comparison operator
expression that is not a loop condition, so it's probably not too large"
(which was improved in a separate commit to at least ignore comparison
operators that appear after the actual `malloc()` call).

This might have been acceptable in 2011 (when this checker was added),
but since then we developed a significantly better standard approach for
analysis and this old relic doesn't deserve to remain in the codebase.

Needless to say, this primitive approach causes lots of false positives
(and presumably false negatives as well), which ensures that this alpha
checker won't be missed by the users.

Moreover, the goals of this checker would be questionable even if it had
a perfect implementation. It's very aggressive to assume that the
argument of malloc can overflow by default (unless the checker sees a
bounds check); and this produces too many false positives -- perhaps
even for an optin checker. It may be possible to eventually create a
useful (and properly path-sensitive) optin checker for these kinds of
suspicious code, but this is a very low priority goal.

Also note that we already have `alpha.security.TaintedAlloc` which
provides more practical heuristics for detecting somewhat similar
"argument of malloc may be too large" vulnerabilities.
…sions

There's some coverage in RISCVISAInfoTest, but it's worth adding a quick
test to ensure nothing happens to the frontend handling of this option.
When instantiating a delayed template, the recorded token stream is
passed to `Parser::ParseLateTemplatedFuncDef` which will append the
current token "so it doesn't get lost". With incremental extensions
enabled, this is `repl_input_end` which subsequently needs support for
(de)serialization.
… order (llvm#102844)

Put the newest standards first, same as for the [C++ status
page](https://clang.llvm.org/cxx_status.html).

The diff is pretty busted, but I swear I copy & pasted faithfully 😅 

The only change beyond shuffling sections around is unfolding the
sections for C99/C11 (6dbce28), which
isn't necessary anymore now that they're safely tucked away towards the
end of the page.
…file (llvm#103004)"

This reverts commit 2d53f0a.

This causes warnings when building with MSVC.
getRawData exposes some internal details of APInt.

The code was iterating over the uint64_t pieces and then iterating
breaking them into 4 uint16_t pieces.

This patch changes the code to extract 16-bit pieces directly from the
APInt without using getRawData.
…te global data (llvm#101224)

This patch aims to reduce TOC usage by merging internal and private
global data.

Moreover, we also add the GlobalMerge pass within the PPCTargetMachine
pipeline, which is disabled by default. This transformation can be
enabled by -ppc-global-merge.
…nstant into a signed comparison (llvm#103480)

Given an unsigned integer comparison of `add nsw X, C1` with some
constant `C2` we can fold it into a signed comparison of `X` and `C2 -
C1` under the following conditions:
  * There's a `nsw` flag on the addition
  * `C2` is non-negative
  * `X + C1` is non-negative
  * `C2 - C1` is non-negative
…lvm#103392)

... whereever we have the Decl for it, and even when we don't keep the
SourceLocation of it aimed at the call site.

Fixes: llvm#102983
In preparing for the future upcoming patches, just moving the call to
the proper place, which is NFC for now.
…p_atomics (llvm#103732)

This commit adds support amdgpu-unsafe-gp-atomics attr plumbing
via introduction of `rocdl.unsafe_fp_atomics`.

This adds the missing translation for amdgpu-waves-per-eu attr.
…vm#103927)

This commit changes the LLVM dialect's inliner interface to no longer be
registered at dialect initialization. Instead, it is now a promised
interface, that needs to be registered explicitly. This change is
desired to avoid pulling in a lot of dependencies into the
`MLIRLLVMDialect` library, especially considering future patches that
plan to extend it further with strong IR analysis.
…rs, NFC

GateredScalars is a full copy of the E->Scalars in this places and can
be safely used for now. Unifies the code across the function.
…to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (llvm#101751)

If the upper bits of the shr aren't demanded.

This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.

There are some regressions in ARM and Thumb2.
…lvm#102952)

This PR addresses the issue detailed in
iree-org/iree#17948.

The problem occurs when distributed types are set to NULL, leading to
compilation crashes.

---------

Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
…eded for explicit symbol visibility (llvm#103900)

In multiple source files function definitions never sees there
declaration in a header because its never included causing linker errors
when explicit symbol visibility macros\dllexport are added to the
declarations.

Most of these were originally found by @tstellar in
llvm#67502

TargetRegistry.h is needed in MCExternalSymbolizer.cpp for
createMCSymbolizer
Analysis/Passes.h is needed in LazyValueInfo.cpp and RegionInfo.cpp for
createLazyValueInfoPassin and createRegionInfoPass
Transforms/Scalar.h is needed in SpeculativeExecution.cpp for
createSpeculativeExecutionPass
MaxEW707 and others added 27 commits August 14, 2024 21:51
…+ / VS2019+ (llvm#102848)

Partial fix for llvm#92204.
This PR just fixes VS2019+ since that is the suite of compilers that I
require link compatibility with at the moment.
I still intend to fix VS2017 and to update llvm-undname in future PRs.
Once those are also finished and merged I'll close out
llvm#92204.
I am hoping to get the llvm-undname PR up in a couple of weeks to be
able to demangle the VS2019+ name mangling.

MSVC 1920+ mangles placeholder return types for non-templated functions
with "@".
For example `auto foo() { return 0; }` is mangled as `?foo@@ya@XZ`.

MSVC 1920+ mangles placeholder return types for templated functions as
the qualifiers of the AutoType followed by "_P" for `auto` and "_T" for
`decltype(auto)`.
For example `template<class T> auto foo() { return 0; }` is mangled as
`??$foo@H@@ya?A_PXZ` when `foo` is instantiated as follows `foo<int>()`.

Lambdas with placeholder return types are still mangled with clang's
custom mangling since MSVC lambda mangling hasn't been deciphered yet.
Similarly any pointers in the return type with an address space are
mangled with clang's custom mangling since that is a clang extension.

We cannot augment `mangleType` to support this mangling scheme as the
mangling schemes for variables and functions differ.
auto variables are encoded with the fully deduced type where auto return
types are not.
The following two functions with a static variable are mangled the same
```
template<class T>
int test()
{
    static int i = 0; // "?i@?1???$test@H@@yahxz@4HA"
    return i;
}

template<class T>
int test()
{
    static auto i = 0; // "?i@?1???$test@H@@yahxz@4HA"
    return i;
}
```
Inside `mangleType` once we get to mangling the `AutoType` we have no
context if we are from a variable encoding or some other encoding.
Therefore it was easier to handle any special casing for `AutoType`
return types with a separate function instead of using the `mangleType`
infrastructure.
FindCountedByField can be used in more places than CodeGen. Move it into
FieldDecl to avoid layering issues.
llvm#96649)

C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
This patch adds a verifier to `tosa.table` which fixes a crash. Fix
llvm#103086.
Move VPWidenStoreRecipe::execute to VPlanRecipes.cpp in line with
other ::execute implementations that don't depend on anything
defined in LoopVectorization.cpp
)

For now, the testcases are grouped in a single TEST. I'll sort them out
and add more testcases in follow-up commits.
3 MLIR tests `FAIL` on SPARC, both Solaris/sparcv9 and Linux/sparc64:
```
  MLIR :: Conversion/ArithToSPIRV/arith-to-spirv-le-specific.mlir
  MLIR :: IR/elements-attr-interface.mlir
  MLIR :: Target/LLVMIR/llvmir-le-specific.mlir
```
The issue is always the same: the tests in question are
little-endian-only currently, so this patch `XFAIL`s them on `sparc*` as
is already done for `s390x`.

Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
…lvm#103722)

`Flang :: Lower/default-initialization-globals.f90` `FAIL`s on SPARC,
both Solaris/sparcv9 and Linux/sparc64.

The failure mode is same as on AIX/PowerPC, so both targets being
big-endian, this patch treats them the same.

Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
…C) (llvm#103723)

This makes `LayoutAlignElem` / `PointerAlignElem` and `AlignTypeEnum`
inner types of `DataLayout`. The types are also renamed to match their
meaning (LangRef refers to them as "specification" and "specifier").

Pull Request: llvm#103723
Removing them simplifies the content and means we don't confuse anyone
who joined after the Phabricator shutdown.

You could use them for review archaeology but this is only a subset of
the names you'd encounter there anyway. So I don't think this is a good
reason to keep them here. With a couple of exceptions the
Phabricator/GitHub names are the same and/or related to their full name
anyway.
…lvm#103730)

`Flang :: Driver/fveclib-codegen.f90` currently `FAIL`s on SPARC, both
Solaris/sparcv9 and Linux/sparc64:
```
bin/flang-new -S -Ofast -fveclib=LIBMVEC -o - /vol/llvm/src/llvm-project/local/flang/test/Driver/fveclib-codegen.f90

flang/test/Driver/fveclib-codegen.f90:11:10: error: CHECK: expected string not found in input
! CHECK: _ZGVbN4vv_powf
         ^
```
The code in question only contains calls to `powf`. Given that `glibc`
only supports `libmvec` on `aarch64` and `x86_64`, this test targets
only those if possible.

Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
Until llvm#103056 lands
or another more appropriate check can be found.

This test fails on Ubuntu Focal where zdump is built with 32 bit time_t
but passes on Ubuntu Jammy where zdump is built with 64 bit time_t.

Marking it unsupported means Linaro can upgrade its bots to Ubuntu
Jammy without getting an unexpected pass.
This commit introduces a slicing utility that can be used to walk
arbitrary IR slices. It additionally ships logic to determine control
flow predecessors, which allows users to walk backward slices without
dealing with both `RegionBranchOpInterface` and `BranchOpInterface`.

This utility is used to improve the `noalias` propagation in the LLVM
dialect's inliner interface. Before this change, it broke down as soon
as pointer were passed through region control flow operations.
Adds m_FPToUI/m_FPToSI matchers for ISD::FP_TO_UINT/ISD::FP_TO_SINT in SDPatternMatch.h with suitable test coverage.

Fixes llvm#103872
…llvm#104037)

The target needs to be initialized in order to compute the correct
target triple from the command line. Without initialized targets the OS
component of the triple might not reflect what would be computed by the
driver for an actual compiler invocation.

Fixes llvm#61762
…pNestOp (llvm#103731)

This patch adds an assert to `genLoopNestClauses` to ensure the number
of symbols and corresponding loop wrapper entry block arguments have the
same size. This is checked by some of the callers, but it makes more
sense moving it into the function itself and avoid having to replicate
it.
This updates the "dxil-metadata-emit" pass flag to be spelled
"dxil-translate-metadata" to better match the pass name.

Pull Request: llvm#104249
Base automatically changed from bump_to_98119718 to feature/fused-ops October 4, 2024 14:33
An error occurred while trying to automatically change base from bump_to_98119718 to feature/fused-ops October 4, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.