Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 4ab73549 (Jun 04) (60) #321

Merged
merged 109 commits into from
Sep 11, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

atetubou and others added 30 commits June 3, 2024 07:30
…ead of std::vector by value.

Avoid std::vector copies as setDefaultProperties just iterates across the Records

Fixes llvm#89207
…#93815)

After recent improvements (llvm#80029) and testing on open-source projects,
the checker is ready to move out of the alpha package.
…#91472)

SCEVLoopGuardRewriter only replaces operands with equivalent values, so
we should be able to transfer the flags from the original expression.

PR: llvm#91472
Similar to the change previously made for binops, make m_Trunc()
only match instructions, not constant expressions. This is more
likely to cause a crash than do something useful.

Fixes crash reported at:
llvm#92885 (comment)
llvm#94203)

In llvm#94167 I found out that `cwg28xx.cpp` has been running without
`-pedantic-errors` and fixed that. This patch fixes that for the rest of
the test suite. Only one test was affected with a trivial fix (warning
was escalated to an error).

I'm intentionally leaving out test for CWG2390, because it requires
major surgery. It's addressed in llvm#94206.
Also mark the test as nounwind. The unwinding information does
not appear to be pertinent to the original intent of the test.
Make sure this test is preserved when icmp constant expressions
are removed.
…lvm#69704)

Moving the body of member functions out-of-line makes sense for classes
defined in implementation files too.
This PR fixes legalize info for G_BITREVERSE.
To make sure these are preserved when icmp constant expressions
are removed.
We add a feature that prevents the GlobalMerge pass from considering
data smaller than a minimum size in bytes for merging.

The MinSize is set in 3 ways:
1. If global-merge-min-data-size is explicitly set, then it uses that
value.
2. If SmallDataLimit is set and non-zero, then SmallDataLimit + 1 is
used.
3. Otherwise, 0 is used, which means all sizes are considered for
merging.

We found that this feature allowed us to see the benefit of the
GlobalMerge pass while eliminating some merging that was not beneficial.
This feature allowed us to enable the GlobalMerge pass on RISC-V in our
downstream by default because it led to improvements on multiple
benchmark suites.

I plan to post a separate patch to propose enabling this by default on
RISC-V. But I do not want that discussion to be part of the discussion
of adding this feature, so I am keeping the patches separate.
Noticed while triaging the failures on llvm#93673 - the attributor pass doesn't emit any range metadata in these tests
Check whether parsing of the argument failed before attempting
to build the expression. 

Fixes llvm#80474.
Removed foo-registered-target constraints from a bunch of tests, because
mostly the driver doesn't need to have a target availabile. I ran
check-clang-driver using a build with only the XCore target, and these
all passed.

There are ~50 tests that still have foo-registered-target, and it looks
like most of them are either doing codegen when they don't need to, or
don't really belong in the Driver tests. But that's a task for another
day.
When FMV was added to AArch64, it added a dependency expansion step
after the -cc1 command line was parsed but before Sema, in
AArch64TargetInfo::initFeatureMap. One effect of this is that
-target-features specified on the -cc1 command line had some level
of incomplete and broken dependency expansion. Since then, many tests
have been added which depend on this behaviour.

The dependency expansion can be considered broken at this stage because
dependency expansion is already performed by the driver to generate the
-target-feature flags using an ExtensionSet. This class does
dependency evaluation and then generates a flattened representation of
the dependency graph in the form of -target-features, which are passed
to -cc1 in an arbitrary order (determined by the order of bits in the
bitset). Any dependency expansion done after -cc1 will be inherently
contradictory. It is impossible to accurately treat negative features
once the dependency graph has been flattened and the order randomised.

This patch fixes a large number of those tests, specifically ones where
only a dependent feature (e.g. -target-feature +sme2p1) was added to
the test -cc1 command, and not the necessary dependencies (e.g.
-target-feature +sme).

See PR llvm#93695 further details.
…or.shuffle` (llvm#93858)

This PR tries to reland llvm#93595 which was reverted in llvm#93732 due to some
issues. The original PR:
- Add integration test for  `vector.shuffle` and `vector.interleave`
- Add `VectorToSPIRV` patterns to `GPUToSPIRVPass`

Description of the issue:
-
llvm#93595 (comment)
- Using either `vector.load` or `vector.store` in the kernel function
will cause the validation layer to report an error
- Trying to bypass the issue by using `memref.load` and `memref.store`
to load/store individual elements from/to the vectors, and populate the
vectors using `vector.insertelement` and `vector.extractelement`
instead.
…llvm#94204)

Since llvm#80801 clang requires a
template argument list after the use of the template keyword.

https://lab.llvm.org/buildbot/#/builders/176/builds/10230

error: a template argument list is expected after a name prefixed by the
template keyword [-Wmissing-template-arg-list-after-template-kw]

This fixes the instances found by the AArch64 Linux builds.
Ensure that FormatStringConverter's constructor fails with a sensible
error message rather than asserting if the format string is not a narrow
string literal.

Also, ensure that we don't even get that far in modernize-use-std-print
and modernize-use-std-format by checking that the format string
parameter is a char pointer.

Fixes llvm#92896
Fix typos in AGGRESIVE-->AGGRESSIVE + WAYAGGRESIVE->WAYAGGRESSIVE

This also exposed an issue that the WAYAGGRESSIVE run removed a block entirely, so the LABEL check was silently failing.

Noticed while triaging the failures on llvm#93673
…ndidates() (llvm#90260)

This reduce the time complexity of the main loop of `findCandidates()`
method from $O(n^2)$ to $O(n \log n)$.

For small $n$, the modification does not regress the build time, but it
helps significantly when $n$ is large.

For one application, this reduces the runtime of the main loop from 120
seconds to 28 seconds.

This is the first commit for an enhanced version of machine outliner --
see
[RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
As noted in
llvm#93796 (comment),
a better way to teach RISCVInsertVSETVLI to work without LiveIntervals
is to set VNInfo to nullptr and teach the various methods to handle it.
We should try that approach first, so we no longer need this pre-commit
patch.

This reverts commit 4b4d366.
The std::min behaves like 'a<b?a:b', which does not match
libstdc++/libc++ behavior like 'b<a?b:a' when input is NaN.

Make it consistent with libstdc++/libc++.

Fixes: llvm#93962

Fixes: ROCm/HIP#3502
reorganize the PPCInstrP10.td based on comment
llvm#92543 (comment)
 
The instructions or patterns defined by same predicates are currently
placed at several different locations , They will be reorganized into
same group based on these predicates in the patch.
…lvm#79875)

Particular example that lead to this is a very long chain of
`UsingShadowDecl`s that we hit in our codebase in generated code.

To avoid that, check for stack exhaustion when deserializing the
declaration. At that point, we can point to source location of a
particular declaration that is being deserialized.
Replace argmemonly readonly with memory(argmem: read).
FirstCand is a reference to RepeatedSequenceLocs[0]. However, that
vector is being modified a lot throughout the function, including one
place that reassigns the whole vector. I'm not sure whether this can
really happen in practice, but it doesn't seem unlikely that this could
lead to a use-after-free.

Avoid this by directly using RepeatedSequenceLocs[0] at the start of the
function (as a lot of other places already do) and only creating
FirstCand at the end where no more modifications take place.
…complete codegen line

Noticed while triaging the failures on llvm#93673
hanhanW and others added 26 commits June 3, 2024 16:39
)

The revision unrolls vector.bitcast like:

```mlir
%0 = vector.bitcast %arg0 : vector<2x4xi32> to vector<2x2xi64>
```

to

```mlir
%cst = arith.constant dense<0> : vector<2x2xi64>
%0 = vector.extract %arg0[0] : vector<4xi32> from vector<2x4xi32>
%1 = vector.bitcast %0 : vector<4xi32> to vector<2xi64>
%2 = vector.insert %1, %cst [0] : vector<2xi64> into vector<2x2xi64>
%3 = vector.extract %arg0[1] : vector<4xi32> from vector<2x4xi32>
%4 = vector.bitcast %3 : vector<4xi32> to vector<2xi64>
%5 = vector.insert %4, %2 [1] : vector<2xi64> into vector<2x2xi64>
```

The scalable vector is not supported because of the limitation of
`vector::createUnrollIterator`. The targetRank could mismatch the final
rank during unrolling; there is no direct way to query what the final
rank is from the object.
…llvm#94149)

- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
…bcxx/libcxxabi/libunwind.

-fvisibility-global-new-delete-hidden is deprecated and clang was warning
about it on every build command. These libraries are always built using
a stage2 compiler, so we can use the new build flag unconditionally.

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: llvm#88459
…achine block address taken. (llvm#94296)

These blocks usually show up in the form of branches within inline
assembly. Since it's hard to rewire them, we fully omit paths with such
blocks from path cloning.
…m#93775)

To support the third parameter of the alignment directive, R_LARCH_ALIGN
relocations need a non-zero symbol index.
In many cases we don't need the third parameter and can set the symbol
index to 0.
This patch will remove a lot of .Lla-relax-align* symbols and mitigate
the size regression due to
llvm#72962.

Co-authored-by: Jinyang He <hejinyang@loongson.cn>
Co-authored-by: Weining Lu <luweining@loongson.cn>
Previously this assumed that `LLVM_ENABLE_ABI_BREAKING_CHECKS` would
always be enabled in this case, if it's not `TTI` does not exist.

Introduced in 7652a59
It should preserve more analysis results, but it happens immediately
after instruction selection.
The MI is generated in `PPCDAGToDAGISel::Select` so the match pattern isn't used and can be removed.
* Improve the condition type requirement description ('scalar' ->
signless i1), to match what is actually verified.
* Use the `I1` type predicate instead of `AnyBooleanTypeMatch`.

Related discussion:
llvm#93351 (comment).
…m#94304)

This manifests as `AddressSanitizer: stack-use-after-return` w/o this
change. The `~CheckFEnv()` method of checking fenv seems to only work
for test fixtures.
…()` (llvm#93927)

Fixes a crash uncovered by
[pr89651](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr89651.f90)
in the test suite.

Fixes a crash caused by missing handling of `omp.private` ops in
`FirOpBuilder::getAllocaBlock()`.
…ing (llvm#94285)

A cycle profile of a thin link showed a lot of time spent in sort called
from the BitcodeWriter, which was being used to compute the unique
references to stack ids in the summaries emitted for each backend in a
distributed thinlto build. We were also frequently invoking lower_bound
to locate stack id indices in the resulting vector when writing out the
referencing memprof records.

Change this to use a map to uniquify the references, and to hold the
index of the corresponding stack id in the StackIds vector, which is
now populated at the same time.

This reduced the time of a large thin link by about 10%.
Sink vscale calls as well when indvars is not widen
(-indvars-widen-indvars=false).
…lable vector type. (llvm#93406)

FunctionStackPoisoner does not serve for `AllocaInst` with scalable
vector type, but it does not filter out struct type with scalable vector
introduced by c8eb535.
The old use of must-be-executed-context (MBEC) did propagate
through calls even if that was not allowed. We now only propagate from
call site arguments. If there are calls/intrinsics that allows
propagation, we need to add them explicitly.

Fixes: llvm#78507

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
A cycle profile showed that we were spending a lot of time invoking
MapVector::erase. According to
https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h,
erasing elements one at a time is very inefficient for MapVector and it
is better to use remove_if.

This change resulted in around 7% time reduction on a large thin link.

While here remove an unused function that also invokes erase on
MapVectors.
With the change in 2fa0591 we can now
use a range for loop.
Move functionality for patching build ID into a separate rewriter class
and change the way we do the patching. Support build ID in different
note sections in order to update the build ID in the Linux kernel binary
which puts in into ".notes" section instead of ".note.gnu.build-id".
…93206)

This patch picks up llvm#78598 with the hope that we can address such
crashes in `tryCaptureVariable()` for unevaluated lambdas.

In addition to `tryCaptureVariable()`, this also contains several other
fixes on e.g. lambda parsing/dependencies.

Fixes llvm#63845
Fixes llvm#67260
Fixes llvm#69307
Fixes llvm#88081
Fixes llvm#89496
Fixes llvm#90669
Fixes llvm#91633
…llvm#94045)

In some cases (see iree-org/iree#16285),
`memref.subview` ops can't be folded into transfer ops and sub-byte type
emulation fails. This issue has been blocking a few things, including
the enablement of vector flattening transformations
(iree-org/iree#16456). This PR extends the
existing sub-byte type emulation support of `memref.subview` to handle
multi-dimensional subviews with dynamic offsets and addresses the issues
for some of the `memref.subview` cases that can't be folded.

Co-authored-by: Diego Caballero <diegocaballero@google.com>
Base automatically changed from bump_to_12fcca0a to feature/fused-ops September 11, 2024 12:08
@mgehre-amd mgehre-amd merged commit 75dd1f5 into feature/fused-ops Sep 11, 2024
5 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_4ab73549 branch September 11, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.