Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of d5863721 (Jun 11) (69) #333

Merged
merged 1,488 commits into from
Sep 11, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

koachan and others added 30 commits June 9, 2024 22:16
This adds the alternate mnemonics for movrz and movrnz.

Reviewers: s-barannikov, jrtc27, brad0, rorth

Reviewed By: s-barannikov

Pull Request: llvm#94252
Implements parts of:
- P0355 Extending chrono to Calendars and Time Zones
We can implicitly convert RemainingVDs to an ArrayRef.  Note that
RemainingVDs is of type SmallVector<InstrProfValueData, 24>.
…td. NFC

Remove unneeded parameters or sync into class if they are only
ever used with one value.
Vectors are supported for fp operations now, so remove the assert. The
supported type/operation combinations are best left for the verifier.
Avoids regression in future commit that starts treating some vector
cases as legal.
…interfaces (llvm#94908)

Fully qualify all namespaces appearing in `GPUTargetAttrInterface` and
`OffloadingLLVMTranslationAttrInterface`. If they're not fully qualified
then out-of-tree dialects might encounter name resolution errors.
…ScopeDirectiveClass case (llvm#77535) (llvm#84135)

Fix llvm#77535, Change unstable assertion into compilation error, and add a
test for it.
…C) (llvm#94859)

VTableNamePtr and CompressedVTableNamesLen are always used together to
create a StringRef in getSymtab.

We can create the StringRef ahead of time in readHeader.  This way,
IndexedInstrProfReader becomes a tiny bit simpler with fewer member
variables.  Also, StringRef default-constructs itself with its Data
and Length set to nullptr and 0, respectively, which is exactly what
we need.
This implements the throwing overload and the exception classes throw by
this overload.

Implements parts of:
- P0355 Extending chrono to Calendars and Time Zones
…94918)

Make `sym_name` an inherent attr in GPUModuleOp so that it doesn't show
in the discardable attributes.
The change is safe as the attribute is always expected to be present.
MCInst is primarily used in local variables and MCRelaxableFragment
(mostly JMP/JCC for x86). Reducing the inline element count can make
MCRelaxableFragment smaller, potentially leading to a lower peak RSS.

When compiling sqlite3.c, x86-64 has the largest maximum numOperands.

aarch64: 5; ppc64: 6; riscv64: 3; s390x: 6; x86-64: 8

Here is the frequency table for x86-64:

max getNumOperands: 8
0: 676
1: 37892
2: 84046
3: 26767
4: 1640
5: 1222
6: 80794
7: 768
8: 22

Pull Request: llvm#94913
This PR fixes attribute registration for `SI8Attr` and `UI8Attr` in
`ir.py`.
…4922)

BinaryIdsStart and BinaryIdsSize in IndexedInstrProfReader are always
used together, so this patch packages them into an ArrayRef<uint8_t>.

For now, readBinaryIdsInternal immediately unpacks ArrayRef into its
constituents to avoid touching the rest of readBinaryIdsInternal.
relax-recompute-align.s might change when we change the fragment
relaxation approach.
Lazy relaxation caused hash table lookups (`getFragmentOffset`) and
complex use/compute interdependencies. Some expressions involding
forward declared symbols (e.g. `subsection-if.s`) cannot be computed.
Recursion detection requires complex `IsBeingLaidOut`
(https://reviews.llvm.org/D79570).

D76114's `invalidateFragmentsFrom` makes lazy relaxation even less
useful.

Switch to eager relaxation to greatly simplify code and resolve these
issues. This change also removes a `getPrevNode` use, which makes it
more feasible to replace the fragment representation, which might yield
a large peak RSS win.

Minor downsides: The number of section relaxations may increase (offset
by avoiding the hash table lookup). For relax-recompute-align.s, the
computed layout is not optimal.
…4896)

This commit fixes a crash in the ownership-based buffer deallocation
pass when indirectly calling a function via SSA value. Such functions
must be conservatively assumed to be public.

Fixes llvm#94780.
This implements the overload with the choose argument and adds this
enum.

Implements parts of:
- P0355 Extending chrono to Calendars and Time Zones
This makes codegen for array initialization simpler in two ways:
1. Drop the zero-index GEP at the start, which is no longer needed with
opaque pointers.
2. Emit GEPs directly to the correct element, instead of having a long
chain of +1 GEPs. This is more canonical, and also avoids regressions in
unoptimized builds from llvm#93823.
Refactor the pass to only support `IntrinsicInst` calls.

`ReplaceWithVecLib` used to support instructions, as AArch64 was using
this pass to replace a vectorized frem instruction to the fmod vector
library call (through TLI).

As this replacement is now done by the codegen (llvm#83859), there is no
need for this pass to support instructions.

Additionally, removed 'frem' tests from:
- AArch64/replace-with-veclib-armpl.ll
- AArch64/replace-with-veclib-sleef-scalable.ll
- AArch64/replace-with-veclib-sleef.ll

Such testing is done at codegen level:
- llvm#83859
…lvm#94754)

Prior to this patch VisualStudio._get_step_info incorrectly identifies
the reason the debugger has stopped. e.g., stepping through a program
would be reported as a StopReason.Breakpoint rather than
StopReason.Step.

Fix. No test added as there are no VisualStudio tests (tested locally).
RKSimon and others added 24 commits June 11, 2024 14:35
…translation (llvm#95098)"

Also reverts "[MLIR][Flang][DebugInfo] Convert debug format in MLIR translators"

The patch above introduces behaviour controlled by an LLVM flag into the
Flang driver, which is incorrect behaviour.

This reverts commits:
  3cc2710.
  460408f.
This makes sure we try to process declaration DIEs that are erroneously
present in the index. Until bd5c636, clang was emitting index
entries for declaration DIEs with DW_AT_signature attributes. This makes
sure to avoid returning those DIEs as the definitions of a type, but
also makes sure to pass through DIEs referring to static constexpr
member variables, which is a (probably nonconforming) extension used by
dsymutil.

It adds test cases for both of the scenarios. It is essentially a
recommit of llvm#91808.
…sn't drop below threshold

Upcoming SimplifyDemandedBits support for CMOV will simplify the code and reduce the critical path below the threshold if we stick with i32 multiplies
…ling

Add basic pass through handling - we could extend this to truncate CMOVQ to CMOVL in a future patch
Handling parallel region RaW conflicts should usually be the
responsibility of the source program, rather than bufferization
analysis. However, to preserve current functionality, checks on parallel
regions is put behind a bufferization in this PR, which is on by
default. Default functionality will not change, but this PR enables the
option to leave parallelism checks out of the bufferization analysis.
@mgehre-amd mgehre-amd changed the base branch from bump_to_be6248a4 to feature/fused-ops September 11, 2024 11:27
@mgehre-amd mgehre-amd merged commit 1f8fafc into feature/fused-ops Sep 11, 2024
4 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_d5863721 branch September 11, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.