forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 02654f73 (Aug 30) (18) #371
Open
mgehre-amd
wants to merge
99
commits into
bump_to_54916e57
Choose a base branch
from
bump_to_02654f73
base: bump_to_54916e57
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…m#105617) Follow-up on 8ac140f. The test `SemaTemplate/default-parm-init.cpp` was introduced since the fix llvm#80288 and mainly did the following things: - Ensure the default arguments are properly substituted inside either the primary template & their explicit / out-of-line specializations. - Ensure the strategy doesn't mess up the substitution of a lambda expression as a default argument. The 1st is for the bug of llvm#68490, yet it does some redundant work: each of the member functions is duplicated twice for the `sizeof` and `alignof` operators, respectively, and the principle under the hood are essentially the same. So this patch removes the duplication and reduces the 8 functions to 4 functions that reveal the same thing. The 2nd is presumably testing that the fix in llvm#80288 doesn't impact a complicated substitution. However, that seems unnecessary & unrelated to the original issue. And more importantly, we don't have any problems with that ever. Hence, I'll remove that test from this patch. The test for default arguments is merged into `SemaTemplate/default-arguments.cpp` with a new namespace, and hopefully this could reduce the entropy of our testing cases.
… BSTRINS_D Reviewed By: SixWeining Pull Request: llvm#106331
…S_D instruction Reviewed By: heiher, SixWeining Pull Request: llvm#106332
precommit f16 test for llvm#87506 fp-int conversion
The PR llvm#105996 broke taking the address of a vector: **compound-literal.c** ```C typedef int v4i32 __attribute((vector_size(16))); v4i32 *y = &(v4i32){1,2,3,4}; ``` That because the current interpreter handle vector unary operator as a fallback when the generic code path fail. but the new interpreter was not. we need to handle `UO_AddrOf` in `Compiler<Emitter>::VisitVectorUnaryOperator`. Signed-off-by: yronglin <yronglin777@gmail.com>
CompilerInstance can re-use same SourceManager across multiple frontendactions. During this process it calls `SourceManager::clearIDTables` to reset any caches based on FileIDs. It didn't reset IncludeLocMap, resulting in wrong include locations for workflows that triggered multiple frontend-actions through same CompilerInstance.
…nh/tanh intrinsics to support llvm#106584
This patch adds check for mutiples of `tosa.tile`. The `multiples` in `tosa.tile` indicates how many times the tensor should be replicated along each dimension. Zero and negative values are invalid, except for -1, which represents a dynamic value. Therefore, each element of `mutiples` should be positive integer or -1. Fix llvm#106167.
A optimizable cast can also be removed by VPlan simplifications. Remove the restriction from planContainsAdditionalSimplifications, as this causes it to miss relevant simplifications, triggering false positives for the cost decision verification. Also adds debug output for printing additional cost-precomputations. Fixes llvm#106641.
…ectorsCombine. (llvm#104774) UZP2 requires both operands to match the result type but the combine tries to replace a truncate by passing the pre-truncated operands directly to an UZP2 with the truncated result type. This patch nop-casts the operands to keep the DAG consistent. There should be no changes to the generated code, which is fine as it. This patch also enables more target specific getNode() validation for fixed length vector types.
…ecks (llvm#104478) The CMake docs state that `check_c_source_compiles()` checks whether the supplied code "can be compiled as a C source file and linked as an executable (so it must contain at least a `main()` function)." https://cmake.org/cmake/help/v3.30/module/CheckCSourceCompiles.html In practice, this command is a wrapper around `try_compile()`: - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/CheckCSourceCompiles.cmake#L54 - https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/Internal/CheckSourceCompiles.cmake#L101 When `CMAKE_SOURCE_DIR` is compiler-rt/lib/builtins/, `CMAKE_TRY_COMPILE_TARGET_TYPE` is set to `STATIC_LIBRARY`, so the checks for `float16` and `bfloat16` support work as intended in a Clang + compiler-rt runtime build for example, as it runs CMake recursively from that directory. However, when using llvm/ or compiler-rt/ as CMake source directory, as `CMAKE_TRY_COMPILE_TARGET_TYPE` defaults to `EXECUTABLE`, these checks will indeed fail if the code doesn't have a `main()` function. This results in LLVM using x86 SIMD registers when generating calls to builtins that, with Arch Linux's compiler-rt package for example, actually use a GPR for their argument or return value as they use `uint16_t` instead of `_Float16`. This had been caught in post-commit review: https://reviews.llvm.org/D145237#4521152. Use of the internal `CMAKE_C_COMPILER_WORKS` variable is not what hides the issue, however. PR llvm#69842 tried to fix this by unconditionally setting `CMAKE_TRY_COMPILE_TARGET_TYPE` to `STATIC_LIBRARY`, but it apparently caused other issues, so it was reverted. This PR just adds a `main()` function in the checks, as per the CMake docs.
…lvm#106707) * Revert "Fix MSVC "not all control paths return a value" warning. NFC." Dep to revert c9b6e01 * Revert "[AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763)" Breaks tests.
Make dsymutil return a non-zero exit code when crashing during linking.
Need to use FinalShuffle function for all vectorized results to correctly produce vectorized value. Fixes llvm#106655
By choosing an initial value whose mask is only used by the blend we can remove the need for the mask entirely.
Recursion here causes stack overflow on large inputs. Fixing by unrolling via a stack.
…intrinsics to support llvm#106584
Code lowering always generates fir.if else blocks for source level if statements, whether needed or not. Change this to only generate else blocks that are needed.
Trivially extend dd0cf23 ([LICM] Reassociate & hoist sub expressions) to handle unsigned predicates as well. Alive2 proofs: https://alive2.llvm.org/ce/z/GdDBtT.
Ever since 6859685 (or, precisely, 84428da) relative jumps emitted by the AVR codegen are off by two bytes - this pull request fixes it. ## Abstract As compared to absolute jumps, relative jumps - such as rjmp, rcall or brsh - have an implied `pc+2` behavior; that is, `jmp 100` is `pc = 100`, but `rjmp 100` gets understood as `pc = pc + 100 + 2`. This is not reflected in the AVR codegen: https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L89 ... which always emits relative jumps that are two bytes too far - or rather it _would_ emit such jumps if not for this check: https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L517 ... which causes most of the relative jumps to be actually resolved late, by the linker, which applies the offsetting logic on its own, hiding the issue within LLVM. [Some time ago](llvm@697a162) we've had a similar "jumps are off" problem that got solved by touching `shouldForceRelocation()`, but I think that has worked only by accident. It's exploited the fact that absolute vs relative jumps in the parsed assembly can be distinguished through a "side channel" check relying on the existence of labels (i.e. absolute jumps happen to named labels, but relative jumps are anonymous, so to say). This was an alright idea back then, but it got broken by 6859685. I propose a different approach: - when emitting relative jumps, offset them by `-2` (well, `-1`, strictly speaking, because those instructions rely on right-shifted offset), - when parsing relative jumps, treat `.` as `+2` and read `rjmp .+1234` as `rjmp (1234 + 2)`. This approach seems to be sound and now we generate the same assembly as avr-gcc, which can be confirmed with: ```cpp // avr-gcc test.c -O3 && avr-objdump -d a.out int main() { asm( " foo:\n\t" " rjmp .+2\n\t" " rjmp .-2\n\t" " rjmp foo\n\t" " rjmp .+8\n\t" " rjmp end\n\t" " rjmp .+0\n\t" " end:\n\t" " rjmp .-4\n\t" " rjmp .-6\n\t" " x:\n\t" " rjmp x\n\t" " .short 0xc00f\n\t" ); } ``` avr-gcc is also how I got the opcodes for all new tests like `inst-brbc.s`, so we should be good.
…san_disable (llvm#106727) This better matches lsan_enable and disable, which we are trying to emulate.
Fixes issue found here llvm#106691 (comment) The issue wasn't in the code change itself, just the unittest; the trailing marker wasn't properly cleaned up.
llvm#100692 changes clang template deduction, and an error was now emitted when building flang with top of the tree clang when mapping std::pow in intrinsics-library.cpp for constant folding `error: address of overloaded function 'pow' is ambiguous` See https://lab.llvm.org/buildbot/#/builders/4/builds/1670 I I am not expert enough to understand if the new error is justified or not here, but it is easy to help the compiler here with explicit wrappers to fix the builds.
… of for loop. NFC.
…lob expansion in lit's internal shell" (llvm#106763) Reverts llvm#106325 Broke some Buildbots.
If the operand node has the same scalars as one of the vectorized nodes, the compiler could miss this and incorrectly request minbitwidth data for the wrong node. It may lead to a compiler crash, because the vectorized node might have different minbw result. Fixes llvm#106667
Noticed in clang-formatting of llvm#106750
The worst possible case for a double literal goes like: ``` mov ... movk ..., lsl #16 movk ..., lsl #32 movk ..., lsl #48 fmov ... ``` The limit of 5 in the code gives the impression that `Insn` includes all instructions including the `fmov`, but that's not true. It only counts the integer moves. This led me astray on some other work in this area.
…106753) Similar to what we do in foldVMV_V_V with the passthru, if we end up changing the Src's VL in tryReduceVL we need to make sure it dominates. Fixes llvm#106735
To support detecting MD5 checksum mismatches, deal with SupportFiles rather than a plain FileSpecs in the SourceManager.
This patch implements sandboxir::ConstantFP mirroring llvm::ConstantFP.
There is no need to support Python 2.7 anymore, Python 3.3+ has `subprocess.DEVNULL`. This is good practice and also prevents file handles from staying open unnecessarily. Also remove a couple unused or unneeded `__future__` imports.
After landing support for actual vectorization of the "clustered" loads, need better estimate the cost between the masked gather and clustered loads. This includes estimation of the address calculation and better estimation of the gathered loads. Also, this estimation now relies on SLPCostThreshold option, allowing modify the behavior of the compiler. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#105858
Argument is another possible starting point for the pointer traversal, and PtrUseVisitor should be able to handle it.
…mber of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#106449
Consolidating code so that we have one copy instead of multiple reasoning about identity element. Note that we're (deliberately) not passing the FMF flags to common utility to preserve behavior in this change.
…optional" (llvm#106778) Reverts llvm#104668 This commit triggers an edge case that can cause circular `unrealized_conversion_cast` ops. llvm#106760 may fix it, but it is has other issues. Reverting this PR for now, until I find a solution for that problem.
Primary goal is having one way of doing this, to ensure that we don't end up with accidental divergence.
This patch updates the source cache dump command to print both the actual (on-disk) checksum and the expected (line table) checksum. To achieve that we now read and store the on-disk checksum in the cached object. The same information will be used in a future path to print a warning when the checksums differ.
Follow up fix to llvm#106332 `LoongArchMatInt.cpp:96:33: runtime error: shift exponent 64 is too large for 64-bit type` https://lab.llvm.org/buildbot/#/builders/169/builds/2681
…#106792) Reverts llvm#103371 There is `heap-use-after-free`, commented on 206b5af Maybe `if (Next == E || BB != Next->getParent()) {` is enough, but not sure, what was the intent there,
…m#98214) We split up all the headers into top-level modules when we broke up cycles with the C compatibility headers. However, this resulted in a large number of small modules, which is awkward and clearly against the philosophy of Clang modules. This was necessary to make things work. This patch regroups a few headers from two leaf modules: stop_token and pstl. It should be pretty uncontroversial that grouping these headers into a single module doesn't introduce any cyclic dependency, yet it's a first step towards reducing the number of top-level modules we have in our modulemap.
…lvm#106494) This patch contains two pars: - first to revert the patch llvm#101428. - second to remove `atomic_fetch_and_*()` to `atomic_<op>()` conversion (when return value is not used), but preserve `__sync_fetch_and_add()` to locked insn with cpu v1/v2.
This updates the expected diffferences document to capture the difference in multi-argument overload resolution between Clang and DXC. Fixes llvm#99530
cferry-AMD
approved these changes
Sep 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.