[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

mgehre-amd · 2024-09-24T16:48:09Z

No description provided.

This patch removes all of the Set.* methods from Status. This cleanup is part of a series of patches that make it harder use the anti-pattern of keeping a long-lives Status object around and updating it while dropping any errors it contains on the floor. This patch is largely NFC, the more interesting next steps this enables is to: 1. remove Status.Clear() 2. assert that Status::operator=() never overwrites an error 3. remove Status::operator=() Note that step (2) will bring 90% of the benefits for users, and step (3) will dramatically clean up the error handling code in various places. In the end my goal is to convert all APIs that are of the form ` ResultTy DoFoo(Status& error) ` to ` llvm::Expected<ResultTy> DoFoo() ` How to read this patch? The interesting changes are in Status.h and Status.cpp, all other changes are mostly ` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git grep -l SetErrorString lldb/source) ` plus the occasional manual cleanup.

…#102860) This patch switches most of the uses of intptr_t to uintptr_t within llvm-exegesis for the subprocess memory support. In the vast majority of cases we do not want a signed component of the address, hence making intptr_t undesirable. intptr_t is left for error handling, for example when making syscalls and we need to see if the syscall returned -1.

We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`. rdar://134425695

…lvm#102940) Problem: On AIX, functions registered by atexit in a shared library are not run when the library is dlclosed, but instead run (and fail because the function pointer is no longer valid) during main program exit. The profile-rt registers some functions with atexit: 1. writeFileWithoutReturn that writes out the profile file 2. llvm_delete_reset_function_list that does some cleanup in the gcov instrumentation library (not sure) And so right now, we get an "Illegal instruction (core dumped)" when an instrumented shared object is dlopen'ed and dlclosed. Solution: When a shared library is dlclose'd, destructors from the library are called. So create a destructor function that iterates over all known functions that profile-rt registers with atexit, and unregister the ones that have been registered and execute them. Scenarios tested: (0) gcov dlopen/dlclose (AIX/gcov-dlopen-dlclose.test) (1) multiple dlopen/dlclose of the same lib and multiple libs (instrprof-dlopen-dlclose.test) (2) dlopen but no dlclose (exists: Posix/instrprof-dlopen.test) (3) a simple fork testcase with dlopen/dlclose (instrprof-dlopen-dlclose.test) (4) dlopen/dlclose by multiple threads. (instrprof-dlopen-dlclose.test) (5) regular dynamic-linking of instrumented shared libs (exists: AIX/shared-bexpall-pgo.c) (6) a simple fork testcase produces correct profile (instrprof-fork.c) --------- Co-authored-by: Hubert Tong <hstong@ca.ibm.com>

Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.

This patch implements sandboxir::VAArgInst mirroring llvm::VAArgInst.

…d))`; NFC

…use-count Added folds: - `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)` - `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)` The fold typically is handled in the `Reassosiate` pass, but it fails if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate doesn't propagate flags correctly. This patch adds the fold explicitly the InstCombine Proofs: https://alive2.llvm.org/ce/z/p6JyRP Closes llvm#105866

…parable with function count for each candidate (llvm#106260) The current cost-benefit analysis between vtable comparison and function comparison require the indirect fallback branch to be cold. This is too conservative. This change allows vtable-comparison as long as vtable count is comparable with function count for each function candidate and removes the cold indirect fallback requirement. Tested: 1. Testing this on benchmarks uplifts the measurable performance wins. Counting the (possibly-duplicated) remarks (because of linkonce_odr functions, cross-module import of functions) show the number of vtable remarks increases from ~30k-ish to 50k-ish. 2. https://gcc.godbolt.org/z/sbGK7Pacn shows vtable-comparison doesn't happen today (using the same IR input)

…profiles for given functions (llvm#104654) Currently in extended binary format, sample reader only read the profiles when the function are in the current module at initialization time, this extends the support to read the arbitrary profiles for given input functions in later stage. It's used for llvm#101053.

We recently added various CPU_SUBTYPE_ARM64E values, notably including CPU_SUBTYPE_ARM64E_VERSIONED_PTRAUTH_ABI_MASK, which is 0x80000000U. The enum is better off as a uint32_t to accomodate that. This also hopefully helps silence GCC warnings reported on a ternary in CPU_SUBTYPE_ARM64E_WITH_PTRAUTH_VERSION. The subtype is already generally treated as a uint32_t elsewhere, so while there, change the new helpers to explicitly pass/return the subtype as uint32_t, and the individual narrower components as either bool or unsigned.

…#106035) In the clobbered FP/BP range, we can't use it as normal FP/BP to access stack. So if there are stack accesses due to register spill, scheduling or other back end optimization, we should report an error instead of silently generate wrong code. Also try to minimize the save/restore range of the clobbered FP/BP if the FrameSetup doesn't change stack size.

Build on the -slp-vectorize-non-power-of-2 experimental option, and support vectorizing reductions with 2^N-1 sized vector. Specifically, two related changes: 1) When searching for a profitable VL, start with the 2^N-1 reduction width. If cost model does not select that VL, return to power of two boundaries when halfing the search VL. The later is mostly for simplicity. 2) Reduce the minimum reduction width from 4 to 3 when supporting non-power of two vectors. This is required to support <3 x Ty> cases. One thing which isn't directly related to this change, but I want to note for clarity is that the non-power-of-two vectorization appears to be sensative to operand order of reduction. I haven't yet fully figured out why, but I suspect this is non-power-of-two specific.

This patch fixes: llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:845:12: error: variable 'RemainingVTableCount' set but not used [-Werror,-Wunused-but-set-variable] llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:306:23: error: private field 'PSI' is not used [-Werror,-Wunused-private-field] Here are a couple of domino effects: - Once I remove PSI, I need to update the contructor and its caller. - Once I remove RemainingVTableCount, I don't need TotalCount, so I am updating the caller as well.

…NFC) (llvm#106251) This patch forward ports the heterogeneous std::map::operator[]() from C++26 so that we can look up the map without allocating an instance of std::string when the key-value pair exists in the map. The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. The new list will be a hash set of tuples (SourceModule, GUID, ImportType) represented in a space efficient manner. That means that as we iterate over the hash set, we encounter SourceModule as many times as GUID. We don't want to create a temporary instance of std::string every time we look up ModuleToSummariesForIndex like: auto &SummariesForIndex = ModuleToSummariesForIndex[std::string(ILI.first)]; This patch removes the need to create the temporaries by enabling the hetegeneous lookup with std::set<K, V, std::less<>> and forward porting std::map::operator[]() from C++26.

llvm#105478) Currently, `getStackAlignment` asserts if the stack alignment wasn't specified. This makes it inconvenient to use and complicates testing. This change also makes `exceedsNaturalStackAlignment` method redundant.

Make some minor tweaks to AMDGPU tests to ensure they still work as intended after llvm#97762. These tests can be radically simplified after bitcast aware fpclass deduction.

…6238) This code has been unchanged for two years; let's simplify the code and remove configurability which makes the code harder to follow.

…m#105832) llvm#78086 provided the trait we want to use for this: `__libcpp_integer`. In some `libcxx/containers/views/mdspan` tests, improper uses of `char` are replaced with `signed char`. Fixes llvm#73715

New dep needed for 2bf2468

Works towards P0619R4/llvm#99985. - std::uncaught_exception was not previously deprecated. This patch deprecates it since C++17 as per N4259. std::uncaught_exceptions is used instead as libc++ unconditionally provides this function. - _LIBCPP_ENABLE_CXX20_REMOVED_UNCAUGHT_EXCEPTION restores std::uncaught_exception. - As a drive-by, this patch updates the C++20 status page to explain that D.11 is already done, since it was done in 578d09c.

Certain intrinsics map to builtins that require an immediate (literal) argument; make sure we report non-literal arguments. This has been kicking around downstream for a while, and the recent removal of the MMX builtins caused me to notice it again.

…pes. Need to use original cmp type i1 when estimating the cost for the buildvector node, not its operand types to prevent compiler crash upon TTI cost estimation.

Fixes failure on the llvm-clang-aarch64-darwin buildbot: https://lab.llvm.org/buildbot/#/builders/190/builds/4660/ The test mentioned does not rely on any unique property of X86, but does rely on the layout of the basic blocks produced by llc, which varies between targets. Although the test could be duplicated for other targets, it seems unnecessary since the behaviour being tested is not target-specific.

Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bb. Fixes llvm#106248.

…n in BB (llvm#105524)" Reverted (along with the NFC followup fix) due to buildbot failure: https://lab.llvm.org/buildbot/#/builders/160/builds/4142 This reverts commit 3ef37e2, and commit 616f7d3.

@mstorsjo

The underlying issue was discovered by an assert added in a800533 by a test case provided by @mstorsjo.

If the global variable is constant (but not constexpr), we need to diagnose, but keep evaluating.

… values VPERMILPS lower bits0-3 (to index per-lane i32/f32 0-3) VPERMILPD uses bit1 (to index per-lane i64/f64 0-1) Use SimplifyDemandedBits to ignore anything touching the remaining bits. Part of llvm#106413

) When including all targets, some files become too large for the NSIS installer to handle. Fixes llvm#101994

Add Windows include equivalents for includes and shell command.

Link restored from the original policy outlined here https://discourse.llvm.org/t/code-of-conduct-changes-related-to-llvm-project-policy-changes/64197

Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we prefetch memory for read on the source buffer. This patch adds prefetch for write on the destination buffer.

…it (llvm#106430) We were reporting ambigious references from using declarations as user can be depending on different overloads of a function just because they are visible in the TU. This doesn't apply to records, or primary templates as declaration being referenced in such cases is unambigious, the ambiguity applies to specializations though. Hence this patch returns an explicit reference to record decls and primary templates of those.

This follows Solaris behavior of allowing both mnemonics all the time. Fixes llvm#105639.

Fix llvm#105571 which demonstrates an end() iterator dereference when performing a non-empty splice to end() from a region that ends at Src::end(). Rather than calling Instruction::adoptDbgRecords from Dest, create a marker (which takes an iterator) and absorbDebugValues onto that. The "absorb" variant doesn't clean up the source marker, which in this case we know is a trailing marker, so we have to do that manually.

…106382) Many tests were easy to update, but these are quite big and I think it's better to autogenerate them to see the difference well.

This requires a bit of restructuring of ctor calls when checking for a potential constant expression.

…/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).

…r widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)

These few worked without changes.

LLVM has a CMake variable to control whether to consider logf128 constant folding which libAnalysis ignores. This patch changes the logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in config-ix.cmake.

…on in BB (llvm#105524)" Fixes the previous buildbot error by adding an explicit triple to the test, ensuring that llc can produce a valid object file. This reverts commit 926f097.

Reverts llvm#102147 It seems some systems which should support F128 are wrongly detected as not supporting. This might be due to checking `LDBL_MANT_DIG` instead of `__LDBL_MANT_DIG__`. I will investigate.

adrian-prantl and others added 30 commits August 27, 2024 10:59

[lldb] Add transitional backwards-compatible API to Status

d1d8edf

[lldb] Add missing namespace

c349ded

[libc++] Add missing include to three_way_comp_ref_type.h

0df7812

We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`. rdar://134425695

[BOLT] Handle internal calls in ValidateInternalCalls (llvm#105736)

abd69b3

Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.

[SandboxIR] Implement VAArgInst (llvm#106247)

ff81f9f

This patch implements sandboxir::VAArgInst mirroring llvm::VAArgInst.

[InstCombine] Add tests for reassosiating `(add/sub (sub/add) (sub/ad…

155e3aa

…d))`; NFC

[libc++] Add missing newline and remove unintended escape sequence

b2dd840

[lldb] Update Windows test to new Status API

5e64520

[SLP] Use early-return in canVectorizeLoads [nfc]

6a74b0e

[lldb] Update ProcessLauncherWinows to new Status API

b24ffa6

[AMDGPU] adjust tests to prevent fpclass bitcast folding (llvm#106268)

4c4908c

Make some minor tweaks to AMDGPU tests to ensure they still work as intended after llvm#97762. These tests can be radically simplified after bitcast aware fpclass deduction.

[SLP] Remove -slp-optimize-identity-hor-reduction-ops option (llvm#10…

ee764a2

…6238) This code has been unchanged for two years; let's simplify the code and remove configurability which makes the code harder to follow.

[libc++] Disallow character types being index types of extents (llv…

74e70ba

…m#105832) llvm#78086 provided the trait we want to use for this: `__libcpp_integer`. In some `libcxx/containers/views/mdspan` tests, improper uses of `char` are replaced with `signed char`. Fixes llvm#73715

[bazel][mlir] Add ConvertToSPIRV dep to mlir-vulkan-runner (llvm#106285)

2a3d735

New dep needed for 2bf2468

[MachO] Silence GCC warning on enum/non-enum in ternary. NFC.

bcb6e27

[lldb] Fix test expectation in TestFrameRecognizer.py (llvm#106281)

fc51797

alexey-bataev and others added 29 commits August 29, 2024 03:53

[SLP]Fix a crash when requestin the cost for buildvector cmp nodes ty…

fdf72c9

…pes. Need to use original cmp type i1 when estimating the cost for the buildvector node, not its operand types to prevent compiler crash upon TTI cost estimation.

Fix MSVC "not all control paths return a value" warning. NFC.

c3cb273

[LV] Use SCEV to analyze second operand for cost query.

0a272d3

Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bb. Fixes llvm#106248.

Revert "[DebugInfo][DWARF] Set is_stmt on first non-line-0 instructio…

926f097

…n in BB (llvm#105524)" Reverted (along with the NFC followup fix) due to buildbot failure: https://lab.llvm.org/buildbot/#/builders/160/builds/4142 This reverts commit 3ef37e2, and commit 616f7d3.

[LAA] Add test cases where evaluating AddRecs at symbolic max BTC wraps.

606a934

The underlying issue was discovered by an assert added in a800533 by a test case provided by @mstorsjo.

[SLP][NFC]Format canVectorizeLoads after previous NFC patches.

50515db

[SLP] Fix REQUIRES line for failing tests (llvm#106531)

9167667

[clang][bytecode] Properly diagnose non-const reads (llvm#106514)

cb608cc

If the global variable is constant (but not constexpr), we need to diagnose, but keep evaluating.

[InstCombine][X86] Add vpermilpd/vpermilps test coverage for llvm#106413

25c9410

[InstCombine][X86] Only demand used bits for VPERMILPD/VPERMILPS mask…

d57c046

… values VPERMILPS lower bits0-3 (to index per-lane i32/f32 0-3) VPERMILPD uses bit1 (to index per-lane i64/f64 0-1) Use SimplifyDemandedBits to ignore anything touching the remaining bits. Part of llvm#106413

Restrict LLVM_TARGETS_TO_BUILD in Windows release packaging (llvm#106059

2a28df6

) When including all targets, some files become too large for the NSIS installer to handle. Fixes llvm#101994

[lldb][lldb-dap][test] Enable Launch tests

b2a820f

Add Windows include equivalents for includes and shell command.

Restore missing link in CodeOfConduct.rst (llvm#106385)

0a48482

Link restored from the original policy outlined here https://discourse.llvm.org/t/code-of-conduct-changes-related-to-llvm-project-policy-changes/64197

[libc][x86] Use prefetch for write for memcpy (llvm#90450)

73ef397

Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we prefetch memory for read on the source buffer. This patch adds prefetch for write on the destination buffer.

[SPARC][IAS] Add illtrap alias for unimp (llvm#105928)

7955760

This follows Solaris behavior of allowing both mnemonics all the time. Fixes llvm#105639.

[IPSCCP] Add test for returning nonnull pointer (NFC)

ba52a09

[NFC][AMDGPU] Autogenerate tests for uniform i32 promo in ISel (llvm#…

1f8f2ed

…106382) Many tests were easy to update, but these are quite big and I think it's better to autogenerate them to see the difference well.

[clang][bytecode] Diagnose member calls on deleted blocks (llvm#106529)

df11ee2

This requires a bit of restructuring of ctor calls when checking for a potential constant expression.

[LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8…

c57abc6

…/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).

[LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vecto…

2f95298

…r widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)

[lldb][lldb-dap] Enable more tests on Windows

f7d6dfa

These few worked without changes.

[Analysis] Guard logf128 cst folding (llvm#106543)

56152fa

LLVM has a CMake variable to control whether to consider logf128 constant folding which libAnalysis ignores. This patch changes the logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in config-ix.cmake.

Reapply "[DebugInfo][DWARF] Set is_stmt on first non-line-0 instructi…

5fef40c

…on in BB (llvm#105524)" Fixes the previous buildbot error by adding an explicit triple to the test, ensuring that llc can produce a valid object file. This reverts commit 926f097.

Revert "[flang] Warn when F128 is unsupported" (llvm#106561)

8ae877a

Reverts llvm#102147 It seems some systems which should support F128 are wrongly detected as not supporting. This might be due to checking `LDBL_MANT_DIG` instead of `__LDBL_MANT_DIG__`. I will investigate.

[LoopUnroll] Add test for llvm#53205 (NFC)

9edd998

[AutoBump] Merge with 9edd998 (Aug 29)

e9c77eb

cferry-AMD approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

mgehre-amd commented Sep 24, 2024

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

Are you sure you want to change the base?

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

Conversation

mgehre-amd commented Sep 24, 2024