[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

This patch removes all of the Set.* methods from Status. This cleanup is part of a series of patches that make it harder use the anti-pattern of keeping a long-lives Status object around and updating it while dropping any errors it contains on the floor. This patch is largely NFC, the more interesting next steps this enables is to: 1. remove Status.Clear() 2. assert that Status::operator=() never overwrites an error 3. remove Status::operator=() Note that step (2) will bring 90% of the benefits for users, and step (3) will dramatically clean up the error handling code in various places. In the end my goal is to convert all APIs that are of the form ` ResultTy DoFoo(Status& error) ` to ` llvm::Expected<ResultTy> DoFoo() ` How to read this patch? The interesting changes are in Status.h and Status.cpp, all other changes are mostly ` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git grep -l SetErrorString lldb/source) ` plus the occasional manual cleanup.

…#102860) This patch switches most of the uses of intptr_t to uintptr_t within llvm-exegesis for the subprocess memory support. In the vast majority of cases we do not want a signed component of the address, hence making intptr_t undesirable. intptr_t is left for error handling, for example when making syscalls and we need to see if the syscall returned -1.

We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`. rdar://134425695

…lvm#102940) Problem: On AIX, functions registered by atexit in a shared library are not run when the library is dlclosed, but instead run (and fail because the function pointer is no longer valid) during main program exit. The profile-rt registers some functions with atexit: 1. writeFileWithoutReturn that writes out the profile file 2. llvm_delete_reset_function_list that does some cleanup in the gcov instrumentation library (not sure) And so right now, we get an "Illegal instruction (core dumped)" when an instrumented shared object is dlopen'ed and dlclosed. Solution: When a shared library is dlclose'd, destructors from the library are called. So create a destructor function that iterates over all known functions that profile-rt registers with atexit, and unregister the ones that have been registered and execute them. Scenarios tested: (0) gcov dlopen/dlclose (AIX/gcov-dlopen-dlclose.test) (1) multiple dlopen/dlclose of the same lib and multiple libs (instrprof-dlopen-dlclose.test) (2) dlopen but no dlclose (exists: Posix/instrprof-dlopen.test) (3) a simple fork testcase with dlopen/dlclose (instrprof-dlopen-dlclose.test) (4) dlopen/dlclose by multiple threads. (instrprof-dlopen-dlclose.test) (5) regular dynamic-linking of instrumented shared libs (exists: AIX/shared-bexpall-pgo.c) (6) a simple fork testcase produces correct profile (instrprof-fork.c) --------- Co-authored-by: Hubert Tong <hstong@ca.ibm.com>

Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.

This patch implements sandboxir::VAArgInst mirroring llvm::VAArgInst.

…d))`; NFC

…use-count Added folds: - `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)` - `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)` The fold typically is handled in the `Reassosiate` pass, but it fails if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate doesn't propagate flags correctly. This patch adds the fold explicitly the InstCombine Proofs: https://alive2.llvm.org/ce/z/p6JyRP Closes llvm#105866

…parable with function count for each candidate (llvm#106260) The current cost-benefit analysis between vtable comparison and function comparison require the indirect fallback branch to be cold. This is too conservative. This change allows vtable-comparison as long as vtable count is comparable with function count for each function candidate and removes the cold indirect fallback requirement. Tested: 1. Testing this on benchmarks uplifts the measurable performance wins. Counting the (possibly-duplicated) remarks (because of linkonce_odr functions, cross-module import of functions) show the number of vtable remarks increases from ~30k-ish to 50k-ish. 2. https://gcc.godbolt.org/z/sbGK7Pacn shows vtable-comparison doesn't happen today (using the same IR input)

…profiles for given functions (llvm#104654) Currently in extended binary format, sample reader only read the profiles when the function are in the current module at initialization time, this extends the support to read the arbitrary profiles for given input functions in later stage. It's used for llvm#101053.

We recently added various CPU_SUBTYPE_ARM64E values, notably including CPU_SUBTYPE_ARM64E_VERSIONED_PTRAUTH_ABI_MASK, which is 0x80000000U. The enum is better off as a uint32_t to accomodate that. This also hopefully helps silence GCC warnings reported on a ternary in CPU_SUBTYPE_ARM64E_WITH_PTRAUTH_VERSION. The subtype is already generally treated as a uint32_t elsewhere, so while there, change the new helpers to explicitly pass/return the subtype as uint32_t, and the individual narrower components as either bool or unsigned.

…#106035) In the clobbered FP/BP range, we can't use it as normal FP/BP to access stack. So if there are stack accesses due to register spill, scheduling or other back end optimization, we should report an error instead of silently generate wrong code. Also try to minimize the save/restore range of the clobbered FP/BP if the FrameSetup doesn't change stack size.

Build on the -slp-vectorize-non-power-of-2 experimental option, and support vectorizing reductions with 2^N-1 sized vector. Specifically, two related changes: 1) When searching for a profitable VL, start with the 2^N-1 reduction width. If cost model does not select that VL, return to power of two boundaries when halfing the search VL. The later is mostly for simplicity. 2) Reduce the minimum reduction width from 4 to 3 when supporting non-power of two vectors. This is required to support <3 x Ty> cases. One thing which isn't directly related to this change, but I want to note for clarity is that the non-power-of-two vectorization appears to be sensative to operand order of reduction. I haven't yet fully figured out why, but I suspect this is non-power-of-two specific.

This patch fixes: llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:845:12: error: variable 'RemainingVTableCount' set but not used [-Werror,-Wunused-but-set-variable] llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:306:23: error: private field 'PSI' is not used [-Werror,-Wunused-private-field] Here are a couple of domino effects: - Once I remove PSI, I need to update the contructor and its caller. - Once I remove RemainingVTableCount, I don't need TotalCount, so I am updating the caller as well.

…NFC) (llvm#106251) This patch forward ports the heterogeneous std::map::operator[]() from C++26 so that we can look up the map without allocating an instance of std::string when the key-value pair exists in the map. The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. The new list will be a hash set of tuples (SourceModule, GUID, ImportType) represented in a space efficient manner. That means that as we iterate over the hash set, we encounter SourceModule as many times as GUID. We don't want to create a temporary instance of std::string every time we look up ModuleToSummariesForIndex like: auto &SummariesForIndex = ModuleToSummariesForIndex[std::string(ILI.first)]; This patch removes the need to create the temporaries by enabling the hetegeneous lookup with std::set<K, V, std::less<>> and forward porting std::map::operator[]() from C++26.

llvm#105478) Currently, `getStackAlignment` asserts if the stack alignment wasn't specified. This makes it inconvenient to use and complicates testing. This change also makes `exceedsNaturalStackAlignment` method redundant.

Make some minor tweaks to AMDGPU tests to ensure they still work as intended after llvm#97762. These tests can be radically simplified after bitcast aware fpclass deduction.

…6238) This code has been unchanged for two years; let's simplify the code and remove configurability which makes the code harder to follow.

…m#105832) llvm#78086 provided the trait we want to use for this: `__libcpp_integer`. In some `libcxx/containers/views/mdspan` tests, improper uses of `char` are replaced with `signed char`. Fixes llvm#73715

New dep needed for 2bf2468

Works towards P0619R4/llvm#99985. - std::uncaught_exception was not previously deprecated. This patch deprecates it since C++17 as per N4259. std::uncaught_exceptions is used instead as libc++ unconditionally provides this function. - _LIBCPP_ENABLE_CXX20_REMOVED_UNCAUGHT_EXCEPTION restores std::uncaught_exception. - As a drive-by, this patch updates the C++20 status page to explain that D.11 is already done, since it was done in 578d09c.

Certain intrinsics map to builtins that require an immediate (literal) argument; make sure we report non-literal arguments. This has been kicking around downstream for a while, and the recent removal of the MMX builtins caused me to notice it again.

…irect-list-initialization (llvm#102581) When initializing structured bindings from an array with direct-list-initialization, array copy will be performed, which is a special case not following list-initialization. This PR adds support for this case. Fixes llvm#31813.

…lvm#106269) This helps with Constant::classof().

llvm#106293) Reverts llvm#105442. Due to `TestSkinnyCoreFailing` and root causing of the failure will likely take longer than EOD.

This patch decouples macOS CI testing from BuildKite, which makes the maintenance of macOS CI easier and more accessible to all contributors. Right now, the macOS CI is running entirely on machines owned by the LLVM Foundation with only a small set of contributors having direct access to them. In particular, updating these machines is currently a very time-consuming manual process that requires taking the machines offline, and using Github-provided instances makes that an order of magnitude easier. The story for performing back-deployment testing still needs to be figured out, so for now we are retaining some jobs under BuildKite.

This patch prepares the NFC groundwork for global outlining using CGData, which will follow llvm#90074. - The `MinRepeats` parameter is now explicitly passed to the `getOutliningCandidateInfo` function, rather than relying on a default value of 2. For local outlining, the minimum number of repetitions is typically 2, but for the global outlining (mentioned above), we will optimistically create a single `Candidate` for each `OutlinedFunction` if stable hashes match a specific code sequence. This parameter is adjusted accordingly in global outlining scenarios. - I have also implemented `unique_ptr` for `OutlinedFunction` to ensure safe and efficient memory management within `FunctionList`, avoiding unnecessary implicit copies. This depends on llvm#101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.

This reverts commit d1d8edf.

…lvm#102036) We use lgamma_r for the random normal distribution support. In this code we redeclare it, which causes issues with the LLVM C library as this function is marked noexcept in LLVM libc. This patch ensures that we don't redeclare that function when targeting LLVM libc.

For supported architectures, lldb will do a static scan of the assembly instructions of a function to detect stack/frame pointer changes, register stores and loads, so we can retrieve register values for the caller stack frames. We trust that the function address range reflects the actual function range, but in a stripped binary or other unusual environment, we can end up scanning all of the text as a single "function" which is (1) incorrect and useless, but more importantly (2) slow. Cap the max size we will profile to 10MB of instructions. There will surely be functions longer than this with no unwind info, and we will miss the final epilogue or mid-function epilogues past the first 10MB, but I think this will be unusual, and the failure more to missing the epilogue is that the user will need to step out an extra time or two as the StackID is not correctly calculated mid-epilogue. I think this is a good tradeoff of behaviors. rdar://134391577

…lvm#106156) LLVM often extends global names by adding suffixes to distinguish unique identities. However, these suffixes are not always stable across different runs and build environments. To address this issue, I implemented `get_stable_name` to ignore such suffixes and obtain the original name. This approach is not new, as PGO or Bolt already handle this issue similarly. Using the stable name obtained from `get_stable_name`, I implemented `stable_hash_name` while utilizing the same underlying `xxh3_64bit` algorithm as before.

Fix Windows after llvm#102940

32-bit Windows uses `unsigned int` for uintptr_t and size_t. Commit 18e06e3 changed uptr to unsigned long, so it no longer matches the real size_t/uintptr_t and therefore the current definition of usize result in: `error C2821: first formal parameter to 'operator new' must be 'size_t'` However, the real problem is that uptr is wrong to work around the fact that we have local SIZE_T and SSIZE_T typedefs that trample on the basetsd.h definitions of the same name and therefore need to match exactly. Unlike size_t/ssize_t the uppercase ones always use unsigned long (even on 32-bit). This commit works around the build breakage by keeping the existing definitions of uptr/sptr and just changing usize. A follow-up change will attempt to fix this properly. Fixes: llvm#101998 Reviewed By: mstorsjo Pull Request: llvm#106151

… test. (llvm#106129) Making the synthesis of a contextual profile file from a JSON descriptor more reusable, for unittest authoring purposes. The functionality round-trips through the binary format - no reason, currently, to support other ways of loading contextual profiles.

@binary

…U objects (llvm#95292) This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table` attributes. The `#gpu.kernel_metadata` attribute allows storing metadata related to a compiled kernel, for example, the number of scalar registers used by the kernel. The attribute only has 2 required parameters, the name and function type. It also has 2 optional parameters, the arguments attributes and generic dictionary for storing all other metadata. The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`, mapping the name of the kernel to the metadata. Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to collect ELF metadata from a binary, and to test the class methods in both attributes. Example: ```mlir gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[ #gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>, #gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]> ]> , bin = "BLOB">] ``` The motivation behind these attributes is to provide useful information for things like tunning. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

An overload of `llvm::promoteCallWithIfThenElse` that updates the contextual profile. High-level, this is very simple: after creating the `if... then (direct call) else (indirect call)` structure, we instrument the new callsites and BBs (the instrumentation will help with tracking for other IPO transformations, and, ultimately, to match counter values before flattening to `MD_prof`). In more detail: - move the callsite instrumentation of the indirect call to the `else` BB, before the indirect call - create a new callsite instrumentation for the direct call - create instrumentation for both the `then` and `else` BBs - we could instrument just one (MST-style) but we're not running the binary with this instrumentation, and at most this would save some space (less counters tracked). For simplicity instrumenting both at this point - update each context belonging to the caller by updating the counters, and moving the indirect callee to the new, direct callsite ID Issue llvm#89287

…ng (llvm#104642) No Functional Changes * Fix comments in several places * Instead of using BB-getName() (in dump methods) use getBasicBlockLabel. This fixes the poor output of the dumped info that resulted in missing BB labels. * Use RPO when dumping SuspendCrossingInfo. Without this the dump order is determined by the ptr addresses and so is not consistent from run to run making IR diffs difficult to read. * Inference -> Interference * Pull the logic that determines insertion location out of insertSpills and into getSpillInsertionPt, to differentiate between these two operations. * Use Shape getters for CoroId instead of getting it manually. --------- Co-authored-by: tnowicki <tnowicki.nowicki@amd.com>

Fix docs modified by llvm#94910 by adding information about the `module` argument in `gpu::TargetAttrInterface::createObject`. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

…alASTSourceWrapper This is an oversight from llvm#104817 where the intention was to hoist the ExternalASTSourceWrapper construction out of the conditional so it can be set on both the `SemaSourceWithPriorities` and be added as an external source to Sema. But the inner `ExternalASTSourceWrapper` allocation wasn't actually removed. This currently all works fine because all these AST sources are refcounted and point to the same underlying AST sources. But this patch cleans this up regardless.

Tail duplication will generate the redundant move before return. It is because the MachineCopyPropogation can't recognize COPY after post-RA pseudoExpand. This patch make MachineCopyPropogation recognize `%0 = ADDI %1, 0` as COPY

Add missing dependency that sometimes makes a build fails with ninja.

llvm#106120 Simplify the data transfer when possible by using the reference and a shape. This bypass the declare op. In order to keep the declare op around, use the second results of the declare op which achieve the same.

Fix strsep interceptor. For strsep description see https://www.man7.org/linux/man-pages/man3/strsep.3.html

For 016e1eb

For 73c3b73

`JITDylibSearchOrderResolver` local variable can be destroyed before completion of all callbacks. Capture it together with `Deps` in `OnEmitted` callback. Original error: ``` ==2035==ERROR: AddressSanitizer: stack-use-after-return on address 0x7bebfa155b70 at pc 0x7ff2a9a88b4a bp 0x7bec08d51980 sp 0x7bec08d51978 READ of size 8 at 0x7bebfa155b70 thread T87 (tf_xla-cpu-llvm) #0 0x7ff2a9a88b49 in operator() llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:58 #1 0x7ff2a9a88b49 in __invoke<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:149:25 #2 0x7ff2a9a88b49 in __call<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:224:5 #3 0x7ff2a9a88b49 in operator() libcxx/include/__functional/function.h:210:12 #4 0x7ff2a9a88b49 in void std::__u::__function::__policy_invoker<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, ```

Fix the message part of bugfix commit `2ef3dcf`.

…ang subproject" (llvm#106267) Reverts llvm#102647 I am reverting this change because the `readfile` doesn't actually perform any useful operation, and yet, for some reason, the test still passed. This indicates that the modification was unnecessary and could lead to confusion or unexpected behavior in the future.

…106013) Fixes llvm#105898.

…lvm#106175) It doesn't make sense to remove the space between new/delete and a C-style cast when SpaceBeforeParensOptions.AfterPlacementOperator is set to false. Fixes llvm#105628.

I am interested in helping contribute macOS binaries since we're generally sporadic with uploading these. Fixes llvm#106016

…ve support. This allows us to rewrite part of StaticLibraryDefinitionGenerator in terms of loadLinkableFile. It's also useful for clients who may not know (either from file extensions or context) whether a given path will be an object file, an archive, or a universal binary. rdar://134638070

…Op pass (llvm#106229) This is currently not controllable by the user and always set to `DIEmissionKind::LineTablesOnly`. The added option allows to set it to the other values accepted by LLVM (`None`, `Full`, and `DebugDirectivesOnly`). --------- Co-authored-by: jingzec <jingzec@nvidia.com>

These methods already returned a uniquely owned object, this just makes them self-documenting.

…)" This reverts commit 3c5ab5a while I investigate bot failures (e.g. https://lab.llvm.org/buildbot/#/builders/163/builds/4286).

Fix the ERROR: UndefinedBehaviorSanitizer, reproduced by BUILDBOT_REVISION=43ffe2eed llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh It might be also related to llvm#76202

The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode

…t globals 3/3 (llvm#105648) Fix some false negatives of StackAddrEscapeChecker: - Output parameters ``` void top(int **out) { int local = 42; *out = &local; // Noncompliant } ``` - Indirect global pointers ``` int **global; void top() { int local = 42; *global = &local; // Noncompliant } ``` Note that now StackAddrEscapeChecker produces a diagnostic if a function with an output parameter is analyzed as top-level or as a callee. I took special care to make sure the reports point to the same primary location and, in many cases, feature the same primary message. That is the motivation to modify Core/BugReporter.cpp and Core/ExplodedGraph.cpp To avoid false positive reports when a global indirect pointer is assigned a local address, invalidated, and then reset, I rely on the fact that the invalidation symbol will be a DerivedSymbol of a ConjuredSymbol that refers to the same memory region. The checker still has a false negative for non-trivial escaping via a returned value. It requires a more sophisticated traversal akin to scanReachableSymbols, which out of the scope of this change. CPP-4734 --------- This is the last of the 3 stacked PRs, it must not be merged before llvm#105652 and llvm#105653

…lvm#106338)

…vm#106233) Currently, `llvm-cxxfilt` will strip the leading underscore of its input on macOS. Historically MachO symbols were prefixed with an extra underscore and this is why this default exists. However, nowadays, the `ItaniumDemangler` supports all of the following mangling prefixes: `_Z`, `__Z`, `___Z`, `____Z`. So really `llvm-cxxfilt` can simply forward the mangled name to the demangler and let the library decide whether it's a valid encoding. Compiling C++ on macOS nowadays will generate symbols with `_Z` and `___Z` prefixes. So users trying to demangle these symbols will have to know that they need to add the `-n` prefix. This routinely catches people off-guard. This patch removes the `-n` default for macOS and allows calling into the `ItaniumDemangler` with all the `_Z` prefixes that the demangler supports (1-4 underscores). rdar://132714940

Alignment and start with an upper-case letter.

S.substr(N) is simpler than S.slice(N, StringRef::npos). Also, substr is probably better recognizable than slice thanks to std::string_view::substr.

…pes for `std::isnormal` (llvm#104773) ## Why Currently, the following does not work when compiled with clang: ```c++ #include <cmath> struct ConvertibleToFloat { operator float(); }; bool test(ConvertibleToFloat x) { return std::isnormal(x); } ``` See https://godbolt.org/z/5bos8v67T for differences with respect to msvc, gcc or icx. It fails for `float`, `double` and `long double` (all cv-unqualified floating-point types). ## What Test and provide overloads as expected by the ISO C++ standard. The classification/comparison function `isnormal` is defined since C++11 until C++23 as ```c++ bool isnormal( float num ); bool isnormal( double num ); bool isnormal( long double num ); ``` and since C++23 as ```c++ constexpr bool isnormal( /* floating-point-type */ num ); ``` for which "the library provides overloads for all cv-unqualified floating-point types as the type of the parameter num". See §28.7.1/1 in the [ISO C++ standard](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf) or check [cppreference](https://en.cppreference.com/w/cpp/numeric/math/isnormal).

…tor (llvm#103716) With this patch, clang now automatically adds ``[[clang::lifetimebound]]`` to the parameters of `std::span, std::string_view` constructors, this enables Clang to capture more cases where the returned reference outlives the object. Fixes llvm#100567

…n optimizations This patch tries to salvage the debug information for the coroutine frames within optimizations by creating the help alloca varaibles with optimizations too. We didn't do this when I implement it initially. I roughtly remember the reason was, we feel the additional help alloca variable may pessimize the performance, which is almost the most important thing under optimizations. But now, it looks like the new inserted help alloca variables can be optimized out by the following optimizations. So it looks like the time to make it available within optimizations. And also, it looks like the following optimizations will convert the generated dbg.declare instrinsic into dbg.value intrinsic within optimizations. In LLVM's test, there is a slightly regression that a dbg.declare for the promise object failed to be remained after this change. But it looks like we won't have a chance to see dbg.declare for the promise object when we split the coroutine as that dbg.declare will be converted into a dbg.value in early stage. So everything looks fine.

…#106240) If a mutex interface is split in inheritance chain, e.g. struct mutex has `unlock` and inherits `lock` from __mutex_base then calls m.lock() and m.unlock() have different "this" targets: m and the __mutex_base of m, which used to confuse the `ActiveCritSections` list. Taking base region canonicalizes the region used to identify a critical section and enables search in ActiveCritSections list regardless of which class the callee is the member of. This likely fixes llvm#104241 CPP-5541

…lvm#105521) This is similar to how the C++ API supports passing `nullptr` to `setPersonalityFn` or `setInitializer`.

…p-reduction.ll. (llvm#102907)

Use ConstantFoldLoadFromConst() instead of a partial re-implementation. This makes the code slightly more generic by not depending on the exact structure of the constant.

[CWG2917](https://cplusplus.github.io/CWG/issues/2917.html) got a new proposed resolution that is different from the one the test has been written against. [CWG2922](https://cplusplus.github.io/CWG/issues/2922.html) apparently the initial "possible resolution" was approved without changes.

Reverts llvm#105496 This patch breaks: https://lab.llvm.org/buildbot/#/builders/25/builds/1952 https://lab.llvm.org/buildbot/#/builders/52/builds/1775 Somehow output is different with sanitizers. Maybe non-determinism in the code?

We're generally not able to simplify signed pointer comparisons (because we don't have no-wrap flags that would permit it), so we shouldn't pretend that we can in the cost model. The unsigned comparison case is also not modelled correctly, as explained in the added comment. As this is a cost model inaccuracy at worst, I'm leaving it alone for now.

This API is faster than getMinusSCEV() and a SCEVConstant cast.

Just need to add powi test with llvm#105775

…sfinite}` (llvm#106224) ## Why Since llvm#98841 and llvm#98952, the constrained overloads are unused and not needed anymore as we added explicit overloads for all floating point types. I forgot to remove them in the mentioned PRs. ## What Remove them.

…lvm#105665)

In the clang user manual the build options `CLANG_CONFIG_FILE_USER_DIR` and `CLANG_CONFIG_FILE_SYSTEM_DIR` are documented, but the run time overrides `--config-user-dir` and `--config-system-dir` are not. I have updated the manual to add these run time arguments.

WideInc/WideIncExpr can be null. Previously this worked out because the comparison with WideIncExpr would fail. Now we have accesses to WideInc prior to that. Avoid the issue with an explicit check. Fixes llvm#106239.

from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations): > Comparisons to singletons like None should always be done with is or is not, never the equality operators. Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>

…lvm#106366) We have type information for them now, so we can do this.

Addresses a regression in JavaScript when formatting anonymous classes. --------- Co-authored-by: Owen Pan <owenpiano@gmail.com>

This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.

Summary: This test currently fails in the `amdgpu-attributor` pass. I haven't figured out anything beyond that yet as it's difficult to reduce.

This patch implements https://wg21.link/p2609r3. The test code was originally authored by JMazurkiewicz. Notes: - P2609R3 is not officially a Defect Report, but MSVC STL implements it in C++20 mode. Moreover, P2609R3 and P2997R1 touch exactly the same set of concepts, and MSVC STL and libc++ have already treated P2997R1 as a DR. - This patch also adjusted feature-test macros. + In C++20 mode, the value of __cpp_lib_ranges should be `202110L` because - `202202L` covers `range_adaptor_closure` (P2387R3), and - `202207L` covers move-only types in range adaptors (P2494R2). And all of these changes are only available since C++23 mode. + In C++23 mode, the value should be `202406L` because - `202211L` covers removing poison overloads (P2602R2), - `202302L` covers relaxing projected value types (P2609R3), and - `202406L` covers removing requirements on `iter_common_reference_t` (P2997R1). And all of these changes are already or being implemented. Fixes llvm#105253. Co-authored-by: Jakub Mazurkiewicz <mazkuba3@gmail.com>

This allows for easier re-use in additional places in the future. Also move code to VPlanAnalysis.cpp

) This unblocks a ton of work including llvm#76756 as it updates to a newer version of AppleClang.

So we can reuse the logic inside IPSCCP.

…lvm#105670) Not quite NFC, fixes splitBasicBlockBefore case when we split before an instruction with debug records (but without the headBit set, i.e., we are splitting before the instruction but after the debug records that come before it). splitBasicBlockBefore splices the instructions before the split point into a new block. Prior to this patch, the debug records would get shifted up to the front of the spliced instructions (as seen in the modified unittest - I believe the unittest was checking erroneous behaviour). We instead want to leave those debug records at the end of the spliced instructions. The functionality of the deleted `else if` branch is covered by the remaining `if` now that `DestMarker` is set to the trailing marker if `Dest` is `end()`. Previously the "===" markers were sometimes detached, now we always detach them and always reattach them. Note: `deleteTrailingDbgRecords` only "unlinks" the tailing marker from the block, it doesn't delete anything. The trailing marker is still cleaned up properly inside the final `if` body with `DestMarker->eraseFromParent();`. Part 1 of 2 needed for llvm#105571

…ibrary (llvm#96910) We always strive to test libc++ as close as possible to the way we are actually shipping it. This was approximated reasonably well by setting up the minimal driver flags when running the test suite, however we were running the test suite against the library located in the build directory. This patch improves the situation by installing the library (the headers, the built library, modules, etc) into a fake location and then running the test suite against that fake "installation root". This should open the door to getting rid of the temporary copy of the headers we make during the build process, however this is left for a future improvement. Note that this adds quite a bit of verbosity whenever running the test suite because we install the headers beforehand every time. We should be able to override this to silence it, however CMake doesn't currently give us a way to do that, see https://gitlab.kitware.com/cmake/cmake/-/issues/26085.

This patch implements https://wg21.link/P2747R2. The library changes affect direct `operator new` and `operator new[]` calls even when the core language changes are absent. The changes are not available for MS ABI because the `operator new` and `operator new[]` are from VCRuntime's `<vcruntime_new.h>`. A feature request was submitted for that [1]. As a drive-by change, the patch reformatted the whole `new.pass.cpp` and `new_array.pass.cpp` tests. Closes llvm#105427 [1]: https://developercommunity.visualstudio.com/t/constexpr-for-placement-operator-newope/10730304.

Adds a missing test for when the rank of the output tensor doesn't match the input tensor rank + number of blocking factors.

Currently, strftime is called to get the timezone for the ZONE argument. On AIX, this routine requires an environment variable set in order to return the required format. This patch is to add the time difference computation from UTC for the platform.

@zygoloid

This patch updates `make_cxx_dr_status` script to use the same spoiler-like way to hide additional details that `cxx_status.html` uses. This gives implemented yet unresolved DRs new but very familiar look: ![s9EpO0E](https://github.com/user-attachments/assets/54852d7b-5fdd-4595-8dca-20628797f952) I also took an opportunity to fix spelling inconsistency pointed out by @zygoloid in llvm#106299 (comment). I got tired of counting `%s`s when we substitute data into HTML template, so I replaced them with an f-string (available since Python 3.6), because I had to touch this code anyway.

* Fix an OOB access * Add comparison operators * Add documentation * Add unit tests

…#106377) (V)PSHUFB only uses the sign bit (for zeroing) and the lower 4 bits (to index per-lane byte 0-15) - so use SimplifyDemandedBits to ignore anything touching the remaining bits. Fixes llvm#106256

@frederick-vs-ja

Please refer to the Github issues for details on why those are marked as resolved. Huge thanks to @frederick-vs-ja for the analysis. Closes llvm#104336 Closes llvm#100042 Closes llvm#100615

llvm#106171) - When `getOutliningCandidateInfo()` returns `std::nullopt` (meaning no `OutlinedFunction` is created), there is no need to clear the input argument, `RepeatedSequenceLocs`, as it's already being cleared in the main loop of `findCandidates()`. - Replaced `2` by `MinRepeats`, which I missed from llvm#105398

llvm#106314) We are extracting this function into the C API so we can eventually install it when a user marks a function [[clang::blocking]].

At least for our Windows on Arm machine compiling with clang-cl, it has inverted which variables get a `::` prefix. Would not surprise me if msvc does the opposite so feel free to revert if these tests fail for you.

This is a step towards further breaking up the rather large tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to VPlanTransforms.cpp, where similar replacements for other recipes are defined as well (e.g. EVL-based ones)

If v2i64 scalar_to_vector is made custom, llc can crash in certain legalization cases where v2i64 vectors are injected, even if they weren't otherwise present. The code generated would be fine, but that operation is not handled in ReplaceNodeResults. Add handling.

This generates `warning: REAL(KIND=16) is not an enabled type for this target` if that type is used in a build not correctly configured to support this type. Uses of `selected_real_kind(30)` return -1.

…lvm#106327)

Fixes llvm#105747 --------- Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>

According to the PTX [spec](https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-max), max & min instructions do not support the `ftz` modifier for `bf16` & `bf16x2` types. This PR removes them from instr info, and the non-ftz legal versions will be emitted instead.

llvm#105726) Commit 0d527e5 ("GlobalIFunc: Make ifunc respect function address spaces") added support for this within LLVM, but Clang does not properly honour the target's address spaces when creating IFUNCs, crashing with RAUW and verifier assertion failures when compiling C code on a target with a non-zero program address space, so fix this.

Instead of repeating SmallVector size in multiple places.

By adding the equivalent includes.

…rands.

Ignore the multiplication overflow but report the 0 denominator.

…lvm#106421) Fixes 106412. The logic that skips the pass on already-lowered variables doesn't cover the path that increases alignment of variables. If a variable is allocated at 24 and then given 16 byte alignment, the backend notices and fatal-errors on the inconsistency.

The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. Once this patch lands, I'm planning to change the type slightly. The new type alias allows us to update the type without touching many places.

) This patch replaces 'tags' in the CSV status pages by inline notes that optionally describe more details about the paper/LWG issue. Tags were not really useful anymore because we have a vastly superior tagging system via Github issues, and keeping the tags up-to-date between CSV files and Github is going to be really challenging. This patch also adds support for encoding custom notes in the CSV files via Github issues. To encode a note in the CSV file, the body (initial description) of a Github issue can be edited to contain the following markers: BEGIN-RST-NOTES text that will be added as a note in the RST END-RST-NOTES Amongst other things, this solves the problem of conveying that a paper has been implemented as a DR, and it gives a unified way to add notes to the status pages from Github.

This is a follow-up for llvm#101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.

Live-ins that are used as exit values don't need to be extracted, they can be passed through directly. This fixes a crash when trying to extract from a live-in. Fixes llvm#106257.

…6424) We need this immediate type to be consistent. This is the pre-commit for llvm#105761

Forgetting to implement an `<Instruction Subclass>::classof()` function does not cause any failures because it falls back to Instruction::classof(). This patch adds an explicit check for all instruction classes to confirm that they have a classof implementation.

Add additional test coverage for interleave groups with different insert positions.

This patch removes curly braces from a test, as lit's internal shell implementation does not support curly brace syntax. Fixes llvm#102382.

…PIC (llvm#106406) `Profile-x86_64 :: Posix/instrprof-dlopen-norpath.test` `FAILs` on Solaris/amd64 and similarly on Solaris/sparcv9: ``` RUN: at line 10: ./a.out 2>&1 | FileCheck compiler-rt/test/profile/Posix/instrprof-dlopen-norpath.test -check-prefix=CHECK-FOO + ./a.out + FileCheck compiler-rt/test/profile/Posix/instrprof-dlopen-norpath.test -check-prefix=CHECK-FOO compiler-rt/test/profile/Posix/instrprof-dlopen-norpath.test:24:12: error: CHECK-FOO: expected string not found in input CHECK-FOO: foo: ^ <stdin>:1:1: note: scanning from here unable to lookup symbol 'foo': ld.so.1: a.out: invalid handle: 0x0 ``` The problem turned out to be two-fold: `OPEN_AND_RUN` didn't check the `dlopen` return value and the objects linked into the shared objects to be `dlopen`ed aren't built as PIC. This patch fixes the latter. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`.

… types We currently return costs which are too low for these.

Supported `__usAtomicCAS` builtin originally defined in `/usr/local/cuda/inlcude/crt/sm_70_rt.hpp` --------- Co-authored-by: Denis Gerasimov <Denis.Gerasimov@baikalelectronics.ru> Co-authored-by: Gonzalo Brito Gadeschi <gonzalob@nvidia.com> Co-authored-by: Denis.Gerasimov <dengzmm@gmail.com>

This patch turns ImportListsTy into a class that wraps DenseMap<StringRef, ImportMapTy>. Here is the background. I'm planning to reduce the memory footprint of ThinLTO indexing. Specifically, ImportMapTy, the list of imports for a given destination module, will be a hash set of integer IDs indexing into a deduplication table of pairs (SourceModule, GUID), which is a lot like string interning. I'm planning to put this deduplication table as part of ImportListsTy and have each instance of ImportMapTy hold a reference to the deduplication table. Another reason to wrap the DenseMap is that I need to intercept operator[]() so that I can construct an instance of ImportMapTy with a reference to the deduplication table. Note that the default implementation of operator[]() would default-construct ImportMapTy, which I am going to disable.

…mplates (llvm#100692) This makes partial ordering of function templates consistent with other entities, by implementing [temp.deduct.type]p1 in that case. Fixes llvm#18291

The previous `all_equal` implementation contained `Begin + 1`, which implicitly requires `Begin` to model the [random_access_iterator](https://en.cppreference.com/w/cpp/iterator/random_access_iterator) concept due to the usage of the `+` operator. By swapping this out with `std::next`, this method can be used with weaker iterator concepts, such as [forward_iterator](https://en.cppreference.com/w/cpp/iterator/forward_iterator). --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

… type analysis

…ask (llvm#106095) This reverts: d8d8d65

…lvm#105930) Update all hybird DXIL/SPIRV codegen tests to use temp variable representing interchange target Fixes: llvm#105710

llvm#106426) This PR adds an integration test for an argmax kernel with `mlir-vulkan-runner`. This test exercises the `convert-to-spirv` pass (landed in llvm#95942) and demonstrates that we can use SPIR-V ops as "intrinsics" among higher-level dialects. The support for `index` dialect in `mlir-vulkan-runner` is also added.

@labath

…vm#104532) This PR fixes another race condition in llvm#90930. The failure was found by @labath with this log: https://paste.debian.net/hidden/30235a5c/: ``` dotest_wrapper. < 15> send packet: $z0,224505,1#65 ... b-remote.async> < 22> send packet: $vCont;s:p1dcf.1dcf#4c intern-state GDBRemoteClientBase::Lock::Lock sent packet: \x03 b-remote.async> < 818> read packet: $T13thread:p1dcf.1dcf;name:a.out;threads:1dcf,1dd2;jstopinfo:5b7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31392c22746964223a373633317d2c7b226e616d65223a22612e6f7574222c22746964223a373633347d5d;thread-pcs:0000000000224505,00007f4e4302119a;00:0000000000000000;01:0000000000000000;02:0100000000000000;03:0000000000000000;04:9084997dfc7f0000;05:a8742a0000000000;06:b084997dfc7f0000;07:6084997dfc7f0000;08:0000000000000000;09:00d7e5424e7f0000;0a:d0d9e5424e7f0000;0b:0202000000000000;0c:80cc290000000000;0d:d8cc1c434e7f0000;0e:2886997dfc7f0000;0f:0100000000000000;10:0545220000000000;11:0602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:80fbe5424e7f0000;17:0000000000000000;18:0000000000000000;19:0000000000000000;reason:signal;#b9 ``` It shows an async interrupt "\x03" was sent immediately after `vCont;s` single step over breakpoint at address `0x224505` (which was disabled before vCont). And the later stop was still at the original PC (0x224505) not moving forward. The investigation shows the failure happens when timeout is short and async interrupt is sent to lldb-server immediately after vCont so ptrace() resumes and then async interrupts debuggee immediately so debuggee does not get a chance to execute and move PC. So it enters stop mode immediately at original PC. `ThreadPlanStepOverBreakpoint` does not expect PC not moving and reports stop at the original place. To fix this, the PR prevents `ThreadPlanSingleThreadTimeout` from being created during `ThreadPlanStepOverBreakpoint` by introduces a new `SupportsResumeOthers()` method and `ThreadPlanStepOverBreakpoint` returns false for it. This makes sense because we should never resume threads during step over breakpoint anyway otherwise it might cause other threads to miss breakpoint. --------- Co-authored-by: jeffreytan81 <jeffreytan@fb.com>

…106451) Reverts llvm#106404 Breaks: https://lab.llvm.org/buildbot/#/builders/169/builds/2590 https://lab.llvm.org/buildbot/#/builders/164/builds/2454

llvm#106452) Also correct the suffix of the intrinsic

I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy. A look-up of the import type will involve data private to ImportMapTy, so it must be done by a member function of ImportMapTy. This patch turns getImportType into a member function so that a subsequent "real" change will just have to update the implementation of the function in place.

Thread init guards are generated for local static variables when using the Microsoft CXX ABI. This ABI is also used for HLSL generation, but DXIL doesn't need the corresponding _Init_thread_header/footer calls and doesn't really have a way to handle them in its output targets. This modifies the language ops when the target is DXIL to exclude this so that they won't be generated and an alternate guardvar method is used that is compatible with the usage. Done to facilitate testing for llvm#89806, but isn't really related

…n-runner`" (llvm#106457) Reverts llvm#106426. This caused failures on nvidia: https://lab.llvm.org/buildbot/#/builders/138/builds/2853

… ! (llvm#105996) Implement constexpr vector unary operators +, -, ~ and ! . - Follow the current constant interpreter. All of our boolean operations on vector types should be '-1' for the 'truth' type. - Move the following functions from `Sema` to `ASTContext`, because we used it in new interpreter. ```C++ QualType GetSignedVectorType(QualType V); QualType GetSignedSizelessVectorType(QualType V); ``` --------- Signed-off-by: yronglin <yronglin777@gmail.com>

…06438) The executable directives are handled earlier.

…ll. NFC We had RUN lines with +v,+f and +v,+f,+d. +v implies +f and +d so these are equivalent.

…lvm#101795) Using the flag `-split_layout` in llvm-profdata merge, the output profile can write profiles with and without inlined function into two different extbinary sections (and their FuncOffsetTable too). The section without inlined functions are marked with `SecFlagFlat` and is skipped by ThinLTO because it provides no useful info. The split layout feature was already implemented in SampleProfWriter but previously there is no way to use it from llvm-profdata.

Fix several uses of formatv() that would be flagged as invalid by an upcoming change that will add additional validation to formatv().

This patch also fixed `CodegenPrepare` to preserve loop metadata when merging blocks. This fixes issue llvm#102632

Flang for Windows depends on compiler-rt, so we need to enable it for the stage1 builds. This also fixes failures building the flang tests on macOS. Fixes llvm#100202.

llvm#105923) …519)" This reverts commit e00d32a and adds a test for lambda arrow SplitPenalty. Fixes llvm#105480.

…ng (Part I) (llvm#96878) This is simplifycfg part of llvm#95515 In this PR, we support hoisting load/store with conditional faulting in `SimplifyCFGOpt::speculativelyExecuteBB` to eliminate conditional branches. This is for cases like ``` void test (int a, int *b) { if (a) *b = a; } ``` In the following patches, we will support the hoist in `SimplifyCFGOpt::hoistCommonCodeFromSuccessors`. That is for cases like ``` void test (int a, int *c, int *d) { if (a) *c = a; else *d = a; } ```

Currently, SLP uses shuffle for the external user of `InsertElementInst` and iterates through the `InsertElementInst` chain to fill the mask with constant indices. However, it may override the original Vec lane. Using the original Vec lane is sufficient.

…reprocessor (llvm#106329) Fixes llvm#99617

…the file builder (llvm#106473) This patch addresses a bug where `cs`/`fs` and other segmentation flags were being identified as having a type of `32b` and `64b` for `rflags`. In that case the register value was returning the fail value `0xF...` and this was corrupting some minidumps. Here we just read it as a 64b value and truncate it. In addition to that fix, I added comparing the registers from the live process to the loaded core for the generic minidump test. Prior only being ARM register tests. This explains why this was not detected before.

… dialects on LLVM Dialect and LLVM Core in CMake build (llvm#104832)" (llvm#105703) Reapply the commit 43b5085 with additional fixes for building with BUILD_SHARED_LIBS=ON.

…06483) Close llvm#102721 Generally, the type of merged decls will be reused in ASTContext. But for lambda, in the import and then include case, we can't decide its previous decl in the imported modules so that we can't assign the previous decl before creating the type for it. Since we can't decide its numbering before creating it. So we have to assign the previous decl and the canonical type for it after creating it, which is unusual and slightly hack.

v[f]slide1down.vx uses VL to determine where the element is inserted into, so changing the VL changes the result. This fixes this by setting ActiveElementsAffectsResult, but it's overly conservative. We should relax this later by modelling that it's ok to change the mask, just not VL. Fixes llvm#106109

…vm#106359) The C intrinsic spec is ratified: riscv-non-isa/rvv-intrinsic-doc#234.

…nd `OffsetBins` out of sync (llvm#106187) The implementation of `AAPointerInfo::RangeList::set_difference` doesn't consider the case where two ranges have the same offset but different sizes. This could cause `AccessList` and `OffsetBins` out of sync because a range has been already updated in `AccessList` but missing in `ToRemove`. I do have a reproducer but the reproducer itself is 248kb. `llvm-reduce` can't further reduce it. Not sure how I can make a smaller reproducer. Fixes: SWDEV-479757.

llvm#106286) Called workflows don't have access to secrets by default, so we need to explicitly pass secrets that we use.

This is a fix-forward for 8bf69ce. The SCF-to-ControlFlow pass has an explicit LLVMDialect dependency.

…to immediate. This dropped the upper 32 bits of the immediate, but I'm not sure it is ever non-zero.

…104574) As far as I'm aware, vrgather.vv is quadratic in LMUL on most microarchitectures today due to each output register needing to read from each input register in the group. For example, the reciprocal throughput for vrgather.vv on the spacemit-x60 is listed on https://camel-cdr.github.io/rvv-bench-results/bpi_f3 as: LMUL1 LMUL2 LMUL4 LMUL8 4.0 16.0 64.0 256.1 Vector reverses are commonly emitted by the loop vectorizer and are lowered as vrgather.vvs, but since the loop vectorizer uses LMUL 2 by default they end up being quadratic. The output registers in a reverse only need to read from one input register though, so we can decompose this into LMUL * M1 vrgather.vvs to get linear performance. This gives a 0.43% runtime improvement on 526.blender_r at rva22u64_v O3 on the Banana Pi F3.

…ES=Off. Building with -DLLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES=Off should not prevent use of bugpoint plugins. This fix uses the approach implemented in llvm#101741.

16-bit loads are expanded into a pair of 8-bit loads, so the maximum offset of such 16-bit loads must be 62, not 63.

IPSCCP can currently return worse results than SCCP for arguments that are tracked interprocedurally, because information from attributes is not used for them. Fix this by intersecting in the attribute information when propagating lattice values from calls.

These tests "just work" on our Windows On Arm machine.

…mittingVTables (llvm#106501) Close llvm#102933 The root cause of the issue is an oversight in llvm#102287 that I didn't notice that PendingEmittingVTables should only accept classes in named modules.

…TABLES=Off. clang-repl should stil work when LLVM is built with -DLLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES=Off. This fix uses the approach implemented in llvm#101741. rdar://134910110

Close llvm#72383 The implementation rationale is, I don't want to pass `-fmodules-embed-all-files` all the time since we can't test it in lit tests (we're using `clang_cc1`). So I tried to set it in FrontendActions for modules.

…lvm#106391) This adds the `-mbranch-protection` command line option to the set of flags used by the multilib selection for ARM and AArch64 targets.

Use const reference for loop variable.

…s. (llvm#106289) Thunks themselves are the same as regular ARM64 thunks; they just need to report the correct machine type. When processing the code, we also need to use the current chunk's machine type instead of the global one: we don't want to treat x86_64 thunks as ARM64EC, and we need to report the correct machine type in hybrid binaries.

Update svn to git & virtualenv to venv

This one snuck into the previous patch. The test program needs updating if it's ever going to work on Windows.

Major rewrite of the AMDGPUSplitModule pass in order to better support it long-term. Highlights: - Removal of the "SML" logging system in favor of just using CL options and LLVM_DEBUG, like any other pass in LLVM. - The SML system started from good intentions, but it was too flawed and messy to be of any real use. It was also a real pain to use and made the code more annoying to maintain. - Graph-based module representation with DOTGraph printing support - The graph represents the module accurately, with bidirectional, typed edges between nodes (a node usually represents one function). - Nodes are assigned IDs starting from 0, which allows us to represent a set of nodes as a BitVector. This makes comparing 2 sets of nodes to find common dependencies a trivial task. Merging two clusters of nodes together is also really trivial. - No more defaulting to "P0" for external calls - Roots that can reach non-copyable dependencies (such as external calls) are now grouped together in a single "cluster" that can go into any partition. - No more defaulting to "P0" for indirect calls - New representation for module splitting proposals that can be graded and compared. - Graph-search algorithm that can explore multiple branches/assignments for a cluster of functions, up to a maximum depth. - With the default max depth of 8, we can create up to 256 propositions to try and find the best one. - We can still fall back to a greedy approach upon reaching max depth. That greedy approach uses almost identical heuristics to the previous version of the pass. All of this gives us a lot of room to experiment with new heuristics or even entirely different splitting strategies if we need to. For instance, the graph representation has room for abstract nodes, e.g. if we need to represent some global variables or external constraints. We could also introduce more edge types to model other type of relations between nodes, etc. I also designed the graph representation & the splitting strategies to be as fast as possible, and it seems to have paid off. Some quick tests showed that we spend pretty much all of our time in the CloneModule function, with the actual splitting logic being >1% of the runtime.

This merges consecutive SME zero intrinsics within a basic block, which avoids the backend eventually emitting multiple zero instructions when it could just use one. Note: This kind of peephole optimization could be implemented in the backend too.

Test didn't have a FileCheck line and is obsolete after llvm#104763

Fixes llvm#106355

Delete the previous files if present, to ensure it won't fail if the output directory of the tests wasn't cleared.

Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>

llvm#105842) There are some spots where all symbols to privatize collected by a `DataSharingProcessor` instance are expected to have corresponding entry block arguments associated regardless of whether delayed privatization was enabled. This can result in compiler crashes if a `DataSharingProcessor` instance created with `useDelayedPrivatization=false` is queried in this way. The solution proposed by this patch is to provide another public method to query specifically delayed privatization symbols, which will either be empty or point to the complete set of symbols to privatize accordingly.

…6498) The spec could be found here riscv-non-isa/riscv-c-api-doc#74 This patch updates the following symbol: ``` mVendorID -> mvendorid mArchID -> marchid mImplID -> mimpid ```

…5540) Unfortunately expandIS_FPCLASS is called directly in SelectionDAGBuilder depending on whether IS_FPCLASS is custom or not. This helps avoid ppc test regressions in a future patch where the custom lowering would be bypassed.

…105577) For some reason, isOperationLegalOrCustom is not the same as isOperationLegal || isOperationCustom. Unfortunately, it checks if the type is legal which makes it uesless for custom lowering on non-legal types (which is always ppcf128). Really the DAG builder shouldn't be going to expand this in the builder, it makes it difficult to work with. It's only here to work around the DAG requiring legal integer types the same size as the FP type after type legalization.

…Solver.cpp (llvm#106410) Resolves llvm#106361. Adding #include <unordered_map> to llvm/lib/Support/Z3Solver.cpp fixes compilation errors for homebrew build on macOS with Xcode 14. https://github.com/Homebrew/homebrew-core/actions/runs/10604291631/job/29390993615?pr=181351 shows that this is resolved when the include is patched in (Linux CI failure is due to unrelated timeout).

) In the legacy space, if both the 66 prefix and REX.W=1 are present, the REX.W=1 takes precedence and makes OSIZE=64b. EVEX map 4 inherits this convention, with EVEX.pp=01 and EVEX.W playing the roles of the 66 prefix and REX.W. So if EVEX.pp=00, the OSIZE can only be 64b or 32b, depending on whether EVEX.W=1 or not. But if EVEX.pp=01, then OSIZE is either 64b or 16b depending on whether EVEX.W=1 or not.

Some of the tests from X86 directory can be generalized for AArch64 to improve its coverage.

…lvm#105524) Fixes: llvm#104695 This patch adds the is_stmt flag to line table entries for the first instruction with a non-0 line location in each basic block, to ensure that it will be used for stepping even if the last instruction in the previous basic block had the same line number; this is important for cases where the new BB is reachable from BBs other than the preceding block.

…#105833) This patch updates the `omp.parallel` operation according to the results of the discussion in [this RFC](https://discourse.llvm.org/t/rfc-disambiguation-between-loop-and-block-associated-omp-parallelop/79972). It is removed from the set of loop wrapper operations, changing the expected MLIR representation for composite `distribute parallel do/for` into the following: ```mlir omp.parallel { ... omp.distribute { omp.wsloop { omp.loop_nest ... { ... } omp.terminator } omp.terminator } ... omp.terminator } ``` MLIR verifiers for operations impacted by this representation change are updated, as well as related tests. The `LoopWrapperInterface` is also updated, since it's no longer representing an optional "role" of an operation but a mandatory set of restrictions instead.

This patch moves the creation of `DataSharingProcessor` instances for loop constructs out of `genOMPDispatch()` and into their corresponding codegen functions. This is a necessary first step to enable a proper handling of privatization on composite constructs. Some tests are updated due to a change of order between clause processing and privatization.

This patch extends llvm#73964 and optimises SVE cvt intrinsics away when predicate is zero.

This patch adds PFT to MLIR lowering support for `distribute parallel do` composite constructs.

This patch adds PFT to MLIR lowering support for `distribute parallel do simd` composite constructs.

…pes. Need to use original cmp type i1 when estimating the cost for the buildvector node, not its operand types to prevent compiler crash upon TTI cost estimation.

Fixes failure on the llvm-clang-aarch64-darwin buildbot: https://lab.llvm.org/buildbot/#/builders/190/builds/4660/ The test mentioned does not rely on any unique property of X86, but does rely on the layout of the basic blocks produced by llc, which varies between targets. Although the test could be duplicated for other targets, it seems unnecessary since the behaviour being tested is not target-specific.

Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bb. Fixes llvm#106248.

…n in BB (llvm#105524)" Reverted (along with the NFC followup fix) due to buildbot failure: https://lab.llvm.org/buildbot/#/builders/160/builds/4142 This reverts commit 3ef37e2, and commit 616f7d3.

@mstorsjo

The underlying issue was discovered by an assert added in a800533 by a test case provided by @mstorsjo.

If the global variable is constant (but not constexpr), we need to diagnose, but keep evaluating.

… values VPERMILPS lower bits0-3 (to index per-lane i32/f32 0-3) VPERMILPD uses bit1 (to index per-lane i64/f64 0-1) Use SimplifyDemandedBits to ignore anything touching the remaining bits. Part of llvm#106413

) When including all targets, some files become too large for the NSIS installer to handle. Fixes llvm#101994

Add Windows include equivalents for includes and shell command.

Link restored from the original policy outlined here https://discourse.llvm.org/t/code-of-conduct-changes-related-to-llvm-project-policy-changes/64197

Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we prefetch memory for read on the source buffer. This patch adds prefetch for write on the destination buffer.

…it (llvm#106430) We were reporting ambigious references from using declarations as user can be depending on different overloads of a function just because they are visible in the TU. This doesn't apply to records, or primary templates as declaration being referenced in such cases is unambigious, the ambiguity applies to specializations though. Hence this patch returns an explicit reference to record decls and primary templates of those.

This follows Solaris behavior of allowing both mnemonics all the time. Fixes llvm#105639.

Fix llvm#105571 which demonstrates an end() iterator dereference when performing a non-empty splice to end() from a region that ends at Src::end(). Rather than calling Instruction::adoptDbgRecords from Dest, create a marker (which takes an iterator) and absorbDebugValues onto that. The "absorb" variant doesn't clean up the source marker, which in this case we know is a trailing marker, so we have to do that manually.

…106382) Many tests were easy to update, but these are quite big and I think it's better to autogenerate them to see the difference well.

This requires a bit of restructuring of ctor calls when checking for a potential constant expression.

…/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).

…r widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)

These few worked without changes.

LLVM has a CMake variable to control whether to consider logf128 constant folding which libAnalysis ignores. This patch changes the logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in config-ix.cmake.

…on in BB (llvm#105524)" Fixes the previous buildbot error by adding an explicit triple to the test, ensuring that llc can produce a valid object file. This reverts commit 926f097.

Reverts llvm#102147 It seems some systems which should support F128 are wrongly detected as not supporting. This might be due to checking `LDBL_MANT_DIG` instead of `__LDBL_MANT_DIG__`. I will investigate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

Commits on Aug 27, 2024

Commits on Aug 28, 2024

Commits on Aug 29, 2024

Commits on Sep 24, 2024

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

Are you sure you want to change the base?

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

Commits on Aug 27, 2024

Commits on Aug 28, 2024

Commits on Aug 29, 2024

Commits on Sep 24, 2024