Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kitrt] Fixes for numpy extension module #57

Merged
merged 2 commits into from
Sep 24, 2024

Conversation

jsarrao
Copy link
Collaborator

@jsarrao jsarrao commented Sep 12, 2024

  • Renamed kitrt.c to kitrt.cpp
  • Removed system allocators from extension module
  • Fixed typo in mem_realloc method name for both cuda and hip
  • Fixed signature for enable/disable mem handler

 - Renamed kitrt.c to kitrt.cpp
 - Removed system allocators from extension module
 - Fixed typo in mem_realloc method name for both cuda and hip
 - Fixed signature for enable/disable mem handler
Fixed whitespace errors.
Copy link
Collaborator

@pmccormick pmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pmccormick pmccormick merged commit a34f7c3 into lanl:dev/18.x Sep 24, 2024
2 checks passed
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this pull request Oct 1, 2024
* [kitrt] Fixes for numpy extension module

 - Renamed kitrt.c to kitrt.cpp
 - Removed system allocators from extension module
 - Fixed typo in mem_realloc method name for both cuda and hip
 - Fixed signature for enable/disable mem handler

* Unconditionally include kitrt.h.
Fixed whitespace errors.
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this pull request Oct 1, 2024
* [kitrt] Fixes for numpy extension module

 - Renamed kitrt.c to kitrt.cpp
 - Removed system allocators from extension module
 - Fixed typo in mem_realloc method name for both cuda and hip
 - Fixed signature for enable/disable mem handler

* Unconditionally include kitrt.h.
Fixed whitespace errors.
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this pull request Oct 2, 2024
* [kitrt] Fixes for numpy extension module

 - Renamed kitrt.c to kitrt.cpp
 - Removed system allocators from extension module
 - Fixed typo in mem_realloc method name for both cuda and hip
 - Fixed signature for enable/disable mem handler

* Unconditionally include kitrt.h.
Fixed whitespace errors.
tarunprabhu added a commit to tarunprabhu/kitsune that referenced this pull request Oct 21, 2024
LLVM 19.x. Credit for the work goes to the individuals listed in the commit
messages below.

commit bfd4fc3089ef5d3c0c51b112813e80ec24396294
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Thu Oct 3 12:52:03 2024 -0600

    Merge with 19.x

commit bc67a96ed1ed18eee5b88679fd54c40fe3a73073
Author: jsarrao <43554622+jsarrao@users.noreply.github.com>
Date:   Tue Sep 24 10:57:15 2024 -0700

    [kitrt] Fixes for numpy extension module (#57)

    * [kitrt] Fixes for numpy extension module

     - Renamed kitrt.c to kitrt.cpp
     - Removed system allocators from extension module
     - Fixed typo in mem_realloc method name for both cuda and hip
     - Fixed signature for enable/disable mem handler

    * Unconditionally include kitrt.h.
    Fixed whitespace errors.

commit ea97153b1b00a1ae3d974c8c6abb429882b1a96d
Author: George Stelle <stelleg@gmail.com>
Date:   Thu Aug 29 10:21:26 2024 -0600

    dev/18.x linking error fixes (#49)

    clang/lib/Frontend has a call to a Tapir << operator, which means it has
    to be linked against TapirOpts.

    There's unguarded checks to Value::dump in HipABI, which is disabled for
    release builds, so I've replaced them with LLVM_DEBUG calls. Note this
    means you don't have that output before assertion failure for release
    builds. Other options if that's important.

commit ffdfdfe8a2ad1eb7e7666717790a5d62baca4976
Author: George Stelle <stelleg@gmail.com>
Date:   Thu Aug 29 10:20:22 2024 -0600

    Revert "[kitsune] Fix list of statically linked libraries" (#55)

    Reverts lanl/kitsune#51

commit df767d40267d121829c3c715b119b1889e723a80
Author: Tarun Prabhu <tarunprabhu@gmail.com>
Date:   Thu Aug 29 08:02:06 2024 -0600

    [kitsune] Fix peculiar build-time behavior (#53)

    Because of the way the build system was set up, the targets and kitrt
    would be "installed" at build time i.e. when running ninja/make as
    opposed to ninja install/make install. This fixes that behavior and only
    installs during ninja build/ninja install. When building, there will be
    messages that suggest that those targets are actually being installed,
    but they are being "installed" to a subdirectory within the build
    directory.

    Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>

commit 3a4350e4e98883d873825dd31b83c3ed25b07634
Author: Tarun Prabhu <tarunprabhu@gmail.com>
Date:   Thu Aug 29 08:00:27 2024 -0600

    [Github] Remove PULL_REQUEST_TEMPALTE since we allow PR's. (#54)

    Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>

commit 1dbbc9236e7c95eedeebe1077a104d110cf89139
Author: Tarun Prabhu <tarunprabhu@gmail.com>
Date:   Thu Aug 29 08:00:01 2024 -0600

    [NFC] Cleanup code (#50)

    Run clang-format on Kitsune-specific files. Remove trailing whitespace
    and excess newlines from elsewhere.

    Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>

commit cde8fc71ec2f425dfbaac8b0328766429b086e2e
Author: Joseph Sarrao <josephsarrao@gmail.com>
Date:   Tue Aug 27 14:18:17 2024 -0600

    [kitsune] Fix list of statically linked libraries

commit 613ee9d2b9466b32b021658dcbf238b9c921f8c1
Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
Date:   Wed Jul 24 13:58:13 2024 -0600

    This is a squash of the commits below

    First cut at some improvments to the numpy module.

    Working on AMD/HIP fixes.

    More woes and struggles with HIP builds in the 18.x refactoring/overhaul...

    Oh the woes...

    Fixing a bad merge.

    More work on HIP integration into 18.x (bitcode search paths, build fixes, etc.).

    Added matmult example from Jaeyoung.

    WIP: Merge with LLVM 18.x

    WIP: Merge with LLVM 18.x and HIP target fixes.

    Oh the woes...

    Fixing a bad merge.

    More tweaks for a bit more stability with HIP enabled... Hopefully...

    Basic experiments working for full suite of cuda+hip+opencilk.

    Tweaks for cmake configure support in the experiments.

    Missed new intrinsics file.

    Fighting darwin specific woes.

    Working on HIP support for 18.x...

            - AMDGPU targets appear to require full feature strings now to be "appropriate"
              (e.g., sramecc and xnack settings much match what is reported on command line
              via rocminfo).
            - squashed a cmake bug

    Just minor clean up.

    Updates to get runtime concepts aligned between hip and cuda (hip now
    matching cuda design wrt stream creation).

    Forgot to clean up some debugging details...

    Some build tweaking for the experiments.

    Missed a function signature update.

    More benchmark/experiment tweaks.

    runtime fixes for new stream model (hip now matching cuda)

    more hip debugging...

    more verbose support for hip runtime.

    Fixing both runtime and code gen issues around hip stream changes.

    debugging hip.

    hip, hip, not hooray

    Trying to get default threads per block value to work.

    chasing performance issues.

    working on more runtime debugging details.

    Hopefully finished the initial/final hip support for 18.x...
              - runtime updated to manage streams better (less overhead, matches cuda design)
              - hipabi changed to reflect runtime interface changes. .

    A few more tweaks to the hip runtime details.

       - some code tweaks (deviation from the cuda code) due to missing hip functionality.
       - enabled auto-launch parameter settings as the default (likely far from perfect but
         better than hip provided hueristics on our examples/experiments.

    shame, shame...  missed a include <string> (some compilers happy,
    others not so much...)

    Squashed commit for hip support in dev/18.x.  High-level summary:

       - Fixed some code generation details related to rocm 6.1.2 (e.g.,
         xnack behaviors changed as did ecc details -- basically becoming
         part of the target binary vs. just an attribute).  Apparnetly,
         this makes binary images incompatible for a given gpu config.

       - hip runtime closer in functionality and design to cuda. This
         includes stream management (simplified) and launch parameter
         determination based on simple multi-proc load determination.
         Launch parameters still need a lot of work in concert with
         what the compiler can provide.

       - Some random bug fixes related to the details above.

       - Tweaked some of the experiment details to match the new
         kitsune build configuration for dev/18.x (llvm 18.x).

    A bit more cleanup and dropped a clang tidy config file to avoid
    kitsune's runtime from all LLVM code base restrictions.

commit 7d125dff632e43dda3916146f04775e30f92445d
Author: Tarun Prabhu <tarun.prabhu@gmail.com>
Date:   Wed Feb 21 10:40:29 2024 -0700

    This is a squash of the commits below.

    commit 4a9db43abe38ce7a840d3f8ad830a69148af243c
    Author: Tarun Prabhu <tarun.prabhu@gmail.com>
    Date:   Thu Aug 1 17:19:51 2024 -0600

        Fix issues introduced after merge with intersect

    commit f29421607d362fd431ed3fc029cf32afce15a049
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue May 7 09:22:58 2024 -0600

        moved the intersect experiment to its own directory.

    commit b1aad99111f3145b8f6e1581ad0776501b31a4a6
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon May 6 16:30:46 2024 -0600

        Tweak makefile to match details and remove hard-coded gpu target.

    commit 9c74e29e278039a9ec4724966e1861fc1c9f45c2
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Mon May 6 11:32:30 2024 -0600

             cleaned up intersect

    commit 2fcd7517b0dd3aab1e1972edb60600b79c3f96da
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Thu Apr 11 09:56:01 2024 -0600

        prior to merge uncommented in intersect

    commit f02cc1a94be95a2521af8266616ab7398c228acd
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Fri Mar 29 09:38:53 2024 -0600

        trapping the multi-target cuda stream error

    commit a6e6f766db0c97336015585380eb0e634262c329
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Wed Mar 27 11:10:03 2024 -0600

        intersect is sort of working

    commit efbe047fa1878bed3084af8d7afdaa05f1d57c41
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Wed Mar 6 09:33:09 2024 -0700

        At the moment, no LTO on intersect

    commit f31f2d4c0b4a974a633da1818ca6703727c21592
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Tue Mar 5 10:41:39 2024 -0700

        prior to pulling, trying to get intersect working with LTO

    commit 2d8f485d641ccf04b3d3120a076f0ea534c30caf
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Wed Feb 21 13:30:36 2024 -0700

        modified the kokkos makefile so it finds the patched kokkos and added support for intersect by changing the recognized has kokkos flag

    commit 3e498f87a96b1fe32408cc74d54d54980462b45b
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Mon Feb 12 11:38:34 2024 -0700

        working on make multi-target/intersect build

    commit c0f771099c52a81cd6a41a5e7701fb1e9aa0b6b5
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Thu Feb 1 11:08:32 2024 -0700

        Revert "fixed a typo in the makefiles"

        This reverts commit 0404e2fc5cbcd5ed8bc20a194ed056ef6bd06521.

    commit 227d32de1adbd813d645466b1258bc69be5e5c93
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Thu Feb 1 10:58:43 2024 -0700

        updated cuda.mk

    commit 592a8ec6c79b2ad4f519d24d00cb229dced06b34
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Wed Oct 18 13:39:31 2023 -0600

        fixed a typo in the makefiles

    commit 3588c8d357a2bdff0284724f1c9eea65711cad1d
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Thu Feb 1 09:13:32 2024 -0700

        finished merge with 16.x

    commit 474d653e39d478ed58c66e2e73058861941be3fc
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Apr 12 10:35:38 2024 -0600

        Disabled sync region optiizations (merging) due to issues with
        multi-target code.  While this could have performance implications in
        some situations it is the only way we can avoid errors with mixed
        threaded and GPU code.  More bugs may be lurking.

        Also includes updates to the runtime to deal with exposing GPU streams
        to the calling stack frame for correctly handling continuations.

    commit 35d70b993a1960e7c0399821f8f4ada89f1c25aa
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Apr 4 16:30:40 2024 -0600

        First attempt at fixing stream assignment from the runtime in a manner
        that GPU streams can be better captured (e.g., opencilk continuations)
        and GPU work can be launched and sync'ed by different host threads; this
        addresses a bug (flawed assumption) in the runtime when it comes to
        multi-target support and interoperability.

    commit 9e44d1189672d0c03c5744c873da51dba64b6588
    Author: Patrick McCormick <>
    Date:   Thu Mar 28 16:43:36 2024 -0600

        fix bad context mistake -- relevant to multi-target thread-streams debugging...
        this is a temporary workaround and not a correctness guarantee for behaving
        well when opencilk and cuda targets are intermixed (it most certainly can also
        have performance implications).

    commit b74b28f5847b1d736870d7e6e50e33389518a363
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Mar 26 17:10:34 2024 -0600

        extra verbose mode details on thread-stream creation.

    commit e8d59a79e712d638ae11513bb466a1674653dfe0
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Mar 26 17:04:16 2024 -0600

        Quick thread-stream tweak (warning message update and context-based
        sync fallback).  A few other odds and ends of cleanup.

    commit 6edc88e43175194bcf44b2846b92d2e20a8212ce
    Author: Tarun Prabhu <tarun.prabhu@gmail.com>
    Date:   Thu Aug 1 16:44:25 2024 -0600

        Undo change introduced by cherry picking commit.

    commit 0685a47c9d8a28dd96489203a227dc79b955392b
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Mar 4 14:39:32 2024 -0700

        A bit more rt feedback about libdl and some testing with rpath stuff
        in cmake.

    commit 37dbed14e0c5d04c36e6fc0fdc00d6336308602c
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Feb 29 15:54:08 2024 -0700

        Small fixes build logic (for no profiling) and nvidia cuda compute
        versions at runtime.

    commit 10f30e22788d3e9de75ce8129adab22139c8dd7f
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Feb 21 10:40:29 2024 -0700

        LTO fixes (opencilk bitcode file and auto-link args for tapir opencilk targets).

        Removed pure-kokkos tests as part of the default target set from all the experiments.

        Misc. clean up w/ experiments (e.g., makefiles), added LTO test, etc.

commit 7732266f87f3efa46ce7cdbac5bbeef7a9b9c878
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Fri Feb 16 14:54:28 2024 -0700

    Merge with LLVM 18.x

commit f33cffebb4c2a85983d23480884802b12034d583
Author: Tarun Prabhu <tarun@lanl.gov>
Date:   Thu Feb 8 15:57:22 2024 -0700

    This is a squash of all Kitsune commits to date. All credit goes to the
    individuals listed in the commit messages below.

    commit 5377f48ce1adf0d5c7fe1e7c65f66c768b8be669
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Feb 14 13:32:36 2024 -0700

        Tapir target tweaks, LTO touch-ups, etc.

    commit dbfc195996db5cc7deb5232cb791cf69f9acb179
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Feb 13 16:57:54 2024 -0700

        Fixes for LTO...

    commit f9094d35d3ce797ef70e7572a9aab621c444f275
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Feb 12 14:31:23 2024 -0700

        Chasing a bug in the LoopSpawning pass...

    commit 4ddb9d13f799ec072d4416d17e7a3779f50bcab4
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Feb 12 11:09:20 2024 -0700

        Runtime tweaks for refactoring launch parameters (for cuda).

        Tweaks to multi-file (LTO) euler3d experiment.

    commit f8ff7c53d0ebd493231ed0161f3aa753b772e8d7
    Author: Patrick McCormick <>
    Date:   Wed Feb 7 16:32:38 2024 -0700

        More launch explorations.

    commit d720cedc84e9e793836bab99e6b53991d4152288
    Author: Patrick McCormick <>
    Date:   Wed Feb 7 10:10:59 2024 -0700

        Tweaks on launch heuristics.

    commit ab61a3c9ca80361077ea69512970dfb4c7f0b3e5
    Author: Patrick McCormick <>
    Date:   Tue Feb 6 13:20:20 2024 -0700

        working on experiments for benchmarking.

    commit f510df149aee441f1b4b5eac5022e58fe79f152b
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Feb 6 13:31:53 2024 -0700

        A bit more verobse output.

    commit 3eec9c0991178346a616f2ddde4a84bc34bda290
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Feb 6 13:12:41 2024 -0700

        Tweaks for launch heuristics (hacks).

    commit 75895a7167f448945291cf4137a0881d15e10272
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Feb 2 08:37:20 2024 -0700

        More launch and compiler related tweaks and tests.  Fix a mistake in
        the error reporting for the runtime's dylib handling...

    commit e8ee550c232c317eace239ad8b211233016af2de
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Feb 1 13:01:21 2024 -0700

        Experimenting with launch details and some nvvm metadata.

    commit 87fb4e4c85e21d914b0c25d5b5d8ad82ee1c1ae2
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Jan 29 11:22:56 2024 -0700

        Tweak to force environment variable to override occupancy-based
        launch parameter settings.

    commit 721f9f9fe0c922b589536187d17e118a30a82266
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Jan 29 09:23:19 2024 -0700

        Tweaks for attribute support (launch parameters) and runtime
        auto-adjustment to launch parameters.

    commit 82c37acdc9e77067254a3c243bda9e354211ced6
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Jan 23 11:56:22 2024 -0700

        Small touch-ups on build details in experiments.

        Still finding some issues with kokkos, latest cuda (13.x), and other
        details (e.g., host compiler).

    commit c65d80ca725fe7ea8fc168277eda148b77565463
    Merge: 1241ae086c7c acc3dfb18799
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Jan 23 11:00:07 2024 -0700

        Merge remote-tracking branch 'origin/multi' into dev/16.x

    commit acc3dfb187998cd8a53eca47c75340f9b22967ff
    Author: Tarun Prabhu <tarun@lanl.gov>
    Date:   Thu May 25 10:35:36 2023 -0600

        A squash of many commits covering a broad scope:

           1. Address some bugs/details/features introduced with the 16.x merge.
              - includes some minor tweaks for 16.x testing but this needs more work.
              - clang's sema probably needs to be revisited and improved.
           2. A significant overhaul of the runtime to support:
              - binding of calling threads to unique (gpu) streams
              - removal of a lot of crufty code that was no longer being used.
              - simplified kernel launch options/interface
              - occupancy-based launch parameters (can cause performance regressions)
              - better environment variable support for tweaking behaviors and
                more flexibility for experimentation, testing, and debugging.
           3. In alignment with #2 portions of the transforms for CUDA and HIP have
              been cleaned up and simplified (in particular kernel launch details are
              much cleaner now).
           4. Some bug fixes for attempts at post-processing code w/out parallel
              constructs.  New "experiment" introduced to catch this as a regression.
           5. Some runtime building blocks for driving prefetch operations.
           6. Some new experiments/test codes.
           7. Fix for nested outlining -- assumed dead-code elimination pass cleanup
              but fails with separate host and gpu code transformation modules.  Had
              to introduce dead-code removal prior to gpu module passes (otherwise, the
              verifier pass fails).
           8. Runtime entry points for numpy allocation entry points (e.g., calloc,
              realloc, etc.).  TODO: Potentially some room here for GPU-side operations to
              improve performance.
           9. Attribute support (e.g., target) for Kokkos 'statements'.
           10. General code cleanup -- removing warnings, unused code, etc.
           11. New support for launch parameter exploration within the experiments code
               base.
           12. Some work on -ffast-math crashes and issues.  TODO: This code needs to be
               further developed (expanded support for double-precision, additional entry
               points, etc.).  There are also some issues here in what is specified on
               the command line can impact code from the host side but does not have a
               similar match on the GPU code of code transformation.  TODO: ABI and
               other issues need to further explored.
           13. Multiple target support within a code base is supported (e.g., run opencilk
               cpu threads and cuda-targeted forall loops).
           14. Fixes around mutli-thread entry points within the runtime components.
           15. Testing and feature support for H100; sync'ing CUDA and PTX version info, etc.

    commit 1241ae086c7c83a3319127661af076169a8a9ca5
    Author: Patrick McCormick <>
    Date:   Fri Dec 8 16:50:13 2023 -0700

        Dealing with some crufty system libraries on Darwin... This will likely break on
        newer installs (e.g., Arch).

    commit c73442567dda8f1cdc1bce39d77cd1f7b5f4b12a
    Author: Patrick McCormick <>
    Date:   Fri Dec 8 15:23:21 2023 -0700

        Missed cleaning up some debug statements in last commit...

        TODO: -ffastmath stuff...

    commit 6d81192c1e0e28df58ee0e4b2d0978706b51aa70
    Author: Patrick McCormick <>
    Date:   Fri Dec 8 15:00:12 2023 -0700

        Some testing on H100.

    commit a7b07c0f13d450db6aec24f5a15801ef5e43aa6b
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Dec 8 14:58:00 2023 -0700

        Cuda runtime tweaks for multi-target and multi-threads.  Likely still extremely
        buggy under duress...

    commit a8bbeebd763b094e0e3d6aec94a341387e1ef969
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Dec 8 12:47:46 2023 -0700

        Quick memory allocation/free mutex for multi-device use cases.

    commit ea7a1b897287b775d3484b5518fdae3a8c360fca
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Dec 5 12:59:32 2023 -0700

        More work on regressions, fast-math mode, hip performance, etc.

    commit 40365ba12a309e9bed09b572d3bd5cfeef5e3f5b
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Dec 5 13:02:21 2023 -0700

        More work on regressions, fast-math mode, hip performance, etc.

    commit 9b75e1182b49417fe33d5deeae1985dae996d126
    Author: Patrick McCormick <>
    Date:   Tue Dec 5 08:52:30 2023 -0700

        Working on some issues surrounding --ffast-math:

          1. ABI conflicts between the host stage and our module offload
             generation (e.g., host side passes generate vectorized code that is
             not supported on GPU backend(s).

          2. Host architecture-centric tweaks occur before our GPU transform.
             That leads to addressing host architecture specific details as
             part of the transform (e.g., aarch64 and x86_64 will generate
             different calls vs. sticking with llvm intrinsics).

        A combo of ABI issues and/or the fact we're too late in the pass pipeline
        to address this with the current design means more work lies ahead...

    commit 2668be237f4c59e7b33a3e27117324db876e486f
    Author: Patrick McCormick <>
    Date:   Mon Nov 27 15:09:29 2023 -0700

        work on hip performance details.

    commit 243ff1146491b06e29484ac48641d2b3fe48b03c
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Nov 28 12:52:52 2023 -0700

        Testing streams and odd stalls (UVM?).  This version seems to remove
        the stalls but also on a system with a newer kernel drop...  CUDA only
        at this point.

    commit e7d0c0985a446b33e4a84efb1dee3f4067a5e8bc
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Nov 27 15:02:47 2023 -0700

        Working on some runtime tweaks and clean up.  Traced a new crash to
        the use of a ptxas whole-program optimization flag.

    commit 9512eb5fc842ca6582fe2eff885db824a5c6f728
    Author: Patrick McCormick <>
    Date:   Fri Nov 17 08:45:58 2023 -0700

        More work to setup the tests for better HIP and CUDA target flexiblity;
        including some reduced complexity the command line arg details in the
        makefile(s) (e.g., strip mining flags for GPU targets moved into the
        config files vs. being necessary in the makefile setup).

        Added better (correct) AMDGPU target attribute selection based on
        multiple target options (prior version was too hard-coded for gfx90a).

    commit afdb9c2d28d738f95c51107acaccce5c766363f2
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Nov 16 12:59:11 2023 -0700

        A bit more verbose and shared cuda and hip feature management (e.g.,
        streaming modes).

    commit 47979332e0e1487b79b5334d77dfb2705db721f4
    Author: Patrick McCormick <>
    Date:   Thu Nov 16 12:58:38 2023 -0700

        Bug fixes for new prefetch feature set.

    commit 6875ccffb64df68de22272c135379efd8b9a51e3
    Author: Patrick McCormick <>
    Date:   Thu Nov 16 11:05:27 2023 -0700

        More work on HIP performance debugging...

    commit 3ac2d66f2a6009dc286a32c7b8cecf61fa34e96b
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Nov 16 11:01:31 2023 -0700

        First cut at CUDA prefetch streams support.  Needs testing...

    commit 7e3a8a24519f207fd488a821a992ee44bc7d62a1
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Nov 16 08:14:07 2023 -0700

        Some refactoring for HIP details, bug chasing, etc.

    commit 3f1e09e7d2210da432350fdfeb19b4a9c0df3454
    Author: Patrick McCormick <>
    Date:   Wed Nov 15 09:40:07 2023 -0700

        Some hacking for trying to debug AMD HIP code gen/runtime issues.  A few new environment variables to
        make chasing (our tails) easier...

         - KITRT_THREADS_PER_BLOCK=1024 (default 256)
         - KITRT_MAX_NUM_PREFETCH_STREAMS=2 (default 4: size of round-robin stream queue for concurrent prefetch calls)
         - KITRT_DEVICE_ID=5 (default 0: change the default GPU selection)
         - KITRT_MIN_WARPS_PER_EXEC_UNIT=1 (default 1: reducing resource usage per warp -- impacts register allocation, etc.)

        The prefetch stream queue is enabled via the command line with "-mllvm -hipabi-streams".

    commit d3f74a006f866895fe326c3d87824712944ebda4
    Author: Patrick McCormick <>
    Date:   Wed Nov 8 20:43:08 2023 -0700

        Some cleanup and work to try and chase down HIP target runtime variabilty.

    commit c2bb71e9dbee9404c7607cb2682936369bf2c657
    Author: Patrick McCormick <>
    Date:   Thu Nov 2 13:04:11 2023 -0600

        chasing build issues/warnings/errors.

    commit b4bafb426e7fe01ec25132a0df3f240086b3ac0e
    Author: Patrick McCormick <>
    Date:   Thu Nov 2 09:04:25 2023 -0600

        Chasing bugs...

    commit 92997086a8d2fafcd26cead1b7e814565fc3d3e9
    Author: Patrick McCormick <>
    Date:   Tue Oct 31 16:23:02 2023 -0600

        working on benchmarks

    commit 74cb34fc1975487810284cce3fd4eefc347b1a82
    Author: Patrick McCormick <>
    Date:   Wed Oct 25 14:31:56 2023 -0600

        Exploring full kokkos builds w/ clang.

    commit 4dc3221636d7aa8ecacb5ed29c4d0873c7c76ae2
    Author: Patrick McCormick <>
    Date:   Tue Jun 27 14:13:50 2023 -0600

        Some cleanup and small tweaks.

    commit 5136b2416af025d5e1fb38344d87e82ca9d4aa70
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Nov 8 20:27:27 2023 -0700

        Attempt at a quick multi-stream prefetch feature.

    commit 8ece574c8f8aaf4676525bb7368aae5677b5a193
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Nov 2 11:48:12 2023 -0600

        small tweaks to sort out some performance details.

    commit 9b006692706e3eec247b58ee091b8a12c6f1cc7c
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Nov 1 20:38:03 2023 -0600

        Tweak in attempt to debug potential numa issues that are impacting
        consistent performance across multiple application runs.

    commit f5d53a62aa38a6ac3aeaad5556bdbf0d63b08748
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Oct 31 14:46:29 2023 -0600

        A bit more cleanup and adding new tests specific to kitsune.

    commit ff078df1402067b57c3647a57b23820b1726a218
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Oct 31 09:47:10 2023 -0600

        A bit more cleanup and adding some infrastructure for the multi-target
        test code (added makefile and a kokkos version).  Not all the pieces
        are in place to fully test.

    commit d884674842afe05ece0a815318c7275c1b913c93
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Oct 31 08:53:36 2023 -0600

        Clean up some code cruft -- no need to duplicate else branch cases.

    commit 88bc75baea41bc83078d3a7e7dff5f3e4820868e
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Oct 31 08:39:57 2023 -0600

        Forgot to save a cleaned up comment...

    commit bd7941e589309dcfd311a3a0ec5e459128ad54dc
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Oct 30 20:01:09 2023 -0600

        New code to handle tapir attributes on Kokkos "statements".

        Some new code for cuda memory management details (calloc, realloc,
        etc.).  Along with some prep work for upcoming memory management and
        movement changes.

    commit 4f4585aa7f7c3d0c7029023e9504f0fd245b2914
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Oct 25 14:27:27 2023 -0600

        Tweaks for numpy allocation entry points.

    commit ce84cb404073bed4be9f0af1eb4c1106f5300a8c
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Oct 5 09:31:20 2023 -0600

        Small tweaks to remove some unnecessary code.

    commit 6a7d0af3c4abca1a15a6cd90122ad7b0c8831885
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Oct 4 17:10:39 2023 -0600

        Allow cudaabi target to be selected via the enviornment (for JIT use cases).
        Small tweak to loop spawning code.

    commit 0b9c7f8d0da00cb368b8309092ea6e9b6bb9790c
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Oct 3 09:41:37 2023 -0600

        Bug fix for the logic around invocation of post processing modules
        without parallel constructs.

    commit 5c56cb918429dc38aa225c96b51086c3afe824ac
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Oct 2 13:21:44 2023 -0600

        Some minor runtime tweaks to try and capture a call path for auto
        initialization in cases where we might not have an easy path to global
        ctors.

        changes. Lines starting # with '#' will be ignored, and an empty
        message aborts the commit.  # # On branch dev/16.x # Your branch is up
        to date with 'origin/dev/16.x'.  # # Changes to be committed: #
        modified: kitsune/runtime/cuda/cuda.cpp # modified:
        llvm/lib/Transforms/Tapir/CudaABI.cpp # modified:
        llvm/lib/Transforms/Tapir/HipABI.cpp #

    commit 483af046930b981d521a9da7e4bcac2ed1b64721
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Sep 15 12:23:29 2023 -0600

        A better (ABI-independent) path for avoiding calls to
        postProcessModule() on code where parallelism was not
        transformed/discovered during loop spawning.

    commit e0af8c3c8f0533f6612c664a10b942486f4d314d
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Sep 12 09:33:40 2023 -0600

        Bug fix for -ftapir gpu targets when no parallel loops are encountered
        in input module (overly strict assertion replaced with kernel module
        content check prior to starting postprocessing phase).  Added a "no
        forall" chunk of code to the experiments.

        A bit of clean up over the various experiments to keep the overall
        output details identical.

    commit 258a71efb43e8e8e6e739d531d39c2f4362123ba
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Jul 10 12:16:38 2023 -0600

        Some tweaks to the runtime for smarter data movement and cuda/hip
        "hints".

    commit 7d48bb1bd420eb1a6c7ed5604be2429bf2ab0cb6
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Jun 27 14:02:03 2023 -0600

        Prep for new prefetch analysis functionality.

    commit 19df680b1b1bf95eee7f83e202ae24f0a8001cd1
    Author: Patrick McCormick <>
    Date:   Thu Jun 22 12:46:19 2023 -0600

        bug fix due to kernel naming issue.

    commit d143626960ffd1e2d2c42e87f2fca7580ec4d9c4
    Author: Patrick McCormick <>
    Date:   Thu Jun 22 12:45:25 2023 -0600

        Some clean up, testing, and a fix to bring our forall sema up-to-date
        w/ clang 16.x.

    commit 24e4135942d28ae056120a04ba01e1d15f1cdd47
    Author: Patrick McCormick <>
    Date:   Thu Jun 22 12:44:10 2023 -0600

        Tweaks and changes for exploring new targets supported by 16.x...

    commit 8edad4c1b42c06d4fce288866c68451fc8942ed7
    Author: Patrick McCormick <>
    Date:   Thu Jun 22 12:43:09 2023 -0600

        Some clean up, testing, and a fix to bring our forall sema up-to-date
        w/ clang 16.x.

    commit ed62938d3d75731505830649067bf79654a6badc
    Author: Patrick McCormick <>
    Date:   Thu Jun 22 12:41:00 2023 -0600

        Tweaks and changes for exploring new targets supported by 16.x...

    commit fb81e370b8bcbc73c2ed5cd070f04de0ef988ef7
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed May 24 08:59:33 2023 -0600

        Fixes for kernel module optimization levels (failed at @ -O0).

    commit e4435c01888970e90140247d16e22a91d495c176
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jun 22 11:36:27 2023 -0600

        only build kitsune-supported experiments by default.

        removed some verbose feedback during compilation.

    commit 3ce803ea5690e076f25c5fe4ab8c51118b5d0ba1
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jun 22 09:20:14 2023 -0600

        bug fix due to kernel naming issue.

    commit 82dcb7bd4dfe397e440d75643de7c7c89fa60a3a
    Author: Patrick McCormick <>
    Date:   Mon Jun 19 10:58:16 2023 -0600

        Tweaks and changes for exploring new targets supported by 16.x...

        Clean up accidental comit of merge conflicts...

    commit 4ec1a4b86f6a3219bbaa81dc18ff18d635cb1a42
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jun 21 12:59:31 2023 -0600

        Some clean up, testing, and a fix to bring our forall sema up-to-date
        w/ clang 16.x.  Clean up missed conflicts in source... ????

    commit f958b8a25f432f720b62af50168667b39d25d562
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed May 24 08:59:33 2023 -0600

        Fixes for kernel module optimization levels (failed at @ -O0).

    commit 1c08014a03d6ccd986cb3444ea5a43cdc812b9d6
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jun 21 14:10:34 2023 -0600

        start of some docs.

    commit 4d12699dfea25b6e80321912cd84706187445117
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jun 21 12:59:31 2023 -0600

        Some clean up, testing, and a fix to bring our forall sema up-to-date
        w/ clang 16.x.

    commit 2a916da2885cdc7cec060399847209c492c28458
    Author: Patrick McCormick <>
    Date:   Mon Jun 19 10:58:16 2023 -0600

        Tweaks and changes for exploring new targets supported by 16.x...

    commit 33ae54c3bf4b6d576186b1984d702dd211bb5ff4
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jun 21 12:59:31 2023 -0600

        Some clean up, testing, and a fix to bring our forall sema up-to-date
        w/ clang 16.x.

    commit 967b4cc21c319fff43413367be52415e73ced84a
    Author: Tarun Prabhu <tarun@lanl.gov>
    Date:   Thu May 25 10:35:36 2023 -0600

        Fixes to get the AArch64 target to build after merge with 16.x.

    commit 6429e61f74e0f67975880834cc1972b603fb52f1
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed May 24 08:59:33 2023 -0600

        Fixes for kernel module optimization levels (failed at @ -O0).

    commit cb6f5aecf6e827b008bb178845d4a68bd9bd8fe3
    Author: Tarun Prabhu <tarun@lanl.gov>
    Date:   Tue Apr 4 14:22:13 2023 -0600

        Merge with 16.x

        This includes all the work by Pat McCormick <pat@lanl.gov> to add support for
        AMDGPU.s This also includes an overhaul of the kitsune/experiments directory
        and everything in it.

        There is still some work to be done - for instance, getting rid of the legacy
        pass manager for everything except backend code generation, but this should
        be in a functional state for now.

    commit 3264b97f507c0fcea3c7157311c7fd4128806903
    Author: Tarun Prabhu <tarun.prabhu@gmail.com>
    Date:   Mon Oct 17 12:03:11 2022 -0600

        Merge with 15.x

    commit 9c15cbc84e6ad99a41cfd20da2f06e3d0a213368
    Author: Alexis Perry-Holby <aperry@lanl.gov>
    Date:   Fri Sep 23 10:01:40 2022 -0600

        more minor build fixes

    commit 7d460b484d781ad943d5786bdaf4d1aafd0fce1c
    Author: Alexis Perry-Holby <aperry@lanl.gov>
    Date:   Thu Sep 22 15:59:10 2022 -0600

        minor build fixes - header file moved

    commit 4f5f1acf32d1e6f0385bd46a798ab67c2d400ae2
    Author: George Stelle <stelleg@lanl.gov>
    Date:   Thu Sep 22 14:21:51 2022 -0600

        14.x fixes

    commit 7c39485fe83abf561a5ce76fdc76da27b9feac82
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Sep 15 08:39:16 2022 -0600

        Avoiding some junk files.

    commit cabb5c7011247e1d681fee2d972ebde5a9968452
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 14 15:42:05 2022 -0600

        Typo fix.

    commit 8e90393e9e688be52811bd914fac7a6854696dfe
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 14 15:41:41 2022 -0600

        Typo fix.

    commit 42576683abb6d21d01936e0133c90115f2c3f117
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 14 15:32:44 2022 -0600

        Verbose mode addition to cmake.

    commit c4d8d446278924d2302dc589450c08af1833ff4a
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 14 13:56:21 2022 -0600

        Updated docs and added missed experiment for the memory access
        attributes.

    commit fe5acc203e1f617040278692983fd603ad748a98
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 14 12:40:38 2022 -0600

        Clean up some and merge in Alexis' memory access attributes.

    commit 1a971bb38f2ea9efa5301e84573cabd2d8f0701b
    Author: Patrick McCormick <>
    Date:   Mon Aug 22 14:04:23 2022 -0600

        Some code cleanup to get running on Darwin for testing.

    commit 713f9c53e732b24779fa2e6d2dc296df55885b66
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Aug 22 09:57:06 2022 -0600

        Fix experiments to account for shuffling of headers in the kitsune
        runtime organization (still not happy with it but at least things
        appear to work now).

        Also fixed a commented out stream sync call in the runtime from the
        stack overflow debug-a-thon...

    commit 8564169120ab16f70170295069b81ac0561d5cd3
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Aug 19 15:51:46 2022 -0600

        Bug fix in runtime (too many modules!).

    commit c8364e5f9724f37c7a36792f234ecb56ea0f9816
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Aug 19 15:34:30 2022 -0600

        Fix build issues.

    commit b24f76c9001ab4ac78e2aa2134c800aa34e73d0a
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Aug 19 14:55:08 2022 -0600

        More cleanup and missed some files on the last commit.

    commit b018ae9432a0912c2c5869bedf37841f95070267
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Aug 19 14:50:46 2022 -0600

        Some continued reorg of the runtime structure.

        Bug fix for stack explosions...

    commit 8b3b526743971c8554c48a6d797984dd38d6d2ba
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Aug 17 12:04:35 2022 -0600

        Hide some generated data files, exeutables, etc.

    commit b6549fbc71fc014fb6f2217d3d1d30b2073291f4
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Aug 17 11:54:53 2022 -0600

        Restructure runtime source to provide one library (makes life easier in
        the compiler/clang code base too).  Using NVIDIA's HPC SDK seems to trip
        up some aspects of CMake's cuda package (set CUDAToolkit_ROOT to address
        this).

        Continued work on trying to track down GeForce crashes.

        Some new code for tapir target attributes.

    commit b95256c02f553f508874f403bfbf4c2700c86ea8
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Aug 15 14:16:04 2022 -0600

        more work on optimization passes, multi-target code, etc.  testing across
        multiple systems trying to narrow down what appears to be a GeForce-only
        crash in kitsune-generated executables (with long running times).

    commit 1aea89636aa4a51fa7af7ecdc525a1d882ed264d
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Aug 10 14:44:21 2022 -0600

        working through more attribute code details and also fixed a command
        line bug in the cuda target transform that would allow both optimizations
        and debugging to be enabled (this is not currently supported by ptxas).

    commit 562578c2b22afa24aa4aa703b814599bbd9039e8
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Aug 10 11:08:13 2022 -0600

        Some minor tweaks to the runtime trying to find issues related to a large timestep count crash
        in the euler3d experiment.

        Tweaks to update tapir target attribute support to match some new clang features.

    commit aada2f3dd7bcaf1c62f172c25ba1181f06dd47a0
    Author: Patrick McCormick <>
    Date:   Mon Aug 15 11:47:04 2022 -0600

        Testing for issues related to geforce crashes.

    commit 5a2e207206a4b2758a83cca1cea15228b393f715
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Aug 8 10:41:32 2022 -0600

        working to track down cuda crash on large time counts (time steps).
        moving to a "friendly" spot for some debugging help...

    commit 661d514d3b63b3171ae5837d2840a41f5d90e485
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Aug 5 13:18:31 2022 -0600

        Missed the no-view euler3d code.

    commit 2abef21c66dca99ac68c0418f7e00cd057c5cb1c
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Aug 5 13:05:52 2022 -0600

        Fix for bad codegen when forall loop iteration variable type differs
        from runtime type (e.g., trip counts not the same type).  This fix
        adds a cast when necessary...

        tweaks to buid a non-view based version of the euler3d experiment.

    commit 923b68e27fdd9a9df4cb655d78e83e1dada36488
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Aug 4 16:32:05 2022 -0600

        Updates for some issues related to performance and new tests/experiments
        for digging futher into some UVM performance impacts.

        CUDA target transform changes:
          1. Removed vectorization pass.
          2. Ran additional inlining pass post PTX-prep transformations (as well as
             supporting passes).  More experiments needed here.
          3. Fixed some issues with ordering of PTX-level function renaming and some
             general code cleanup.

        Runtime:
          1. Poking at some additional logic to support "auto" prefetch.
          2. Playing with some runtime hints to the page system; so far they seem
             to have minimal (no?) impact.
          3. Some general code cleanup.

    commit cab0a4c47f803dc6599fc349e768cbef396ce496
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jul 28 08:54:00 2022 -0600

        Tweaks for clang builds (CUDA and GCC 12.x don't play together so well
        as CUDA headers #define noinline -- quick workaround is to tweak the
        define in CUDA host header file).

    commit 1875b778c50f7216334140899997a2bb59522b4a
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jul 27 15:20:57 2022 -0600

        Serial version of euler3d and a test data set for running.

    commit 527244f423107d46f98dbe3f00987b7eccf330e4
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jul 27 15:20:16 2022 -0600

        Kokkos version of the euler3d experiment.

    commit 78bc380bbfea9c5cc19ab54210f33a9f56251eb5
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jul 27 14:59:51 2022 -0600

        Fixes for the euler3d code and addition of attributes for kitrt allocated memory buffers.

    commit 2ffea09d23373cc54f7a1b7046f70c8c1cd34eca
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Jul 26 08:44:46 2022 -0600

        Fixes for tranforming function calls into libdevice names (bug when we had
        multiple occurences of the same function across kernels).  Also fixed a issue
        where we were not transforming cloned decls for functions that map into the
        cuda's libdevice library (module).

        Code still needs some cleanup, better checking, and some more test drives.
        Pushing this up as it is likely the bugs above will trip up more complex use
        cases.

    commit 3c2257180e1088e2780de0a4853ddefdb3fc27cc
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jul 21 19:05:21 2022 -0600

        Quick fix for runtime compilation issue.  Still some compiler debug
        messages in place.

    commit a310d8a6782b4f4320315a95e7c63bdbf4e1fc47
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Jul 18 16:43:46 2022 -0600

        Some fixes, simplification of srad benchmark for measuring overall
        runtime (in prep for graph rt code gen and support).  New Rodina
        benchmark (euler3d) in forall form.

    commit fc898895bc9a5b1d697ba3da93151ef1b45b11d2
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jul 14 13:05:47 2022 -0600

        Fixes for kokkos dual view code -- was missing some required sync points for correctness.

    commit df4969e51688d2a0dca2d9809c38e4d05ba01d9a
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jul 14 10:58:03 2022 -0600

        Fix for memcheck error and some cleaner code gen given we want to fix
        the grainsize to 1 for the cuda abi.

    commit 7d049aa91f511454bb3782333f2ef4b4d09859b1
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jul 13 12:16:35 2022 -0600

        Working on a bug.

    commit e3ade8a3038d165397d370bfbd57b64246d16ff1
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jul 13 12:16:03 2022 -0600

        Clean up some vscode stuff so it won't stop on other's settings.

    commit 2bda5f1f99b388f569429cac58db8250d23ad1c8
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Jul 12 10:41:23 2022 -0600

        Still working on the srad benchmark -- trying to determine if there is
        a runtime prefetching impact.

    commit 7f08e2d80a16e0acca21ac532aa7cf86dd00f4cb
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Jul 11 16:36:14 2022 -0600

        Working on the srad benchmark.

    commit db1a00cc91e55f4c52a5edfc47afae3c0865fa0f
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jun 23 14:15:44 2022 -0600

        More detailed timing reports for the srad example.

    commit a4bf6eefd524d24241d00d3da040d5092ff47759
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jun 23 14:02:39 2022 -0600

        Fixed a mistaken benchmark name in the readme, working on trying to
        add similar infrastructure for the srad example/experiment.

    commit 51fb636551864c0644b406361834268e747b6b12
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jun 23 10:34:20 2022 -0600

        Missed the new files.

    commit df62e2f80799748a0fa959d0f9ee3e0c2b559c1d
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Jun 23 10:32:34 2022 -0600

        More updates to experiments/examples for generating and tracking
        results and a bit more documentation.

    commit a00c99a5e07f4e9fad4a46ce4338920a6e9eb811
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jun 22 16:19:50 2022 -0600

        Cleaned up some of the benchmarking bits with an eye towards CI and
        regression checks on the performance front.

    commit 077d6e9b59242e32a232677c8e73ebae3d434f6c
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Jun 22 08:36:21 2022 -0600

        forall and kokkos versions of the Rodinia srad benchmark.

        TODO: The forall and serial version of the code have similar final
              results but Kokkos looks to be much futher off.  Could be a
              missed sync between the host-device dual view data but have
              not tracked it down yet...

    commit 49d6f38c167db5a66c58c06862d999137a99bb38
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Apr 26 16:40:13 2022 -0600

        Overhaul of the CudaABI tranform and portions of the Tapir
        infrastructure.  This includes some bug fixes as well as an approach
        for post-processing the transformed code at the module level.

        Handles const global variables (creates device side and issues
        host-to-device memcpy of values -- something you can't do in
        CUDA/Kokkos without using __managed__).  There are restrictions on
        this capability but checks are not yet in place to enforce those
        restrictions.

        Performance appears to be on target w/ previous version (i.e., up to
        2-4x faster than Kokkos at best and on par with both CUDA and Kokkos
        for simpler code bases).  This currently relies on UVM memory
        allocation and prefetch call code generation prior to kernel launches
        (default behavior now).

    commit 27ab5791380f0e147feadc26693946b9ff03da0d
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Fri Apr 22 09:52:56 2022 -0600

        Comment out some debugging code.

    commit 3b13b65416c431163d3d478df71c896bc800a6ef
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Mar 28 08:58:50 2022 -0600

        This is a squash of several updates for the CudaABI transform and the
        supporting kitsune Cuda runtime components.  There are several new
        features and capabilities for driving the transform via the command
        line (-mllvm -cuabi...), support that should allow for support across
        modules (compilation units), code generation of prefetch via
        coordination between the compiler and runtime, various bug fixes, and
        a few experiments that include hand-coded and tests as the features of
        the transform advanced.

        TODO: More work needs to be done to update the docs and such to
        capture the full scope of the feature set but things seem stable
        enough to unleash the next phase of testing across the team.

    commit 2dec7b5144bab11ab1f510eef96957ea28c83513
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Mar 28 08:58:01 2022 -0600

        Initial set of additions/updates for the new CUDA ABI transform.

    commit 1a808c84a8c690775c23f20b69fc73dc963ca049
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Jun 21 16:52:54 2022 -0600

        Updated GPU LTO extern example

    commit ccc264b0a35f927c7ced4876331e97bcccf62def
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Jun 21 16:11:38 2022 -0600

        Added GPU abi to opt command line handling

    commit 3b67e8324f3fd7a8072dd2da2cb3426786bf360c
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Jun 21 16:09:21 2022 -0600

        Working link time lowering to tapir targets

    commit fd9d42f050b40d3e1154de410f22f9151ed155f5
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Jun 21 10:20:19 2022 -0600

        Added externs example

    commit 199b4c07caf4a8065ef44013eeeee6e69840ceac
    Author: George Stelle <stelleg@gmail.com>
    Date:   Wed Jun 15 06:54:51 2022 -0600

        Added GPU abi to argument serialization

    commit dde26a27e37c93b2b0090133e5331f2136d037fe
    Author: George Stelle <stelleg@gmail.com>
    Date:   Wed Jun 8 12:29:16 2022 -0600

        Fixed libopencilk bitcode clang driver check

    commit 47176b0dc178cd871a99e533a384499267a50016
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Mar 22 15:11:29 2022 -0600

        Merge pull request #41 from shevitz/rf

        basic Kokkos parallel_for implementation

    commit f3f89ccabaab11a6a95e7a86be91eccc5fea8746
    Author: George Stelle <stelleg@gmail.com>
    Date:   Fri Mar 18 12:48:44 2022 -0600

        [kitsune] 13.x fixes

    commit 176e85e645c1cfe39ce2135cee21754587d37b05
    Author: George Stelle <stelleg@gmail.com>
    Date:   Thu Feb 3 10:05:20 2022 -0700

        Added metadata copying for GPU kernels

    commit a29ff78aadd554e0283f95ec2986b48db5fa4149
    Author: George Stelle <stelleg@gmail.com>
    Date:   Mon Jan 10 12:27:00 2022 -0700

        Added always_inline attribute to gpu kernel callees

    commit 20448a029970b4409d66a2b5d6ec1958c462c572
    Author: George Stelle <stelleg@gmail.com>
    Date:   Mon Jan 10 11:27:44 2022 -0700

        Add stripmine-loops check to be able to disable stripmining

    commit 2c05bd12e3c994a6ebb45c15392c1dca8ed59558
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Fri Jan 7 14:12:37 2022 -0700

        changed the <KITSUNE> tags to descriptive comments.

    commit ca0aa787b8e711e8f5b5fb08d735aed56b4a064f
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Thu Jan 6 14:55:19 2022 -0700

        refactored forall range loops to use the same helper functions as forall

    commit c43e55777110718c6dfa48a31c6b3d6af8405dba
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Wed Jan 5 17:48:39 2022 -0700

        changed forall EmitIVLoad to shallow copy form for structs

    commit eda8520c5118a700d219cd7f77def6bb09066cb3
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Tue Jan 4 11:02:43 2022 -0700

        fixed the loop end bug in range based for loops

    commit 1e7a144da050c7c4df88624ad51e6826293cb986
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Thu Dec 2 12:30:32 2021 -0700

        unfortunately found a race in forall range, so committing before checking out master

    commit 1ccbfa922f02f4c0a01d916b413afd532f78d4e1
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Mon Nov 22 17:17:03 2021 -0700

        Removed the Continue JumpDest in favor of explict Condition and Increment JumpDest's

    commit 5c6dbab3558b221f126f1e6ee86afcf53eebdda4
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Wed Nov 10 11:56:21 2021 -0700

        forall refactor is working, and the code is mostly cleanup up

    commit 5e99b03af7275aefb2769ed4a8f584705eb0ebcf
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Mon Oct 11 08:52:43 2021 -0600

        minor changes to pull pat's stuff

    commit 6897db57bc6076fed03d35ffda601d47e48e7714
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Thu Sep 16 11:54:05 2021 -0600

        tried to refactor forall, but it's buggy and I need to checkout release/10.x to see what's going on

    commit a9d6d8ba9291b9eae0cbe0667d7cff42a46af62b
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Wed Sep 8 15:26:28 2021 -0600

        first commit on new refactor branch

    commit 9df375cca94e4b4c118fde0399ff47c3f81a73d8
    Author: Danny Shevitz <shevitz@lanl.gov>
    Date:   Mon Nov 22 10:45:51 2021 -0700

        changed kitsune-dev.cmake

    commit db037f8d15eadec12446b9c425da6f88f856b712
    Author: George Stelle <stelleg@lanl.gov>
    Date:   Tue Dec 21 08:58:06 2021 -0700

        Fixed GPU codegen

    commit 001043d4067715257f6d7d8e301c9c865373cb5b
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Nov 18 08:59:54 2021 -0700

        Adding back in some tidbits that now appear to work correctly with the
        issues addressed in the previous commit/push.

    commit 66a3a2bd7b5b9c578de73bebd1e90f4e92177ef6
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Nov 17 13:54:55 2021 -0700

        A few quick fixes for clang driver code for config file support
        that was lost in the merge with 11/12.x.

        A few related cmake tweaks that are potentially still a bit buggy
        but working better now with some addition for pulling from cilk v1.1
        repos.

    commit f35828658a26be6e4f4ebb1071c97c92553bb20e
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Nov 16 11:34:01 2021 -0700

        Fix cmake fetch stuff -- looks like we lost the configuration and
        include lines as part of the merge with upstream.

    commit 099cab25cafcc0cd7a6d78b474691b2ca7f88fc3
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Nov 2 12:55:25 2021 -0600

        Removed github additions

    commit 7577da9dc7d43bc6ac585428fad652e7ef33b78b
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Nov 2 12:38:16 2021 -0600

        Added c++ condition to extern

    commit a137c5c2deb40060fc23d20e031faa89c2a06550
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Nov 2 12:23:37 2021 -0600

        Tapir abi 12.x merge fixes

    commit 65a36adfaa9f7a3e97b517dc93214abf3be7d1b6
    Author: George Stelle <stelleg@lanl.gov>
    Date:   Wed Oct 13 10:06:35 2021 -0600

        Added bitwriter lib dependnecy for Tapir GPU backend

    commit 90ab9e61d257f5e3c2d1aaf90db0d520b5f99514
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Thu Oct 7 16:09:28 2021 -0600

        Added cilktools.

    commit 6fe60d06e69e58b2696420b783663e29c30a75b5
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 8 15:00:40 2021 -0600

        Merged changes from George and moved new GPU library into location
        for the new build system.

    commit 6f987d1478b16e7bf088c8c86c53bbddb60736c3
    Author: George Stelle <stelleg@lanl.gov>
    Date:   Wed Sep 8 14:04:02 2021 -0600

        Added initial gpu runtime

    commit 7257f5bae9191b7ffd8c38ce1c1aaa605389fd6f
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 8 14:02:44 2021 -0600

        prep to merge w/ an update from George.

    commit c04f21ebf5dd880870ec0dd097a9c11e11db279a
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 8 13:50:47 2021 -0600

        New unified runtime abi for GPUs.

    commit 3d682af1c0744fcaa1d2522d3228760111528fd2
    Author: George Stelle <stelleg@gmail.com>
    Date:   Fri Sep 3 12:47:47 2021 -0600

        Codegen looks good

    commit 19227307eebdb27537e4b5fb95ad59ca906f67e3
    Author: George Stelle <stelleg@gmail.com>
    Date:   Wed Sep 1 11:12:53 2021 -0600

        Handling kernel arguments

    commit e5ee38f76c163c4467c1f6c4ed31b272a93c5fc1
    Author: George Stelle <stelleg@gmail.com>
    Date:   Tue Aug 31 15:20:00 2021 -0600

        Initial GPUABI commit

    commit 3014ee0448cf47963a83d5770a6e14764e1f664e
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Sep 7 10:47:28 2021 -0600

        A few last-minute fixes before release:

           1. -fkokkos mode forces -fno-exceptions to avoid code gen issues
              around parallelism code gen and internal exception mechanisms.
           2. following #1, added a patch to #ifdef out exceptions (try and
              catch blocks) in Kokkos memory spaces.
           3. fixed a bug in the realm ABI that only showed up for some
              examples -- would crash the compiler via an assertion on the
              types of a binary operator.
           4. tweaked some of the kokkos examples to actually behave like a
              good kokkos program should...

    commit e63a4d5bcfe979e2832b7f970540671dc3c7517a
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Tue Sep 7 10:25:17 2021 -0600

        Fixes for cuda abi.

    commit 668ceed82908c56ea6e390feafcdbf2998642b8d
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 1 16:58:46 2021 -0600

        Playing with kernel approaches.

    commit 25f3dbf5d2e83f3bf1a46806ccd413a800d655c3
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Wed Sep 1 16:12:31 2021 -0600

        Some clean up to remove the cudakit target (towards unification across
        the toolchain between tapir, kitsune, opencilk, etc.).

        New cuba abi runtime target code to simplify code gen -- needs to
        merge design ideas with George's current multi-architecture approach.

    commit 8f21b471808e468a5a00356d8883bd647ac2b9c4
    Author: Patrick McCormick <pat@darwin-fe3.lanl.gov>
    Date:   Thu Aug 26 13:21:46 2021 -0600

        Yet another attempt to address build issues that are clearly a race condition
        on build order that can vary between parallel job size as well as platform (e.g.,
        faster systems are builds are more likely to fail).   This hopefully stems from
        a dependency issue but could also be impacted by bugs in either ninja or cmake.
        Would suggest updating to use the latest ninja (1.10.2) and cmake -- both were
        used on the final pass of testing before this commit.

    commit 16eef860f853122f748c88ff00692deab00cdc93
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Aug 23 08:59:41 2021 -0600

        Fixed bug for kitsune.h and install target.

    commit 1ba0104908dbb2b7c89ba69dec29dc43a4af9a3f
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Aug 16 12:30:52 2021 -0600

        More work to sort through adding the Realm ABI target into the build
        system w/ realm as a FetchContent component.  This means some dependency
        and target work within cmake, some ordering details about where the
        FetchContent occurs, and also some fixes to the code base that were lost
        for realm support (e.g., missing switch entries).

        There is some point where the overall cmake config becomes more difficult
        to reason about than the llvm+clang+... source code.  ;-)

        A few additional fixes for the config file (.cfg) settings and supporting
        cmake pieces.  The default config files are autogenerated to handle both
        the in-tree and install use cases.  It is probably worth tailoring for
        your own use cases at some point but adding them to your user directory.

        TODO: there is currently a "collision" between the use of a cmake option
        and the ABI libraries to build within kitsune.  At present you will have
        both enable and add the abi library to the configuration (see the cache
        file under kitsune/cmake/caches/kitsune-dev.cmake).

    commit fbde04b2b248b84d06bac8e624916f397dd3764b
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Aug 16 11:16:16 2021 -0600

        Fixed merge to support LD_LIBRARY_PATH search for the opencilk bitcode file. A change from TB post-1.0.

    commit 171fd764cdf4743a24da18c47007e280e1412151
    Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com>
    Date:   Mon Aug 16 10:05:52 2021 -0600

        Tweaks to address some issues with the -fkokkos option and the addition of
        command line arguments to align with the full toolchain build. One issue
        is that link libraries that appe…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants