diff --git a/BUILDING.md b/BUILDING.md index 148c968..2013061 100644 --- a/BUILDING.md +++ b/BUILDING.md @@ -1,40 +1,4 @@ -UMT requires CMake to build. It is recommended that your are familiar with CMake, in order to enable/disable the supported build options for UMT. An example shell script 'build_and_run_umt.sh' is included in the UMT that will compile UMT, and its required libraries, on a typical Linux distribution. The Linux distribution must have an MPI installation available. - -A list of common CMake options for UMT, and their default values, are: - -Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel -**CMAKE_BUILD_TYPE:STRING=** - -Link and enable the use of the optional CALIPER performance measurement library. -**ENABLE_CALIPER:BOOL=OFF** - -Enable compiling CUDA code -**ENABLE_CUDA:BOOL=OFF** - -Enables CMake's Find MPI support (Turn off when compiling with the mpi wrapper directly) -**ENABLE_FIND_MPI:BOOL=ON** - -Enable OpenMP pragmas -**ENABLE_OPENMP:BOOL=OFF** - -Enable OpenMP target offload pragmas -**ENABLE_OPENMP_OFFLOAD:BOOL=OFF** - -Link and enable the use of the optional UMPIRE memory library. Umpire provides memory pools on both CPU and GPU devices, and CPU memory pools in page-locked memory. -**ENABLE_UMPIRE:BOOL=OFF** - -Compiler OpenMP runtime implementation includes Fortran interface -**OPENMP_HAS_FORTRAN_INTERFACE:BOOL=OFF** - -Compiler OpenMP implementation includes use_device_addr pragma. This is usually true for OpenMP 5.x implementations, but not for OpenMP 4.x. -**OPENMP_HAS_USE_DEVICE_ADDR:BOOL=OFF** - -Use preprocessor macros that strictly conform to FPP. -**STRICT_FPP_MODE:BOOL=OFF** - - - - +UMT requires CMake to build. It is recommended that your are familiar with CMake, in order to enable/disable the supported build options for UMT. An example shell script is included in the UMT that will compile UMT, and its required libraries, on a typical Linux distribution. The Linux distribution must have an MPI installation available. UMT also provides a Spack package and can be optionally built using that package manager. For more information on Spack see https://github.com/spack/spack. diff --git a/DEPENDENCIES.md b/DEPENDENCIES.md index f955b07..b007d37 100644 --- a/DEPENDENCIES.md +++ b/DEPENDENCIES.md @@ -2,32 +2,8 @@ System libraries ----------------------- UMT Requires MPI to be installed on your system to provide a C++ and Fortran compiler. -To build UMT with LLVM Clang/Flang, you must also use them to build MPI so that -Fortran mod files are compatible. For example, you might build OpenMPI as -follows: - -``` -$ CC=clang CXX=clang++ FC=flang-new \ - CFLAGS=-O3 CXXFLAGS=-O3 FCFLAGS=-O3 \ - ../configure --prefix=$PWD/../install-flang \ - --without-knem --without-ucx --enable-pretty-print-stacktrace \ - --enable-orterun-prefix-by-default --enable-mpi1-compatibility -$ make -j -$ make install -``` - -Reference: - Third party libraries ----------------------- -The code depends on several libraries. - Required libraries are: - Conduit, a io interchange library https://github.com/LLNL/conduit - -Optional libraries: -- MFEM, a finite element methods library - https://github.com/mfem/mfem - -MFEM requires the additional libraries METIS and HYPRE. See the MFEM github website for more information. diff --git a/README.md b/README.md index 13bf26e..0e38787 100644 --- a/README.md +++ b/README.md @@ -39,8 +39,7 @@ UMT can run different test problems, depending on how it was configured and buil Blueprint test problem ------------------------ UMT can dynamically generate an unstructured 2D or 3D mesh when built against Conduit -version 0.8.9 or later, or their develop branch as of 1/1/2024 or later. This mode is -selected using the -B command line argument. +version 0.8.9 or later. This mode is selected using the -B command line argument. Each MPI rank in UMT will construct a mesh domain by repeating a Blueprint tile topology into a larger, repeated pattern arranged as a square tiling of the tile topology. By default, UMT will use the tile pattern that is built into Conduit, although other @@ -71,7 +70,7 @@ high aspect ratio domains, then adjust the number of tiles in x,y,z using the `- command line argument. To run this problem: -1. Build UMT (MFEM is not needed) +1. Build UMT 2. Run the test driver in 3D. In the below example, the problem will run for 10 cycles. The `-d 10,10,10` argument will create a mesh domain with 10x10x10 = 1000 tiles, resulting in ~24000 zones/domain. Overall, there would be 8 times that number of zones due to running on 8 MPI ranks. The domains will be arranged in 2x2x2 layout. Run 'test_driver -h' for more info on the arguments. ``` srun -n 8 /path/to/install/bin/test_driver -c 10 -B -d 10,10,10 @@ -81,23 +80,6 @@ srun -n 8 /path/to/install/bin/test_driver -c 10 -B -d 10,10,10 srun -n 4 /path/to/install/bin/test_driver -c 10 -B -d 10,10,0 ``` -MFEM test problem -------------------- -UMT includes an unstructured mesh 3d test problem using a [MFEM](https://mfem.org/) mesh -This mesh can be refined using MFEM at run time to provide larger problems. This problem -requires UMT to have been built with MFEM support. - -To run this problem: -1. Build UMT. This will produce a test_driver and makeUnstructuredBox executable. -2. Run the makeUnstructuredBox to produce the 3d test mesh. -``` -srun -n1 /path/to/install/bin/makeUnstructuredBox -``` -3. Run the test driver. In the below example the problem will run for 10 cycles and the mesh will be refined. Run 'test_driver -h' for more info on the arguments. -``` -srun -n2 /path/to/install/bin/test_driver -i ./unstructBox3D.mesh -c 10 -r 1 -R 6 -``` - References ============== * Nemanic, M K, and Nowak, P. "Radiation transport calculations on unstructured diff --git a/build_and_run_umt.sh b/build_and_run_umt.sh index e81e98b..473051c 100755 --- a/build_and_run_umt.sh +++ b/build_and_run_umt.sh @@ -1,9 +1,4 @@ -#!/bin/bash -xe -# This script will compile a basic Release build of UMT. Additional CMake options can be added to the command line args of this -# script, and they will be picked up and added to the UMT CMake command at the bottom of this script. -# For a list of supported CMake options, run 'ccmake /path/to/umt/src'. -# Do not copy this script out of the UMT repo directory, it assumes it is located next to the UMT source files in order to work. - +#!/bin/sh -x # If you have a UMT tarball, untar it. Otherwise, git clone it from github. # git clone https://github.com/LLNL/UMT.git @@ -17,26 +12,17 @@ CC=gcc CXX=g++ FC=gfortran -# Set to 1 to optionally build UMT with UMPIRE support. -# For more information on UMPIRE CMake options, please see: -# https://umpire.readthedocs.io/en/develop/sphinx/advanced_configuration.html -USE_UMPIRE=0 - FFLAGS=-fallow-argument-mismatch + # Intel example -#CC=icx -#CXX=icpx -#FC=ifx +# CC=icx +# CXX=icpx +# FC=ifx + -# LLVM Clang/Flang build -# -# See DEPENDENCIES.md for a compatible MPI. -# -# So far this works only for single-threaded CPU execution. -#CC=clang -#CXX=clang++ -#FC=flang-new -#FFLAGS=-flang-experimental-polymorphism +# This script will compile a basic Release build of UMT. Additional CMake options can be added to the command line args of this script, and they will be picked up and added to the UMT CMake command at the bottom of this script. +# For a list of supported CMake options, run 'ccmake /path/to/umt/src'. +# Do not copy this script out of the UMT repo directory, it assumes it is located next to the UMT source files in order to work. # Get directory this script is located in. This is assumed to be the UMT repo location. SOURCE="${BASH_SOURCE[0]}" @@ -57,34 +43,16 @@ cd umt_workspace git clone --recurse-submodules https://github.com/LLNL/conduit.git conduit -b v0.9.0 mkdir build_conduit cd build_conduit -cmake ${PWD}/../conduit/src -DCMAKE_Fortran_FLAGS="${FFLAGS}" -DCMAKE_INSTALL_PREFIX=${INSTALL_PATH} -DCMAKE_C_COMPILER=${CC} -DCMAKE_CXX_COMPILER=${CXX} -DCMAKE_Fortran_COMPILER=${FC} -DMPI_CXX_COMPILER=mpicxx -DMPI_Fortran_COMPILER=mpifort -DBUILD_SHARED_LIBS=OFF -DENABLE_TESTS=OFF -DENABLE_EXAMPLES=OFF -DENABLE_DOCS=OFF -DENABLE_FORTRAN=ON -DENABLE_MPI=ON -DENABLE_PYTHON=OFF +cmake ${PWD}/../conduit/src -DCMAKE_INSTALL_PREFIX=${INSTALL_PATH} -DCMAKE_C_COMPILER=${CC} -DCMAKE_CXX_COMPILER=${CXX} -DCMAKE_Fortran_COMPILER=${FC} -DMPI_CXX_COMPILER=mpicxx -DMPI_Fortran_COMPILER=mpifort -DBUILD_SHARED_LIBS=OFF -DENABLE_TESTS=OFF -DENABLE_EXAMPLES=OFF -DENABLE_DOCS=OFF -DENABLE_FORTRAN=ON -DENABLE_MPI=ON -DENABLE_PYTHON=OFF gmake -j install cd .. -UMPIRE_CMAKE_ARGS= -UMPIRE_RUNLINE_ARGS= - -if [ $USE_UMPIRE -eq 1 ]; then - echo "Enabling UMPIRE support" - # If building Umpire, enable it in the UMT CMake and provide the path to the installation. - UMPIRE_CMAKE_ARGS="-DENABLE_UMPIRE=TRUE -DUMPIRE_ROOT=${INSTALL_PATH}" - # If building Umpire, add the '-u 1' command line arg to the UMT test driver run line. - # This will tell it to use an Umpire CPU memory pool, or if a GPU run, to use a CPU pinned memory pool. - UMPIRE_RUNLINE_ARGS="-u 1" - git clone --recurse-submodules https://github.com/LLNL/Umpire.git -b v2023.06.0 - - mkdir build_umpire - cd build_umpire - cmake ${PWD}/../Umpire -DCMAKE_INSTALL_PREFIX=${INSTALL_PATH} -DCMAKE_C_COMPILER=${CC} -DCMAKE_CXX_COMPILER=${CXX} -DCMAKE_Fortran_COMPILER=${FC} -DMPI_CXX_COMPILER=mpicxx -DMPI_Fortran_COMPILER=mpifort -DBUILD_SHARED_LIBS=OFF -DENABLE_TESTS=OFF -DENABLE_EXAMPLES=OFF -DENABLE_DOCS=OFF -DENABLE_FORTRAN=ON -DENABLE_MPI=ON - gmake -j install - cd .. -fi - # Run CMake on UMT, compile, and install. -cmake ${UMT_REPO_PATH}/src -DCMAKE_Fortran_FLAGS=${FFLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=${CXX} -DCMAKE_Fortran_COMPILER=${FC} -DCMAKE_INSTALL_PREFIX=${INSTALL_PATH} -DCONDUIT_ROOT=${INSTALL_PATH} ${UMPIRE_CMAKE_ARGS} $1 +cmake ${UMT_REPO_PATH}/src -DCMAKE_Fortran_FLAGS=${FFLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=${CXX} -DCMAKE_Fortran_COMPILER=${FC} -DCMAKE_INSTALL_PREFIX=${INSTALL_PATH} -DCONDUIT_ROOT=${INSTALL_PATH} $1 gmake -j install cd .. -# Run two smoke tests to verify executable, one on 2D 8x8 tiles mesh and one on 3D 4x4x4 tiles mesh. -srun -n 8 ${INSTALL_PATH}/bin/test_driver -c 10 -B local -d 8,8,0 --benchmark_problem 2 ${UMPIRE_RUNLINE_ARG} -srun -n 8 ${INSTALL_PATH}/bin/test_driver -c 10 -B local -d 4,4,4 --benchmark_problem 2 ${UMPIRE_RUNLINE_ARG} +srun -n 8 ${INSTALL_PATH}/bin/test_driver -c 10 -B local -d 8,8,0 --benchmark_problem 2 +srun -n 8 ${INSTALL_PATH}/bin/test_driver -c 10 -B local -d 4,4,4 --benchmark_problem 2 + +# Test UMT on SSP1 unstructured 3d mesh problem on two mpi ranks. Refine the mesh via -r and -R arguments. diff --git a/host-configs/example.gnu.cmake b/host-configs/example.gnu.cmake index 5bec829..d923491 100644 --- a/host-configs/example.gnu.cmake +++ b/host-configs/example.gnu.cmake @@ -13,4 +13,4 @@ set(CMAKE_Fortran_FLAGS "-ffree-line-length-none" CACHE PATH "") set(ENABLE_OPENMP ON CACHE BOOL "") set(ENABLE_OPENMP_OFFLOAD OFF CACHE BOOL "") -set(CONDUIT_ROOT ${TPL_ROOT}/conduit/develop CACHE PATH "") +set(CONDUIT_ROOT ${TPL_ROOT}/conduit/0.8.2 CACHE PATH "") diff --git a/spack/umt/package.py b/spack/umt/package.py index 4d97eb1..1d9174a 100644 --- a/spack/umt/package.py +++ b/spack/umt/package.py @@ -30,7 +30,6 @@ class Umt(CachedCMakePackage, CudaPackage): variant("openmp", default=False, description="Enable OpenMP support") variant("openmp_offload", default=False, description="Enable OpenMP target offload support") variant("caliper", default=False, description="Enable Caliper performance timers") - variant("mfem", default=False, description="Enable support for reading MFEM meshes") variant("umpire", default=False, description="Enable use of Umpire memory library") variant("shared", default=False, description="Enable shared libraries") variant("silo", default=False, description="Enable silo I/O support") @@ -196,19 +195,6 @@ def initconfig_package_entries(self): entries.append(cmake_cache_path("CONDUIT_ROOT", spec["conduit"].prefix)) - if "+mfem" in spec: - entries.append(cmake_cache_option("ENABLE_MFEM", True)) - entries.append(cmake_cache_path("MFEM_ROOT", spec["mfem"].prefix)) - if "hypre" in spec: - entries.append(cmake_cache_path("HYPRE_ROOT", spec["hypre"].prefix)) - if "metis" in spec: - entries.append(cmake_cache_path("METIS_ROOT", spec["metis"].prefix)) - if ("+zlib" in spec["mfem"]): - found_zlib_dependency = True - - else: - entries.append(cmake_cache_option("ENABLE_MFEM", False)) - if "+caliper" in spec: entries.append(cmake_cache_option("ENABLE_CALIPER", True)) entries.append(cmake_cache_path("CALIPER_ROOT", spec["caliper"].prefix)) diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 7040317..a8ce916 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -1,17 +1,17 @@ #<<<<<=====----------------------------------------------------------=====>>>>> # CMake script for Teton library #<<<<<=====----------------------------------------------------------=====>>>>> -cmake_minimum_required( VERSION 3.14 ) +cmake_minimum_required(VERSION 3.21) set(PROJECT_NAME teton) # Update version information in this file. set(TETON_VERSION_MAJOR 5) -set(TETON_VERSION_MINOR 3) +set(TETON_VERSION_MINOR 5) set(TETON_VERSION_PATCH 0) include (cmake/Version.cmake) -# The CUDA Boltzmann Compton solver source file requires C++11. +# The CUDA Boltzmann Compton solver source file requires C++11. Conduit requires C++14. set(CMAKE_CXX_STANDARD 14) set(CMAKE_CXX_STANDARD_REQUIRED ON) set(CMAKE_CUDA_STANDARD 14) @@ -20,12 +20,23 @@ set(CMAKE_CUDA_STANDARD_REQUIRED ON) # ----- Set up Teton project source files and build targets ----- project( ${PROJECT_NAME} LANGUAGES CXX Fortran VERSION ${TETON_VERSION_MAJOR}.${TETON_VERSION_MINOR}.${TETON_VERSION_PATCH}) +# ----- Gather gpu hardware details, will be set in Teton build_info module. +include (cmake/GetGPUInfo.cmake) + # Tell CMake to not clear out the RPATH when installing an executable. SET(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE) # Add third party FindXXX.cmake modules. set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${PROJECT_SOURCE_DIR}/cmake/thirdparty/") +option(ENABLE_FIND_PACKAGE_CONFIG_MODE "Set CMake find_package mode to CONFIG. Enables imported CMake targets from TPL installations." NO ) + +option(ENABLE_ASSERTS "Enable teton assertion code checks." NO) +if(ENABLE_ASSERTS) + message( STATUS "Enabling Teton code asserts.") + add_compile_definitions( "TETON_COMPILE_ASSERTS") +endif() + # Set up our default flags for each CMake build type. include(cmake/InitBuildTypeCompilerFlags.cmake) # Set correct linker wrapper flags. This is a workaround for HPE CCE compiler not setting this @@ -43,7 +54,7 @@ endif() # ----- # Enable mini-app code version that focuses on the transport sweep algorithm. -# This will enable the test driver, and MFEM mesh support by default. +# This will enable the test driver. # ----- option( ENABLE_MINIAPP_BUILD "Enable transport sweep mini-app." TRUE ) if(ENABLE_MINIAPP_BUILD) @@ -85,30 +96,36 @@ option( OPENMP_HAS_FORTRAN_INTERFACE "Compiler OpenMP implementation includes Fo # TODO - add check that OpenMP_Fortran_HAVE_OMPLIB_MODULE is TRUE, since we use the omp_lib module. if(ENABLE_OPENMP) - message( STATUS "Enable OpenMP pragmas" ) + message( STATUS "OPENMP - Enable OpenMP pragmas" ) if(ENABLE_OPENMP_OFFLOAD) - message( STATUS "Enable OpenMP target offload pragmas" ) + message( STATUS "OPENMP - Enable OpenMP target offload pragmas" ) if ( ${CMAKE_Fortran_COMPILER_ID} STREQUAL "XL" ) - message(STATUS "Detected XLF compiler, manually adding -qoffload to OpenMP flags...") + message(STATUS "OPENMP - Detected XLF compiler, manually adding -qoffload to OpenMP flags...") set(OpenMP_Fortran_FLAG "-qsmp=omp -qoffload") endif() if ( ${CMAKE_CXX_COMPILER_ID} STREQUAL "XL" OR ${CMAKE_CXX_COMPILER_ID} STREQUAL "XLClang") - message(STATUS "Detected XLC compiler, manually adding -qoffload to OpenMP flags...") + message(STATUS "OPENMP - Detected XLC compiler, manually adding -qoffload to OpenMP flags...") set(OpenMP_CXX_FLAG "-qsmp=omp -qoffload") endif() add_compile_definitions( "TETON_ENABLE_OPENMP_OFFLOAD" ) if(OPENMP_HAS_USE_DEVICE_ADDR) - message( STATUS "Enable use of OpenMP use_device_addr pragmas" ) + message( STATUS "OPENMP - Enable use of OpenMP use_device_addr pragmas" ) add_compile_definitions( "TETON_OPENMP_HAS_USE_DEVICE_ADDR" ) - endif() + endif() + + if(OPENMP_UNIFIED_MEMORY) + message( STATUS "OPENMP - Platform has unified memory, skip OpenMP host<->device maps and updates" ) + add_compile_definitions( "TETON_OPENMP_HAS_UNIFIED_MEMORY" ) + endif() + endif() - find_package(OpenMP REQUIRED COMPONENTS Fortran CXX) + find_package(OpenMP REQUIRED COMPONENTS Fortran) if(ENABLE_OPENMP_OFFLOAD) # Add custom logic here to add in the HIP backend libraries needed by Cray for openmp offload kernels. @@ -116,21 +133,30 @@ if(ENABLE_OPENMP) # If we need C++ support, then Cray needs to provide a way to distinguish amdclang vs craycc, as they both identify as 'Clang' to CMake. -- Aaron if (${CMAKE_Fortran_COMPILER_ID} STREQUAL "Cray") - message( STATUS "Detected Cray Fortran OpenMP offload support requested. Adding additional flags to pull in HIP backend needed by Cray Fortran compiler.") + message( STATUS "OPENMP - Detected Cray Fortran OpenMP offload support requested.") + message( STATUS "OPENMP - Adding additional flags to pull in HIP backend needed by Cray Fortran compiler.") + + message( STATUS "OPENMP - Adding additional flag to allocate variable size local arrays and temporaries in heap memory to avoid stack overflow issues with large GPU problems.") + set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -h heap_allocate") if (DEFINED HIP_ROOT_DIR) - message( STATUS "Using HIP installation at ${HIP_ROOT_DIR}") + message( STATUS "OPENMP - Using HIP installation at ${HIP_ROOT_DIR}") else() message( FATAL_ERROR " Must set HIP_ROOT_DIR when using Cray compiler openmp offload functionality.") endif() if (DEFINED CMAKE_HIP_ARCHITECTURES) - message( STATUS "Setting target gpu architecture to ${CMAKE_HIP_ARCHITECTURES}") + message( STATUS "OPENMP - Setting target gpu architecture to ${CMAKE_HIP_ARCHITECTURES}") else() message( FATAL_ERROR " Must set CMAKE_HIP_ARCHITECTURES when using Cray compiler openmp offload functionality.") endif() find_package(hip REQUIRED) + + if(ENABLE_CALIPER) + find_package(ROCTracer REQUIRED) + endif() + add_compile_definitions( "TETON_ENABLE_HIP" ) target_link_options( OpenMP::OpenMP_Fortran INTERFACE $<$:--rocm-path=${HIP_ROOT_DIR} -target-accel=amd_${CMAKE_HIP_ARCHITECTURES}>) @@ -141,15 +167,15 @@ if(ENABLE_OPENMP) endif() endif() - message( STATUS "Compiler supports OpenMP ${OpenMP_Fortran_VERSION}" ) - message( STATUS "Compiler supports OpenMP spec date ${OpenMP_Fortran_SPEC_DATE}") - message( STATUS "Compiler OpenMP C++ flags: ${OpenMP_CXX_FLAGS}") - message( STATUS "Compiler OpenMP Fortran flags: ${OpenMP_Fortran_FLAGS}") + message( STATUS "OPENMP - Compiler supports OpenMP ${OpenMP_Fortran_VERSION}" ) + message( STATUS "OPENMP - Compiler supports OpenMP spec date ${OpenMP_Fortran_SPEC_DATE}") + message( STATUS "OPENMP - Compiler OpenMP C++ flags: ${OpenMP_CXX_FLAGS}") + message( STATUS "OPENMP - Compiler OpenMP Fortran flags: ${OpenMP_Fortran_FLAGS}") add_compile_definitions( "TETON_ENABLE_OPENMP" ) if(OPENMP_HAS_FORTRAN_INTERFACE) - message( STATUS "OpenMP includes Fortran interface, disable creation of iso_c_bindings." ) + message( STATUS "OPENMP - OpenMP includes Fortran interface, disable creation of iso_c_bindings." ) add_compile_definitions( "TETON_OPENMP_HAS_FORTRAN_INTERFACE" ) endif() endif() @@ -164,7 +190,7 @@ if( ENABLE_CUDA ) if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.18.0") find_package(CUDAToolkit REQUIRED) else() - message( STATUS "Older CMake version detected (pre-3.18.0). CUDA compiler flags will need to be manually added by cmake config file or on command line.") + message( STATUS "OPENMP - Older CMake version detected (pre-3.18.0). CUDA compiler flags will need to be manually added by cmake config file or on command line.") endif() endif() @@ -195,13 +221,13 @@ else() endif() -# ----- Gather gpu hardware details, will be set in Teton build_info module. -include (cmake/GetGPUInfo.cmake) - option( ENABLE_UMPIRE "Link and enable the use of the UMPIRE memory library." NO ) if(ENABLE_UMPIRE) find_package(Umpire) - if(ENABLE_CAMP) + if(FMT_ROOT) + find_package(Fmt) + endif() + if(CAMP_ROOT) find_package(Camp) endif() add_compile_definitions( "TETON_ENABLE_UMPIRE" ) @@ -209,8 +235,32 @@ endif() option( ENABLE_CALIPER "Link and enable the use of the CALIPER performance measurement library." NO ) if(ENABLE_CALIPER) - find_package(Caliper) - find_package(Adiak) + + if(ENABLE_FIND_PACKAGE_CONFIG_MODE) + # Use package config files for ADIAK and CALIPER. (experimental) + + # Must be lowercase + message(STATUS "Looking for adiak in ${ADIAK_ROOT}") + find_package(adiak + REQUIRED + CONFIG + PATHS ${ADIAK_ROOT}/lib/cmake/adiak + ) + + message(STATUS "Looking for caliper in ${CALIPER_ROOT}") + find_package(Caliper + REQUIRED + CONFIG + PATHS ${CALIPER_ROOT}/share/cmake/caliper + ) + + # The caliper module does not set some things we expect. + target_include_directories( caliper INTERFACE ${CALIPER_ROOT}/include/caliper/fortran) + else() + find_package(Caliper) + find_package(Adiak) + endif() + add_compile_definitions( "TETON_ENABLE_CALIPER" ) endif() @@ -223,32 +273,33 @@ find_package(ConduitRelayMPIIO) find_package(ConduitBlueprint) find_package(ConduitBlueprintMPI) - if(NOT ENABLE_MINIAPP_BUILD) # Need the source headers for these, regardless of whether we compile the test driver executable. find_package(PhysicsUtils) # Needed for restart SILO file support. + # The FindConduit.cmake file will set CONDUIT_REQUIRES_SILO option( ENABLE_SILO "Link and enable the use of the SILO file i/o library." YES) - if (ENABLE_SILO) + if (ENABLE_SILO OR CONDUIT_REQUIRES_SILO) find_package(Silo) endif() endif() -option( ENABLE_MFEM "Enable support for reading in mfem test meshes." NO) -if (ENABLE_MFEM) - message( STATUS "Enable mfem mesh support.") - add_compile_definitions( "TETON_ENABLE_MFEM" ) - find_package(MFEM) - find_package(Hypre) + + +# The FindConduit.cmake file will set CONDUIT_REQUIRES_PARMETIS +if (CONDUIT_REQUIRES_PARMETIS) + find_package(Parmetis) find_package(Metis) endif() -# Needed if conduit was built against HDF5. -option( ENABLE_HDF5 "Link in the HDF5 library." NO) -if (ENABLE_HDF5) +# It is assumed that SILO was built against HDF5. +# The FindConduitRelay.cmake file will set CONDUIT_REQUIRES_HDF5. +if (ENABLE_SILO OR CONDUIT_REQUIRES_HDF5) find_package(HDF5) - find_package(Z) + if (Z_ROOT) + find_package(Z) + endif() endif() add_subdirectory(teton) diff --git a/src/cmake/GetGPUInfo.cmake b/src/cmake/GetGPUInfo.cmake index 337707f..3e8c9b3 100644 --- a/src/cmake/GetGPUInfo.cmake +++ b/src/cmake/GetGPUInfo.cmake @@ -10,19 +10,44 @@ # This is the number of threads supported per thread team/ block on the target device. # OMP_TARGET_MAX_THREADS_PER_THREAD_TEAM -#NVIDIA VOLTA -if (CMAKE_Fortran_COMPILER_ID STREQUAL "XL") - set(OMP_DEVICE_NUM_PROCESSORS 80) - set(OMP_DEVICE_TEAM_THREAD_LIMIT 1024) -# AMD MI250X -elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray") -# Cray recommends only using 108 of the 110 CUs to leave some free for OS/runtime tasks. -# However, Teton runs faster with 110 vs. 108 - set(OMP_DEVICE_NUM_PROCESSORS 110) - set(OMP_DEVICE_TEAM_THREAD_LIMIT 1024) +if (ENABLE_OPENMP_OFFLOAD) + message(STATUS "Checking for GPU...") + if (CMAKE_CUDA_ARCHITECTURES STREQUAL 70) + message(STATUS "-- Detected NVIDIA Volta, setting device num processors to 70") + set(OMP_DEVICE_NUM_PROCESSORS 80) + set(GSET_MIN_SIZE 16) + set(MAX_NUM_HYPER_DOMAINS 16) + set(OMP_DEVICE_TEAM_THREAD_LIMIT 1024) + + elseif (CMAKE_HIP_ARCHITECTURES STREQUAL gfx90a) +# AMD MI250X - 110 CUs + message(STATUS "-- Detected AMD MI250, setting device num processors to 110") + set(OMP_DEVICE_NUM_PROCESSORS 110) + set(GSET_MIN_SIZE 16) + set(MAX_NUM_HYPER_DOMAINS 16) + set(OMP_DEVICE_TEAM_THREAD_LIMIT 1024) + +# AMD MI300 - 228 CUs + elseif (CMAKE_HIP_ARCHITECTURES STREQUAL gfx942) + message(STATUS "-- Detected AMD MI300, setting device num processors to 228") + set(OMP_DEVICE_NUM_PROCESSORS 228) + set(OPENMP_UNIFIED_MEMORY TRUE) + set(GSET_MIN_SIZE 8) + set(MAX_NUM_HYPER_DOMAINS 16) + set(OMP_DEVICE_TEAM_THREAD_LIMIT 1024) + + else() + message(ERROR "-- Unrecogized or unset value for CUDA or HIP architecture.") + endif() + else() + message(STATUS -- No GPU detected.) + # These are only used if running the GPU kernels on the CPU for testing purposes. set(OMP_DEVICE_NUM_PROCESSORS 1) set(OMP_DEVICE_TEAM_THREAD_LIMIT 1) + set(GSET_MIN_SIZE 1) + set(MAX_NUM_HYPER_DOMAINS 1) + set(OMP_DEVICE_TEAM_THREAD_LIMIT 1) endif() mark_as_advanced(OMP_DEVICE_NUM_PROCESSORS OMP_DEVICE_TEAM_THREAD_LIMIT) diff --git a/src/cmake/InitBuildTypeCompilerFlags.cmake b/src/cmake/InitBuildTypeCompilerFlags.cmake index 0d5d5b7..55418a7 100644 --- a/src/cmake/InitBuildTypeCompilerFlags.cmake +++ b/src/cmake/InitBuildTypeCompilerFlags.cmake @@ -10,19 +10,12 @@ if (NOT "${CMAKE_BUILD_TYPE}" STREQUAL "") string(TOUPPER ${CMAKE_BUILD_TYPE} BUILD_TYPE_UPPERCASE) # --- Set C++ compiler flags --- - if ("${CMAKE_CXX_FLAGS_${BUILD_TYPE_UPPERCASE}_INIT}" STREQUAL "") - message(WARNING "CMake toolchain failed to initialize CMAKE_CXX_FLAGS_${BUILD_TYPE_UPPERCASE}_INIT, please submit a ticket to vendor.") - endif() - if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU") set(CMAKE_CXX_FLAGS_RELEASE "-O3 -DNDEBUG") - set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG") - set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g") + set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -Wall -Wextra -Wshadow -fdiagnostics-show-option -DNDEBUG") + set(CMAKE_CXX_FLAGS_DEBUG "-O0 -Wall -Wextra -Wshadow -fdiagnostics-show-option -g") elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang") - set(CMAKE_CXX_FLAGS_RELEASE "-O3 -DNDEBUG") - set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG") - set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g") elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "XL" OR "${CMAKE_CXX_COMPILER_ID}" STREQUAL "XLClang") set(CMAKE_CXX_FLAGS_RELEASE "-O3 -qstrict -qarch=auto -qtune=auto -qmaxmem=-1 -qsuppress=1500-036") @@ -47,20 +40,12 @@ if (NOT "${CMAKE_BUILD_TYPE}" STREQUAL "") endif() # --- Set Fortran compiler flags --- - if ("${CMAKE_Fortran_FLAGS_${BUILD_TYPE_UPPERCASE}_INIT}" STREQUAL "") - message(WARNING "CMake toolchain failed to initialize CMAKE_Fortran_FLAGS_${BUILD_TYPE_UPPERCASE}_INIT, please submit a ticket to vendor.") - endif() - if("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "GNU") set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -DNDEBUG -ffree-line-length-none") - set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-fcheck=all -O3 -g -DNDEBUG -ffree-line-length-none") - set(CMAKE_Fortran_FLAGS_DEBUG "-fcheck=all -O0 -g -ffree-line-length-none") + set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-Wall -Wextra -fdiagnostics-show-option -fcheck=all -O3 -g -DNDEBUG -ffree-line-length-none") + set(CMAKE_Fortran_FLAGS_DEBUG "-Wall -Wextra -fdiagnostics-show-option -fcheck=all -O0 -g -ffree-line-length-none") elseif("${CMAKE_Fortran_COMPILER_ID}" MATCHES "Clang") # For Clang or AppleClang - elseif("${CMAKE_Fortran_COMPILER_ID}" MATCHES "LLVMFlang") - set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -DNDEBUG") - set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG") - set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g") elseif("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "XL") # Enable F2003 support via the below -qxlf2003 flag list. This behavior is the default if xlf2003 compiler is used, but not if xlf is used. @@ -77,23 +62,37 @@ if (NOT "${CMAKE_BUILD_TYPE}" STREQUAL "") elseif("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "Intel") set(CMAKE_Fortran_FLAGS_RELEASE "-O2 -DNDEBUG") set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-O2 -g -warn all,noexternal,nointerfaces -diag-enable=remark -fpe-all=0 -traceback") - set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g -warn all,noexternal,nointerfaces -diag-enable=remark -check all -fpe-all=0 -traceback") + #set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g -warn all,noexternal,nointerfaces -diag-enable=remark -check all -fpe-all=0 -traceback") + # Check all, at least in the latest LLVM-intel compiler, uses the adress sanitizer for "-check all", so the final link line needs some flags too. + set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g -warn all,noexternal,nointerfaces -diag-enable=remark -fpe-all=0 -traceback") elseif("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "IntelLLVM") set(CMAKE_Fortran_FLAGS_RELEASE "-O2 -DNDEBUG") set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-O2 -g -warn all,noexternal,nointerfaces -diag-enable=remark -fpen=0-traceback") - set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g -warn all,noexternal,nointerfaces -diag-enable=remark -check all -fpen=0 -traceback") + #set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g -warn all,noexternal,nointerfaces -diag-enable=remark -check all -fpen=0 -traceback") + # Check all, at least in the latest LLVM-intel compiler, uses the adress sanitizer for "-check all", so the final link line needs some flags too. + set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -g -warn all,noexternal,nointerfaces -diag-enable=remark -fpen=0 -traceback") elseif("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "PGI") # Note : Cray Fortran completely fails to provide any initial set of flags for build types. Ticket has been submitted to HPE. --black27 # For now, don't append to existing flags ( since there are none ), just set the optimization and debug symbols flag ourselves. + + # Suppress the warning about importing modules that have already been imported by other modules. + # The code has a lot of these dependencies. elseif("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "Cray") - set(CMAKE_Fortran_FLAGS_RELEASE "-O2 -DNDEBUG") + set(CMAKE_Fortran_FLAGS_RELEASE "-O2 -DNDEBUG -M878") # G2 is the only level that doesn't disable OpenMP loop collapsing and still provides debug information. # A ticket has been submitted to ask HPE to update the -G# flag to be consistent with the "-g" flag in their C++ compiler. - set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-O2 -G2 -DNDEBUG") - set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -G2") + set(CMAKE_Fortran_FLAGS_RELWITHDEBINFO "-O2 -G2 -DNDEBUG -h bounds -M878 -Ktrap=fp") + set(CMAKE_Fortran_FLAGS_DEBUG "-O0 -G2 -h bounds -M878 -Ktrap=fp") + endif() + + # Add array bounds checking and asserts for non release builds. + if (NOT "${BUILD_TYPE_UPPERCASE}" STREQUAL RELEASE) + message(STATUS "Detected non-release build, enabling code asserts...") + add_compile_definitions("TETON_COMPILE_ASSERTS") + add_compile_definitions("TETON_CHECK_OUT_OF_BOUNDS_ARRAY_ACCESSES") endif() message(STATUS "Build type ${CMAKE_BUILD_TYPE} CXX flags: ${CMAKE_CXX_FLAGS_${BUILD_TYPE_UPPERCASE}}") diff --git a/src/cmake/InitLinkerWrapperFlags.cmake b/src/cmake/InitLinkerWrapperFlags.cmake index e653223..32f3942 100644 --- a/src/cmake/InitLinkerWrapperFlags.cmake +++ b/src/cmake/InitLinkerWrapperFlags.cmake @@ -24,10 +24,6 @@ if ("Fortran" IN_LIST languages AND "${CMAKE_Fortran_LINKER_WRAPPER_FLAG}" STREQ message(WARNING "Cray Fortran toolchain failed to set CMAKE_Fortran_LINKER_WRAPPER_FLAGS. Report this to vendor. Setting flag to '-Wl,' manually as workaround.") set(CMAKE_Fortran_LINKER_WRAPPER_FLAG "-Wl,") set(CMAKE_Fortran_LINKER_WRAPPER_FLAG_SEP ",") - elseif("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "LLVMFlang") - message(WARNING "LLVM Flang toolchain failed to set CMAKE_Fortran_LINKER_WRAPPER_FLAGS. Report this to vendor. Setting flag to '-Wl,' manually as workaround.") - set(CMAKE_Fortran_LINKER_WRAPPER_FLAG "-Wl,") - set(CMAKE_Fortran_LINKER_WRAPPER_FLAG_SEP ",") else() message(FATAL_ERROR "Toolchain file failed to initialize CMAKE_Fortran_LINKER_WRAPPER_FLAG, please submit a ticket to vendor.") endif() diff --git a/src/cmake/thirdparty/FindCaliper.cmake b/src/cmake/thirdparty/FindCaliper.cmake index 7145a3f..7086a56 100644 --- a/src/cmake/thirdparty/FindCaliper.cmake +++ b/src/cmake/thirdparty/FindCaliper.cmake @@ -37,4 +37,6 @@ find_package_handle_standard_args( DEFAULT_MSG CALIPER_LIBRARIES CALIPER_INCLUDE_DIR) +file(STRINGS ${CALIPER_INCLUDE_DIR}/caliper/caliper-config.h CALIPER_REQUIRES_ADIAK REGEX "^#define CALIPER_HAVE_ADIAK") + mark_as_advanced(CALIPER_LIBRARIES CALIPER_INCLUDE_DIR) diff --git a/src/cmake/thirdparty/FindConduit.cmake b/src/cmake/thirdparty/FindConduit.cmake index 39079c9..7c85db1 100644 --- a/src/cmake/thirdparty/FindConduit.cmake +++ b/src/cmake/thirdparty/FindConduit.cmake @@ -35,5 +35,7 @@ find_package_handle_standard_args( DEFAULT_MSG CONDUIT_LIBRARIES CONDUIT_INCLUDE_DIR) +file(STRINGS ${CONDUIT_INCLUDE_DIR}/conduit/conduit_config.h CONDUIT_REQUIRES_PARMETIS REGEX "^#define CONDUIT_USE_PARMETIS") + set(CONDUIT_FORTRAN_MODULES_DIR ${CONDUIT_INCLUDE_DIR}/conduit) mark_as_advanced(CONDUIT_LIBRARIES CONDUIT_INCLUDE_DIR CONDUIT_FORTRAN_MODULES_DIR) diff --git a/src/cmake/thirdparty/FindConduitBlueprintMPI.cmake b/src/cmake/thirdparty/FindConduitBlueprintMPI.cmake index 7b76323..8c30f26 100644 --- a/src/cmake/thirdparty/FindConduitBlueprintMPI.cmake +++ b/src/cmake/thirdparty/FindConduitBlueprintMPI.cmake @@ -6,33 +6,46 @@ include(FindPackageHandleStandardArgs) -find_path( - CONDUITBLUEPRINTMPI_INCLUDE_DIR - NAMES conduit/conduit_blueprint_mpi.hpp - PATHS ${CONDUIT_ROOT} - PATH_SUFFIXES include - NO_DEFAULT_PATH - NO_CMAKE_ENVIRONMENT_PATH - NO_CMAKE_PATH - NO_SYSTEM_ENVIRONMENT_PATH - NO_CMAKE_SYSTEM_PATH -) +if(TETON_BUILDING_WITH_PARMETIS) + # Find Conduit using the Conduit installed package so we can get any library + # dependencies for the conduit_blueprint_mpi library. + unset(CONDUIT_FOUND) + find_package(Conduit PATHS ${CONDUIT_ROOT}) + set(CONDUIT_FOUND TRUE) + get_target_property(deps conduit_blueprint_mpi INTERFACE_LINK_LIBRARIES) + set(CONDUITBLUEPRINTMPI_LIBRARIES conduit_blueprint_mpi;${deps}) + message(STATUS "CONDUITBLUEPRINTMPI_LIBRARIES=${CONDUITBLUEPRINTMPI_LIBRARIES}") -find_library( - CONDUITBLUEPRINTMPI_LIBRARIES - NAMES conduit_blueprint_mpi - PATHS ${CONDUIT_ROOT} - PATH_SUFFIXES lib - NO_DEFAULT_PATH - NO_CMAKE_ENVIRONMENT_PATH - NO_CMAKE_PATH - NO_SYSTEM_ENVIRONMENT_PATH - NO_CMAKE_SYSTEM_PATH -) +else() + find_path( + CONDUITBLUEPRINTMPI_INCLUDE_DIR + NAMES conduit/conduit_blueprint_mpi.hpp + PATHS ${CONDUIT_ROOT} + PATH_SUFFIXES include + NO_DEFAULT_PATH + NO_CMAKE_ENVIRONMENT_PATH + NO_CMAKE_PATH + NO_SYSTEM_ENVIRONMENT_PATH + NO_CMAKE_SYSTEM_PATH + ) -find_package_handle_standard_args( - ConduitBlueprintMPI - DEFAULT_MSG - CONDUITBLUEPRINTMPI_LIBRARIES CONDUITBLUEPRINTMPI_INCLUDE_DIR) + find_library( + CONDUITBLUEPRINTMPI_LIBRARIES + NAMES conduit_blueprint_mpi + PATHS ${CONDUIT_ROOT} + PATH_SUFFIXES lib + NO_DEFAULT_PATH + NO_CMAKE_ENVIRONMENT_PATH + NO_CMAKE_PATH + NO_SYSTEM_ENVIRONMENT_PATH + NO_CMAKE_SYSTEM_PATH + ) -mark_as_advanced(CONDUITBLUEPRINTMPI_LIBRARIES CONDUITBLUEPRINTMPI_INCLUDE_DIR) + find_package_handle_standard_args( + ConduitBlueprintMPI + DEFAULT_MSG + CONDUITBLUEPRINTMPI_LIBRARIES CONDUITBLUEPRINTMPI_INCLUDE_DIR) + + mark_as_advanced(CONDUITBLUEPRINTMPI_LIBRARIES CONDUITBLUEPRINTMPI_INCLUDE_DIR) + +endif() diff --git a/src/cmake/thirdparty/FindConduitRelay.cmake b/src/cmake/thirdparty/FindConduitRelay.cmake index 1db5bda..ad69b47 100644 --- a/src/cmake/thirdparty/FindConduitRelay.cmake +++ b/src/cmake/thirdparty/FindConduitRelay.cmake @@ -36,4 +36,7 @@ find_package_handle_standard_args( DEFAULT_MSG CONDUITRELAY_LIBRARIES CONDUITRELAY_INCLUDE_DIR) +file(STRINGS ${CONDUIT_INCLUDE_DIR}/conduit/conduit_relay_config.h CONDUIT_REQUIRES_HDF5 REGEX "^#define CONDUIT_RELAY_IO_HDF5_ENABLED") +file(STRINGS ${CONDUIT_INCLUDE_DIR}/conduit/conduit_relay_config.h CONDUIT_REQUIRES_SILO REGEX "^#define CONDUIT_RELAY_IO_SILO_ENABLED") + mark_as_advanced(CONDUITRELAY_LIBRARIES CONDUITRELAY_INCLUDE_DIR) diff --git a/src/cmake/thirdparty/FindHypre.cmake b/src/cmake/thirdparty/FindFmt.cmake similarity index 51% rename from src/cmake/thirdparty/FindHypre.cmake rename to src/cmake/thirdparty/FindFmt.cmake index 8f88b1c..7b8a4bb 100644 --- a/src/cmake/thirdparty/FindHypre.cmake +++ b/src/cmake/thirdparty/FindFmt.cmake @@ -1,15 +1,15 @@ -# Once done, this will define +# The FMT library. Currently used by newer Umpire versions. # -# HYPRE_FOUND - system has hypre -# HYPRE_INCLUDE_DIR - hypre include directory -# HYPRE_LIBRARIES - hypre library +# FMT_FOUND - system has fmt +# FMT_INCLUDE_DIR - fmt include directory +# FMT_LIBRARIES - fmt library include(FindPackageHandleStandardArgs) find_path( - HYPRE_INCLUDE_DIR - NAMES HYPRE.h - PATHS ${HYPRE_ROOT} + FMT_INCLUDE_DIR + NAMES fmt/format.h + PATHS ${FMT_ROOT} PATH_SUFFIXES include NO_DEFAULT_PATH NO_CMAKE_ENVIRONMENT_PATH @@ -19,9 +19,9 @@ find_path( ) find_library( - HYPRE_LIBRARIES - NAMES HYPRE - PATHS ${HYPRE_ROOT} + FMT_LIBRARIES + NAMES fmt + PATHS ${FMT_ROOT} PATH_SUFFIXES lib NO_DEFAULT_PATH NO_CMAKE_ENVIRONMENT_PATH @@ -30,9 +30,10 @@ find_library( NO_CMAKE_SYSTEM_PATH ) + find_package_handle_standard_args( - Hypre + Fmt DEFAULT_MSG - HYPRE_LIBRARIES HYPRE_INCLUDE_DIR) + FMT_LIBRARIES FMT_INCLUDE_DIR) -mark_as_advanced(HYPRE_LIBRARIES HYPRE_INCLUDE_DIR) +mark_as_advanced(FMT_LIBRARIES FMT_INCLUDE_DIR) diff --git a/src/cmake/thirdparty/FindMFEM.cmake b/src/cmake/thirdparty/FindParmetis.cmake similarity index 50% rename from src/cmake/thirdparty/FindMFEM.cmake rename to src/cmake/thirdparty/FindParmetis.cmake index 4f0c493..f9be45b 100644 --- a/src/cmake/thirdparty/FindMFEM.cmake +++ b/src/cmake/thirdparty/FindParmetis.cmake @@ -1,15 +1,15 @@ # Once done, this will define # -# MFEM_FOUND - system has hypre -# MFEM_INCLUDE_DIR - hypre include directory -# MFEM_LIBRARIES - hypre library +# PARMETIS_FOUND - system has parmetis +# PARMETIS_INCLUDE_DIR - parmetis include directory +# PARMETIS_LIBRARIES - parmetis library include(FindPackageHandleStandardArgs) find_path( - MFEM_INCLUDE_DIR - NAMES mfem.hpp - PATHS ${MFEM_ROOT} + PARMETIS_INCLUDE_DIR + NAMES parmetis.h + PATHS ${PARMETIS_ROOT} PATH_SUFFIXES include NO_DEFAULT_PATH NO_CMAKE_ENVIRONMENT_PATH @@ -19,9 +19,9 @@ find_path( ) find_library( - MFEM_LIBRARIES - NAMES mfem - PATHS ${MFEM_ROOT} + PARMETIS_LIBRARIES + NAMES parmetis + PATHS ${PARMETIS_ROOT} PATH_SUFFIXES lib NO_DEFAULT_PATH NO_CMAKE_ENVIRONMENT_PATH @@ -31,8 +31,9 @@ find_library( ) find_package_handle_standard_args( - MFEM + Parmetis DEFAULT_MSG - MFEM_LIBRARIES MFEM_INCLUDE_DIR) + PARMETIS_LIBRARIES PARMETIS_INCLUDE_DIR) -mark_as_advanced(MFEM_LIBRARIES MFEM_INCLUDE_DIR) +mark_as_advanced(PARMETIS_LIBRARIES PARMETIS_INCLUDE_DIR) +mark_as_advanced(PARMETIS_LIBRARIES) diff --git a/src/cmake/thirdparty/FindROCTracer.cmake b/src/cmake/thirdparty/FindROCTracer.cmake new file mode 100644 index 0000000..3fba58f --- /dev/null +++ b/src/cmake/thirdparty/FindROCTracer.cmake @@ -0,0 +1,60 @@ +# Try to find roctracer library +# Once done, this will define +# +# ROCTRACER_FOUND - system has roctracer +# ROCTRACER_INCLUDE_DIR - roctracer include directory +# ROCTRACER_LIBRARIES - roctracer library +# +# AMD does not provided an exported roctracer target. See issue +# https://rzlc.llnl.gov/jira/browse/ELCAP-578 + +include(FindPackageHandleStandardArgs) + +find_path( + ROCTRACER_INCLUDE_DIR + NAMES roctracer/roctracer.h + PATHS ${HIP_ROOT_DIR} + PATH_SUFFIXES include + NO_DEFAULT_PATH + NO_CMAKE_ENVIRONMENT_PATH + NO_CMAKE_PATH + NO_SYSTEM_ENVIRONMENT_PATH + NO_CMAKE_SYSTEM_PATH +) + +find_library( + ROCTRACER_LIBRARY + NAMES roctracer64 + PATHS ${HIP_ROOT_DIR} + PATH_SUFFIXES lib lib64 + NO_DEFAULT_PATH + NO_CMAKE_ENVIRONMENT_PATH + NO_CMAKE_PATH + NO_SYSTEM_ENVIRONMENT_PATH + NO_CMAKE_SYSTEM_PATH +) + + +find_library( + ROCTX_LIBRARY + NAMES roctx64 + PATHS ${HIP_ROOT_DIR} + PATH_SUFFIXES lib lib64 + NO_DEFAULT_PATH + NO_CMAKE_ENVIRONMENT_PATH + NO_CMAKE_PATH + NO_SYSTEM_ENVIRONMENT_PATH + NO_CMAKE_SYSTEM_PATH +) + +if(ROCTRACER_LIBRARY AND ROCTX_LIBRARY) + set(ROCTRACER_LIBRARIES ${ROCTRACER_LIBRARY} ${ROCTX_LIBRARY}) + set(ROCTRACER_FOUND TRUE) + message(STATUS "Found ROCtracer: ${ROCTRACER_LIBRARIES}") + mark_as_advanced(ROCTRACER_LIBRARIES ROCTRACER_LIBRARY ROCTX_LIBRARY ROCTRACER_INCLUDE_DIR) +endif() + +find_package_handle_standard_args( + ROCTracer + DEFAULT_MSG + ROCTRACER_LIBRARIES ROCTRACER_INCLUDE_DIR) diff --git a/src/teton/CMakeLists.txt b/src/teton/CMakeLists.txt index 08e0683..7106e66 100644 --- a/src/teton/CMakeLists.txt +++ b/src/teton/CMakeLists.txt @@ -28,7 +28,6 @@ if( ENABLE_CUDA ) endif() if( ENABLE_OPENMP ) - target_link_libraries( teton PUBLIC OpenMP::OpenMP_CXX) target_link_libraries( teton PUBLIC OpenMP::OpenMP_Fortran) endif() @@ -51,10 +50,14 @@ endif() # Caliper installs its Fortran modules down in include/caliper/fortran, # so we must add that additional include path if (ENABLE_CALIPER) - target_include_directories( teton PRIVATE include - ${CALIPER_INCLUDE_DIR} - ${CALIPER_INCLUDE_DIR}/caliper/fortran - ) + if (ENABLE_FIND_PACKAGE_CONFIG_MODE) + target_link_libraries( teton PUBLIC caliper ) + else() + target_include_directories( teton PRIVATE include + ${CALIPER_INCLUDE_DIR} + ${CALIPER_INCLUDE_DIR}/caliper/fortran + ) + endif() endif() if (ENABLE_BLUEPRINT_INTERFACE) @@ -62,15 +65,13 @@ if (ENABLE_BLUEPRINT_INTERFACE) target_sources( teton INTERFACE include/TetonBlueprint.hh ) target_sources( teton INTERFACE include/TetonSurfaceTallies.hh ) target_sources( teton INTERFACE include/TetonConduitInterface.hh ) + target_sources( teton INTERFACE include/TetonNDAccessor.hh ) + target_sources( teton INTERFACE include/TetonTesting.hh ) + target_sources( teton INTERFACE include/TetonUtilities.hh ) endif() -# Note: -# The MFEM conduit data collection class expects to have an include -# path to the internal 'include/conduit' directory, so add that -# additional include path until MFEM can fix that behavior. target_include_directories( teton PRIVATE include ${CONDUIT_INCLUDE_DIR} - ${CONDUIT_INCLUDE_DIR}/conduit ${CONDUIT_FORTRAN_MODULES_DIR}) # Note: @@ -80,9 +81,16 @@ if (NOT ENABLE_MINIAPP_BUILD) target_include_directories( teton PRIVATE include ${PHYSICSUTILS_INCLUDE_DIR} ${PHYSICSUTILS_INCLUDE_DIR}/PhysicsUtils - ${PHYSICSUTILS_INCLUDE_DIR}/fortran ${SILO_INCLUDE_DIR} ) + + # There was short lived bug in physicsutils when it was putting the fortran modules into 'fortran'. + IF(EXISTS ${PHYSICSUTILS_INCLUDE_DIR}/fortran) + target_include_directories( teton PRIVATE include + ${PHYSICSUTILS_INCLUDE_DIR}/fortran + ) + endif() + endif() configure_file (include/TetonVersion.hh.in include/TetonVersion.hh ) @@ -124,11 +132,6 @@ if(ENABLE_TESTS) add_executable( test_driver driver/test_driver.cc ) - if("${CMAKE_Fortran_COMPILER_ID}" STREQUAL "LLVMFlang") - set_target_properties( test_driver PROPERTIES LINK_FLAGS -fno-fortran-main) - set(OpenMP_Fortran_FLAGS "${OpenMP_Fortran_FLAGS} -fno-fortran-main") - endif() - if (ENABLE_FIND_MPI) target_link_libraries( test_driver PUBLIC MPI::MPI_Fortran MPI::MPI_CXX ) endif() @@ -172,7 +175,7 @@ if(ENABLE_TESTS) endif() if( ENABLE_OPENMP ) - target_link_libraries( test_driver PUBLIC OpenMP::OpenMP_Fortran OpenMP::OpenMP_CXX) + target_link_libraries( test_driver PUBLIC OpenMP::OpenMP_Fortran) # The target_link_options command has trouble with adding flags if there is # a space, it keeps putting quotes around the whole line. Use the older @@ -190,10 +193,9 @@ if(ENABLE_TESTS) # This is only enabled if one of the TPLs require HDF5, so it can be added to the link line. # Our code does not have any direct dependencies on HDF5. - if (ENABLE_HDF5) + if (ENABLE_HDF5 OR CONDUIT_REQUIRES_HDF5) target_link_libraries( test_driver PUBLIC ${HDF5_LIBRARIES} - ${Z_LIBRARIES} dl ) endif() @@ -204,22 +206,51 @@ if(ENABLE_TESTS) if (ENABLE_UMPIRE) target_include_directories( test_driver PUBLIC ${UMPIRE_INCLUDE_DIR} - ${UMPIRE_FORTRAN_MODULES_DIR}) - target_link_libraries( test_driver PUBLIC ${UMPIRE_LIBRARIES} ) - if (ENABLE_CAMP) + ${UMPIRE_FORTRAN_MODULES_DIR}) + target_link_libraries( test_driver PUBLIC ${UMPIRE_LIBRARIES}) + + if (FMT_ROOT) + target_include_directories( test_driver PUBLIC ${FMT_INCLUDE_DIR}) + target_link_libraries( test_driver PUBLIC ${FMT_LIBRARIES} ) + endif() + + if (CAMP_ROOT) target_include_directories( test_driver PUBLIC ${CAMP_INCLUDE_DIR}) target_link_libraries( test_driver PUBLIC ${CAMP_LIBRARIES} ) endif() endif() if (ENABLE_CALIPER) - target_include_directories( test_driver PUBLIC ${CALIPER_INCLUDE_DIR} ${ADIAK_INCLUDE_DIR}) - target_link_libraries( test_driver PUBLIC ${CALIPER_LIBRARIES} ${ADIAK_LIBRARIES} rt) + if (ENABLE_FIND_PACKAGE_CONFIG_MODE) + target_link_libraries( test_driver PUBLIC caliper ) + target_link_libraries( test_driver PUBLIC adiak::adiak ) + else() + target_include_directories( test_driver PUBLIC ${CALIPER_INCLUDE_DIR} ${ADIAK_INCLUDE_DIR}) + target_link_libraries( test_driver PUBLIC ${CALIPER_LIBRARIES} ${ADIAK_LIBRARIES} rt) + endif() + + if (ENABLE_HIP) + target_link_libraries( test_driver PUBLIC ${ROCTRACER_LIBRARIES}) + endif() + endif() - if (ENABLE_MFEM) - target_include_directories( test_driver PUBLIC ${MFEM_INCLUDE_DIR} ${HYPRE_INCLUDE_DIR}) - target_link_libraries( test_driver PUBLIC ${MFEM_LIBRARIES} ${HYPRE_LIBRARIES} ${METIS_LIBRARIES}) + if (CONDUIT_REQUIRES_PARMETIS) + target_link_libraries( test_driver PUBLIC ${PARMETIS_LIBRARIES} ${METIS_LIBRARIES}) + endif() + + if (Z_ROOT) + target_link_libraries( test_driver PUBLIC + ${Z_LIBRARIES} + ) + endif() + + if (TETON_TEST_DRIVER_LIBRARIES) + target_link_libraries( test_driver PUBLIC ${TETON_TEST_DRIVER_LIBRARIES}) + endif() + + if (TETON_TEST_DRIVER_LIBRARIES) + target_link_libraries( test_driver PUBLIC ${TETON_TEST_DRIVER_LIBRARIES}) endif() target_link_options(test_driver PUBLIC "LINKER:${TETON_LINK_OPTIONS}") @@ -227,34 +258,4 @@ if(ENABLE_TESTS) install( TARGETS test_driver RUNTIME DESTINATION bin ) -# Executable for creating unstructured box mesh. - if (ENABLE_MFEM) - add_executable( makeUnstructuredBox - driver/makeUnstructuredBox.cc ) - if (ENABLE_FIND_MPI) - target_link_libraries( makeUnstructuredBox PUBLIC MPI::MPI_CXX ) - endif() - if (ENABLE_HDF5) - target_link_libraries( makeUnstructuredBox PUBLIC - ${HDF5_LIBRARIES} - ${Z_LIBRARIES} - dl - ) - endif() - -# Note: -# The MFEM conduit data collection class expects to have an include -# path to the internal 'include/conduit' directory, so add that -# additional include path until MFEM can fix that behavior. - target_include_directories( makeUnstructuredBox PUBLIC ${CONDUIT_INCLUDE_DIR} - ${CONDUIT_INCLUDE_DIR}/conduit - ${MFEM_INCLUDE_DIR} - ${HYPRE_INCLUDE_DIR} - ) - target_link_options( makeUnstructuredBox PUBLIC "LINKER:${TETON_LINK_OPTIONS}") - target_link_libraries( makeUnstructuredBox PUBLIC ${MFEM_LIBRARIES} ${HYPRE_LIBRARIES} ${METIS_LIBRARIES}) - install( TARGETS makeUnstructuredBox - RUNTIME DESTINATION bin ) - endif() - endif() diff --git a/src/teton/aux/ConstructPhaseSpaceSets.F90 b/src/teton/aux/ConstructPhaseSpaceSets.F90 index a1c24a1..d36e84e 100644 --- a/src/teton/aux/ConstructPhaseSpaceSets.F90 +++ b/src/teton/aux/ConstructPhaseSpaceSets.F90 @@ -1,3 +1,4 @@ +#include "macros.h" !*********************************************************************** ! Last Update: 07/2017, TSH * ! * @@ -15,7 +16,8 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & use, intrinsic :: ISO_C_BINDING use, intrinsic:: iso_fortran_env, only: stdout=>output_unit - use cmake_defines_mod, only: omp_device_num_processors + use cmake_defines_mod, only: omp_device_num_processors, & + min_groupset_size, max_num_hyperdomains use kind_mod use constant_mod @@ -49,13 +51,15 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & type(CommSet), pointer :: CSet integer :: nSets - integer :: nBalancedSets integer :: nSetsMax + integer :: nSetsMaxUser integer :: nGTASets integer :: nAngleSets integer :: nGroupSets + integer :: nGroupSetsMax integer :: nCommSets integer :: nZoneSets + integer :: nHyperDomains integer :: setID integer :: QuadID integer :: groupSetID @@ -82,9 +86,7 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & integer :: nAngles integer :: nReflecting - integer :: NEW_COMM_GROUP - integer :: new_group - integer :: new_comm + integer :: new_comm integer :: ierror integer :: reflID @@ -112,21 +114,32 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & ! Construct Set Data - nReflecting = getNumberOfReflecting(RadBoundary) - nSets = getNumberOfSets(Quad) - nGroupSets = 1 - nZoneSets = getNumberOfZoneSets(Quad) + nReflecting = getNumberOfReflecting(RadBoundary) + nSetsMaxUser = getNumberOfSets(Quad) + nGroupSets = 1 + nZoneSets = getNumberOfZoneSets(Quad) - verbose = nSets > 1 .AND. Options%isRankVerbose() > 0 + verbose = nSetsMaxUser > 1 .AND. Options%isRankVerbose() > 0 ! Decompose angle sets (this finds the maximum number of angle sets ! allowed respecting angular dependencies). Decomposition in angle ! minimizes run time and gives the best thread scaling so we do this first. + call decomposeAngleSets +! Determine the number of "hyper-domains" to increase parallelism +! in the high-order sweeps and "new" GTA. We need to be +! careful for very small zone counts so we estimate a +! maximum number based on the number of zones. We also limit +! the maximum # based on performance observations. This value is set +! in cmake/GetGPUInfo to 12 currently. This number +! could change in the future. PFN 03/29/2024 + + nHypDomMax = int( sqrt( real(Size%nzones) )/2 ) + nHypDomMax = min( nHypDomMax, max_num_hyperdomains ) + nHypDomMax = max( nHypDomMax, 1 ) + ! Determine maximum number of phase-space sets problem will support. -! Until we add "zone sets", the maximum number of sets we can use for the -! sweep is the number of angle sets multiplied by the number of groups QuadSet => getQuadrature(Quad, 1) nAngleSets = QuadSet% maxAngleSets @@ -137,21 +150,27 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & print "(A, I0, A, I0, A, I0, A)", "Teton: Quadrature set ", QuadID, " supports sweeping up to ", QuadSet%maxAngleSets, " sweep directions concurrently and has ", QuadSet%Groups, " energy group bins." endif - allocate( setGroups(nSets) ) - allocate( setAngles(nSets) ) - allocate( setGroup0(nSets) ) - allocate( setGroupID(nSets) ) - allocate( setAngle0(nSets) ) - ! Reduce nSets if it exceeds what problem will support. - if (nSets > nSetsMax) then + if (nSetsMaxUser > nSetsMax) then if (verbose) then - print "(A, I0, A, I0, A)", "Teton: This problem lacks enough parallelism to create ", nSets, " phase-space sets. The maximum available (", nSetsMax, ") will be created." + print "(A, I0, A, I0, A)", "Teton: This problem lacks enough parallelism to create ", nSetsMaxUser, " phase-space sets. The maximum available (up to ", nSetsMax, ") will be created." endif nSets = nSetsMax + else + if (verbose) then + print "(A, I0, A)", "Teton: Will create up to ", nSetsMaxUser, " phase-space sets (limit requested by user)." + endif + + nSets = nSetsMaxUser endif + allocate( setGroups(nSets) ) + allocate( setAngles(nSets) ) + allocate( setGroup0(nSets) ) + allocate( setGroupID(nSets) ) + allocate( setAngle0(nSets) ) + totalSets = 0 ! Create only one phase-space set @@ -191,23 +210,21 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & ! If the number of sets desired is greater than the number of angle sets, ! decompose further in energy and distribute the work as balanced as possible -! - - ! Keep doubling the sets until we get the closest we can to nSets - ! without exceeding it. The assumption here is that each group set - ! contains the same number of groups. This should be relaxed in the - ! future. - - nBalancedSets = nAngleSets - nGroupSets = 1 - do while ( (nBalancedSets * 2 <= nSets) .AND. (nGroupSets *2 <= QuadSet%maxGroupSets)) - nGroupSets = nGroupSets * 2 - nBalancedSets = nBalancedSets * 2 - enddo +! Now that we have spatial parallelism for the transport sweeps it is +! advantageous to have at least 'min_groupset_size' groups per groups set. This +! number is set in cmake/GetGPUInfo.cmake and is currently '16' for our +! supported GPU platforms. + + if ( QuadSet% Groups <= min_groupset_size ) then + nGroupSets = 1 + else + nGroupSets = int( QuadSet% Groups/ min_groupset_size ) + nGroupSetsMax = int ( nSets/nAngleSets ) + nGroupSets = min( nGroupSets, nGroupSetsMax ) + endif -! Here nSets = nAngleSets*nGroupSets - nSets = nBalancedSets + nSets = nAngleSets*nGroupSets ! The following code block handles the case where the groups sets are ! unbalanced (i.e. not all group sets contain the same number of groups). @@ -316,12 +333,12 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & ! Remove after support is added for phase-space sets with different numbers of ! angles and groups and we have tests exercising this in the suite. if ( setID > 1) then - if ( setGroups(setID) /= setGroups(setID-1) ) then - call f90fatal("Teton: Unable to evenly distribute energy groups across phase-space sets. This is currently a requirement. Contact the Teton team for tips on adjusting your energy groups to allow even distribution over the phase-space sets.") - endif if ( setAngles(setID) /= setAngles(setID-1) ) then call f90fatal("Teton: Unable to evenly distribute angles across phase-space sets. This is currently a requirement. Contact the Teton team for tips on adjusting your angle setup to allow even distribution over the phase-space sets.") endif + if ( setGroups(setID) /= setGroups(setID-1) ) then + call f90fatal("Teton: Unable to evenly distribute groups across phase-space sets. This is currently a requirement. Contact the Teton team for tips on adjusting your angle setup to allow even distribution over the phase-space sets.") + endif endif enddo @@ -340,9 +357,44 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & call constructSetPointers(Quad, nSets, nAngleSets, nGroupSets, & nCommSets, nGTASets) +! Note that the use of "hyper-domains" will be deprecated once +! we support sub-meshes per MPI rank. Also, hyper-domains are +! not used on the CPU. PFN 02/14/2023 + + if (Size% useGPU) then +! High-order sweep +! Set number of hyper-domains automatically. (default) + if (Options% getSweepNumHyperDomains() == 0) then + nHypDomMin = int( min(omp_device_num_processors,nSetsMaxUser)/max(nSets,1) ) + nHypDomMin = max( nHypDomMin, 1 ) + + Quad% nHyperDomains(1) = min(nHypDomMax, nHypDomMin) + else if (Options% getSweepNumHyperDomains() >= 1) then +! Specified from user + Quad% nHyperDomains(1) = Options% getSweepNumHyperDomains() + endif + +! GTA Sweep +! Set number of hyper-domains automatically. (default) + if (Options% getGTANumHyperDomains() == 0) then + nHypDomMin = int( min(omp_device_num_processors,nSetsMaxUser)/max(nGTASets,1) ) + nHypDomMin = max( nHypDomMin, 1 ) + + Quad% nHyperDomains(2) = min(nHypDomMax, nHypDomMin) + else if (Options% getGTANumHyperDomains() >= 1) then +! Specified from user + Quad% nHyperDomains(2) = Options% getGTANumHyperDomains() + endif + + else + Quad% nHyperDomains(1) = 1 + Quad% nHyperDomains(2) = 1 + endif + ! Construct the phase-space sets - GTASet = .FALSE. + GTASet = .FALSE. + nHyperDomains = Quad% nHyperDomains(1) SetLoop: do setID=1,nSets @@ -376,9 +428,9 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & Quad% commID(setID) = commSetID ! Construct the set - call Set%construct(setID, groupSetID, angleSetID, QuadID, & + call Set%construct(setID, groupSetID, angleSetID, QuadID, & Groups, NumAngles, g0, angle0, nZones, nCorner, & - QuadSet, GTASet, fromRestart) + nHyperDomains, QuadSet, GTASet, fromRestart) ! Construct group sets, but only for the first angle set if (angle0 == 0) then @@ -397,13 +449,7 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & if (groupSetID == 1 .or. Size% ndim == 1) then ! duplicate the existing communicator - call MPI_COMM_DUP(MY_COMM_GROUP, NEW_COMM_GROUP, ierror) - -! extract the original group handle - call MPI_COMM_GROUP(NEW_COMM_GROUP, new_group, ierror) - -! create new communicator - call MPI_COMM_CREATE(NEW_COMM_GROUP, new_group, new_comm, ierror) + call MPI_COMM_DUP(MY_COMM_GROUP, new_comm, ierror) if (ierror /= MPI_SUCCESS) then call f90fatal("MPI COMM Create Failed") @@ -468,10 +514,11 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & if (Size% ndim > 1) then - GTASet = .TRUE. - angle0 = 0 - angleSetID = nAngleSets - groupSetID = 1 + GTASet = .TRUE. + nHyperDomains = Quad% nHyperDomains(2) + angle0 = 0 + angleSetID = nAngleSets + groupSetID = 1 if (verbose) then print "(A)", "Teton: Angle and energy group distribution breakdown (grey acceleration sweep):" @@ -490,21 +537,21 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & Quad% angleID(nSets+setID) = angleSetID Quad% commID(nSets+setID) = commSetID - QuadID = 2 - Groups = 1 - NumAngles = QuadSet% angleSetSize(setID) - g0 = 0 - nZones = Size% nZones - nCorner = Size% ncornr + QuadID = 2 + Groups = 1 + NumAngles = QuadSet% angleSetSize(setID) + g0 = 0 + nZones = Size% nZones + nCorner = Size% ncornr if (verbose) then write(stdout,100) setID,QuadID,NumAngles,angle0+1,angle0+NumAngles,Groups,g0+1,g0+Groups endif ! construct the GTA set - call Set%construct(setID, groupSetID, angleSetID, QuadID, & - Groups, NumAngles, g0, angle0, nZones, & - nCorner, QuadSet, GTASet, fromRestart) + call Set%construct(setID, groupSetID, angleSetID, QuadID, & + Groups, NumAngles, g0, angle0, nZones, nCorner, & + nHyperDomains, QuadSet, GTASet, fromRestart) ! construct an angle set for every GTA set call construct(ASet, NumAngles, angle0, nZones, & @@ -513,13 +560,7 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & ! construct a communication set for every GTA set ! duplicate the existing communicator - call MPI_COMM_DUP(MY_COMM_GROUP, NEW_COMM_GROUP, ierror) - -! extract the original group handle - call MPI_COMM_GROUP(NEW_COMM_GROUP, new_group, ierror) - -! create new communicator - call MPI_COMM_CREATE(NEW_COMM_GROUP, new_group, new_comm, ierror) + call MPI_COMM_DUP(MY_COMM_GROUP, new_comm, ierror) cSet1 = nSets + setID cSet2 = nSets + setID @@ -530,6 +571,12 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & enddo + if (verbose) then + write(stdout, 300) + write(stdout, 200) Size% myRankInGroup,Quad% nHyperDomains(1),Quad% nHyperDomains(2) + write(stdout, 300) + endif + ! Construct and incident test on shared boundaries call initFindExit(nAngleSets, nGTASets) @@ -538,6 +585,9 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & 100 format(" Phase-Angle Set ID =",i3,2x," | Quadrature Set ID =",i2,2x, " | # Angles = ",i3," | Angle IDs =",i3," -",i3, " | # Groups =",i3," | Group IDs = ",i3," -",i3) + 200 format( "hyper-domains for rank = ",i4,": high-order = ",i4,", GTA = ",i4) + 300 format(" ") + ! Grey Acceleration Module ! Moving this constructor here because (in the near future) ! we will taylor an acceleration method for each set @@ -560,23 +610,6 @@ subroutine ConstructPhaseSpaceSets(fromRestart) & call construct(ZSet, nZoneSets) -! Determine the number of "hyper-domains" to increase parallelism -! for GTA. We need to be careful for very small zone counts so we -! estimate a minimum number based on the number of zones. - -! Note that the use of "hyper-domains" will be deprecated once -! we support sub-meshes per MPI rank. PFN 09/22/2022 - - nHypDomMin = int( 2*sqrt( real(Size%nzones) ) - 1 ) - nHypDomMin = max( nHypDomMin, 1 ) - nHypDomMax = int( omp_device_num_processors/max(nGTASets,1) ) - nHypDomMax = min( nHypDomMax, 20 ) - - if (Size% useGPU) then - Quad% nHyperDomains = min(nHypDomMax, nHypDomMin) - else - Quad% nHyperDomains = min(nSets, nHypDomMin) - endif ! Release memory diff --git a/src/teton/aux/DestructMeshData.F90 b/src/teton/aux/DestructMeshData.F90 index de8df7f..27cf992 100644 --- a/src/teton/aux/DestructMeshData.F90 +++ b/src/teton/aux/DestructMeshData.F90 @@ -35,6 +35,7 @@ subroutine DestructMeshData(nonLTE) BIND(C,NAME="teton_destructmeshdata") use CommSet_mod use SetData_mod use ZoneSet_mod + use DataStore_mod implicit none @@ -74,6 +75,7 @@ subroutine DestructMeshData(nonLTE) BIND(C,NAME="teton_destructmeshdata") call destruct(Geom) call destruct(Rad) + deallocate(Rad) call Mat%destruct(nonLTE) ! Deallocate Phase-Spaces Set data @@ -126,6 +128,7 @@ subroutine DestructMeshData(nonLTE) BIND(C,NAME="teton_destructmeshdata") ! Zone Sets call destruct(ZSet) + deallocate(ZSet) ! Communication Sets @@ -170,6 +173,8 @@ subroutine DestructMeshData(nonLTE) BIND(C,NAME="teton_destructmeshdata") deallocate( Size ) + call theDatastore%root%reset() + return end subroutine DestructMeshData diff --git a/src/teton/aux/SurfaceEdit.F90 b/src/teton/aux/SurfaceEdit.F90 index 3ad1eee..a267f9a 100644 --- a/src/teton/aux/SurfaceEdit.F90 +++ b/src/teton/aux/SurfaceEdit.F90 @@ -129,10 +129,10 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & ! Check some inputs: if (numGroups /= 1) then - tetonAssert(Size% ngr == numGroups, "numGroups in teton_surfaceedit must be either 1 or the # of Teton groups") + TETON_ASSERT(Size% ngr == numGroups, "numGroups in teton_surfaceedit must be either 1 or the # of Teton groups") endif - tetonAssert(numTimeBins > 0, "Number of time bins must be positive.") - tetonAssert(minval(cornerList) > 0, "corner indices must be greater than 0") + TETON_ASSERT(numTimeBins > 0, "Number of time bins must be positive.") + TETON_ASSERT(minval(cornerList) > 0, "corner indices must be greater than 0") ! Constants @@ -141,8 +141,10 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & dtrad = getRadTimeStep(DtControls) geometryFactorTimesDt = getGeometryFactor(Size)*dtrad - allocate( cFaceList(nCornerFaces) ) - cFaceList(:) = -1 + if (nCornerFaces > 0) then + allocate( cFaceList(nCornerFaces) ) + cFaceList(:) = -1 + endif ! Get the starting time @@ -152,7 +154,7 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & ! For the input corner list and opposite corners, find the correct ! corner-face index - if (timeShift) then + if (timeShift .and. nCornerFaces > 0) then allocate(deltasFromCenter(Size% nDim, nCornerFaces)) allocate(sqDistsFromCenter(nCornerFaces)) endif @@ -181,7 +183,7 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & endif enddo CFaceLoop - tetonAssert(cFaceList(iCornerFace) > 0, "Could not find corner face from corner "//char(c)//" and zface"//char(zface)) + TETON_ASSERT(cFaceList(iCornerFace) > 0, "Could not find corner face from corner "//char(c)//" and zface"//char(zface)) enddo ! Initialize temporary arrays: @@ -220,7 +222,7 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & endif if (numAngleBins /= 1) then - tetonAssert(ASet% nPolarAngles == numAngleBins, "numAngleBins must either be 1 or # of Teton polar angles") + TETON_ASSERT(ASet% nPolarAngles == numAngleBins, "numAngleBins must either be 1 or # of Teton polar angles") else polarAngle = 1 endif @@ -306,7 +308,7 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & enddo timeBinDistribution(timeBinFinal) = (shiftedRadTimes(2)-timeBinBoundaries(timeBinFinal))/dtrad endif - tetonAssert(abs(sum(timeBinDistribution) - one) < 1.e-14_adqt, "timeBinDistribution must sum to one.") + TETON_ASSERT(abs(sum(timeBinDistribution) - one) < 1.e-12_adqt, "timeBinDistribution must sum to one.") ! offsets for timeBin and timeBinFinal: timeBin0 = (timeBin-1)*numGroupAngleBins+polarAngle0 @@ -331,7 +333,7 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & elseif (Size% igeom == geometry_sphere) then factor = weight*geometryFactorTimesDt*lambdaD3*Geom% Radius(c)*Geom% Radius(c) else - tetonAssert(.false., "Unknown geometry type in Teton's SurfaceEdit.F90") + TETON_ASSERT(.false., "Unknown geometry type in Teton's SurfaceEdit.F90") endif if (angdota > zero) then @@ -362,8 +364,8 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & ! Note that we only ever reach this part of the code if ! computeIncident = true - tetonAssert(computeIncident, "Should not try to compute incident power in SurfaceEdit if computeIncident is .false.") - tetonAssert(cOpp > 0, "cOpp must be a positive index") + TETON_ASSERT(computeIncident, "Should not try to compute incident power in SurfaceEdit if computeIncident is .false.") + TETON_ASSERT(cOpp > 0, "cOpp must be a positive index") timeBinPlusNBins = timeBin + numTimeBins timeBinFinalPlusNBins = timeBinFinal + numTimeBins @@ -421,7 +423,9 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & endif - deallocate(cFaceList) + if (nCornerFaces > 0) then + deallocate(cFaceList) + endif if (calcErrorMetricsConfirmed) then @@ -447,7 +451,7 @@ subroutine SurfaceEdit(nCornerFaces, labFrame, & deallocate( tempErrEstSrcSize ) endif - if (timeShift) then + if (timeShift .and. nCornerFaces > 0) then deallocate( deltasFromCenter ) deallocate( sqDistsFromCenter ) endif diff --git a/src/teton/aux/getAngleBins.F90 b/src/teton/aux/getAngleBins.F90 index 47ad5f2..170dbf3 100644 --- a/src/teton/aux/getAngleBins.F90 +++ b/src/teton/aux/getAngleBins.F90 @@ -43,7 +43,7 @@ subroutine getAngleBins(numAngleBins, & ASet => getAngleSetData(Quad, angleSetID) - tetonAssert(ASet% nPolarAngles == numAngleBins, "numAngleBins must be # of Teton polar angles in teton_getanglebins") + TETON_ASSERT(ASet% nPolarAngles == numAngleBins, "numAngleBins must be # of Teton polar angles in teton_getanglebins") NumAngles = ASet% NumAngles @@ -55,7 +55,7 @@ subroutine getAngleBins(numAngleBins, & enddo - tetonAssert(abs(sum(angleBinBoundaries) - one/Size%wtiso) < 1.e-14_adqt, "Error in getAngleBins.F90: sum(angle weights) != 1") + TETON_ASSERT(abs(sum(angleBinBoundaries)*Size%wtiso - one) < 1.e-12_adqt, "Error in getAngleBins.F90: sum(angle weights) != 1") angleBinBoundaries(:) = angleBinBoundaries(:)*Size%wtiso*2 angleBinBoundaries(1) = -one @@ -66,7 +66,7 @@ subroutine getAngleBins(numAngleBins, & enddo ! Check that the final weight is close enough to one: - tetonAssert((angleBinBoundaries(numAngleBins+1) - one) < 1.e-14_adqt, "Error in getAngleBins.F90: Last bin boundary != 1") + TETON_ASSERT((angleBinBoundaries(numAngleBins+1) - one) < 1.e-14_adqt, "Error in getAngleBins.F90: Last bin boundary != 1") ! Then set it to one explicitly: angleBinBoundaries(numAngleBins+1) = one diff --git a/src/teton/aux/getDtMessage.F90 b/src/teton/aux/getDtMessage.F90 index b8b23dc..43e6ee3 100644 --- a/src/teton/aux/getDtMessage.F90 +++ b/src/teton/aux/getDtMessage.F90 @@ -29,8 +29,6 @@ subroutine getDtMessage(dtMessage) & dtString = trim(DtControls% dtMessage)//C_NULL_CHAR dtMessage = C_LOC( dtString ) - return end subroutine getDtMessage - diff --git a/src/teton/aux/getEdits.F90 b/src/teton/aux/getEdits.F90 index 17e34b0..7224ad9 100644 --- a/src/teton/aux/getEdits.F90 +++ b/src/teton/aux/getEdits.F90 @@ -73,7 +73,7 @@ subroutine getEdits(noutrt, ninrt, ngdart, nNLIters, & real(C_DOUBLE) :: timerad real(C_DOUBLE) :: outerTempRelTol, outerEDRelTol, greyRelTol, incidentFluxRelTol, innerNLRelTol - character(len=34), parameter :: Cformat = "(1X,A6,i8,A12,F18.10,A10,1pe18.10)" + character(len=38), parameter :: Cformat = "(1X,A10,i8,A12,F18.10,A10,1pe18.10)" character(len=25), parameter :: Iformat = "(1X,A13,i6,A15,i6,A15,i6)" character(len=18), parameter :: Jformat = "(1X,A20,i6,A21,i6)" character(len=30), parameter :: Tformat = "(1X,A7,1X,F18.10,A9,i7,A12,i5)" @@ -134,7 +134,7 @@ subroutine getEdits(noutrt, ninrt, ngdart, nNLIters, & if ( Options%isRankVerbose() > 0 ) then print *, "" print *, ">>>>>>>>>>>>>>> End Radiation Step <<<<<<<<<<<<<<<" - print Cformat, "CYCLE ", ncycle," timerad = ", timerad," dtrad = ", dtused + print Cformat, "TIME STEP ", ncycle," timerad = ", timerad," dtrad = ", dtused print *, "" print Iformat,"TempIters = ", noutrt, " FluxIters = ", ninrt, " GTASweeps = ",ngdart print Jformat,"AveNonLinearIters = ", nNLIters, " MaxNonLinearIters = ",maxNLIters diff --git a/src/teton/aux/getRunStats.F90 b/src/teton/aux/getRunStats.F90 index 841133b..10197e1 100644 --- a/src/teton/aux/getRunStats.F90 +++ b/src/teton/aux/getRunStats.F90 @@ -77,6 +77,7 @@ subroutine getRunStats(MatCoupTimeTotal, SweepTimeTotal, & integer :: numOmpCPUThreads integer :: nZoneSets integer :: nSets + integer :: nHyperDomains character(len=26), parameter :: Tformat = "(1X,A16,1X,F14.8,5X,F14.8)" character(len=14), parameter :: Sformat = "(A21,1pe18.11)" @@ -114,9 +115,10 @@ subroutine getRunStats(MatCoupTimeTotal, SweepTimeTotal, & type(IterControl) , pointer :: intensityControl => NULL() ! Threading information - numOmpCPUThreads = Options%getNumOmpMaxThreads() - nZoneSets = getNumberOfZoneSets(Quad) - nSets = getNumberOfSets(Quad) + numOmpCPUThreads = Options%getNumOmpMaxThreads() + nZoneSets = getNumberOfZoneSets(Quad) + nSets = getNumberOfSets(Quad) + nHyperDomains = getNumberOfHyperDomains(Quad,1) ! Iteration Controls @@ -246,18 +248,17 @@ subroutine getRunStats(MatCoupTimeTotal, SweepTimeTotal, & ncycle = getRadCycle(DtControls) - print '(A24,i5,A24)', " >>>>>>>>> TETON Cycle ",ncycle," Statistics <<<<<<<<<" #if defined(TETON_ENABLE_OPENMP) print *,"***************** Threading ****************" - print '(A,i5)', " # threads per rank, cpu = ", numOmpCPUThreads + print '(A,i5)', " # threads per rank, cpu = ", numOmpCPUThreads #if defined(TETON_ENABLE_OPENMP_OFFLOAD) if (Size%useGPU) then ! Number of thread teams used for kernels iterating over zone sets. - print '(A,i5)', " # thread teams over zone sets = ", nZoneSets + print '(A,i5)', " # thread teams over zone sets = ", nZoneSets ! Number of thread teams used for kernels iterating over phase-angle ! sets. May comment this line out later, as Paul intends to migrate all ! kernels to be over zone sets. - print '(A,i5)', " # thread teams over phase-angle sets = ", nSets + print '(A,i5)', " # thread teams over sweep sets = ", nSets*nHyperDomains endif print *," " #endif diff --git a/src/teton/aux/getZonalPsi.F90 b/src/teton/aux/getZonalPsi.F90 index fc06781..4798f68 100644 --- a/src/teton/aux/getZonalPsi.F90 +++ b/src/teton/aux/getZonalPsi.F90 @@ -52,7 +52,7 @@ subroutine getZonalPsi(numAngles, Psi) & ASet => getAngleSetData(Quad,angleSetID) numAnglesInternal = numAnglesInternal + ASet%NumAngles enddo - tetonAssert(numAngles == numAnglesInternal, "numAngles given to getZonalPsi does not match Teton's internal total number of angles") + TETON_ASSERT(numAngles == numAnglesInternal, "numAngles given to getZonalPsi does not match Teton's internal total number of angles") numSets = getNumberOfSets(Quad) do setID = 1,numSets diff --git a/src/teton/aux/publishEdits.F90 b/src/teton/aux/publishEdits.F90 index 639448e..1c2ca06 100644 --- a/src/teton/aux/publishEdits.F90 +++ b/src/teton/aux/publishEdits.F90 @@ -69,7 +69,7 @@ subroutine publishEdits(dtrad) BIND(C,NAME="teton_publishedits") real(C_DOUBLE) :: timerad real(C_DOUBLE) :: outerTempRelTol, outerEDRelTol, greyRelTol, incidentFluxRelTol, innerNLRelTol - character(len=34), parameter :: Cformat = "(1X,A6,i8,A12,F18.10,A10,1pe18.10)" + character(len=38), parameter :: Cformat = "(1X,A10,i8,A12,F18.10,A10,1pe18.10)" character(len=25), parameter :: Iformat = "(1X,A13,i6,A15,i6,A15,i6)" character(len=18), parameter :: Jformat = "(1X,A20,i6,A21,i6)" character(len=30), parameter :: Tformat = "(1X,A7,1X,F18.10,A9,i7,A12,i5)" @@ -151,11 +151,15 @@ subroutine publishEdits(dtrad) BIND(C,NAME="teton_publishedits") if ( Options%isRankVerbose() > 0 ) then print *, "" - print *, ">>>>>>>>>>>>>>> End Radiation Step <<<<<<<<<<<<<<<" - print Cformat, "CYCLE ", ncycle," timerad = ", timerad," dtrad = ", dtused + print *, ">>>>>>>>>>>>>>> End of Radiation Step Report <<<<<<<<<<<<<<<" + print Cformat, "TIME STEP ", ncycle," timerad = ", timerad," dtrad = ", dtused print *, "" +#if !defined(TETON_ENABLE_MINIAPP_BUILD) print Iformat,"TempIters = ", noutrt, " FluxIters = ", ninrt, " GTASweeps = ",ngdart print Jformat,"AveNonLinearIters = ", nNLIters, " MaxNonLinearIters = ",maxNLIters +#else + print *, "FluxIters = ", ninrt +#endif if( Options%isRankVerbose() > 1 ) then print *, " *** max outer iterations = ", outerMaxIts print *, " *** max outer temperature rel tol = ", outerTempRelTol diff --git a/src/teton/aux/setTetonZone.F90 b/src/teton/aux/setTetonZone.F90 index fb78174..64303a7 100644 --- a/src/teton/aux/setTetonZone.F90 +++ b/src/teton/aux/setTetonZone.F90 @@ -50,10 +50,6 @@ subroutine setTetonZone(zoneID, corner0, zoneFaces, cornerFaces, & integer :: c, c1, c2, cCWLast integer :: bcID, b0, bdyelem - integer, parameter :: fp=1 - integer, parameter :: ez=2 - integer, parameter :: cc=3 - integer :: cFaceID(Size%maxCorner) integer :: cCW(Size%maxcf,Size%maxCorner) integer :: cCCW(Size%maxcf,Size%maxCorner) diff --git a/src/teton/control/constructDynMemory.F90 b/src/teton/control/constructDynMemory.F90 index 8bccbc8..49de74b 100644 --- a/src/teton/control/constructDynMemory.F90 +++ b/src/teton/control/constructDynMemory.F90 @@ -1,7 +1,7 @@ !======================================================================= ! construct dynamic memory !======================================================================= - subroutine constructDynMemory(setID, maxZonesPerPlane) + subroutine constructDynMemory(setID, maxPerPlane) use kind_mod use Size_mod @@ -9,6 +9,7 @@ subroutine constructDynMemory(setID, maxZonesPerPlane) use QuadratureList_mod use SetData_mod use AngleSet_mod + use Options_mod use MemoryAllocator_mod implicit none @@ -16,14 +17,18 @@ subroutine constructDynMemory(setID, maxZonesPerPlane) ! Arguments integer, intent(in) :: setID - integer, intent(in) :: maxZonesPerPlane + integer, intent(in) :: maxPerPlane ! Local type(SetData), pointer :: Set type(AngleSet), pointer :: ASet + type(SweepSet), pointer :: Swp integer :: totalCycles + integer :: dom + integer :: nHyperDomains + integer :: sweepVersion ! Allocate Memory @@ -34,17 +39,34 @@ subroutine constructDynMemory(setID, maxZonesPerPlane) Set => getSetData(Quad, setID) ASet => getAngleSetFromSetID(Quad, setID) - totalCycles = max(ASet% totalCycles, 1) + totalCycles = max(ASet% totalCycles, 1) + nHyperDomains = getNumberOfHyperDomains(Quad,1) + sweepVersion = Options% getSweepVersion() call Allocator%allocate(Size%usePinnedMemory, Set%label, "cyclePsi", Set% cyclePsi, Set% Groups, totalCycles) ! These are onlu used in the GPU sweep if (Size% useGPU) then - call Allocator%allocate(Size%usePinnedMemory, Set%label, "Q", Set% Q,Set% Groups, Size% maxCorner, maxZonesPerPlane) - call Allocator%allocate(Size%usePinnedMemory, Set%label, "S", Set% S,Set% Groups, Size% maxCorner, maxZonesPerPlane) - Set% Q(:,:,:) = zero - Set% S(:,:,:) = zero + if ( sweepVersion == 0 ) then + + do dom=1,nHyperDomains + Swp => Set% SweepPtr(dom) + call Allocator%allocate(Size%usePinnedMemory, Set%label, "Q", Swp% Q, Set% Groups, Size% maxCorner, maxPerPlane) + call Allocator%allocate(Size%usePinnedMemory, Set%label, "S", Swp% S, Set% Groups, Size% maxCorner, maxPerPlane) + enddo + + do dom=1,nHyperDomains + Swp => Set% SweepPtr(dom) + Swp% Q(:,:,:) = zero + Swp% S(:,:,:) = zero + enddo + + endif + + call Allocator%allocate(Size%usePinnedMemory, Set%label, "PsiInt", Set% PsiInt, Set% Groups, ASet% maxInterface, Set% numAngles) + + Set% PsiInt(:,:,:) = zero endif diff --git a/src/teton/control/finalizeSets.F90 b/src/teton/control/finalizeSets.F90 index 5b1bfad..7aada8d 100644 --- a/src/teton/control/finalizeSets.F90 +++ b/src/teton/control/finalizeSets.F90 @@ -50,23 +50,27 @@ subroutine finalizeSets integer :: aSetID integer :: gSetID integer :: cSetID - integer :: commID integer :: nSets integer :: nCommSets integer :: nAngleSets integer :: nGroupSets integer :: nGTASets + integer :: nHyperDomains integer :: angle + integer :: sweepVersion logical(kind=1) :: useBoltzmannCompton logical(kind=1) :: startCycle ! Constants - nSets = getNumberOfSets(Quad) - nCommSets = getNumberOfCommSets(Quad) - nAngleSets = getNumberOfAngleSets(Quad) - nGroupSets = getNumberOfGroupSets(Quad) - nGTASets = getNumberOfGTASets(Quad) + nSets = getNumberOfSets(Quad) + nCommSets = getNumberOfCommSets(Quad) + nAngleSets = getNumberOfAngleSets(Quad) + nGroupSets = getNumberOfGroupSets(Quad) + nGTASets = getNumberOfGTASets(Quad) + nHyperDomains = getNumberOfHyperDomains(Quad, 1) + sweepVersion = Options% getSweepVersion() + #if !defined(TETON_ENABLE_MINIAPP_BUILD) useBoltzmannCompton = getUseBoltzmann(Compton) #else @@ -78,182 +82,193 @@ subroutine finalizeSets ! Update PhiTotal and edits on the CPU - TOMP(target update from(Rad% PhiTotal)) + TOMP_UPDATE(target update from(Rad% PhiTotal)) if ( useBoltzmannCompton .and. Size%useCUDASolver .and. Size%ngr >= 16) then ! In this case these edits are already on the CPU else - TOMP(target update from(Mat% denec)) - TOMP(target update from(Mat% nonLinearIterations)) - TOMP(target update from(Mat% PowerEmitted)) - TOMP(target update from(Mat% PowerCompton)) + TOMP_UPDATE(target update from(Mat% denec)) + TOMP_UPDATE(target update from(Mat% nonLinearIterations)) + TOMP_UPDATE(target update from(Mat% PowerEmitted)) + TOMP_UPDATE(target update from(Mat% PowerCompton)) endif ! Unmap zone sets UMPIRE_DEVICE_POOL_FREE(ZSet% nCornerSet) - TOMP(target exit data map(always,release:ZSet% nCornerSet)) + TOMP_MAP(target exit data map(always,release:ZSet% nCornerSet)) UMPIRE_DEVICE_POOL_FREE(ZSet% nCornerBatch) - TOMP(target exit data map(always,release:ZSet% nCornerBatch)) + TOMP_MAP(target exit data map(always,release:ZSet% nCornerBatch)) UMPIRE_DEVICE_POOL_FREE(ZSet% offset) - TOMP(target exit data map(always,release:ZSet% offset)) + TOMP_MAP(target exit data map(always,release:ZSet% offset)) UMPIRE_DEVICE_POOL_FREE(ZSet% cornerList) - TOMP(target exit data map(always,release:ZSet% cornerList)) + TOMP_MAP(target exit data map(always,release:ZSet% cornerList)) UMPIRE_DEVICE_POOL_FREE(ZSet% cornerMap) - TOMP(target exit data map(always,release:ZSet% cornerMap)) + TOMP_MAP(target exit data map(always,release:ZSet% cornerMap)) UMPIRE_DEVICE_POOL_FREE(ZSet% zoneList) - TOMP(target exit data map(always,release:ZSet% zoneList)) + TOMP_MAP(target exit data map(always,release:ZSet% zoneList)) UMPIRE_DEVICE_POOL_FREE(ZSet% cornerConverged) - TOMP(target exit data map(always,release:ZSet% cornerConverged)) + TOMP_MAP(target exit data map(always,release:ZSet% cornerConverged)) UMPIRE_DEVICE_POOL_FREE(ZSet% Te) - TOMP(target exit data map(always,release:ZSet% Te)) + TOMP_MAP(target exit data map(always,release:ZSet% Te)) UMPIRE_DEVICE_POOL_FREE(ZSet% TeOld) - TOMP(target exit data map(always,release:ZSet% TeOld)) + TOMP_MAP(target exit data map(always,release:ZSet% TeOld)) UMPIRE_DEVICE_POOL_FREE(ZSet% delta) - TOMP(target exit data map(always,release:ZSet% delta)) + TOMP_MAP(target exit data map(always,release:ZSet% delta)) UMPIRE_DEVICE_POOL_FREE(ZSet% sumT) - TOMP(target exit data map(always,release:ZSet% sumT)) + TOMP_MAP(target exit data map(always,release:ZSet% sumT)) UMPIRE_DEVICE_POOL_FREE(ZSet% netRate) - TOMP(target exit data map(always,release:ZSet% netRate)) + TOMP_MAP(target exit data map(always,release:ZSet% netRate)) UMPIRE_DEVICE_POOL_FREE(ZSet% dTCompton) - TOMP(target exit data map(always,release:ZSet% dTCompton)) + TOMP_MAP(target exit data map(always,release:ZSet% dTCompton)) UMPIRE_DEVICE_POOL_FREE(ZSet% B) - TOMP(target exit data map(always,release:ZSet% B)) + TOMP_MAP(target exit data map(always,release:ZSet% B)) UMPIRE_DEVICE_POOL_FREE(ZSet% dBdT) - TOMP(target exit data map(always,release:ZSet% dBdT)) + TOMP_MAP(target exit data map(always,release:ZSet% dBdT)) UMPIRE_DEVICE_POOL_FREE(ZSet% Snu0) - TOMP(target exit data map(always,release:ZSet% Snu0)) + TOMP_MAP(target exit data map(always,release:ZSet% Snu0)) UMPIRE_DEVICE_POOL_FREE(ZSet% dSnu0dT) - TOMP(target exit data map(always,release:ZSet% dSnu0dT)) + TOMP_MAP(target exit data map(always,release:ZSet% dSnu0dT)) UMPIRE_DEVICE_POOL_FREE(ZSet% AD) - TOMP(target exit data map(always,release:ZSet% AD)) + TOMP_MAP(target exit data map(always,release:ZSet% AD)) UMPIRE_DEVICE_POOL_FREE(ZSet% z) - TOMP(target exit data map(always,release:ZSet% z)) + TOMP_MAP(target exit data map(always,release:ZSet% z)) UMPIRE_DEVICE_POOL_FREE(ZSet% fk2) - TOMP(target exit data map(always,release:ZSet% fk2)) + TOMP_MAP(target exit data map(always,release:ZSet% fk2)) UMPIRE_DEVICE_POOL_FREE(ZSet% nI) - TOMP(target exit data map(always,release:ZSet% nI)) + TOMP_MAP(target exit data map(always,release:ZSet% nI)) UMPIRE_DEVICE_POOL_FREE(ZSet% nS) - TOMP(target exit data map(always,release:ZSet% nS)) + TOMP_MAP(target exit data map(always,release:ZSet% nS)) UMPIRE_DEVICE_POOL_FREE(ZSet% ex) - TOMP(target exit data map(always,release:ZSet% ex)) + TOMP_MAP(target exit data map(always,release:ZSet% ex)) UMPIRE_DEVICE_POOL_FREE(ZSet% expPH) - TOMP(target exit data map(always,release:ZSet% expPH)) + TOMP_MAP(target exit data map(always,release:ZSet% expPH)) UMPIRE_DEVICE_POOL_FREE(ZSet% comptonDeltaEr) - TOMP(target exit data map(always,release:ZSet% comptonDeltaEr)) + TOMP_MAP(target exit data map(always,release:ZSet% comptonDeltaEr)) UMPIRE_DEVICE_POOL_FREE(ZSet% dComptonDT) - TOMP(target exit data map(always,release:ZSet% dComptonDT)) + TOMP_MAP(target exit data map(always,release:ZSet% dComptonDT)) UMPIRE_DEVICE_POOL_FREE(ZSet% comptonSe) - TOMP(target exit data map(always,release:ZSet% comptonSe)) + TOMP_MAP(target exit data map(always,release:ZSet% comptonSe)) UMPIRE_DEVICE_POOL_FREE(ZSet% AU) - TOMP(target exit data map(always,release:ZSet% AU)) + TOMP_MAP(target exit data map(always,release:ZSet% AU)) UMPIRE_DEVICE_POOL_FREE(ZSet% AL) - TOMP(target exit data map(always,release:ZSet% AL)) + TOMP_MAP(target exit data map(always,release:ZSet% AL)) - TOMP(target exit data map(release: ZSet)) + TOMP_MAP(target exit data map(release: ZSet)) ! Unmap group sets do gSetID=1,nGroupSets UMPIRE_DEVICE_POOL_FREE(Quad% GrpSetPtr(gSetID)% STotal) - TOMP(target exit data map(always,release:Quad% GrpSetPtr(gSetID)% STotal)) + TOMP_MAP(target exit data map(always,release:Quad% GrpSetPtr(gSetID)% STotal)) UMPIRE_DEVICE_POOL_FREE(Quad% GrpSetPtr(gSetID)% Sigt) - TOMP(target exit data map(always,release:Quad% GrpSetPtr(gSetID)% Sigt)) + TOMP_MAP(target exit data map(always,release:Quad% GrpSetPtr(gSetID)% Sigt)) enddo do aSetID=1,nAngleSets+nGTASets UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% nextZ) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% nextZ)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% nextZ)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% nextC) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% nextC)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% nextC)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% StartingDirection) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% StartingDirection)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% StartingDirection)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% FinishingDirection) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% FinishingDirection)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% FinishingDirection)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% Omega) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% Omega)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% Omega)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% Weight) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% Weight)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% Weight)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% numCycles) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% numCycles)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% numCycles)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% cycleOffSet) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% cycleOffSet)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% cycleOffSet)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% cycleList) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% cycleList)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% cycleList)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% nHyperPlanes) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% nHyperPlanes)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% nHyperPlanes)) ! This loop unmaps internal components of HypPlanePtr and BdyExitPtr. ! Delay unmapping these until this loop is done. do angle=1,Quad%AngSetPtr(aSetID)% numAngles ! Unable to map this to UMPIRE device pool, causes segfault. - TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% BdyExitPtr(angle)%bdyList)) + TOMP_MAP(target exit data map(release:Quad% AngSetPtr(aSetID)% BdyExitPtr(angle)%bdyList)) if ( .not. Quad%AngSetPtr(aSetID)% FinishingDirection(angle) ) then ! Unable to map these to UMPIRE device pool, causes segfault or wrong answers. - TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% zonesInPlane)) + + if (aSetID > nAngleSets) then + TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% zonesInPlane)) + else + if ( sweepVersion == 0 ) then + TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% zonesInPlane)) + else + TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% cornersInPlane)) + endif + endif + TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% hplane1)) TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% hplane2)) TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% ndone)) + TOMP(target exit data map(release:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% interfaceList)) endif enddo UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% HypPlanePtr) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% HypPlanePtr)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% HypPlanePtr)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% BdyExitPtr) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% BdyExitPtr)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% BdyExitPtr)) if ( aSetID <= nAngleSets ) then UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% AfpNorm) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% AfpNorm)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% AfpNorm)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% AezNorm) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% AezNorm)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% AezNorm)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% ANormSum) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% ANormSum)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% ANormSum)) endif @@ -261,13 +276,13 @@ subroutine finalizeSets if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% angDerivFac) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% angDerivFac)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% angDerivFac)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% quadTauW1) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% quadTauW1)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% quadTauW1)) UMPIRE_DEVICE_POOL_FREE(Quad% AngSetPtr(aSetID)% quadTauW2) - TOMP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% quadTauW2)) + TOMP_MAP(target exit data map(always,release:Quad% AngSetPtr(aSetID)% quadTauW2)) endif @@ -276,154 +291,157 @@ subroutine finalizeSets ! Geometry UMPIRE_DEVICE_POOL_FREE(Geom% Volume) - TOMP(target exit data map(always,release:Geom% Volume)) + TOMP_MAP(target exit data map(always,release:Geom% Volume)) UMPIRE_DEVICE_POOL_FREE(Geom% VolumeOld) - TOMP(target exit data map(always,release:Geom% VolumeOld)) + TOMP_MAP(target exit data map(always,release:Geom% VolumeOld)) UMPIRE_DEVICE_POOL_FREE(Geom% VolumeZone) - TOMP(target exit data map(always,release:Geom% VolumeZone)) + TOMP_MAP(target exit data map(always,release:Geom% VolumeZone)) UMPIRE_DEVICE_POOL_FREE(Geom% cOffSet) - TOMP(target exit data map(always,release:Geom% cOffSet)) + TOMP_MAP(target exit data map(always,release:Geom% cOffSet)) UMPIRE_DEVICE_POOL_FREE(Geom% numCorner) - TOMP(target exit data map(always,release:Geom% numCorner)) + TOMP_MAP(target exit data map(always,release:Geom% numCorner)) UMPIRE_DEVICE_POOL_FREE(Geom% CToZone) - TOMP(target exit data map(always,release:Geom% CToZone)) + TOMP_MAP(target exit data map(always,release:Geom% CToZone)) UMPIRE_DEVICE_POOL_FREE(Geom% corner1) - TOMP(target exit data map(always,release:Geom% corner1)) + TOMP_MAP(target exit data map(always,release:Geom% corner1)) UMPIRE_DEVICE_POOL_FREE(Geom% corner2) - TOMP(target exit data map(always,release:Geom% corner2)) + TOMP_MAP(target exit data map(always,release:Geom% corner2)) UMPIRE_DEVICE_POOL_FREE(Geom% zone1) - TOMP(target exit data map(always,release:Geom% zone1)) + TOMP_MAP(target exit data map(always,release:Geom% zone1)) UMPIRE_DEVICE_POOL_FREE(Geom% zone2) - TOMP(target exit data map(always,release:Geom% zone2)) + TOMP_MAP(target exit data map(always,release:Geom% zone2)) UMPIRE_DEVICE_POOL_FREE(Geom% cEZ) - TOMP(target exit data map(always,release:Geom% cEZ)) + TOMP_MAP(target exit data map(always,release:Geom% cEZ)) UMPIRE_DEVICE_POOL_FREE(Geom% cFP) - TOMP(target exit data map(always,release:Geom% cFP)) + TOMP_MAP(target exit data map(always,release:Geom% cFP)) UMPIRE_DEVICE_POOL_FREE(Geom% A_ez) - TOMP(target exit data map(always,release:Geom% A_ez)) + TOMP_MAP(target exit data map(always,release:Geom% A_ez)) UMPIRE_DEVICE_POOL_FREE(Geom% A_fp) - TOMP(target exit data map(always,release:Geom% A_fp)) + TOMP_MAP(target exit data map(always,release:Geom% A_fp)) if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_FREE(Geom% Area) - TOMP(target exit data map(always,release:Geom% Area)) + TOMP_MAP(target exit data map(always,release:Geom% Area)) UMPIRE_DEVICE_POOL_FREE(Geom% RadiusEZ) - TOMP(target exit data map(always,release:Geom% RadiusEZ)) + TOMP_MAP(target exit data map(always,release:Geom% RadiusEZ)) UMPIRE_DEVICE_POOL_FREE(Geom% RadiusFP) - TOMP(target exit data map(always,release:Geom% RadiusFP)) + TOMP_MAP(target exit data map(always,release:Geom% RadiusFP)) elseif (Size% ndim == 3) then UMPIRE_DEVICE_POOL_FREE(Geom% nCFacesArray) - TOMP(target exit data map(always,release:Geom% nCFacesArray)) + TOMP_MAP(target exit data map(always,release:Geom% nCFacesArray)) endif - TOMP(target exit data map(release:Geom)) + TOMP_MAP(target exit data map(release:Geom)) ! Radiation Intensity UMPIRE_DEVICE_POOL_FREE(Rad% PhiTotal) - TOMP(target exit data map(always,release:Rad% PhiTotal)) + TOMP_MAP(target exit data map(always,release:Rad% PhiTotal)) UMPIRE_DEVICE_POOL_FREE(Rad% radEnergy) - TOMP(target exit data map(always,release:Rad% radEnergy)) + TOMP_MAP(target exit data map(always,release:Rad% radEnergy)) - TOMP(target exit data map(release:Rad)) + TOMP_MAP(target exit data map(release:Rad)) ! GTA if (Size%useNewGTASolver) then UMPIRE_DEVICE_POOL_FREE(GTA% TT) - TOMP(target exit data map(always,release:GTA% TT)) + TOMP_MAP(target exit data map(always,release:GTA% TT)) UMPIRE_DEVICE_POOL_FREE(GTA% Pvv) - TOMP(target exit data map(always,release:GTA% Pvv)) + TOMP_MAP(target exit data map(always,release:GTA% Pvv)) UMPIRE_DEVICE_POOL_FREE(GTA% GreySigTotal) - TOMP(target exit data map(always,release:GTA% GreySigTotal)) + TOMP_MAP(target exit data map(always,release:GTA% GreySigTotal)) UMPIRE_DEVICE_POOL_FREE(GTA% GreySigScat) - TOMP(target exit data map(always,release:GTA% GreySigScat)) + TOMP_MAP(target exit data map(always,release:GTA% GreySigScat)) UMPIRE_DEVICE_POOL_FREE(GTA% GreySigScatVol) - TOMP(target exit data map(always,release:GTA% GreySigScatVol)) + TOMP_MAP(target exit data map(always,release:GTA% GreySigScatVol)) UMPIRE_DEVICE_POOL_FREE(GTA% GreySigtInv) - TOMP(target exit data map(always,release:GTA% GreySigtInv)) + TOMP_MAP(target exit data map(always,release:GTA% GreySigtInv)) UMPIRE_DEVICE_POOL_FREE(GTA% PhiInc) - TOMP(target exit data map(always,release:GTA% PhiInc)) + TOMP_MAP(target exit data map(always,release:GTA% PhiInc)) + + UMPIRE_DEVICE_POOL_FREE(GTA% Sscat) + TOMP_MAP(target exit data map(always,release:GTA% Sscat)) UMPIRE_DEVICE_POOL_FREE(GTA% Q) - TOMP(target exit data map(always,release:GTA% Q)) + TOMP_MAP(target exit data map(always,release:GTA% Q)) UMPIRE_DEVICE_POOL_FREE(GTA% TsaSource) - TOMP(target exit data map(always,release:GTA% TsaSource)) + TOMP_MAP(target exit data map(always,release:GTA% TsaSource)) UMPIRE_DEVICE_POOL_FREE(GTA% AfpNorm) - TOMP(target exit data map(always,release:GTA% AfpNorm)) + TOMP_MAP(target exit data map(always,release:GTA% AfpNorm)) UMPIRE_DEVICE_POOL_FREE(GTA% AezNorm) - TOMP(target exit data map(always,release:GTA% AezNorm)) + TOMP_MAP(target exit data map(always,release:GTA% AezNorm)) UMPIRE_DEVICE_POOL_FREE(GTA% ANormSum) - TOMP(target exit data map(always,release:GTA% ANormSum)) + TOMP_MAP(target exit data map(always,release:GTA% ANormSum)) if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_FREE(GTA% Tvv) - TOMP(target exit data map(always,release:GTA% Tvv)) + TOMP_MAP(target exit data map(always,release:GTA% Tvv)) endif endif UMPIRE_DEVICE_POOL_FREE(GTA% GreySource) - TOMP(target exit data map(always,release:GTA% GreySource)) + TOMP_MAP(target exit data map(always,release:GTA% GreySource)) UMPIRE_DEVICE_POOL_FREE(GTA% GreyCorrection) - TOMP(target exit data map(always,release:GTA% GreyCorrection)) + TOMP_MAP(target exit data map(always,release:GTA% GreyCorrection)) UMPIRE_DEVICE_POOL_FREE(GTA% Chi) - TOMP(target exit data map(always,release:GTA% Chi)) + TOMP_MAP(target exit data map(always,release:GTA% Chi)) - TOMP(target exit data map(release:GTA)) + TOMP_MAP(target exit data map(release:GTA)) do setID=nSets+1,nSets+nGTASets UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% AngleOrder) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% AngleOrder)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% AngleOrder)) UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% tPsi) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% tPsi)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% tPsi)) UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% pInc) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% pInc)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% pInc)) UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% src) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% src)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% src)) if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% tPsiM) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% tPsiM)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% tPsiM)) UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% tInc) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% tInc)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% tInc)) endif enddo @@ -431,70 +449,70 @@ subroutine finalizeSets ! Material UMPIRE_DEVICE_POOL_FREE(Mat% Tec) - TOMP(target exit data map(always,release:Mat% Tec)) + TOMP_MAP(target exit data map(always,release:Mat% Tec)) UMPIRE_DEVICE_POOL_FREE(Mat% Tecn) - TOMP(target exit data map(always,release:Mat% Tecn)) + TOMP_MAP(target exit data map(always,release:Mat% Tecn)) UMPIRE_DEVICE_POOL_FREE(Mat% denec) - TOMP(target exit data map(always,release:Mat% denec)) + TOMP_MAP(target exit data map(always,release:Mat% denec)) UMPIRE_DEVICE_POOL_FREE(Mat% cve) - TOMP(target exit data map(always,release:Mat% cve)) + TOMP_MAP(target exit data map(always,release:Mat% cve)) UMPIRE_DEVICE_POOL_FREE(Mat% rho) - TOMP(target exit data map(always,release:Mat% rho)) + TOMP_MAP(target exit data map(always,release:Mat% rho)) UMPIRE_DEVICE_POOL_FREE(Mat% nez) - TOMP(target exit data map(always,release:Mat% nez)) + TOMP_MAP(target exit data map(always,release:Mat% nez)) UMPIRE_DEVICE_POOL_FREE(Mat% stimComptonMult) - TOMP(target exit data map(always,release:Mat% stimComptonMult)) + TOMP_MAP(target exit data map(always,release:Mat% stimComptonMult)) UMPIRE_DEVICE_POOL_FREE(Mat% Siga) - TOMP(target exit data map(always,release:Mat% Siga)) + TOMP_MAP(target exit data map(always,release:Mat% Siga)) UMPIRE_DEVICE_POOL_FREE(Mat% Sigs) - TOMP(target exit data map(always,release:Mat% Sigs)) + TOMP_MAP(target exit data map(always,release:Mat% Sigs)) UMPIRE_DEVICE_POOL_FREE(Mat% Eta) - TOMP(target exit data map(always,release:Mat% Eta)) + TOMP_MAP(target exit data map(always,release:Mat% Eta)) UMPIRE_DEVICE_POOL_FREE(Mat% EmissionRate) - TOMP(target exit data map(always,release:Mat% EmissionRate)) + TOMP_MAP(target exit data map(always,release:Mat% EmissionRate)) UMPIRE_DEVICE_POOL_FREE(Mat% SMatEff) - TOMP(target exit data map(always,release:Mat% SMatEff)) + TOMP_MAP(target exit data map(always,release:Mat% SMatEff)) UMPIRE_DEVICE_POOL_FREE(Mat% PowerEmitted) - TOMP(target exit data map(always,release:Mat% PowerEmitted)) + TOMP_MAP(target exit data map(always,release:Mat% PowerEmitted)) UMPIRE_DEVICE_POOL_FREE(Mat% PowerCompton) - TOMP(target exit data map(always,release:Mat% PowerCompton)) + TOMP_MAP(target exit data map(always,release:Mat% PowerCompton)) UMPIRE_DEVICE_POOL_FREE(Mat% nonLinearIterations) - TOMP(target exit data map(always,release:Mat% nonLinearIterations)) + TOMP_MAP(target exit data map(always,release:Mat% nonLinearIterations)) - TOMP(target exit data map(release:Mat)) + TOMP_MAP(target exit data map(release:Mat)) ! IF THESE ARE UNALLOCATED WILL CRASH? #if !defined(TETON_ENABLE_MINIAPP_BUILD) if (getComptonFlag(Compton) /= comptonType_None) then UMPIRE_DEVICE_POOL_FREE(Compton% gamMean) - TOMP(target exit data map(always,release:Compton% gamMean)) + TOMP_MAP(target exit data map(always,release:Compton% gamMean)) UMPIRE_DEVICE_POOL_FREE(Compton% gamSqdDGam) - TOMP(target exit data map(always,release:Compton% gamSqdDGam)) + TOMP_MAP(target exit data map(always,release:Compton% gamSqdDGam)) UMPIRE_DEVICE_POOL_FREE(Compton% gamCubedDGam) - TOMP(target exit data map(always,release:Compton% gamCubedDGam)) + TOMP_MAP(target exit data map(always,release:Compton% gamCubedDGam)) UMPIRE_DEVICE_POOL_FREE(Compton% gamD) - TOMP(target exit data map(always,release:Compton% gamD)) + TOMP_MAP(target exit data map(always,release:Compton% gamD)) endif - TOMP(target exit data map(release:Compton)) + TOMP_MAP(target exit data map(release:Compton)) #endif endif !endif useGPU @@ -512,18 +530,15 @@ subroutine finalizeSets ! Update Psi on the host and Release GPU Memory if ( Size% useGPU ) then -#if defined(TETON_ENABLE_OPENMP_OFFLOAD) call finalizeGPUMemory(setID) UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% AngleOrder) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% AngleOrder)) - -#endif + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% AngleOrder)) endif ! Release Dynamic Memory allocated at the beginning of the time step if (Size% ndim > 1) then - call Set%destructDynMemory() + call Set%destructDynMemory(nHyperDomains) endif enddo SetLoop @@ -548,15 +563,15 @@ subroutine finalizeSets UMPIRE_DEVICE_POOL_FREE(Quad%AngSetPtr) - TOMP(target exit data map(always,release:Quad%AngSetPtr)) + TOMP_MAP(target exit data map(always,release:Quad%AngSetPtr)) UMPIRE_DEVICE_POOL_FREE(Quad%GrpSetPtr) - TOMP(target exit data map(always,release:Quad%GrpSetPtr)) + TOMP_MAP(target exit data map(always,release:Quad%GrpSetPtr)) UMPIRE_DEVICE_POOL_FREE(Quad%SetDataPtr) - TOMP(target exit data map(always,release:Quad%SetDataPtr)) + TOMP_MAP(target exit data map(always,release:Quad%SetDataPtr)) - TOMP(target exit data map(release:Quad)) + TOMP_MAP(target exit data map(release:Quad)) endif @@ -571,7 +586,11 @@ subroutine finalizeSets AngleSetLoop: do setID=1,nAngleSets+nGTASets ASet => getAngleSetData(Quad, setID) - call destructHyperPlane(ASet) + if (setID > nAngleSets) then + sweepVersion = 0 + endif + + call destructHyperPlane(ASet, sweepVersion) call destructBdyExitList(ASet) call destructCycleList(ASet) enddo AngleSetLoop diff --git a/src/teton/control/getPhiTotal_OMPOL.F90 b/src/teton/control/getPhiTotal_OMPOL.F90 index 97d23f6..892e21d 100644 --- a/src/teton/control/getPhiTotal_OMPOL.F90 +++ b/src/teton/control/getPhiTotal_OMPOL.F90 @@ -51,18 +51,16 @@ subroutine getPhiTotal(sendIndex) if (Size%useGPU) then -#ifdef TETON_ENABLE_OPENACC - !$acc data copyin(nSets, ngr, sendIndex) + TOMP_MAP(target enter data map(to: nSets, ngr, sendIndex)) - !$acc parallel loop gang num_gangs(nZoneSets) & - !$acc& vector_length(omp_device_team_thread_limit) & - !$acc& private(Set, ASet, g0, Groups, Angle, quadwt) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, g0, Groups, Angle, quadwt) #else - TOMP(target enter data map(to: nSets, ngr, sendIndex)) - TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(Geom, Rad, ngr, nZoneSets, sendIndex, nSets, Quad)&) - TOMPC(private(Set, ASet, g0, Groups, Angle, quadwt)) + TOMPC(private(setID, Set, ASet, g0, Groups, Angle, quadwt)) #endif ZoneSetLoop1: do zSetID=1,nZoneSets @@ -124,13 +122,13 @@ subroutine getPhiTotal(sendIndex) enddo ZoneSetLoop1 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop -!$acc end data + !$acc end parallel loop #else -TOMP(end target teams distribute) -TOMP(target exit data map(release: nSets, ngr, sendIndex)) + TOMP(end target teams distribute) #endif + TOMP_MAP(target exit data map(release: nSets, ngr, sendIndex)) + else Rad% PhiTotal(:,:) = zero diff --git a/src/teton/control/initPhiTotal_OMPOL.F90 b/src/teton/control/initPhiTotal_OMPOL.F90 index eed049c..bed5978 100644 --- a/src/teton/control/initPhiTotal_OMPOL.F90 +++ b/src/teton/control/initPhiTotal_OMPOL.F90 @@ -52,24 +52,22 @@ subroutine initPhiTotal if ( Size% useGPU ) then -#ifdef TETON_ENABLE_OPENACC - !$acc data copyin(nSets, ngr) + TOMP_MAP(target enter data map(to: nSets, ngr)) - !$acc parallel loop gang num_gangs(nZoneSets) & - !$acc& vector_length(omp_device_team_thread_limit) & - !$acc& private(Set, ASet, g0, Groups, NumAngles, quadwt, volRatio) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, g0, Groups, NumAngles, quadwt, volRatio, Angle, setID) #else - TOMP(target enter data map(to: nSets, ngr)) - TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, Geom, Rad, ngr, Quad, nSets )&) - TOMPC(private(Set, ASet, g0, Groups, NumAngles, quadwt, volRatio)) + TOMPC(private(Set, ASet, g0, Groups, NumAngles, quadwt, volRatio, Angle, setID)) #endif ZoneSetLoop1: do zSetID=1,nZoneSets #ifdef TETON_ENABLE_OPENACC - !$acc loop vector collapse(2) + !$acc loop vector collapse(2) #else !$omp parallel do collapse(2) default(none) & !$omp& shared(Geom, Rad, ngr, zSetID) @@ -116,7 +114,7 @@ subroutine initPhiTotal enddo #ifndef TETON_ENABLE_OPENACC - !$omp end parallel do + !$omp end parallel do #endif enddo AngleLoop @@ -126,13 +124,13 @@ subroutine initPhiTotal enddo ZoneSetLoop1 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop -!$acc end data + !$acc end parallel loop #else TOMP(end target teams distribute) - TOMP(target exit data map(release: nSets, ngr)) #endif + TOMP_MAP(target exit data map(release: nSets, ngr)) + else Rad% PhiTotal(:,:) = zero diff --git a/src/teton/control/initializeRadiationField_OMPOL.F90 b/src/teton/control/initializeRadiationField_OMPOL.F90 index 8a52a11..eb8e455 100644 --- a/src/teton/control/initializeRadiationField_OMPOL.F90 +++ b/src/teton/control/initializeRadiationField_OMPOL.F90 @@ -29,6 +29,7 @@ subroutine initializeRadiationField type(SetData), pointer :: Set type(AngleSet), pointer :: ASet type(BdyExit), pointer :: BdyExitPtr + type(HypPlane), pointer :: HypPlanePtr integer :: setID integer :: nSets @@ -62,11 +63,17 @@ subroutine initializeRadiationField ! Update Boundary data +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Groups, NumAngles, c, b, angle) +#else TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nSets, Quad)&) - TOMPC(private(Set, ASet, BdyExitPtr, offSet, Groups, NumAngles, c, b)) + TOMPC(private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Groups, NumAngles, c, b, angle)) +#endif - do setID=1,nSets + SetLoop: do setID=1,nSets Set => Quad% SetDataPtr(setID) ASet => Quad% AngSetPtr(Set% angleSetID) @@ -76,9 +83,14 @@ subroutine initializeRadiationField do angle=1,NumAngles BdyExitPtr => ASet% BdyExitPtr(angle) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(b,c) +#else !$omp parallel do collapse(2) default(none) & !$omp& shared(Set, BdyExitPtr, angle, Groups) & !$omp& private(b,c) +#endif do i=1,BdyExitPtr% nxBdy do g=1,Groups b = BdyExitPtr% bdyList(1,i) @@ -87,7 +99,9 @@ subroutine initializeRadiationField Set% PsiB(g,b,angle) = Set% Psi(g,c,angle) enddo enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo @@ -96,20 +110,56 @@ subroutine initializeRadiationField do angle=1,NumAngles offSet = ASet% cycleOffSet(angle) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else !$omp parallel do collapse(2) default(none) & !$omp& shared(Set, ASet, angle, offSet, Groups) & !$omp& private(c) +#endif do mCycle=1,ASet% numCycles(angle) do g=1,Groups c = ASet% cycleList(offSet+mCycle) Set% cyclePsi(g,offSet+mCycle) = Set% Psi(g,c,angle) enddo enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo - enddo +! Update the Psi saved at hyper-domain interfaces + + do angle=1,NumAngles + HypPlanePtr => ASet% HypPlanePtr(angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, angle, Groups) & + !$omp& private(c) +#endif + do i=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(i) + Set% PsiInt(g,i,angle) = Set% Psi(g,c,angle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + enddo + + enddo SetLoop + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif else diff --git a/src/teton/control/initializeSets.F90 b/src/teton/control/initializeSets.F90 index 71b29c7..1aa8cb2 100644 --- a/src/teton/control/initializeSets.F90 +++ b/src/teton/control/initializeSets.F90 @@ -11,7 +11,6 @@ subroutine initializeSets - use kind_mod use constant_mod use radconstant_mod @@ -55,25 +54,27 @@ subroutine initializeSets integer :: aSetID integer :: gSetID integer :: cSetID - integer :: commID integer :: nSets integer :: nAngleSets integer :: nGroupSets integer :: nGTASets integer :: nCommSets integer :: angle + integer :: sweepVersion logical(kind=1) :: useBoltzmannCompton real(adqt) :: dtrad ! Constants - dtrad = getRadTimeStep(DtControls) - nSets = getNumberOfSets(Quad) - nAngleSets = getNumberOfAngleSets(Quad) - nGroupSets = getNumberOfGroupSets(Quad) - nGTASets = getNumberOfGTASets(Quad) - nCommSets = getNumberOfCommSets(Quad) + dtrad = getRadTimeStep(DtControls) + nSets = getNumberOfSets(Quad) + nAngleSets = getNumberOfAngleSets(Quad) + nGroupSets = getNumberOfGroupSets(Quad) + nGTASets = getNumberOfGTASets(Quad) + nCommSets = getNumberOfCommSets(Quad) + sweepVersion = Options% getSweepVersion() + #if !defined(TETON_ENABLE_MINIAPP_BUILD) useBoltzmannCompton = getUseBoltzmann(Compton) #endif @@ -96,10 +97,10 @@ subroutine initializeSets ! Find reflected angles on all symmetry boundaries call findReflectedAngles(aSetID) - ! Establish angle order for transport sweeps (rtorder) and create - ! a list of exiting boundary elements by angle (findexit) + ! Obtain a directed graph of zones or corners depending on the transport sweep selected; + ! create a list of exiting boundary elements by angle (findexit) if (Size% ndim >= 2) then - call rtorder(aSetID) + call getDirectedGraph(aSetID) call findexit(aSetID) endif enddo AngleSetLoop @@ -113,6 +114,16 @@ subroutine initializeSets enddo endif +! Find the maximum number of hyper-elements (high-order and GTA) + + do aSetID=1,nAngleSets + Quad% nHyperElements(1) = max( Quad% nHyperElements(1), Quad% AngSetPtr(aSetID)% maxInterface ) + enddo + + do aSetID=nAngleSets+1,nAngleSets+nGTASets + Quad% nHyperElements(2) = max( Quad% nHyperElements(2), Quad% AngSetPtr(aSetID)% maxInterface ) + enddo + ! If we are using the GPU, we need to map some data before the set loop if ( Size% useGPU ) then @@ -126,118 +137,118 @@ subroutine initializeSets #endif ! Map Quadrature List - TOMP(target enter data map(to:Quad)) + TOMP_MAP(target enter data map(to:Quad)) UMPIRE_DEVICE_POOL_ALLOC(Quad%SetDataPtr) - TOMP(target enter data map(always,to:Quad%SetDataPtr)) + TOMP_MAP(target enter data map(always,to:Quad%SetDataPtr)) UMPIRE_DEVICE_POOL_ALLOC(Quad%GrpSetPtr) - TOMP(target enter data map(always,to:Quad%GrpSetPtr)) + TOMP_MAP(target enter data map(always,to:Quad%GrpSetPtr)) UMPIRE_DEVICE_POOL_ALLOC(Quad%AngSetPtr) - TOMP(target enter data map(always,to:Quad%AngSetPtr)) + TOMP_MAP(target enter data map(always,to:Quad%AngSetPtr)) ! Map Group Sets do gSetID=1,nGroupSets UMPIRE_DEVICE_POOL_ALLOC(Quad% GrpSetPtr(gSetID)% STotal) - TOMP(target enter data map(always,to:Quad% GrpSetPtr(gSetID)% STotal)) + TOMP_MAP(target enter data map(always,to:Quad% GrpSetPtr(gSetID)% STotal)) UMPIRE_DEVICE_POOL_ALLOC(Quad% GrpSetPtr(gSetID)% Sigt) - TOMP(target enter data map(always,to:Quad% GrpSetPtr(gSetID)% Sigt)) + TOMP_MAP(target enter data map(always,to:Quad% GrpSetPtr(gSetID)% Sigt)) enddo ! Map ZoneSets - TOMP(target enter data map(to:ZSet)) + TOMP_MAP(target enter data map(to:ZSet)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% AL) - TOMP(target enter data map(always,to:ZSet% AL)) + TOMP_MAP(target enter data map(always,to:ZSet% AL)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% AU) - TOMP(target enter data map(always,to:ZSet% AU)) + TOMP_MAP(target enter data map(always,to:ZSet% AU)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% nCornerSet) - TOMP(target enter data map(always,to:ZSet% nCornerSet)) + TOMP_MAP(target enter data map(always,to:ZSet% nCornerSet)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% nCornerBatch) - TOMP(target enter data map(always,to:ZSet% nCornerBatch)) + TOMP_MAP(target enter data map(always,to:ZSet% nCornerBatch)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% offset) - TOMP(target enter data map(always,to:ZSet% offset)) + TOMP_MAP(target enter data map(always,to:ZSet% offset)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% cornerList) - TOMP(target enter data map(always,to:ZSet% cornerList)) + TOMP_MAP(target enter data map(always,to:ZSet% cornerList)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% cornerMap) - TOMP(target enter data map(always,to:ZSet% cornerMap)) + TOMP_MAP(target enter data map(always,to:ZSet% cornerMap)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% zoneList) - TOMP(target enter data map(always,to:ZSet% zoneList)) + TOMP_MAP(target enter data map(always,to:ZSet% zoneList)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% cornerConverged) - TOMP(target enter data map(always,to:ZSet% cornerConverged)) + TOMP_MAP(target enter data map(always,to:ZSet% cornerConverged)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% Te) - TOMP(target enter data map(always,to:ZSet% Te)) + TOMP_MAP(target enter data map(always,to:ZSet% Te)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% TeOld) - TOMP(target enter data map(always,to:ZSet% TeOld)) + TOMP_MAP(target enter data map(always,to:ZSet% TeOld)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% delta) - TOMP(target enter data map(always,to:ZSet% delta)) + TOMP_MAP(target enter data map(always,to:ZSet% delta)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% sumT) - TOMP(target enter data map(always,to:ZSet% sumT)) + TOMP_MAP(target enter data map(always,to:ZSet% sumT)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% netRate) - TOMP(target enter data map(always,to:ZSet% netRate)) + TOMP_MAP(target enter data map(always,to:ZSet% netRate)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% dTCompton) - TOMP(target enter data map(always,to:ZSet% dTCompton)) + TOMP_MAP(target enter data map(always,to:ZSet% dTCompton)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% B) - TOMP(target enter data map(always,to:ZSet% B)) + TOMP_MAP(target enter data map(always,to:ZSet% B)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% dBdT) - TOMP(target enter data map(always,to:ZSet% dBdT)) + TOMP_MAP(target enter data map(always,to:ZSet% dBdT)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% Snu0) - TOMP(target enter data map(always,to:ZSet% Snu0)) + TOMP_MAP(target enter data map(always,to:ZSet% Snu0)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% dSnu0dT) - TOMP(target enter data map(always,to:ZSet% dSnu0dT)) + TOMP_MAP(target enter data map(always,to:ZSet% dSnu0dT)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% AD) - TOMP(target enter data map(always,to:ZSet% AD)) + TOMP_MAP(target enter data map(always,to:ZSet% AD)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% z) - TOMP(target enter data map(always,to:ZSet% z)) + TOMP_MAP(target enter data map(always,to:ZSet% z)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% fk2) - TOMP(target enter data map(always,to:ZSet% fk2)) + TOMP_MAP(target enter data map(always,to:ZSet% fk2)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% nI) - TOMP(target enter data map(always,to:ZSet% nI)) + TOMP_MAP(target enter data map(always,to:ZSet% nI)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% nS) - TOMP(target enter data map(always,to:ZSet% nS)) + TOMP_MAP(target enter data map(always,to:ZSet% nS)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% ex) - TOMP(target enter data map(always,to:ZSet% ex)) + TOMP_MAP(target enter data map(always,to:ZSet% ex)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% expPH) - TOMP(target enter data map(always,to:ZSet% expPH)) + TOMP_MAP(target enter data map(always,to:ZSet% expPH)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% comptonDeltaEr) - TOMP(target enter data map(always,to:ZSet% comptonDeltaEr)) + TOMP_MAP(target enter data map(always,to:ZSet% comptonDeltaEr)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% dComptonDT) - TOMP(target enter data map(always,to:ZSet% dComptonDT)) + TOMP_MAP(target enter data map(always,to:ZSet% dComptonDT)) UMPIRE_DEVICE_POOL_ALLOC(ZSet% comptonSe) - TOMP(target enter data map(always,to:ZSet% comptonSe)) + TOMP_MAP(target enter data map(always,to:ZSet% comptonSe)) ! Map Angle Sets @@ -245,80 +256,90 @@ subroutine initializeSets UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% nextZ) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% nextZ)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% nextZ)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% nextC) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% nextC)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% nextC)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% StartingDirection) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% StartingDirection)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% StartingDirection)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% FinishingDirection) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% FinishingDirection)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% FinishingDirection)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% Omega) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% Omega)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% Omega)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% Weight) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% Weight)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% Weight)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% numCycles) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% numCycles)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% numCycles)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% cycleOffSet) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% cycleOffSet)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% cycleOffSet)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% cycleList) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% cycleList)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% cycleList)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% nHyperPlanes) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% nHyperPlanes)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% nHyperPlanes)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% HypPlanePtr) - - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% HypPlanePtr)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% HypPlanePtr)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% BdyExitPtr) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% BdyExitPtr)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% BdyExitPtr)) if ( aSetID <= nAngleSets ) then UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% AfpNorm) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% AfpNorm)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% AfpNorm)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% AezNorm) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% AezNorm)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% AezNorm)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% ANormSum) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% ANormSum)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% ANormSum)) endif do angle=1,Quad% AngSetPtr(aSetID)% numAngles ! Unable to map this to UMPIRE device pool, causes a segfault. - TOMP(target enter data map(to: Quad% AngSetPtr(aSetID)% BdyExitPtr(angle)% bdyList)) + TOMP_MAP(target enter data map(to: Quad% AngSetPtr(aSetID)% BdyExitPtr(angle)% bdyList)) if ( .not. Quad% AngSetPtr(aSetID)% FinishingDirection(angle) ) then ! Unable to map these to UMPIRE device pool, causes a segfault or wrong answers. - TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% zonesInPlane)) + + if (aSetID > nAngleSets) then + TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% zonesInPlane)) + else + if ( sweepVersion == 0 ) then + TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% zonesInPlane)) + else + TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% cornersInPlane)) + endif + endif + TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% hplane1)) TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% hplane2)) TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% ndone)) + TOMP(target enter data map(to:Quad% AngSetPtr(aSetID)% HypPlanePtr(angle)% interfaceList)) endif enddo if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% angDerivFac) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% angDerivFac)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% angDerivFac)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% quadTauW1) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% quadTauW1)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% quadTauW1)) UMPIRE_DEVICE_POOL_ALLOC(Quad% AngSetPtr(aSetID)% quadTauW2) - TOMP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% quadTauW2)) + TOMP_MAP(target enter data map(always,to:Quad% AngSetPtr(aSetID)% quadTauW2)) endif @@ -326,149 +347,152 @@ subroutine initializeSets ! Geometry - TOMP(target enter data map(to:Geom)) + TOMP_MAP(target enter data map(to:Geom)) UMPIRE_DEVICE_POOL_ALLOC(Geom% Volume) - TOMP(target enter data map(always,to:Geom% Volume)) + TOMP_MAP(target enter data map(always,to:Geom% Volume)) UMPIRE_DEVICE_POOL_ALLOC(Geom% VolumeOld) - TOMP(target enter data map(always,to:Geom% VolumeOld)) + TOMP_MAP(target enter data map(always,to:Geom% VolumeOld)) UMPIRE_DEVICE_POOL_ALLOC(Geom% VolumeZone) - TOMP(target enter data map(always,to:Geom% VolumeZone)) + TOMP_MAP(target enter data map(always,to:Geom% VolumeZone)) UMPIRE_DEVICE_POOL_ALLOC(Geom% cOffSet) - TOMP(target enter data map(always,to:Geom% cOffSet)) + TOMP_MAP(target enter data map(always,to:Geom% cOffSet)) UMPIRE_DEVICE_POOL_ALLOC(Geom% numCorner) - TOMP(target enter data map(always,to:Geom% numCorner)) + TOMP_MAP(target enter data map(always,to:Geom% numCorner)) UMPIRE_DEVICE_POOL_ALLOC(Geom% CToZone) - TOMP(target enter data map(always,to:Geom% CToZone)) + TOMP_MAP(target enter data map(always,to:Geom% CToZone)) UMPIRE_DEVICE_POOL_ALLOC(Geom% corner1) - TOMP(target enter data map(always,to:Geom% corner1)) + TOMP_MAP(target enter data map(always,to:Geom% corner1)) UMPIRE_DEVICE_POOL_ALLOC(Geom% corner2) - TOMP(target enter data map(always,to:Geom% corner2)) + TOMP_MAP(target enter data map(always,to:Geom% corner2)) UMPIRE_DEVICE_POOL_ALLOC(Geom% zone1) - TOMP(target enter data map(always,to:Geom% zone1)) + TOMP_MAP(target enter data map(always,to:Geom% zone1)) UMPIRE_DEVICE_POOL_ALLOC(Geom% zone2) - TOMP(target enter data map(always,to:Geom% zone2)) + TOMP_MAP(target enter data map(always,to:Geom% zone2)) UMPIRE_DEVICE_POOL_ALLOC(Geom% cEZ) - TOMP(target enter data map(always,to:Geom% cEZ)) + TOMP_MAP(target enter data map(always,to:Geom% cEZ)) UMPIRE_DEVICE_POOL_ALLOC(Geom% cFP) - TOMP(target enter data map(always,to:Geom% cFP)) + TOMP_MAP(target enter data map(always,to:Geom% cFP)) UMPIRE_DEVICE_POOL_ALLOC(Geom% A_ez) - TOMP(target enter data map(always,to:Geom% A_ez)) + TOMP_MAP(target enter data map(always,to:Geom% A_ez)) UMPIRE_DEVICE_POOL_ALLOC(Geom% A_fp) - TOMP(target enter data map(always,to:Geom% A_fp)) + TOMP_MAP(target enter data map(always,to:Geom% A_fp)) if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_ALLOC(Geom% Area) - TOMP(target enter data map(always,to:Geom% Area)) + TOMP_MAP(target enter data map(always,to:Geom% Area)) UMPIRE_DEVICE_POOL_ALLOC(Geom% RadiusEZ) - TOMP(target enter data map(always,to:Geom% RadiusEZ)) + TOMP_MAP(target enter data map(always,to:Geom% RadiusEZ)) UMPIRE_DEVICE_POOL_ALLOC(Geom% RadiusFP) - TOMP(target enter data map(always,to:Geom% RadiusFP)) + TOMP_MAP(target enter data map(always,to:Geom% RadiusFP)) elseif (Size% ndim == 3) then UMPIRE_DEVICE_POOL_ALLOC(Geom% nCFacesArray) - TOMP(target enter data map(always,to:Geom% nCFacesArray)) + TOMP_MAP(target enter data map(always,to:Geom% nCFacesArray)) endif ! Radiation Intensity - TOMP(target enter data map(to:Rad)) + TOMP_MAP(target enter data map(to:Rad)) UMPIRE_DEVICE_POOL_ALLOC(Rad% PhiTotal) - TOMP(target enter data map(always,to:Rad% PhiTotal)) + TOMP_MAP(target enter data map(always,to:Rad% PhiTotal)) UMPIRE_DEVICE_POOL_ALLOC(Rad% radEnergy) - TOMP(target enter data map(always,to:Rad% radEnergy)) + TOMP_MAP(target enter data map(always,to:Rad% radEnergy)) #if !defined(TETON_ENABLE_MINIAPP_BUILD) - TOMP(target enter data map(to:Compton)) + TOMP_MAP(target enter data map(to:Compton)) if (getComptonFlag(Compton) /= comptonType_None) then UMPIRE_DEVICE_POOL_ALLOC(Compton% gamMean) - TOMP(target enter data map(always,to:Compton% gamMean)) + TOMP_MAP(target enter data map(always,to:Compton% gamMean)) UMPIRE_DEVICE_POOL_ALLOC(Compton% gamSqdDGam) - TOMP(target enter data map(always,to:Compton% gamSqdDGam)) + TOMP_MAP(target enter data map(always,to:Compton% gamSqdDGam)) UMPIRE_DEVICE_POOL_ALLOC(Compton% gamCubedDGam) - TOMP(target enter data map(always,to:Compton% gamCubedDGam)) + TOMP_MAP(target enter data map(always,to:Compton% gamCubedDGam)) UMPIRE_DEVICE_POOL_ALLOC(Compton% gamD) - TOMP(target enter data map(always,to:Compton% gamD)) + TOMP_MAP(target enter data map(always,to:Compton% gamD)) endif #endif ! GTA - TOMP(target enter data map(to:GTA)) + TOMP_MAP(target enter data map(to:GTA)) UMPIRE_DEVICE_POOL_ALLOC(GTA% GreySource) - TOMP(target enter data map(always,to:GTA% GreySource)) + TOMP_MAP(target enter data map(always,to:GTA% GreySource)) UMPIRE_DEVICE_POOL_ALLOC(GTA% GreyCorrection) - TOMP(target enter data map(always,to:GTA% GreyCorrection)) + TOMP_MAP(target enter data map(always,to:GTA% GreyCorrection)) UMPIRE_DEVICE_POOL_ALLOC(GTA% Chi) - TOMP(target enter data map(always,to:GTA% Chi)) + TOMP_MAP(target enter data map(always,to:GTA% Chi)) if (Size%useNewGTASolver) then UMPIRE_DEVICE_POOL_ALLOC(GTA% TT) - TOMP(target enter data map(always,to:GTA% TT)) + TOMP_MAP(target enter data map(always,to:GTA% TT)) UMPIRE_DEVICE_POOL_ALLOC(GTA% Pvv) - TOMP(target enter data map(always,to:GTA% Pvv)) + TOMP_MAP(target enter data map(always,to:GTA% Pvv)) UMPIRE_DEVICE_POOL_ALLOC(GTA% GreySigTotal) - TOMP(target enter data map(always,to:GTA% GreySigTotal)) + TOMP_MAP(target enter data map(always,to:GTA% GreySigTotal)) UMPIRE_DEVICE_POOL_ALLOC(GTA% GreySigScat) - TOMP(target enter data map(always,to:GTA% GreySigScat)) + TOMP_MAP(target enter data map(always,to:GTA% GreySigScat)) UMPIRE_DEVICE_POOL_ALLOC(GTA% GreySigScatVol) - TOMP(target enter data map(always,to:GTA% GreySigScatVol)) + TOMP_MAP(target enter data map(always,to:GTA% GreySigScatVol)) UMPIRE_DEVICE_POOL_ALLOC(GTA% GreySigtInv) - TOMP(target enter data map(always,to:GTA% GreySigtInv)) + TOMP_MAP(target enter data map(always,to:GTA% GreySigtInv)) UMPIRE_DEVICE_POOL_ALLOC(GTA% PhiInc) - TOMP(target enter data map(always,to:GTA% PhiInc)) + TOMP_MAP(target enter data map(always,to:GTA% PhiInc)) + + UMPIRE_DEVICE_POOL_ALLOC(GTA% Sscat) + TOMP_MAP(target enter data map(always,to:GTA% Sscat)) UMPIRE_DEVICE_POOL_ALLOC(GTA% Q) - TOMP(target enter data map(always,to:GTA% Q)) + TOMP_MAP(target enter data map(always,to:GTA% Q)) UMPIRE_DEVICE_POOL_ALLOC(GTA% TsaSource) - TOMP(target enter data map(always,to:GTA% TsaSource)) + TOMP_MAP(target enter data map(always,to:GTA% TsaSource)) UMPIRE_DEVICE_POOL_ALLOC(GTA% AfpNorm) - TOMP(target enter data map(always,to:GTA% AfpNorm)) + TOMP_MAP(target enter data map(always,to:GTA% AfpNorm)) UMPIRE_DEVICE_POOL_ALLOC(GTA% AezNorm) - TOMP(target enter data map(always,to:GTA% AezNorm)) + TOMP_MAP(target enter data map(always,to:GTA% AezNorm)) UMPIRE_DEVICE_POOL_ALLOC(GTA% ANormSum) - TOMP(target enter data map(always,to:GTA% ANormSum)) + TOMP_MAP(target enter data map(always,to:GTA% ANormSum)) if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_ALLOC(GTA% Tvv) - TOMP(target enter data map(always,to:GTA% Tvv)) + TOMP_MAP(target enter data map(always,to:GTA% Tvv)) endif endif @@ -476,10 +500,8 @@ subroutine initializeSets ! Initialize communication handles for persistent communicators -! QUESTION - We're passing in angle set IDs, but inside the initcomm the -! parameter is 'cSetID'. Should this be a loop over comm sets or angle sets?? -black27 - do aSetID=1,nAngleSets+nGTASets - call initcomm(aSetID) + do cSetID=1,nCommSets+nGTASets + call initcomm(cSetID) enddo ! Begin Initialize Phase @@ -503,11 +525,8 @@ subroutine initializeSets ! Map PsiB back to the CPU if (Size%useGPU) then - do cSetID=1,nCommSets - CSet => getCommSetData(Quad, cSetID) - do setID=CSet% set1,CSet% set2 - TOMP(target update from(Quad% SetDataPtr(setID)% PsiB)) - enddo + do setID=1,nSets + TOMP_UPDATE(target update from(Quad% SetDataPtr(setID)% PsiB)) enddo endif @@ -526,7 +545,7 @@ subroutine initializeSets if (Size%useGPU) then do setID=1,nSets UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% AngleOrder) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% AngleOrder)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% AngleOrder)) enddo endif @@ -547,51 +566,51 @@ subroutine initializeSets ! Material if ( Size% useGPU ) then - TOMP(target enter data map(to:Mat)) + TOMP_MAP(target enter data map(to:Mat)) UMPIRE_DEVICE_POOL_ALLOC(Mat% Tec) - TOMP(target enter data map(always,to:Mat% Tec)) + TOMP_MAP(target enter data map(always,to:Mat% Tec)) UMPIRE_DEVICE_POOL_ALLOC(Mat% Tecn) - TOMP(target enter data map(always,to:Mat% Tecn)) + TOMP_MAP(target enter data map(always,to:Mat% Tecn)) UMPIRE_DEVICE_POOL_ALLOC(Mat% denec) - TOMP(target enter data map(always,to:Mat% denec)) + TOMP_MAP(target enter data map(always,to:Mat% denec)) UMPIRE_DEVICE_POOL_ALLOC(Mat% cve) - TOMP(target enter data map(always,to:Mat% cve)) + TOMP_MAP(target enter data map(always,to:Mat% cve)) UMPIRE_DEVICE_POOL_ALLOC(Mat% rho) - TOMP(target enter data map(always,to:Mat% rho)) + TOMP_MAP(target enter data map(always,to:Mat% rho)) UMPIRE_DEVICE_POOL_ALLOC(Mat% nez) - TOMP(target enter data map(always,to:Mat% nez)) + TOMP_MAP(target enter data map(always,to:Mat% nez)) UMPIRE_DEVICE_POOL_ALLOC(Mat% stimComptonMult) - TOMP(target enter data map(always,to:Mat% stimComptonMult)) + TOMP_MAP(target enter data map(always,to:Mat% stimComptonMult)) UMPIRE_DEVICE_POOL_ALLOC(Mat% Siga) - TOMP(target enter data map(always,to:Mat% Siga)) + TOMP_MAP(target enter data map(always,to:Mat% Siga)) UMPIRE_DEVICE_POOL_ALLOC(Mat% Sigs) - TOMP(target enter data map(always,to:Mat% Sigs)) + TOMP_MAP(target enter data map(always,to:Mat% Sigs)) UMPIRE_DEVICE_POOL_ALLOC(Mat% Eta) - TOMP(target enter data map(always,to:Mat% Eta)) + TOMP_MAP(target enter data map(always,to:Mat% Eta)) UMPIRE_DEVICE_POOL_ALLOC(Mat% EmissionRate) - TOMP(target enter data map(always,to:Mat% EmissionRate)) + TOMP_MAP(target enter data map(always,to:Mat% EmissionRate)) UMPIRE_DEVICE_POOL_ALLOC(Mat% SMatEff) - TOMP(target enter data map(always,to:Mat% SMatEff)) + TOMP_MAP(target enter data map(always,to:Mat% SMatEff)) UMPIRE_DEVICE_POOL_ALLOC(Mat% PowerEmitted) - TOMP(target enter data map(always,to:Mat% PowerEmitted)) + TOMP_MAP(target enter data map(always,to:Mat% PowerEmitted)) UMPIRE_DEVICE_POOL_ALLOC(Mat% PowerCompton) - TOMP(target enter data map(always,to:Mat% PowerCompton)) + TOMP_MAP(target enter data map(always,to:Mat% PowerCompton)) UMPIRE_DEVICE_POOL_ALLOC(Mat% nonLinearIterations) - TOMP(target enter data map(always,to:Mat% nonLinearIterations)) + TOMP_MAP(target enter data map(always,to:Mat% nonLinearIterations)) endif @@ -601,24 +620,24 @@ subroutine initializeSets do setID=nSets+1,nSets+nGTASets UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% AngleOrder) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% AngleOrder)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% AngleOrder)) UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% tPsi) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% tPsi)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% tPsi)) UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% pInc) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% pInc)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% pInc)) UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% src) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% src)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% src)) if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% tPsiM) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% tPsiM)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% tPsiM)) UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% tInc) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% tInc)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% tInc)) endif enddo diff --git a/src/teton/control/initializeZones_OMPOL.F90 b/src/teton/control/initializeZones_OMPOL.F90 index e59bc5a..ef72693 100644 --- a/src/teton/control/initializeZones_OMPOL.F90 +++ b/src/teton/control/initializeZones_OMPOL.F90 @@ -53,33 +53,58 @@ subroutine initializeZones if ( Size% useGPU ) then +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, ZSet, Geom, Rad)) +#endif do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) schedule(dynamic) & !$omp& shared(zSetID, ZSet, Geom, Rad) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) ZSet% sumT(c) = Geom% Volume(c)*sum( Rad% PhiTotal(:,c) ) enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(nCorner, c0) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, Geom, Rad, ZSet)&) TOMPC(private(nCorner, c0)) +#endif do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector private(nCorner, c0) +#else !$omp parallel do default(none) schedule(dynamic) & !$omp& shared(zSetID, Geom, Rad, ZSet) & !$omp& private(c0, nCorner) +#endif do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) nCorner = Geom% numCorner(zone) @@ -90,11 +115,17 @@ subroutine initializeZones Rad% radEnergy(zone) = Rad% radEnergy(zone) + ZSet% sumT(c0+c) enddo enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif TOMP(target update from(Rad% radEnergy)) diff --git a/src/teton/control/setBoundarySources.F90 b/src/teton/control/setBoundarySources.F90 index be1a468..7a3ac4c 100644 --- a/src/teton/control/setBoundarySources.F90 +++ b/src/teton/control/setBoundarySources.F90 @@ -57,8 +57,8 @@ subroutine setBoundarySources(setID) ! Some temporary variables used for beam boundary conditions: type(AngleSet), pointer :: ASet - integer :: polarAngleValue - real :: refPolarValue + real(adqt) :: polarAngleValue + real(adqt) :: refPolarValue ! Constants diff --git a/src/teton/driver/makeUnstructuredBox.cc b/src/teton/driver/makeUnstructuredBox.cc deleted file mode 100644 index c683482..0000000 --- a/src/teton/driver/makeUnstructuredBox.cc +++ /dev/null @@ -1,281 +0,0 @@ -#include "mfem.hpp" -#include -#include -#include - -mfem::Mesh makeUnstructBoxMesh(int zoneSplits) -{ - int red = 1; - - int top = 1; - int bottom = 2; - int left = 3; - int right = 4; - int front = 5; - int back = 6; - - /* - This makes a mesh that looks like this. - On the outside, all nodes are spaced evenly. - The g node is meant to make box zone 5 a square. - Point f is somewhere. - - - - i j k l - *------*-*-* - | |3|5| - | 2 | *-* h - | |/ g| - *------* 4 | - |e f|\ | - | | \ | - | | \| - | 0 | 1 *d - | | | - *------*---* - a b c - */ - - int dim = 3; - int num2DVert = 12; - int numVert = num2DVert * 3; - int numElem = 6 * 2; - int numBdrElem = 6 * 2; - mfem::Mesh mesh(dim, numVert, numElem, numBdrElem, dim); - - double width = 1.0; - - // TODO: Fix to be "width/2" so that this is a cube. But - // maybe it doesn't matter. - for (int i = 0; i < 3; ++i) - { - // a - mesh.AddVertex(0.0, 0.0, i * width / 3.0); - // b - mesh.AddVertex(width / 2, 0.0, i * width / 3.0); - // c - mesh.AddVertex(width, 0.0, i * width / 3.0); - // d - mesh.AddVertex(width, width / 3.0, i * width / 3.0); - // e - mesh.AddVertex(0.0, width / 2.0, i * width / 3.0); - // f - mesh.AddVertex(width / 3.0, width / 2.0, i * width / 3.0); - // g - mesh.AddVertex(2.0 * width / 3.0, 2.0 * width / 3.0, i * width / 3.0); - // h - mesh.AddVertex(width, 2.0 * width / 3.0, i * width / 3.0); - // i - mesh.AddVertex(0.0, width, i * width / 3.0); - // j - mesh.AddVertex(width / 3.0, width, i * width / 3.0); - // k - mesh.AddVertex(2.0 * width / 3.0, width, i * width / 3.0); - // l - mesh.AddVertex(width, width, i * width / 3.0); - } - - int a = 0; - int b = 1; - int c = 2; - int d = 3; - int e = 4; - int f = 5; - int g = 6; - int h = 7; - int i = 8; - int j = 9; - int k = 10; - int l = 11; - - // MFEM for hexes and quads (and a few others) shares the VTK file format. See - // https://kitware.github.io/vtk-examples/site/VTKFileFormats/ - int B, T; - B = 0 * num2DVert; // bottom layer - T = 1 * num2DVert; // top layer - mesh.AddHex(a + B, b + B, f + B, e + B, a + T, b + T, f + T, e + T, red); - mesh.AddHex(b + B, c + B, d + B, f + B, b + T, c + T, d + T, f + T, red); - mesh.AddHex(e + B, f + B, j + B, i + B, e + T, f + T, j + T, i + T, red); - mesh.AddHex(f + B, g + B, k + B, j + B, f + T, g + T, k + T, j + T, red); - mesh.AddHex(f + B, d + B, h + B, g + B, f + T, d + T, h + T, g + T, red); - mesh.AddHex(g + B, h + B, l + B, k + B, g + T, h + T, l + T, k + T, red); - B = 1 * num2DVert; - T = 2 * num2DVert; - mesh.AddHex(a + B, b + B, f + B, e + B, a + T, b + T, f + T, e + T, red); - mesh.AddHex(b + B, c + B, d + B, f + B, b + T, c + T, d + T, f + T, red); - mesh.AddHex(e + B, f + B, j + B, i + B, e + T, f + T, j + T, i + T, red); - mesh.AddHex(f + B, g + B, k + B, j + B, f + T, g + T, k + T, j + T, red); - mesh.AddHex(f + B, d + B, h + B, g + B, f + T, d + T, h + T, g + T, red); - mesh.AddHex(g + B, h + B, l + B, k + B, g + T, h + T, l + T, k + T, red); - - // Should be counter-clockwise from the outside of the mesh. - B = 0 * num2DVert; - mesh.AddBdrQuad(a + B, e + B, f + B, b + B, bottom); - mesh.AddBdrQuad(b + B, f + B, d + B, c + B, bottom); - mesh.AddBdrQuad(e + B, i + B, j + B, f + B, bottom); - mesh.AddBdrQuad(f + B, j + B, k + B, g + B, bottom); - mesh.AddBdrQuad(f + B, g + B, h + B, d + B, bottom); - mesh.AddBdrQuad(g + B, k + B, l + B, h + B, bottom); - - B = 2 * num2DVert; - mesh.AddBdrQuad(a + B, b + B, f + B, e + B, top); - mesh.AddBdrQuad(b + B, c + B, d + B, f + B, top); - mesh.AddBdrQuad(e + B, f + B, j + B, i + B, top); - mesh.AddBdrQuad(f + B, g + B, k + B, j + B, top); - mesh.AddBdrQuad(f + B, d + B, h + B, g + B, top); - mesh.AddBdrQuad(g + B, h + B, l + B, k + B, top); - - T = 1 * num2DVert; - B = 0 * num2DVert; - mesh.AddBdrQuad(a + B, a + T, e + T, e + B, left); - mesh.AddBdrQuad(e + B, e + T, i + T, i + B, left); - T = 2 * num2DVert; - B = 1 * num2DVert; - mesh.AddBdrQuad(a + B, a + T, e + T, e + B, left); - mesh.AddBdrQuad(e + B, e + T, i + T, i + B, left); - - T = 1 * num2DVert; - B = 0 * num2DVert; - mesh.AddBdrQuad(b + B, b + T, a + T, a + B, front); - mesh.AddBdrQuad(c + B, c + T, b + T, b + B, front); - T = 2 * num2DVert; - B = 1 * num2DVert; - mesh.AddBdrQuad(b + B, b + T, a + T, a + B, front); - mesh.AddBdrQuad(c + B, c + T, b + T, b + B, front); - - T = 1 * num2DVert; - B = 0 * num2DVert; - mesh.AddBdrQuad(d + B, d + T, c + T, c + B, right); - mesh.AddBdrQuad(h + B, h + T, d + T, d + B, right); - mesh.AddBdrQuad(l + B, l + T, h + T, h + B, right); - T = 2 * num2DVert; - B = 1 * num2DVert; - mesh.AddBdrQuad(d + B, d + T, c + T, c + B, right); - mesh.AddBdrQuad(h + B, h + T, d + T, d + B, right); - mesh.AddBdrQuad(l + B, l + T, h + T, h + B, right); - - T = 1 * num2DVert; - B = 0 * num2DVert; - mesh.AddBdrQuad(k + B, k + T, l + T, l + B, right); - mesh.AddBdrQuad(j + B, j + T, k + T, k + B, right); - mesh.AddBdrQuad(i + B, i + T, j + T, j + B, right); - T = 2 * num2DVert; - B = 1 * num2DVert; - mesh.AddBdrQuad(k + B, k + T, l + T, l + B, right); - mesh.AddBdrQuad(j + B, j + T, k + T, k + B, right); - mesh.AddBdrQuad(i + B, i + T, j + T, j + B, right); - - // Build internal structures - mesh.FinalizeTopology(); - - if (zoneSplits > 0) - { - // Refine the initial mesh with this many zones in each - // dimension in each element above. - int ref_type = mfem::BasisType::ClosedUniform; - mesh = mfem::Mesh::MakeRefined(mesh, zoneSplits, ref_type); - } - - if (int wrong = mesh.CheckElementOrientation(true) > 0) - { - std::cout << "There were " << wrong << " 3D mesh elements with the wrong orientation.\n"; - } - if (int wrong = mesh.CheckBdrElementOrientation(true) > 0) - { - std::cout << "There were " << wrong << " 3D mesh boundary elements with the wrong orientation.\n"; - } - - // Sort the grid for better locality - mfem::Array ordering; - mesh.GetHilbertElementOrdering(ordering); - mesh.ReorderElements(ordering); - - return mesh; -} - -void twist(const mfem::Vector &in, mfem::Vector &p) -{ - const double x = in[0] - 0.5; - const double y = in[1] - 0.5; - const double z = in[2] - 0.5; - const double r = std::hypot(x, y); - const double theta = std::atan2(y, x); - p[0] = r * std::cos(theta + 0.2 * M_PI * z) + 0.5; - p[1] = r * std::sin(theta + 0.2 * M_PI * z) + 0.5; - p[2] = z + 0.5; -} - -int main(int argc, char *argv[]) -{ - MPI_Comm comm = MPI_COMM_WORLD; - int request = MPI_THREAD_SINGLE; - int provided = 0; - int claimed = 0; - MPI_Init_thread(&argc, &argv, request, &provided); - int myRank = 0; - int mySize = 0; - - MPI_Comm_rank(comm, &myRank); - MPI_Comm_size(comm, &mySize); - - int opt; - - int zoneSplit = 0; - std::string outputFile = "unstructBox3D.mesh"; - bool dumpViz = false; - - // put ':' in the starting of the string so that program can - // distinguish between '?' and ':' - while ((opt = getopt(argc, argv, "r:o:")) != -1) - { - switch (opt) - { - case 'r': - std::cout << "Refining mesh by splitting each cell edge into " << optarg << " sections..." << std::endl; - zoneSplit = atoi(optarg); - break; - case 'o': - outputFile = std::string(optarg); - break; - case 'v': - dumpViz = true; - break; - case '?': - std::cerr << "Unknown option: " << optopt << std::endl; - return 1; - } - } - - mfem::Mesh mesh3D = makeUnstructBoxMesh(zoneSplit); - - // Make it harder. Only do once on the fully refined mesh. - //mesh3D.Transform(twist); - - if (int wrong = mesh3D.CheckElementOrientation(true) > 0) - { - std::cout << "There were " << wrong << " 3D mesh elements with the wrong orientation after reordering.\n"; - } - if (int wrong = mesh3D.CheckBdrElementOrientation(true) > 0) - { - std::cout << "There were " << wrong - << " 3D mesh boundary elements with the wrong orientation after reordering.\n"; - } - - if (dumpViz) - { - mfem::VisItDataCollection vdc(outputFile, &mesh3D); - vdc.Save(); - } - - // Save mesh to file - std::ofstream mesh_ofs(outputFile); - mesh_ofs.precision(16); - mesh3D.Print(mesh_ofs); - mesh_ofs.close(); - - // Show info about the mesh. - mesh3D.PrintInfo(); - - return 0; -} diff --git a/src/teton/driver/test_driver.cc b/src/teton/driver/test_driver.cc index 2031eac..6ad0d8b 100644 --- a/src/teton/driver/test_driver.cc +++ b/src/teton/driver/test_driver.cc @@ -21,10 +21,6 @@ extern "C" void xl__trce(int, siginfo_t *, void *); #include #endif -#if defined(TETON_ENABLE_OPENMP) -#include "omp.h" -#endif - #if defined(TETON_ENABLE_UMPIRE) #include "umpire/Umpire.hpp" #include "umpire/strategy/QuickPool.hpp" @@ -49,10 +45,6 @@ extern "C" void xl__trce(int, siginfo_t *, void *); #include "conduit/conduit_relay.hpp" #include "conduit/conduit_relay_mpi_io_blueprint.hpp" -#if defined(TETON_ENABLE_MFEM) -#include "mfem.hpp" -#endif - #if defined(TETON_ENABLE_CALIPER) #include "adiak.hpp" #include "caliper/cali-manager.h" @@ -64,11 +56,6 @@ extern "C" void xl__trce(int, siginfo_t *, void *); #define CALI_CXX_MARK_SCOPE(label) #endif -// Determine whether Conduit has the tiled function. -#if CONDUIT_VERSION_MAJOR >= 0 && CONDUIT_VERSION_MINOR >= 8 && CONDUIT_VERSION_PATCH >= 9 -#define TETON_CONDUIT_HAS_TILED_FUNCTION -#endif - // Utility function, check if string ends with another string. bool endsWith(std::string const &fullString, std::string const &ending) { @@ -167,30 +154,19 @@ class TetonDriver int processArguments(int argc, char *argv[]); int execute(); void finalize(); + void print_umpire_usage(); private: void printUsage(const std::string &argv0) const; void startCaliper(const std::string &label); - void initThreads(); - void initGPU(); - void writeStartSummary(unsigned int ndims, - unsigned long local_num_corners, - unsigned long num_corners, - unsigned long &num_unknowns, - unsigned long &local_num_unknowns) const; - int readMeshMFEM(); + void writeStartSummary(unsigned int ndims, unsigned long num_corners, unsigned long &num_unknowns) const; void initializeBlueprintFields(int nelem, int numPolar, int numAzimuthal, int numGroups); - void initializeBoundaryConditionsMFEM(); - void updateBoundaryConnectivityMFEM(); void readConduitInputs(); void setOptions(); void verifyMesh(); - void cycleLoop(double &dtrad, double &timerad, unsigned long num_unknowns); + void cycleLoop(double &dtrad, double &timerad); void buildBlueprintTiledMesh(); - void writeEndSummary(double end_time, - double start_time, - unsigned long num_unknowns, - unsigned long local_num_unknowns); + void writeEndSummary(double end_time, double start_time, unsigned long num_unknowns); void release(); @@ -198,28 +174,20 @@ class TetonDriver int return_status{0}; int myRank{0}; int mySize{0}; - unsigned int cycles{0}; + int cycles{0}; int numPhaseAngleSets{0}; int useUmpire{2}; - int numOmpMaxThreads{-1}; // Max number of CPU threads to use If -1, use value from omp_get_max_threads() + int numOmpMaxThreads{1}; // Max number of CPU threads to use. double fixedDT{0.0}; bool dumpViz{false}; - double energy_check_tolerance{1.0e-9}; + bool partition{false}; + double energy_check_tolerance{1.0e-6}; int input_sanitizer_level{1}; unsigned int benchmarkProblem{0}; int numPolarUser{-1}; int numAzimuthalUser{-1}; int numGroupsUser{0}; -#if defined(TETON_ENABLE_MFEM) - int numSerialRefinementFactor{2}; - int numParallelRefinementFactor{2}; - int numSerialRefinementLevels{0}; - int numParallelRefinementLevels{0}; - mfem::Mesh *mesh{nullptr}; - mfem::ParMesh *pmesh{nullptr}; - mfem::ConduitDataCollection *conduit_data_collec{nullptr}; -#endif // MPI MPI_Comm comm{MPI_COMM_WORLD}; @@ -229,27 +197,23 @@ class TetonDriver bool useCUDASweep{false}; int gta_kernel{1}; int sweep_kernel{-1}; + int sweep_numhyperdomains{-1}; + int gta_numhyperdomains{-1}; std::string scattering_kernel{}; ::Teton::Teton myTetonObject{}; std::string inputPath{"."}; std::string outputPath{"."}; - std::string label{}; + std::string label{"unnamed"}; std::string colorFile{}; - std::string caliper_config - { -#if defined(TETON_ENABLE_CUDA) - "runtime-report,nvprof" -#else - "runtime-report" -#endif - }; + std::string caliper_config; std::string meshOrdering{"kdtree"}; #if defined(TETON_ENABLE_CALIPER) cali::ConfigManager mgr{}; #endif int blueprintMesh{0}; int dims[3]{10, 10, 10}; //!< Number of cells in blueprint mesh. + unsigned int total_num_flux_iterations{0}; }; //--------------------------------------------------------------------------- @@ -266,6 +230,14 @@ void TetonDriver::initialize() mgr.set_default_parameter("aggregate_across_ranks", "true"); mgr.set_default_parameter("calc.inclusive", "true"); mgr.set_default_parameter("main_thread_only", "true"); + mgr.add("runtime-report"); + +#if defined(TETON_ENABLE_CUDA) + mgr.add("nvtx"); +#elif defined(TETON_ENABLE_HIP) + //mgr.add("roxtx"); +#endif + #endif #ifdef SIGSEGV @@ -346,53 +318,42 @@ int TetonDriver::processArguments(int argc, char *argv[]) while (1) { - static struct option long_options[] = { - {"apply_label", no_argument, 0, 'l'}, - {"benchmark_problem", required_argument, 0, 'b'}, - {"blueprint", required_argument, 0, 'B'}, - {"dims", required_argument, 0, 'd'}, - {"caliper", required_argument, 0, 'p'}, - {"input_sanitizer_level", required_argument, 0, 'y'}, - {"handler", no_argument, 0, 'H'}, - {"help", no_argument, 0, 'h'}, - {"input_path", required_argument, 0, 'i'}, - {"num_cycles", required_argument, 0, 'c'}, - {"dt", required_argument, 0, 'D'}, - {"num_phase_space_sets", required_argument, 0, 's'}, - {"num_threads", required_argument, 0, 't'}, - {"output_path", required_argument, 0, 'o'}, - {"umpire_mode", required_argument, 0, 'u'}, - {"mesh_ordering", required_argument, 0, 'M'}, - {"use_device_aware_mpi", no_argument, 0, 'm'}, - {"use_cuda_sweep", no_argument, 0, 'e'}, - {"use_gpu_kernels", no_argument, 0, 'g'}, - {"gta_kernel", required_argument, 0, 'n'}, - {"scattering_kernel", required_argument, 0, 'k'}, - {"sweep_kernel", required_argument, 0, 'S'}, - {"verbose", required_argument, 0, 'v'}, - {"write_viz_file", no_argument, 0, 'V'}, -#if defined(TETON_ENABLE_MFEM) - {"serial_refinement_levels", required_argument, 0, 'r'}, - {"parallel_refinement_levels", required_argument, 0, 'z'}, - {"serial_refinement_factor", required_argument, 0, 'R'}, - {"parallel_refinement_factor", required_argument, 0, 'Z'}, - {"color_file", required_argument, 0, 'C'}, -#endif - {"num_Polar", required_argument, 0, 'P'}, - {"num_Azimuthal", required_argument, 0, 'A'}, - {"num_Groups", required_argument, 0, 'G'}, - {0, 0, 0, 0} - }; + static struct option long_options[] = {{"apply_label", required_argument, 0, 'l'}, + {"benchmark_problem", required_argument, 0, 'b'}, + {"blueprint", required_argument, 0, 'B'}, + {"dims", required_argument, 0, 'd'}, + {"caliper", required_argument, 0, 'p'}, + {"input_sanitizer_level", required_argument, 0, 'y'}, + {"handler", no_argument, 0, 'H'}, + {"help", no_argument, 0, 'h'}, + {"input_path", required_argument, 0, 'i'}, + {"num_cycles", required_argument, 0, 'c'}, + {"dt", required_argument, 0, 'D'}, + {"num_phase_space_sets", required_argument, 0, 's'}, + {"num_threads", required_argument, 0, 't'}, + {"output_path", required_argument, 0, 'o'}, + {"umpire_mode", required_argument, 0, 'u'}, + {"mesh_ordering", required_argument, 0, 'M'}, + {"use_device_aware_mpi", no_argument, 0, 'm'}, + {"use_cuda_sweep", no_argument, 0, 'e'}, + {"use_gpu_kernels", no_argument, 0, 'g'}, + {"gta_kernel", required_argument, 0, 'n'}, + {"scattering_kernel", required_argument, 0, 'k'}, + {"sweep_kernel", required_argument, 0, 'S'}, + {"sweep_numhyperdomains", required_argument, 0, 'Z'}, + {"gta_numhyperdomains", required_argument, 0, 'z'}, + {"verbose", required_argument, 0, 'v'}, + {"write_viz_file", no_argument, 0, 'V'}, + {"partition", no_argument, 0, 'x'}, + {"num_Polar", required_argument, 0, 'P'}, + {"num_Azimuthal", required_argument, 0, 'A'}, + {"num_Groups", required_argument, 0, 'G'}, + {0, 0, 0, 0}}; /* getopt_long stores the option index here. */ int option_index = 0; -#if defined(TETON_ENABLE_MFEM) - auto optString = "A:B:b:c:D:d:eG:gHhi:k:l:M:mn:o:P:p:s:S:t:u:Vv:y:" // Base options - "C:R:r:z:Z:"; // MFEM-only options -#else - auto optString = "A:B:b:c:D:d:eG:gHhi:k:l:M:mn:o:P:p:s:S:t:u:Vv:y:"; // Base options -#endif + auto optString = "A:B:b:c:D:d:eG:gHhi:k:l:M:mn:o:P:p:s:S:t:u:Vv:xy:"; int opt = getopt_long(argc, argv, optString, long_options, &option_index); @@ -509,6 +470,22 @@ int TetonDriver::processArguments(int argc, char *argv[]) << ". (0=zone sweep, 1=corner sweep)" << std::endl; } break; + case 'Z': + sweep_numhyperdomains = atoi(optarg); + if (myRank == 0) + { + std::cout << "Teton driver: setting sweep number of hyper-domains " << sweep_numhyperdomains + << ". (0=automatic setting, >=1 number of sweep hyper-domains)" << std::endl; + } + break; + case 'z': + gta_numhyperdomains = atoi(optarg); + if (myRank == 0) + { + std::cout << "Teton driver: setting new GTA number of hyper-domains " << gta_numhyperdomains + << ". (0=automatic setting, >=1 number of new GTA hyper-domains)" << std::endl; + } + break; case 'o': outputPath = std::string(optarg); break; @@ -543,71 +520,26 @@ int TetonDriver::processArguments(int argc, char *argv[]) std::cout << "Teton driver: number of phase-angle sets to create: " << numPhaseAngleSets << std::endl; } break; -#if defined(TETON_ENABLE_MFEM) - case 'r': - numSerialRefinementLevels = atoi(optarg); - if (myRank == 0) - { - std::cout << "Teton driver: number of serial refinement levels: " << numSerialRefinementLevels - << std::endl; - } - break; - case 'z': - numParallelRefinementLevels = atoi(optarg); - if (myRank == 0) - { - std::cout << "Teton driver: number of parallel refinement levels: " << numParallelRefinementLevels - << std::endl; - } - break; - case 'R': - numSerialRefinementFactor = atoi(optarg); - if (myRank == 0) - { - std::cout << "Teton driver: serial refinement factor: " << numSerialRefinementFactor << std::endl; - } - break; - case 'Z': - numParallelRefinementFactor = atoi(optarg); - if (myRank == 0) - { - std::cout << "Teton driver: parallel refinement factor: " << numParallelRefinementFactor << std::endl; - } - break; - case 'C': - colorFile = std::string(optarg); - if (myRank == 0) - { - std::cout << "Teton driver: Using color file for decomposition: " << colorFile << std::endl; - } - break; -#endif case 'A': -#if defined(TETON_ENABLE_MFEM) || defined(TETON_CONDUIT_HAS_TILED_FUNCTION) numAzimuthalUser = atoi(optarg); if (myRank == 0) { std::cout << "Teton driver: number of azimuthal angles: " << numAzimuthalUser << std::endl; } -#endif break; case 'P': -#if defined(TETON_ENABLE_MFEM) || defined(TETON_CONDUIT_HAS_TILED_FUNCTION) numPolarUser = atoi(optarg); if (myRank == 0) { std::cout << "Teton driver: number of polar angles: " << numPolarUser << std::endl; } -#endif break; case 'G': -#if defined(TETON_ENABLE_MFEM) || defined(TETON_CONDUIT_HAS_TILED_FUNCTION) numGroupsUser = atoi(optarg); if (myRank == 0) { std::cout << "Teton driver: number of energy groups: " << numGroupsUser << std::endl; } -#endif break; case 't': numOmpMaxThreads = atoi(optarg); @@ -637,6 +569,14 @@ int TetonDriver::processArguments(int argc, char *argv[]) std::cout << "Teton driver: setting verbosity to " << verbose << std::endl; } break; + case 'x': + // We only partition if there are multiple ranks. + partition = mySize > 1; + if (myRank == 0) + { + std::cout << "Teton driver: partitioning enabled." << std::endl; + } + break; case 'y': input_sanitizer_level = atoi(optarg); if (myRank == 0) @@ -653,6 +593,13 @@ int TetonDriver::processArguments(int argc, char *argv[]) } } +#if defined(TETON_ENABLE_CALIPER) + // Add output path for spot dump. This should occur after the command line args are processed. + std::string spot_line("spot"); + spot_line = spot_line + "(output=" + outputPath + "/" + label + ".cali)"; + mgr.add(spot_line.c_str()); +#endif + return 0; } @@ -687,11 +634,17 @@ void TetonDriver::printUsage(const std::string &argv0) const std::cout << " -S, --sweep_kernel <0,1> Select sweep kernel version. 0=zone, 1=corner. Note: corner sweep only available as a GPU kernel, on 3D meshes." << std::endl; - std::cout << " -o, --output_path Path to generate output files. If not set, will disable output files." + std::cout + << " -Z, --sweep_numhyperdomains <0+> Set number of sweep hyper-domains. 0=automatic, or >=1 number of sweep hyper-domains." + << std::endl; + std::cout + << " -z, --gta_numhyperdomains <0+> Set number of new GTA hyper-domains. 0=automatic, or >=1 number of new GTA hyper-domains." + << std::endl; + std::cout << " -o, --output_path Path to generate output files, including any caliper spot dumps." << std::endl; #if defined(TETON_ENABLE_CALIPER) std::cout << " -p, --caliper Caliper configuration profile. Set to 'help'" - << " to get supported keywords. 'None' disabled caliper. Default is 'runtime-report'." << std::endl; + << " to get supported keywords. 'None' disabled caliper. Default is 'runtime-report,spot'." << std::endl; #endif std::cout << " -s, --num_phase_space_sets Number of phase-angle sets to construct." << std::endl; std::cout << " -t, --num_threads Max number of threads for cpu OpenMP parallel regions." << std::endl; @@ -703,25 +656,9 @@ void TetonDriver::printUsage(const std::string &argv0) const std::cout << " -y, --input_sanitizer_level 0 - don't check inputs\n 1 - print one message for each bad input category\n 2 - print one message for each bad value of each bad category" << std::endl; -#if defined(TETON_ENABLE_MFEM) - std::cout << " -r, --serial_refinement_levels Number of times to halve each edge the MFEM mesh before " - << "doing parallel decomposition. Applied after the refinement_factor. (factor of 2^((r*dim) new zones)" - << std::endl; - std::cout << " -z, --parallel_refinement_levels Number of times to halve each edge the MFEM mesh after " - << "doing parallel decomposition. Applied after the refinement_factor. (factor of 2^(z*dim) new zones)" - << std::endl; - std::cout << " -R, --serial_refinement_factor Number of subdivisions for each edge in the original MFEM " - << "mesh before doing parallel decomposition. (factor of (R+1)^dim new zones)" << std::endl; - std::cout << " -Z, --parallel_refinement_factor Number of subdivisions for each edge in the original MFEM " - << "mesh after doing parallel decomposition. (factor of (Z+1)^dim new zones)" << std::endl; - std::cout << " -C, --color_file color file for manual decomposition" << std::endl; -#endif -#if defined(TETON_ENABLE_MFEM) || defined(TETON_CONDUIT_HAS_TILED_FUNCTION) std::cout << " -A, --num_Azimuthal Number azimuthal angles in an octant" << std::endl; std::cout << " -P, --num_Polar Number polar angles in an octant" << std::endl; std::cout << " -G, --num_Groups Number energy groups" << std::endl; -#endif -#if defined(TETON_CONDUIT_HAS_TILED_FUNCTION) std::cout << " -B, --blueprint local|global Generate Blueprint tiled mesh in memory using the specified scheme." << " The \"local\" scheme creates the same sized mesh on each MPI rank, allowing for weak scaling. The " << "\"global\" scheme creates the specified mesh size globally and decomposes that size over the available" @@ -730,7 +667,6 @@ void TetonDriver::printUsage(const std::string &argv0) const << " -d, --dims i,j,k The size of the Blueprint mesh in tiles in i,j,k. k=0 builds a 2D mesh." << std::endl; std::cout << " -M, --mesh_ordering order The name of the mesh ordering to use (normal or kdtree)." << std::endl; -#endif } //--------------------------------------------------------------------------- @@ -754,14 +690,26 @@ int TetonDriver::execute() //========================================================== if (benchmarkProblem > 0) { - options["iteration/relativeTolerance"] = energy_check_tolerance / 10.0; - -// If running UMT, only the sweep kernel is active. Increase the number of -// allowed inner flux iterations to enable it to converge on its own. +// If running UMT, only the sweep kernel is active. Tighten the tolerance and +// increase the number of allowed inner flux iterations to enable it to converge. #if defined(TETON_ENABLE_MINIAPP_BUILD) + if (myRank == 0) + { + std::cerr + << "Detected UMT run, fixing temperature iterations to one and increasing max flux iterations to enable convergence." + << std::endl; + } options["iteration/incidentFluxMaxIt"] = 99; + if (!options.has_path("iteration/relativeTolerance")) + { + options["iteration/relativeTolerance"] = energy_check_tolerance / 10.0; + } + energy_check_tolerance = 1e-9; #endif - fixedDT = 1e-3; + if (fixedDT <= 0.0) + { + fixedDT = 1e-3; + } if (cycles == 0) { cycles = 5; @@ -772,39 +720,24 @@ int TetonDriver::execute() numPolarUser = 3; numAzimuthalUser = 3; numGroupsUser = 128; - label = "UMTSPP1"; } else if (benchmarkProblem == 2) { numPolarUser = 2; numAzimuthalUser = 2; numGroupsUser = 16; - label = "UMTSPP2"; } else { std::cerr << "Teton driver: Custom benchmark problem #" << benchmarkProblem << std::endl; - label = "CustomBenchmark" + std::to_string(benchmarkProblem); - } - - if (useGPU) - { - label += "_GPU"; } } // More initialization startCaliper(label); - initThreads(); - initGPU(); - - //========================================================== - // Read in conduit nodes or mfem mesh with problem input - //========================================================== //========================================================== - // Read in mesh from an mfem mesh file. We set up a uniform - // temperature problem with simple source boundary conditions. + // Read in conduit nodes or generate a default blueprint mesh. // // TODO - All this hard-coding can be moved into an input file that lives // alongside the mfem mesh file. @@ -818,41 +751,6 @@ int TetonDriver::execute() { buildBlueprintTiledMesh(); } - else if (endsWith(inputPath, ".mesh")) - { - CALI_CXX_MARK_SCOPE("Teton_Read_Mfem_Input"); - if (access(inputPath.c_str(), F_OK) != -1) - { -#if defined(TETON_ENABLE_MFEM) - int nelem = readMeshMFEM(); - - { // new scope - CALI_CXX_MARK_SCOPE("Teton_Init_BP_Fields"); - - initializeBlueprintFields(nelem, numPolarUser, numAzimuthalUser, numGroupsUser); - initializeBoundaryConditionsMFEM(); - updateBoundaryConnectivityMFEM(); - - // Needs to be re-done when blueprint interface for specifying profiles is updated. - options["sources/profile1/Values"] = 0.3; - options["sources/profile1/NumTimes"] = 1; - options["sources/profile1/NumValues"] = 1; - options["sources/profile1/Multiplier"] = 1.0; - - // Disable updating the mesh vertices each cycle. This is unnecessary, as this test problem has fixed - // vertex positions. - options["mesh_motion"] = 0; - } -#else - throw std::runtime_error( - "Unable to open mfem mesh, test driver was not configured with CMake's '-DENABLE_MFEM=ON'."); -#endif - } - else - { - throw std::runtime_error("Couldn't find mfem mesh at " + inputPath); - } - } // Assume this is an input directory with a set of conduit blueprint mesh files and problem parameter files. // Note: The parameter files currently have all the input duplicated for each rank. Look into making a // single 'global' parameter file for global input. @@ -881,20 +779,45 @@ int TetonDriver::execute() // opposed to bloating up the test driver code. // -- black27 - // Get total number of corners in problem. - const conduit::Node &corner_topology = meshBlueprint.fetch_existing("topologies/main_corner"); - unsigned long local_num_corners = conduit::blueprint::mesh::utils::topology::length(corner_topology); - unsigned long num_corners; - // int MPI_Reduce(_In_ void *sendbuf, _Out_opt_ void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm); + // Get total number of corners in problem. Use part mesh in case partitioning has occurred. + unsigned long local_num_corners = 0; + unsigned long num_corners = 0; + unsigned int ndims = 1; + const conduit::Node &part = myTetonObject.getMeshBlueprintPart(); + if (part.has_path("topologies/main_corner")) + { + const conduit::Node &corner_topology = part.fetch_existing("topologies/main_corner"); + local_num_corners = conduit::blueprint::mesh::utils::topology::length(corner_topology); + ndims = conduit::blueprint::mesh::utils::topology::dims(corner_topology); + } + else + { + int nelem = 1; + if (part.has_path("topologies/main")) + { + nelem = part.fetch_existing("topologies/main/elements/dims/i").value(); + } + else // Take the first topology + { + conduit::NodeConstIterator topologies = part.fetch_existing("topologies").children(); + if (!topologies.has_next()) + { + std::cout << "There must be at least one topology in your mesh!" << std::endl; + exit(1); + } + nelem = topologies.next().fetch_existing("elements/dims/i").value(); + } + local_num_corners = 2 * nelem; + } + int error_code = MPI_Reduce(&local_num_corners, &num_corners, 1, MPI_UNSIGNED_LONG, MPI_SUM, 0, comm); if (error_code != MPI_SUCCESS) { //TODO - error out } - unsigned long num_unknowns = 0, local_num_unknowns = 0; - unsigned int ndims = conduit::blueprint::mesh::utils::topology::dims(corner_topology); - writeStartSummary(ndims, local_num_corners, num_corners, num_unknowns, local_num_unknowns); + unsigned long num_unknowns = 0; + writeStartSummary(ndims, num_corners, num_unknowns); // If a dtrad wasn't provided in the input file, the Teton initialize() // call will populate it with a default value. @@ -907,19 +830,19 @@ int TetonDriver::execute() meshBlueprint["state/cycle"] = 0; double start_time = MPI_Wtime(); - cycleLoop(dtrad, timerad, num_unknowns); + cycleLoop(dtrad, timerad); double end_time = MPI_Wtime(); myTetonObject.dumpTallyToJson(); - writeEndSummary(end_time, start_time, num_unknowns, local_num_unknowns); + writeEndSummary(end_time, start_time, num_unknowns); } return return_status; } //--------------------------------------------------------------------------- -void TetonDriver::startCaliper(const std::string &label) +void TetonDriver::startCaliper(const std::string &label2) { //========================================================== // Start caliper @@ -932,231 +855,17 @@ void TetonDriver::startCaliper(const std::string &label) { if (myRank == 0) { - std::cout << "Teton driver: Caliper config error: " << mgr.error_msg() << std::endl; + std::cerr << "Teton driver: Caliper config error: " << mgr.error_msg() << std::endl; + exit(1); } } mgr.start(); } - if (!label.empty()) - { - adiak::value("ProblemName", label, adiak_general); - } -#endif -} - -//--------------------------------------------------------------------------- -void TetonDriver::initThreads() -{ -#if defined(TETON_ENABLE_OPENMP) - if (numOmpMaxThreads == -1) - { - numOmpMaxThreads = omp_get_max_threads(); - } - - if (myRank == 0) - { - std::cout << "Teton driver: Threading enabled, max number of threads is " << numOmpMaxThreads << std::endl; - } -#endif -} - -//--------------------------------------------------------------------------- -void TetonDriver::initGPU() -{ -//========================================================== -// Initialize environment on GPU -//========================================================== -#if defined(TETON_ENABLE_OPENMP_OFFLOAD) - if (myRank == 0) - print_gpu_mem("Teton driver: Before hello world gpu kernel run."); - -// It's necessary to run a small GPU kernel to initialize the GPU state so our timers get accurate benchmarks later. -#pragma omp target - { - printf("Teton driver: Hello World! GPU is now initialized.\n"); - } - - if (myRank == 0) - print_gpu_mem("Teton driver: After hello world gpu kernel run."); -#endif -} - -//--------------------------------------------------------------------------- -int TetonDriver::readMeshMFEM() -{ - int nelem = 0; -#if defined(TETON_ENABLE_MFEM) - conduit::Node &options = myTetonObject.getOptions(); - conduit::Node &meshBlueprint = myTetonObject.getMeshBlueprint(); - - if (myRank == 0) - { - std::cout << "Teton driver: reading mfem mesh: " << inputPath << std::endl; - } - - { // new scope - CALI_CXX_MARK_SCOPE("Teton_Refine_Serial_Mesh"); - - mesh = new mfem::Mesh(inputPath.c_str(), 1, 1); - for (int l = 0; l < numSerialRefinementLevels; ++l) - { - if (myRank == 0) - { - std::cout << "Teton driver: Uniformly refining serial mesh, iteration " << l + 1 - << ", factor = " << numSerialRefinementFactor << std::endl; - } - if (numSerialRefinementFactor == 2) - { - mesh->UniformRefinement(); - } - else - { - int ref_type = mfem::BasisType::ClosedUniform; - *mesh = mfem::Mesh::MakeRefined(*mesh, numSerialRefinementFactor, ref_type); - } - } - - if (benchmarkProblem > 0 && (numSerialRefinementLevels > 0)) - { - // Save refined mesh to file, if running benchmark problems ( for later use ). - std::ofstream mesh_ofs("refined_mesh.mesh"); - mesh_ofs.precision(16); - mesh->Print(mesh_ofs); - mesh_ofs.close(); - } - } - - { // new scope - CALI_CXX_MARK_SCOPE("Teton_Create_Par_Mesh"); - // MFEM does not support parallel refinement on NURBs meshes. - // Convert to high order mesh to enable basic refinement capability. - // Also add a grid function if we didn't have one before. - mesh->SetCurvature(1); - - if (myRank == 0) - { - std::cout << "Teton driver: decomposing serial mesh into parallel" << std::endl; - } - - if (colorFile.size() > 0) - { - if (access(colorFile.c_str(), F_OK) != -1) - { - int nelem = mesh->GetNE(); - std::vector colorData; - colorData.reserve(nelem); - - int e = 0; - std::ifstream colorFileStream(colorFile.c_str()); - while (!colorFileStream.eof() or e == nelem) - { - int c = 0; - colorFileStream >> c; - colorData.push_back(c); - ++e; - } - if (e < nelem) - { - throw std::runtime_error("Not enough colors in " + colorFile); - } - colorFileStream.close(); - - pmesh = new mfem::ParMesh(comm, *mesh, colorData.data()); - } - else - { - throw std::runtime_error("Could not open color file " + colorFile); - } - } - else - { - // Only re-order without the color file, otherwise we won't know - // order the elements are. - - // Sort the grid for better locality - mfem::Array ordering; - mesh->GetHilbertElementOrdering(ordering); - mesh->ReorderElements(ordering); - - // TODO Make optional. This will use the space-filling curve for - // partitioning. It's both nicer and more horrific than Metis, - // if that was even possible. - // mesh.EnsureNCMesh(); - - pmesh = new mfem::ParMesh(comm, *mesh); - if (int wrong = pmesh->CheckElementOrientation(true) > 0) - { - std::cout << "There were " << wrong << " 3D mesh elements with the wrong orientation after reordering.\n"; - } - if (int wrong = pmesh->CheckBdrElementOrientation(true) > 0) - { - std::cout << "There were " << wrong - << " 3D mesh boundary elements with the wrong orientation after reordering.\n"; - } - } - } - - { // new scope - CALI_CXX_MARK_SCOPE("Teton_Refine_Par_Mesh"); - for (int l = 0; l < numParallelRefinementLevels; ++l) - { - if (myRank == 0) - { - std::cout << "Teton driver: Uniformly refining parallel mesh, iteration " << l + 1 - << ", factor = " << numParallelRefinementFactor << std::endl; - } - if (numParallelRefinementFactor == 2) - { - mesh->UniformRefinement(); - } - else - { - int ref_type = mfem::BasisType::ClosedUniform; - *pmesh = mfem::ParMesh::MakeRefined(*pmesh, numParallelRefinementFactor, ref_type); - } - } - } - - if (myRank == 0) + if (!label2.empty()) { - std::cout << "Teton driver: Final parallel mesh characteristics are:" << std::endl; - } - pmesh->PrintInfo(); - - // This is local number of elements. - nelem = pmesh->GetNE(); - - { // new scope - CALI_CXX_MARK_SCOPE("Teton_Create_BP_Mesh"); - // Create a blueprint node from the mfem mesh - conduit_data_collec = new mfem::ConduitDataCollection("mfem_conduit_data_collection", pmesh); - // Note - the mesh blueprint node contains pointers back into the mfem - // mesh for some of the data. For example, the coordinates. - // *************************** - // DO NOT DELETE the mfem mesh objects until this blueprint node is no - // longer needed. - // *************************** - conduit_data_collec->MeshToBlueprintMesh(pmesh, meshBlueprint); - - // Delete extra fields we don't need. Some of these are not yet supported - // by VisIt (https://wci.llnl.gov/simulation/computer-codes/visit) - if (meshBlueprint.has_path("topologies/main/grid_function")) - { - meshBlueprint.remove("topologies/main/grid_function"); - } - if (meshBlueprint.has_path("topologies/main/boundary_topology")) - { - meshBlueprint.remove("topologies/main/boundary_topology"); - } - if (meshBlueprint.has_path("fields/mesh_nodes")) - { - meshBlueprint.remove("fields/mesh_nodes"); - } - - mesh->Clear(); + adiak::value("ProblemName", label2, adiak_general); } #endif - return nelem; } //--------------------------------------------------------------------------- @@ -1286,109 +995,6 @@ void TetonDriver::initializeBlueprintFields(int nelem, int numPolar, int numAzim electron_number_density.size()); } -//--------------------------------------------------------------------------- -void TetonDriver::initializeBoundaryConditionsMFEM() -{ -#if defined(TETON_ENABLE_MFEM) - conduit::Node &options = myTetonObject.getOptions(); - // Make all boundaries vacuum - std::map boundary_id_to_type; - for (int i = 0; i < pmesh->bdr_attributes.Size(); ++i) - { - int ba = pmesh->bdr_attributes[i]; - // Draco will tag untagged faces with 9999, even if they are - // interior. We need to skip those. - if (ba != 9999) - { - int bctype = 35; // vacuum - // Maybe if we'd run in RZ, we need to make the axis reflecting? - //if( ba == 1 or ba == 4) - //{ - // int bctype = 32; // Reflecting - //} - boundary_id_to_type[pmesh->bdr_attributes[i]] = 35; // vacuum - } - } - - std::vector keys, values; - for (std::map::iterator it = boundary_id_to_type.begin(); it != boundary_id_to_type.end(); ++it) - { - int k = it->first; - int v = it->second; - keys.push_back(k); - // There's some memory error here, which is very, very odd. - values.push_back(v); - } - options["boundary_conditions/id_to_type_map/ids"] = keys; - options["boundary_conditions/id_to_type_map/types"] = values; -#endif -} - -//--------------------------------------------------------------------------- -void TetonDriver::updateBoundaryConnectivityMFEM() -{ -#if defined(TETON_ENABLE_MFEM) - conduit::Node &options = myTetonObject.getOptions(); - conduit::Node &meshBlueprint = myTetonObject.getMeshBlueprint(); - - // Prune the boundary topology of all interior boundary elements. MFEM creates a boundary topology over - // both problem surface elements and interior shared boundaries between domains. - // Teton only wants the surface elements. Teton uses the adjacency lists to determine shared boundary elements. - conduit::int_accessor bndry_vals = meshBlueprint.fetch_existing("fields/boundary_attribute/values").value(); - conduit::int_accessor element_points = meshBlueprint.fetch_existing("topologies/boundary/elements/connectivity") - .value(); - std::string element_type = meshBlueprint.fetch_existing("topologies/boundary/elements/shape").as_string(); - - size_t num_points_in_element; - - if (element_type == "point") - { - num_points_in_element = 1; - } - else if (element_type == "line") - { - num_points_in_element = 2; - } - else if (element_type == "quad") - { - num_points_in_element = 4; - } - else - { - throw std::runtime_error("Unsupported element type of: " + element_type); - } - - std::vector new_bnd_attribs; - std::vector new_connectivity; - size_t current_element = 0; - const conduit::Node &n_keys = options.fetch_existing("boundary_conditions/id_to_type_map/ids"); - auto nkeys = n_keys.dtype().number_of_elements(); - const int *keys = n_keys.as_int_ptr(); - const int *keys_end = keys + nkeys; - for (size_t i = 0; i < bndry_vals.number_of_elements(); i++) - { - if (std::find(keys, keys_end, bndry_vals[i]) != keys_end) - { - new_bnd_attribs.push_back(bndry_vals[i]); - for (size_t j = 0; j < num_points_in_element; j++) - { - new_connectivity.push_back(element_points[current_element + j]); - } - } - - current_element += num_points_in_element; - } - - conduit::Node &attrib_node = meshBlueprint.fetch_existing("fields/boundary_attribute/values"); - attrib_node.reset(); - attrib_node.set(new_bnd_attribs); - - conduit::Node &conn_node = meshBlueprint.fetch_existing("topologies/boundary/elements/connectivity"); - conn_node.reset(); - conn_node.set(new_connectivity); -#endif -} - //--------------------------------------------------------------------------- void TetonDriver::readConduitInputs() { @@ -1483,6 +1089,18 @@ void TetonDriver::readConduitInputs() std::cout << "Teton driver: creating Teton mesh node from blueprint node " << "\n"; } + + // If certain objects exist in the blueprint now, remove them since they will be regenerated. + const std::vector removals{"topologies/main_corner", + "topologies/main_face", + "fields/face_attribute", + "adjsets/main_corner", + "adjsets/main_face"}; + for (const auto &path : removals) + { + if (meshBlueprint.has_path(path)) + meshBlueprint.remove(path); + } } //--------------------------------------------------------------------------- @@ -1544,6 +1162,16 @@ void TetonDriver::setOptions() options["sweep/kernel/version"] = sweep_kernel; } + if (sweep_numhyperdomains > -1) + { + options["sweep/sn/numhyperdomains"] = sweep_numhyperdomains; + } + + if (gta_numhyperdomains > -1) + { + options["sweep/gta/numhyperdomains"] = gta_numhyperdomains; + } + if (useCUDASweep == true) { options["size/useCUDASweep"] = true; @@ -1570,32 +1198,57 @@ void TetonDriver::setOptions() #if defined(TETON_ENABLE_UMPIRE) if (myRank == 0) { - std::cout << "Teton driver: Enabling use of Umpire CPU and GPU memory pools..." << std::endl; + if (useUmpire == 1) + { + std::cout + << "Teton driver: Enabling use of Umpire for single memory pool backed by CPU native allocator." + << std::endl; + } + else if (useUmpire == 2) + { + std::cout + << "Teton driver: Enabling use of Umpire for separate memory pools for host and accelerator (cuda or hip)." + << std::endl; + } + else if (useUmpire == 3) + { + std::cout + << "Teton driver: Enabling use of Umpire for single memory pool backed by accelerator (cuda or hip) allocator." + << std::endl; + } } auto &rm = umpire::ResourceManager::getInstance(); - - // Create umpire allocators. - auto host_pinned_pool = rm.makeAllocator("HOST_PINNED_QUICK_POOL", - rm.getAllocator("PINNED")); - auto thread_safe_host_pinned_pool = rm.makeAllocator( - "THREAD_SAFE_PINNED_QUICK_POOL", - host_pinned_pool); - - options["memory_allocator/umpire_host_allocator_id"] = thread_safe_host_pinned_pool.getId(); - if (useUmpire > 1) + if (useUmpire == 1 || useUmpire == 2) + { + // Create umpire allocators. + auto host_pinned_pool = rm.makeAllocator("HOST_PINNED_QUICK_POOL", + rm.getAllocator("PINNED")); + auto thread_safe_host_pinned_pool = rm.makeAllocator( + "THREAD_SAFE_PINNED_QUICK_POOL", + host_pinned_pool); + + options["memory_allocator/umpire_host_allocator_id"] = thread_safe_host_pinned_pool.getId(); + } + if (useUmpire == 2 || 3) { auto device_pool = rm.makeAllocator("DEVICE_QUICK_POOL", rm.getAllocator("DEVICE")); auto thread_safe_device_pool = rm.makeAllocator( "THREAD_SAFE_DEVICE_QUICK_POOL", device_pool); - options["memory_allocator/umpire_device_allocator_id"] = thread_safe_device_pool.getId(); - } - else - { - options["memory_allocator/umpire_device_allocator_id"] = -1; + + if (useUmpire == 2) + { + options["memory_allocator/umpire_device_allocator_id"] = thread_safe_device_pool.getId(); + } + else + { + options["memory_allocator/umpire_host_allocator_id"] = thread_safe_device_pool.getId(); + } } + print_umpire_usage(); + #else if (myRank == 0) { @@ -1629,6 +1282,8 @@ void TetonDriver::setOptions() { options["iteration/dtrad"] = fixedDT; } + + options["partitioning"] = partition; } //--------------------------------------------------------------------------- @@ -1636,14 +1291,6 @@ void TetonDriver::verifyMesh() { conduit::Node &meshBlueprint = myTetonObject.getMeshBlueprint(); - // Can remove this when MFEM is fixed to stop writing out empty adjacency sets. - // -- black27 - if (meshBlueprint.has_path("adjsets/main_adjset/groups") - && meshBlueprint.fetch_existing("adjsets/main_adjset/groups").number_of_children() == 0) - { - meshBlueprint.remove("adjsets"); - } - // Verify the blueprint is valid. std::string protocol = "mesh"; conduit::Node info; @@ -1651,11 +1298,12 @@ void TetonDriver::verifyMesh() } //--------------------------------------------------------------------------- -void TetonDriver::cycleLoop(double &dtrad, double &timerad, unsigned long num_unknowns) +void TetonDriver::cycleLoop(double &dtrad, double &timerad) { CALI_CXX_MARK_SCOPE("Teton_Cycle_Loop"); conduit::Node &options = myTetonObject.getOptions(); + const conduit::Node &datastore = myTetonObject.getDatastore(); for (int cycle = 1; cycle <= cycles; cycle++) { if (dumpViz) @@ -1666,12 +1314,6 @@ void TetonDriver::cycleLoop(double &dtrad, double &timerad, unsigned long num_un } myTetonObject.dump(comm, outputPath); } - if (myRank == 0) - { - std::cout << "----------" << std::endl; - std::cout << "CYCLE " << cycle << std::endl; - std::cout << "----------" << std::endl; - } timerad = timerad + dtrad; options["iteration/timerad"] = timerad; @@ -1681,8 +1323,8 @@ void TetonDriver::cycleLoop(double &dtrad, double &timerad, unsigned long num_un double inner_elapsed_time = inner_end_time - inner_start_time; if (myRank == 0) { - std::cout << "Teton driver: CYCLE WALL TIME = " << inner_elapsed_time << " seconds." << std::endl; - std::cout << "Teton driver: CYCLE UNKNOWNS/SECOND = " << num_unknowns / inner_elapsed_time << std::endl; + std::cout << "Teton driver: CPU MEM USE (rank 0): " << getCurrentRSS() / 1024.0 / 1024.0 << "MB" << std::endl; + print_umpire_usage(); } // Either setTimeStep(cycle, dtrad) can be called to update time step, or // these can be directly updated in the options. @@ -1691,6 +1333,8 @@ void TetonDriver::cycleLoop(double &dtrad, double &timerad, unsigned long num_un dtrad = fixedDT; } options["iteration/dtrad"] = dtrad; + + total_num_flux_iterations += datastore.fetch_existing("rtedits/ninrt").as_int(); } } @@ -1842,69 +1486,72 @@ void TetonDriver::buildBlueprintTiledMesh() } //--------------------------------------------------------------------------- -void TetonDriver::writeStartSummary(unsigned int ndims, - unsigned long local_num_corners, - unsigned long num_corners, - unsigned long &num_unknowns, - unsigned long &local_num_unknowns) const +void TetonDriver::writeStartSummary(unsigned int ndims, unsigned long num_corners, unsigned long &num_unknowns) const { if (myRank == 0) { const conduit::Node &options = myTetonObject.getOptions(); - unsigned int num_groups = options.fetch_existing("quadrature/num_groups").to_unsigned_int(); - unsigned int num_polar_angles = options.fetch_existing("quadrature/npolar").to_unsigned_int(); - unsigned int num_azimuthal_angles = options.fetch_existing("quadrature/nazimu").to_unsigned_int(); unsigned int num_angles = 0; - int quadrature_type = options.fetch_existing("quadrature/qtype").value(); - if (quadrature_type == 1) //level-symmetric quadrature + unsigned int num_groups = options.fetch_existing("quadrature/num_groups").to_unsigned_int(); + if (ndims > 1) { - int quadrature_order = options.fetch_existing("quadrature/qorder").value(); - // Assume RZ, as we don't support XY - if (ndims == 2) - { - num_angles = quadrature_order * (quadrature_order + 6) / 2; - } - else if (ndims == 3) + unsigned int num_polar_angles = options.fetch_existing("quadrature/npolar").to_unsigned_int(); + unsigned int num_azimuthal_angles = options.fetch_existing("quadrature/nazimu").to_unsigned_int(); + int quadrature_type = options.fetch_existing("quadrature/qtype").value(); + if (quadrature_type == 1) //level-symmetric quadrature { - num_angles = quadrature_order * (quadrature_order + 2); + int quadrature_order = options.fetch_existing("quadrature/qorder").value(); + // Assume RZ, as we don't support XY + if (ndims == 2) + { + num_angles = quadrature_order * (quadrature_order + 6) / 2; + } + else if (ndims == 3) + { + num_angles = quadrature_order * (quadrature_order + 2); + } + else + { + //TODO - Error out. + } } - else + else if (quadrature_type == 2) { - //TODO - Error out. + // Assume RZ, as we don't support XY + if (ndims == 2) + { + // 2D has four quadrants, and RZ has additional starting/finishing angles, so add one to azimuthal angles. + num_angles = num_polar_angles * (num_azimuthal_angles + 1) * 4; + } + else if (ndims == 3) + { + // 3D has eight quadrants. + num_angles = num_polar_angles * num_azimuthal_angles * 8; + } + else + { + //TODO - Error out. + } } } - else if (quadrature_type == 2) + else //ndim == 1 { - // Assume RZ, as we don't support XY - if (ndims == 2) - { - // 2D has four quadrants, and RZ has additional starting/finishing angles, so add one to azimuthal angles. - num_angles = num_polar_angles * (num_azimuthal_angles + 1) * 4; - } - else if (ndims == 3) - { - // 3D has eight quadrants. - num_angles = num_polar_angles * num_azimuthal_angles * 8; - } - else - { - //TODO - Error out. - } + int quadrature_order = options.fetch_existing("quadrature/qorder").value(); + num_angles = quadrature_order + 2; // assuming spherical } num_unknowns = num_corners * num_angles * num_groups; - local_num_unknowns = local_num_corners * num_angles * num_groups; std::cout << "=================================================================" << std::endl; std::cout << "=================================================================" << std::endl; - std::cout << "Teton starting cycling\n"; + std::cout << "Test driver starting time steps\n"; std::cout << "=================================================================" << std::endl; std::cout << "Solving for " << num_unknowns << " global unknowns." << std::endl; std::cout << "(" << num_corners << " spatial elements * " << num_angles << " directions (angles) * " << num_groups << " energy groups)" << std::endl; // TODO - could beef this up to be a global memory estimate and global memory used, if we are testing problems with unbalanced mesh partition sizes, but this is meant as a rough memory estimate. - std::cout << "CPU memory needed (rank 0) for PSI: " << local_num_unknowns * sizeof(double) / 1024.0 / 1024.0 - << "MB" << std::endl; + std::cout << "CPU memory needed per rank (average) for radiation intensity (PSI): " + << num_unknowns / mySize * sizeof(double) / 1024.0 / 1024.0 << "MB" << std::endl; std::cout << "Current CPU memory use (rank 0): " << getCurrentRSS() / 1024.0 / 1024.0 << "MB" << std::endl; if (options.has_path("iteration/relativeTolerance")) { @@ -1917,23 +1564,27 @@ void TetonDriver::writeStartSummary(unsigned int ndims, } //--------------------------------------------------------------------------- -void TetonDriver::writeEndSummary(double end_time, - double start_time, - unsigned long num_unknowns, - unsigned long local_num_unknowns) +void TetonDriver::writeEndSummary(double end_time, double start_time, unsigned long num_unknowns) { - double elapsed_time = end_time - start_time; - if (myRank == 0) { + const conduit::Node &datastore = myTetonObject.getDatastore(); double avg_unknowns_per_second = num_unknowns * cycles / (end_time - start_time); + double avg_unknowns_per_second_per_iteration = num_unknowns * total_num_flux_iterations / (end_time - start_time); std::cout << std::endl; std::cout << "=================================================================" << std::endl; std::cout << "=================================================================" << std::endl; - std::cout << "Teton finished cycling\n"; + std::cout << "Test driver finished time steps\n"; std::cout << "=================================================================" << std::endl; - std::cout << "Average number of unknowns/second solved per cycle was " << avg_unknowns_per_second << std::endl; + std::cout << "Average throughput of single iteration of iterative solver was " + << avg_unknowns_per_second_per_iteration << " unknowns calculated per second." << std::endl; + std::cout << "Throughput of iterative solver was " << avg_unknowns_per_second + << " unknowns calculated per second." << std::endl; + std::cout << "(average throughput of single iteration * # iterations for solver to produce answer" << std::endl; + std::cout << std::endl; + std::cout << "Total number of flux solver iterations for run: " << total_num_flux_iterations << std::endl; + std::cout << "Total wall time for run: " << end_time - start_time << " seconds." << std::endl; std::cout << "=================================================================" << std::endl; std::cout << std::endl; @@ -1941,8 +1592,6 @@ void TetonDriver::writeEndSummary(double end_time, // Also appends some problem state. if (benchmarkProblem > 0) { - const conduit::Node &datastore = myTetonObject.getDatastore(); - double energy_radiation = datastore.fetch_existing("rtedits/EnergyRadiation").value(); double max_electron_temp = datastore.fetch_existing("rtedits/TeMax").value(); double max_radiation_temp = datastore.fetch_existing("rtedits/TrMax").value(); @@ -1955,26 +1604,42 @@ void TetonDriver::writeEndSummary(double end_time, double energy_check = datastore.fetch_existing("rtedits/EnergyCheck").value(); - double memForPSI = local_num_unknowns * sizeof(double); + double memForPSI = num_unknowns / mySize * sizeof(double); std::ofstream outfile; outfile.precision(16); std::string filePath = outputPath + "/" + label + ".csv"; - std::string firstline; - if (access(filePath.c_str(), F_OK) == -1) - { - firstline - = "# mpi ranks, Memory required for PSI (bytes), memory actual (bytes), # solver unknowns, solver throughput (# unknowns solved per second), energy check, energy in radiation field, maximum electron temperature, maximum radiation temperature, incident power, escaping power, power absorbed, power emitted"; - } + + // Check if file exists. If not, need to add header line for .csv, otherwise just append new rows. + bool isNewFile = (access(filePath.c_str(), F_OK) == -1); + + const std::string header = "# mpi ranks, " + "Mem for PSI (kb), " + "process rss mem (kb), " + "# solver unknowns (extents of PSI), " + "total # flux iterations, " + "# time steps, " + "walltime(seconds)," + "energy check, " + "energy in radiation field, " + "maximum electron temperature, " + "maximum radiation temperature, " + "incident power, " + "escaping power, " + "power absorbed, " + "power emitted" + "\n"; + outfile.open(filePath, std::ios_base::app); - if (!firstline.empty()) + if (isNewFile) { - outfile << firstline << "\n"; + outfile << header; } - outfile << mySize << ", " << memForPSI << ", " << getCurrentRSS() << ", " << num_unknowns << ", " - << avg_unknowns_per_second << ", " << energy_check << ", " << energy_radiation << ", " - << max_electron_temp << ", " << max_radiation_temp << ", " << power_incident << ", " << power_escape - << ", " << power_absorbed << ", " << power_emitted << "\n"; + outfile << mySize << ", " << memForPSI / 1024.0 << ", " << getCurrentRSS() / 1024.0 << ", " << num_unknowns + << ", " << total_num_flux_iterations << ", " << cycles << ", " << end_time - start_time << ", " + << energy_check << ", " << energy_radiation << ", " << max_electron_temp << ", " << max_radiation_temp + << ", " << power_incident << ", " << power_escape << ", " << power_absorbed << ", " << power_emitted + << "\n"; outfile.close(); @@ -2006,42 +1671,60 @@ void TetonDriver::writeEndSummary(double end_time, } //--------------------------------------------------------------------------- -void TetonDriver::finalize() +void TetonDriver::print_umpire_usage() { #if defined(TETON_ENABLE_UMPIRE) - if (useGPU == 1 && useUmpire > 0) + if (useUmpire > 0) { - if (myRank == 0) - { - std::cout << "Teton driver: Deleting Umpire CPU and GPU memory pools..." << std::endl; - } + conduit::Node &options = myTetonObject.getOptions(); auto &rm = umpire::ResourceManager::getInstance(); - // Release memory from umpire allocators - auto thread_safe_host_pinned_pool = rm.getAllocator("THREAD_SAFE_PINNED_QUICK_POOL"); - if (myRank == 0) - print_bytes_as_gb("Teton driver: Thread safe host pinned pool size: ", - thread_safe_host_pinned_pool.getActualSize()); - thread_safe_host_pinned_pool.release(); - auto host_pinned_pool = rm.getAllocator("HOST_PINNED_QUICK_POOL"); - if (myRank == 0) - print_bytes_as_gb("Teton driver: Host pinned (parent) pool size: ", host_pinned_pool.getActualSize()); - host_pinned_pool.release(); + int host_allocator_id = options.fetch_existing("memory_allocator/umpire_host_allocator_id").value(); + int device_allocator_id = options.fetch_existing("memory_allocator/umpire_device_allocator_id").value(); - if (useUmpire > 1) + if (host_allocator_id != -1) { - auto thread_safe_device_pool = rm.getAllocator("THREAD_SAFE_DEVICE_QUICK_POOL"); + auto pool = rm.getAllocator(host_allocator_id); if (myRank == 0) - print_bytes_as_gb("Teton driver: Thread safe device pool size: ", thread_safe_device_pool.getActualSize()); - thread_safe_device_pool.release(); - auto device_pool = rm.getAllocator("DEVICE_QUICK_POOL"); + { + print_bytes_as_gb("Teton driver: Umpire pool size: ", pool.getActualSize()); + } + } + + if (device_allocator_id != -1) + { + auto pool = rm.getAllocator(device_allocator_id); if (myRank == 0) - print_bytes_as_gb("Teton driver: Device (parent) pool size: ", device_pool.getActualSize()); - device_pool.release(); + { + print_bytes_as_gb("Teton driver: Umpire device pool size: ", pool.getActualSize()); + } + } + } +#endif +} + +//--------------------------------------------------------------------------- +void TetonDriver::finalize() +{ +#if defined(TETON_ENABLE_UMPIRE) + if (useUmpire > 0) + { + auto &rm = umpire::ResourceManager::getInstance(); + conduit::Node &options = myTetonObject.getOptions(); + + int host_allocator_id = options.fetch_existing("memory_allocator/umpire_host_allocator_id").value(); + int device_allocator_id = options.fetch_existing("memory_allocator/umpire_device_allocator_id").value(); + + if (host_allocator_id != -1) + { + rm.getAllocator(host_allocator_id).release(); + } + + if (device_allocator_id != -1) + { + rm.getAllocator(host_allocator_id).release(); } - if (myRank == 0) - print_gpu_mem("Teton driver: After Umpire device pool release."); } #endif @@ -2056,23 +1739,6 @@ void TetonDriver::finalize() //--------------------------------------------------------------------------- void TetonDriver::release() { -#if defined(TETON_ENABLE_MFEM) - if (mesh != nullptr) - { - delete mesh; - mesh = nullptr; - } - if (pmesh != nullptr) - { - delete pmesh; - pmesh = nullptr; - } - if (conduit_data_collec != nullptr) - { - delete conduit_data_collec; - conduit_data_collec = nullptr; - } -#endif } //========================================================== diff --git a/src/teton/gpu/CMakeLists.txt b/src/teton/gpu/CMakeLists.txt index 1a2dc7a..141af75 100644 --- a/src/teton/gpu/CMakeLists.txt +++ b/src/teton/gpu/CMakeLists.txt @@ -1,4 +1,5 @@ target_sources( teton PRIVATE + CornerSweepUCBrz_OMPOL.F90 CornerSweepUCBxyz_OMPOL.F90 InitSweepGreyUCBrz_OMPOL.F90 InitSweepGreyUCBxyz_OMPOL.F90 diff --git a/src/teton/gpu/CornerSweepUCBrz_OMPOL.F90 b/src/teton/gpu/CornerSweepUCBrz_OMPOL.F90 new file mode 100644 index 0000000..790f64e --- /dev/null +++ b/src/teton/gpu/CornerSweepUCBrz_OMPOL.F90 @@ -0,0 +1,741 @@ +#include "macros.h" +#include "omp_wrappers.h" +!*********************************************************************** +! Last Update: 10/2016, PFN * +! * +! CornerSweepUCBrz_GPU - This routine calculates angular fluxes for a * +! single direction and multiple energy groups for * +! for an upstream corner-balance (UCB) spatial * +! in rz-geometry. The mesh is traversed corner +! by corner. * +! * +! The work is offloaded to a GPU in which each * +! computational "set" is a GPU block. The threads * +! assigned to the block compute one group in one * +! zone in a hyperplane (by definition, all of the * +! zones in a hyperplane are independent). * +! * +!*********************************************************************** + + subroutine CornerSweepUCBrz_GPU(nSets, sendIndex, savePsi) + + use, intrinsic :: iso_c_binding, only : c_int + use cmake_defines_mod, only : omp_device_team_thread_limit + use Options_mod + use kind_mod + use constant_mod + use Size_mod + use Geometry_mod + use QuadratureList_mod + use SetData_mod + use AngleSet_mod + use GroupSet_mod + use CodeChecks_mod + + implicit none + +! Arguments + + integer, intent(in) :: nSets + integer, intent(in) :: sendIndex + logical (kind=1), intent(in) :: savePsi + +! Local + + type(SetData), pointer :: Set + type(AngleSet), pointer :: ASet + type(GroupSet), pointer :: GSet + type(HypPlane), pointer :: HypPlanePtr + type(BdyExit), pointer :: BdyExitPtr + + integer :: setID + integer :: zSetID + integer :: Angle + integer :: g + integer :: Groups + + integer :: mCycle + integer :: offset + integer :: nAngleSets + integer :: nZoneSets + integer :: nHyperDomains + + integer :: nzones + integer :: ii + integer :: ndone + integer :: hyperPlane + integer :: domID + integer :: hplane1 + integer :: hplane2 + + real(adqt) :: tau + +! Local + + integer :: b + integer :: i + integer :: cface + integer :: cez + integer :: cfp + integer :: ifp + integer :: c + integer :: c0 + integer :: c1 + integer :: c2 + integer :: zone + integer :: nCorner + + real(adqt), parameter :: fouralpha=1.82d0 + + real(adqt) :: fac + real(adqt) :: sigA + real(adqt) :: sigA2 + real(adqt) :: source + real(adqt) :: SS + real(adqt) :: QQ + real(adqt) :: Qez + real(adqt) :: mult + + real(adqt) :: area + real(adqt) :: sig + real(adqt) :: sez + real(adqt) :: gnum + real(adqt) :: gden + real(adqt) :: quadTauW1 + real(adqt) :: quadTauW2 + + real(adqt) :: afp + real(adqt) :: aez + real(adqt) :: R + real(adqt) :: R2 + real(adqt) :: R_afp + real(adqt) :: R_afp2 + +! Dynamic + + integer, allocatable :: angleList(:) + +! Constants + tau = Size% tau + nAngleSets = getNumberOfAngleSets(Quad) + nZoneSets = getNumberOfZoneSets(Quad) + nHyperDomains = getNumberOfHyperDomains(Quad,1) + + allocate( angleList(nAngleSets) ) + + do setID=1,nSets + Set => Quad% SetDataPtr(setID) + angleList(Set% angleSetID) = Set% AngleOrder(sendIndex) + enddo + +! Here the maximum block size is the product of the maximum +! number of zones in a hyperplane and the number of groups; +! The maximum value is the over all teams + +! Note: num_blocks = nSets and the number of threads per +! team (a.k.a. "block") <= block_threads + + ! Verify we won't get out-of-bounds accesses below. + TETON_CHECK_BOUNDS1(Quad%AngSetPtr, nAngleSets) + TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) + TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) + + TOMP(target enter data map(to: tau, sendIndex, angleList)) + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(ASet, angle) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nZoneSets, nAngleSets,Geom, angleList, Quad)&) + TOMPC(private(ASet, angle)) +#endif + + ZoneSetLoop: do zSetID=1,nZoneSets + +! Loop over angle sets + + do setID=1,nAngleSets + + ASet => Quad% AngSetPtr(setID) + angle = angleList(setID) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Geom, ASet, Angle, zSetID) +#endif + do c=Geom% corner1(zSetID),Geom% corner2(zSetID) + do cface=1,2 + ASet% AfpNorm(cface,c) = DOT_PRODUCT( ASet% omega(:,angle),Geom% A_fp(:,cface,c) ) + ASet% AezNorm(cface,c) = DOT_PRODUCT( ASet% omega(:,angle),Geom% A_ez(:,cface,c) ) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC +!$omp end parallel do +#endif + + enddo + + enddo ZoneSetLoop + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(ASet, angle, fac, R_afp, R_afp2, R, R2) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, nAngleSets, Geom, angleList, Quad)&) + TOMPC(private(ASet, angle, fac, R_afp, R_afp2, R, R2)) +#endif + + ZoneSetLoop2: do zSetID=1,nZoneSets + +! Loop over angle sets + + do setID=1,nAngleSets + + ASet => Quad% AngSetPtr(setID) + angle = angleList(setID) + fac = ASet% angDerivFac(Angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(R_afp, R_afp2, R, R2) +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, ASet, angle, fac, zSetID) private(R_afp, R_afp2, R, R2) +#endif + + do c=Geom% corner1(zSetID),Geom% corner2(zSetID) + R_afp = Geom% RadiusFP(1,c) + R_afp2 = Geom% RadiusFP(2,c) + R = Geom% RadiusEZ(1,c) + R2 = Geom% RadiusEZ(2,c) + + ASet% ANormSum(c) = fac*Geom% Area(c) - half*( & + R_afp *(ASet% AfpNorm(1,c) - abs(ASet% AfpNorm(1,c))) + & + R_afp2*(ASet% AfpNorm(2,c) - abs(ASet% AfpNorm(2,c))) + & + R *(ASet% AezNorm(1,c) - abs(ASet% AezNorm(1,c))) + & + R2 *(ASet% AezNorm(2,c) - abs(ASet% AezNorm(2,c))) ) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + enddo + + enddo ZoneSetLoop2 + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, Angle, Groups, offSet, c) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(sendIndex, Quad, nSets) &) + TOMPC(private(Set, ASet, HypPlanePtr, Angle, Groups, offSet, c)) +#endif + + SetLoop0: do setID=1,nSets + + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + offSet = ASet% cycleOffSet(angle) + HypPlanePtr => ASet% HypPlanePtr(angle) + +! Initialize boundary values in Psi1 and interior values on the cycle +! list + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) +#endif + do mCycle=1,ASet% numCycles(Angle) + do g=1,Groups + c = ASet% cycleList(offSet+mCycle) + Set% Psi1(g,c) = Set% cyclePsi(g,offSet+mCycle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups, Angle) +#endif + do b=1,Set%nbelem + do g=1,Groups + Set% Psi1(g,Set%nCorner+b) = Set% PsiB(g,b,Angle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + +! Initialize values at hyper-domain interfaces + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(b) + Set% Psi1(g,c) = Set% PsiInt(g,b,angle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + enddo SetLoop0 + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +! TODO: +! IBM XLF segfaults if 'mCycle', 'b', and 'g' are not scoped to private below. This should not +! be necessary, as these are loop control variables which the runtime should automatically scope to private. +! +! Relevant portions of OpenMP spec: +! `The loop iteration variable in any associated loop of a for, parallel for, +! taskloop, or distribute construct is private.` +! +! `A loop iteration variable for a sequential loop in a parallel or task +! generating construct is private in the innermost such construct that encloses +! the loop.` +! +! Look into reporting this bug to IBM, using UMT as a reproducer. + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang collapse(2) num_gangs(nSets*nHyperDomains) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, GSet, HypPlanePtr, Swp, Angle, Groups, hplane1, hplane2, ndone) & + !$acc& private(hyperPlane, nzones, fac, c, c0, c1, c2, cfp, cez, zone, nCorner) & + !$acc& private(sigA, sigA2, source, area, sig, sez, SS, QQ, Qez) & + !$acc& private(gnum, gden, aez, afp, R, R_afp, ifp, mult) +#else + TOMP(target teams distribute collapse(2) num_teams(nSets*nHyperDomains) default(none) &) + TOMPC(thread_limit(omp_device_team_thread_limit) &) + TOMPC(shared(nSets, nHyperDomains, Quad, Geom, sendIndex, tau)&) + TOMPC(private(Set, ASet, GSet, HypPlanePtr, Angle, Groups, hplane1, hplane2, ndone) &) + TOMPC(private(b, g, hyperPlane, nzones, fac, c, c0, c1, c2, cfp, cez, zone, nCorner)&) + TOMPC(private(sigA, sigA2, source, area, sig, sez, SS, QQ, Qez)&) + TOMPC(private(gnum, gden, aez, afp, R, R_afp, ifp, mult)) +#endif + + SetLoop: do setID=1,nSets + DomainLoop: do domID=1,nHyperDomains + + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + GSet => Quad% GrpSetPtr(Set% groupSetID) + + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + HypPlanePtr => ASet% HypPlanePtr(Angle) + hplane1 = HypPlanePtr% hplane1(domID) + hplane2 = HypPlanePtr% hplane2(domID) + ndone = HypPlanePtr% ndone(domID) + +! Angle Constants + + fac = ASet% angDerivFac(Angle) + + HyperPlaneLoop: do hyperPlane=hplane1,hplane2 + + nCorner = HypPlanePtr% cornersInPlane(hyperPlane) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c, c0, c1, c2, cez, cfp, cface, ifp, zone, sigA, sigA2, source) & + !$acc& private(SS, QQ, Qez, area, sig, sez, gnum, gden, aez, afp, R, R_afp) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Geom, ASet, GSet, Angle, nCorner, Groups, ndone, tau, fac) & + !$omp& private(c, c0, c1, c2, cez, cfp, cface, ifp, zone, sigA, sigA2, source, mult) & + !$omp& private(SS, QQ, Qez, area, sig, sez, gnum, gden, aez, afp, R, R_afp) +#endif + + CornerLoop: do ii=1,nCorner + GroupLoop: do g=1,Groups + +! Loop through all of the corners using the NEXT list + + c = ASet% nextC(ndone+ii,Angle) + zone = Geom% CToZone(c) + c0 = Geom% cOffSet(zone) + + sig = GSet% Sigt(g,zone) + +! Contributions from volume terms (if a starting direction add angular +! derivative) + + source = GSet% STotal(g,c) + tau*Set% Psi(g,c,Angle) + SS = Geom% Volume(c)*source + fac*Geom% Area(c)*Set% PsiM(g,c) + + do cface=1,2 + afp = ASet% AfpNorm(cface,c) + cfp = Geom% cFP(cface,c) + R_afp = Geom% RadiusFP(cface,c)*afp + + if ( afp < zero ) then + SS = SS - R_afp*Set% Psi1(g,cfp) + endif + enddo + + CornerFaceLoop: do cface=1,2 + + aez = ASet% AezNorm(cface,c) + + if (aez > zero ) then + + c1 = c + c2 = c0 + Geom% cEZ(cface,c) + QQ = source + Qez = GSet% STotal(g,c2) + tau*Set% Psi(g,c2,Angle) + mult = one + ifp = cface + + elseif (aez < zero ) then + + c2 = c + c1 = c0 + Geom% cEZ(cface,c) + QQ = GSet% STotal(g,c1) + tau*Set% Psi(g,c1,Angle) + Qez = source + mult = -one + ifp = mod(cface,2) + 1 + R = Geom% RadiusEZ(cface,c) + +! Contributions from upsteam fluxes in the same zone + SS = SS - R*aez*Set% Psi1(g,c1) + aez = -aez + + endif + + afp = ASet% AfpNorm(ifp,c1) + R = Geom% RadiusEZ(ifp,c1) + + if ( afp < zero ) then + + cfp = Geom% cFP(ifp,c1) + area = Geom% Area(c1) + sigA = sig*area + sigA2 = sigA*sigA + + gnum = aez*aez*( fouralpha*sigA2 + & + aez*(four*sigA + three*aez) ) + + gden = area*(four*sigA*sigA2 + aez*(six*sigA2 + & + two*aez*(two*sigA + aez))) + + sez = R*( area*gnum*( sig*Set% Psi1(g,cfp) - QQ ) + & + half*aez*gden*( QQ - Qez ) )/ & + ( gnum + gden*sig ) + + SS = SS + mult*sez + + else + + sez = half*R*aez*( QQ - Qez )/sig + SS = SS + mult*sez + + endif + + enddo CornerFaceLoop + +! Corner angular flux + Set% Psi1(g,c) = SS/(ASet% ANormSum(c) + sig*Geom% Volume(c)) + + enddo GroupLoop + enddo CornerLoop +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + ndone = ndone + nCorner + + enddo HyperPlaneLoop + + enddo DomainLoop + enddo SetLoop + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, Angle, Groups, quadTauW1, quadTauW2) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nSets, Quad, sendIndex)&) + TOMPC(private(Set, ASet, Angle, Groups, quadTauW1, quadTauW2)) +#endif + + SetLoop2: do setID=1,nSets + + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + +! Set the "half-angle" angular intensity (PsiM) for the next angle + + if ( ASet% StartingDirection(Angle) ) then + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups) +#endif + do c=1,Set% nCorner + do g=1,Groups + Set% PsiM(g,c) = Set% Psi1(g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + else + + quadTauW1 = ASet% quadTauW1(Angle) + quadTauW2 = ASet% quadTauW2(Angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups, quadTauW1, quadTauW2) +#endif + + do c=1,Set% nCorner + do g=1,Groups + Set% PsiM(g,c) = quadTauW1*Set% Psi1(g,c) - & + quadTauW2*Set% PsiM(g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + endif + + enddo SetLoop2 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Angle, Groups, b, c) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nSets, Quad, sendIndex)&) + TOMPC(private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Angle, Groups, b, c)) +#endif + + SetLoop3: do setID=1,nSets + + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + offSet = ASet% cycleOffSet(angle) + BdyExitPtr => ASet% BdyExitPtr(Angle) + HypPlanePtr => ASet% HypPlanePtr(angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(b, c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, BdyExitPtr, Angle, Groups) private(b, c) +#endif + do i=1,BdyExitPtr% nxBdy + do g=1,Groups + b = BdyExitPtr% bdyList(1,i) + c = BdyExitPtr% bdyList(2,i) + + Set% PsiB(g,b,Angle) = Set% Psi1(g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + +! Update Interface Elements + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) +#endif + + do i=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(i) + Set% PsiInt(g,i,angle) = Set% Psi1(g,c) + enddo + enddo + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + +! Update Psi in the cycle list + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) +#endif + do mCycle=1,ASet% numCycles(angle) + do g=1,Groups + c = ASet% cycleList(offSet+mCycle) + Set% cyclePsi(g,offSet+mCycle) = Set% Psi1(g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + + if ( ASet% FinishingDirection(Angle+1) ) then + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(b, c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, BdyExitPtr, Angle, Groups) private(b, c) +#endif + do i=1,BdyExitPtr% nxBdy + do g=1,Groups + b = BdyExitPtr% bdyList(1,i) + c = BdyExitPtr% bdyList(2,i) + + Set% PsiB(g,b,Angle+1) = Set% PsiM(g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC +!$omp end parallel do +#endif + + endif + + enddo SetLoop3 + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +! We only store Psi if this is the last transport sweep in the time step + + if ( savePsi ) then + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, Angle, Groups) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nSets, sendIndex, Quad)&) + TOMPC(private(Set, ASet, Angle, Groups)) +#endif + SetLoop4: do setID=1,nSets + + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, ASet, Angle, Groups) +#endif + CornerLoop4: do c=1,Set% nCorner + GroupLoop4: do g=1,Groups + + Set% Psi(g,c,Angle) = Set% Psi1(g,c) + + if ( ASet% FinishingDirection(Angle+1) ) then + Set% Psi(g,c,Angle+1) = Set% PsiM(g,c) + endif + + enddo GroupLoop4 + enddo CornerLoop4 +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + enddo SetLoop4 + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + endif + + + TOMP(target exit data map(always,release: tau, sendIndex, angleList)) + + deallocate( angleList ) + + + return + end subroutine CornerSweepUCBrz_GPU + diff --git a/src/teton/gpu/CornerSweepUCBxyz_OMPOL.F90 b/src/teton/gpu/CornerSweepUCBxyz_OMPOL.F90 index 904e17f..faf0bbe 100644 --- a/src/teton/gpu/CornerSweepUCBxyz_OMPOL.F90 +++ b/src/teton/gpu/CornerSweepUCBxyz_OMPOL.F90 @@ -30,7 +30,7 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) use SetData_mod use AngleSet_mod use GroupSet_mod - use ArrayChecks_mod + use CodeChecks_mod implicit none @@ -48,24 +48,27 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) type(HypPlane), pointer :: HypPlanePtr type(BdyExit), pointer :: BdyExitPtr - integer :: setID - integer :: zSetID - integer :: Angle - integer :: g - integer :: Groups + integer :: setID + integer :: zSetID + integer :: Angle + integer :: g + integer :: Groups - integer :: mCycle - integer :: offSet - integer :: nAngleSets - integer :: nZoneSets + integer :: mCycle + integer :: offSet + integer :: nAngleSets + integer :: nZoneSets + integer :: nHyperDomains - integer :: nzones - integer :: ii - integer :: ndoneZ - integer :: hyperPlane - integer :: nHyperplanes + integer :: nzones + integer :: ii + integer :: ndone + integer :: hyperPlane + integer :: domID + integer :: hplane1 + integer :: hplane2 - real(adqt) :: tau + real(adqt) :: tau ! Local @@ -80,8 +83,9 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) integer :: nCFaces integer :: zone - integer :: zone0 integer :: nCorner + integer :: c1 + integer :: c2 real(adqt), parameter :: fouralpha=1.82d0 @@ -98,6 +102,10 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) real(adqt) :: sez real(adqt) :: psi_opp real(adqt) :: afp + real(adqt) :: Qez + real(adqt) :: QQ + real(adqt) :: SS + real(adqt) :: mult ! Dynamic @@ -105,9 +113,10 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) ! Constants - tau = Size% tau - nAngleSets = getNumberOfAngleSets(Quad) - nZoneSets = getNumberOfZoneSets(Quad) + tau = Size% tau + nAngleSets = getNumberOfAngleSets(Quad) + nZoneSets = getNumberOfZoneSets(Quad) + nHyperDomains = getNumberOfHyperDomains(Quad,1) allocate( angleList(nAngleSets) ) @@ -130,11 +139,8 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) -#ifdef TETON_ENABLE_OPENACC - !$acc data copyin(tau, sendIndex, angleList) -#else - TOMP(target enter data map(to: tau, sendIndex, angleList)) -#endif + + TOMP_MAP(target enter data map(to: tau, sendIndex, angleList)) #ifdef TETON_ENABLE_OPENACC !$acc parallel loop gang num_gangs(nZoneSets) & @@ -142,7 +148,7 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) !$acc& private(ASet, setID, Angle) #else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) - TOMPC(private(ASet, setID, Angle) &) + TOMPC(private(ASet, setID, Angle, zSetID) &) TOMPC(shared(nZoneSets, angleList, Quad, Geom, nAngleSets) ) #endif @@ -160,10 +166,10 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) ! faster to split into two loops as below #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) + !$acc loop vector collapse(2) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Geom, ASet, Angle, zSetID) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Geom, ASet, Angle, zSetID) #endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do cface=1,3 @@ -172,14 +178,14 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif #ifdef TETON_ENABLE_OPENACC -!$acc loop vector + !$acc loop vector #else -!$omp parallel do default(none) & -!$omp& shared(Geom, ASet, Angle, zSetID) + !$omp parallel do default(none) & + !$omp& shared(Geom, ASet, Angle, zSetID) #endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do cface=4,Geom% nCFacesArray(c) @@ -188,7 +194,7 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif enddo @@ -196,20 +202,20 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo ZoneSetLoop #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang num_gangs(nZoneSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(ASet) + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(ASet) #else -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(private(ASet) &) -TOMPC(shared(nZoneSets, nAngleSets, Quad, Geom)) + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(private(ASet, setID) &) + TOMPC(shared(nZoneSets, nAngleSets, Quad, Geom)) #endif ZoneSetLoop2: do zSetID=1,nZoneSets @@ -221,10 +227,10 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) ASet => Quad% AngSetPtr(setID) #ifdef TETON_ENABLE_OPENACC -!$acc loop vector + !$acc loop vector #else -!$omp parallel do default(none) & -!$omp& shared(Geom, ASet, zSetID) + !$omp parallel do default(none) & + !$omp& shared(Geom, ASet, zSetID) #endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) @@ -237,7 +243,7 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif enddo @@ -245,20 +251,20 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo ZoneSetLoop2 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, ASet, Angle, Groups, offSet) + !$acc parallel loop gang num_gangs(nSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, angle, Groups, offSet, c) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(sendIndex, Quad, nSets) &) -TOMPC(private(Set, ASet, Angle, Groups, offSet)) + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(sendIndex, Quad, nSets) &) + TOMPC(private(Set, ASet, HypPlanePtr, angle, Groups, offSet, c)) #endif @@ -268,54 +274,74 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) ASet => Quad% AngSetPtr(Set% angleSetID) Groups = Set% Groups - Angle = Set% AngleOrder(sendIndex) + angle = Set% AngleOrder(sendIndex) offSet = ASet% cycleOffSet(angle) + HypPlanePtr => ASet% HypPlanePtr(angle) ! Initialize boundary values in Psi1 and interior values on the cycle ! list #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(c) + !$acc loop vector collapse(2) & + !$acc& private(c) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(angle, Set, ASet, offSet, Groups) private(c) #endif - do mCycle=1,ASet% numCycles(Angle) + do mCycle=1,ASet% numCycles(angle) do g=1,Groups c = ASet% cycleList(offSet+mCycle) Set% Psi1(g,c) = Set% cyclePsi(g,offSet+mCycle) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) + !$acc loop vector collapse(2) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Groups, Angle) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups, angle) #endif - do c=1,Set%nbelem + do b=1,Set%nbelem do g=1,Groups - Set% Psi1(g,Set%nCorner+c) = Set% PsiB(g,c,Angle) + Set% Psi1(g,Set%nCorner+b) = Set% PsiB(g,b,angle) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif - enddo SetLoop0 +! Initialize values at hyper-domain interfaces #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc loop vector collapse(2) & + !$acc& private(c) #else -TOMP(end target teams distribute) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(b) + Set% Psi1(g,c) = Set% PsiInt(g,b,angle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do #endif + enddo SetLoop0 + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + ! TODO: ! IBM XLF segfaults if 'hyperPlane' is not scoped to private below. ! This should not be necessary, as this is a loop control variables which the runtime should automatically scope to @@ -332,17 +358,20 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) ! Look into reporting this bug to IBM, using UMT as a reproducer. #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, ASet, GSet, HypPlanePtr, Angle, Groups) & -!$acc& private(nHyperPlanes, ndoneZ, nzones, hyperPlane) + !$acc parallel loop gang collapse(2) num_gangs(nSets*nHyperDomains) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, GSet, HypPlanePtr, Angle, Groups) & + !$acc& private(hplane1, hplane2, ndone, nCorner, hyperPlane) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) &) -TOMPC(private(Set, ASet, GSet, HypPlanePtr, Angle, Groups) &) -TOMPC(private(nHyperPlanes, ndoneZ, nzones, hyperPlane)) + TOMP(target teams distribute collapse(2) num_teams(nSets*nHyperDomains) &) + TOMPC(thread_limit(omp_device_team_thread_limit) &) + TOMPC(shared(Quad, sendIndex, nSets, nHyperDomains) &) + TOMPC(private(Set, ASet, GSet, HypPlanePtr, Angle, Groups) &) + TOMPC(private(hplane1, hplane2, ndone, nCorner, hyperPlane)) #endif SetLoop: do setID=1,nSets + DomainLoop: do domID=1,nHyperDomains Set => Quad% SetDataPtr(setID) ASet => Quad% AngSetPtr(Set% angleSetID) @@ -350,410 +379,162 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) Groups = Set% Groups Angle = Set% AngleOrder(sendIndex) - nHyperPlanes = ASet% nHyperPlanes(Angle) - ndoneZ = 0 HypPlanePtr => ASet% HypPlanePtr(Angle) + hplane1 = HypPlanePtr% hplane1(domID) + hplane2 = HypPlanePtr% hplane2(domID) + ndone = HypPlanePtr% ndone(domID) - HyperPlaneLoop: do hyperPlane=1,nHyperPlanes - - nzones = HypPlanePtr% zonesInPlane(hyperPlane) - -! NOTE: Two loop sections that follow do not support a collapse(3), -! as its not in canonical form (all loop bounds are not predetermined); -! it's significantly faster to split into two loops as below - -#ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(3) & -!$acc& private(c0,zone,zone0,nCorner,source,nCFaces,afp,cfp) -#else -!$omp parallel do collapse(3) default(none) & -!$omp& shared(Set, Geom, ASet, GSet, Angle, nzones, Groups) & -!$omp& shared(ndoneZ, tau) & -!$omp& private(c0,zone,zone0,source,nCFaces,afp,cfp) -#endif - - ZoneLoop0: do ii=1,nzones - CornerLoop0: do c=1,8 - GroupLoop0: do g=1,Groups - -! Loop through the zones using the NEXTZ list - - zone0 = ASet% nextZ(ndoneZ+ii,Angle) - zone = iabs( zone0 ) - c0 = Geom% cOffSet(zone) + HyperPlaneLoop: do hyperPlane=hplane1,hplane2 -! Contributions from volume terms - - source = GSet% STotal(g,c0+c) + tau*Set% Psi(g,c0+c,Angle) - Set% Q(g,c,ii) = source - Set% S(g,c,ii) = Geom% Volume(c0+c)*source - - nCFaces = Geom% nCFacesArray(c0+c) - - do cface=1,nCFaces - - afp = ASet% AfpNorm(cface,c0+c) - cfp = Geom% cFP(cface,c0+c) - - if ( afp < zero ) then - Set% S(g,c,ii) = Set% S(g,c,ii) - afp*Set% Psi1(g,cfp) - endif - enddo - - enddo GroupLoop0 - enddo CornerLoop0 - enddo ZoneLoop0 - -#ifndef TETON_ENABLE_OPENACC -!$omp end parallel do -#endif + nCorner = HypPlanePtr% cornersInPlane(hyperPlane) #ifdef TETON_ENABLE_OPENACC !$acc loop vector collapse(2) & -!$acc& private(c0,zone,zone0,nCorner,source,nCFaces,afp,cfp) +!$acc& private(c, c0, cfp, ifp, cez, zone, cface, i, nCFaces) & +!$acc& private(aez, area_opp, sig, vol, Qez, QQ, SS, source) & +!$acc& private(sigv, sigv2, sez, gnum, gden, psi_opp, afp, c1, c2, mult) #else !$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Geom, ASet, GSet, Angle, nzones, Groups) & -!$omp& shared(ndoneZ, tau) & -!$omp& private(c0,zone,zone0,source,nCFaces,afp,cfp) +!$omp& shared(Set, Geom, ASet, GSet, Angle, nCorner, Groups, ndone, tau) & +!$omp& private(c, c0, cfp, ifp, cez, zone, cface, i, nCFaces) & +!$omp& private(aez, area_opp, sig, vol, Qez, QQ, SS, source) & +!$omp& private(sigv, sigv2, sez, gnum, gden, psi_opp, afp, c1, c2, mult) #endif - ZoneLoop00: do ii=1,nzones - GroupLoop00: do g=1,Groups - -! Loop through the zones using the NEXTZ list + CornerLoop1: do ii=1,nCorner + GroupLoop1: do g=1,Groups - zone0 = ASet% nextZ(ndoneZ+ii,Angle) - zone = iabs( zone0 ) +! Loop through the corners using the NEXT list - CornerLoop00: do c=9,Geom% numCorner(zone) + c = ASet% nextC(ndone+ii,Angle) + zone = Geom% CToZone(c) + c0 = Geom% cOffSet(zone) + sig = GSet% Sigt(g,zone) - c0 = Geom% cOffSet(zone) +! Calculate Area_CornerFace dot Omega to determine the +! contributions from incident fluxes across external +! corner faces (FP faces) -! Contributions from volume terms + nCFaces = Geom% nCFacesArray(c) - source = GSet% STotal(g,c0+c) + tau*Set% Psi(g,c0+c,Angle) - Set% Q(g,c,ii) = source - Set% S(g,c,ii) = Geom% Volume(c0+c)*source +! Contributions from volume terms - nCFaces = Geom% nCFacesArray(c0+c) + source = GSet% STotal(g,c) + tau*Set% Psi(g,c,Angle) + SS = Geom% Volume(c)*source - do cface=1,nCFaces + do cface=1,nCFaces - afp = ASet% AfpNorm(cface,c0+c) - cfp = Geom% cFP(cface,c0+c) + afp = ASet% AfpNorm(cface,c) + cfp = Geom% cFP(cface,c) - if ( afp < zero ) then - Set% S(g,c,ii) = Set% S(g,c,ii) - afp*Set% Psi1(g,cfp) - endif - enddo - - enddo CornerLoop00 - enddo GroupLoop00 - enddo ZoneLoop00 - -#ifndef TETON_ENABLE_OPENACC -!$omp end parallel do -#endif - - -#ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(3) & -!$acc& private(c0,cfp,ifp,cez,zone,zone0,nCorner,nCFaces) & -!$acc& private(aez,area_opp,sig,vol) & -!$acc& private(sigv,sigv2,sez,gnum,gden,psi_opp,afp) -#else -!$omp parallel do collapse(3) default(none) & -!$omp& shared(Set, Geom, ASet, GSet, Angle, nzones, Groups, ndoneZ) & -!$omp& private(c0,cfp,ifp,cez,zone,zone0,nCFaces) & -!$omp& private(aez,area_opp,sig,vol) & -!$omp& private(sigv,sigv2,sez,gnum,gden,psi_opp,afp) -#endif + if ( afp < zero ) then + SS = SS - afp*Set% Psi1(g,cfp) + endif + enddo - ZoneLoop1: do ii=1,nzones - CornerLoop1: do c=1,8 - GroupLoop1: do g=1,Groups +! Contributions from interior corner faces (EZ faces) -! Loop through the zones using the NEXTZ list + do cface=1,nCFaces - zone0 = ASet% nextZ(ndoneZ+ii,Angle) - zone = iabs( zone0 ) - c0 = Geom% cOffSet(zone) + aez = ASet% AezNorm(cface,c) + mult = zero - sig = GSet% Sigt(g,zone) + if (aez > zero ) then -! Calculate Area_CornerFace dot Omega to determine the -! contributions from incident fluxes across external -! corner faces (FP faces) + c1 = c + c2 = c0 + Geom% cEZ(cface,c) + ifp = mod(cface,nCFaces) + 1 + QQ = source + Qez = GSet% STotal(g,c2) + tau*Set% Psi(g,c2,Angle) + mult = one - nCFaces = Geom% nCFacesArray(c0+c) + elseif (aez < zero ) then -! Contributions from interior corner faces (EZ faces) + c2 = c + c1 = c0 + Geom% cEZ(cface,c) - do cface=1,nCFaces +! Contributions from upsteam fluxes in the same zone + SS = SS - aez*Set% Psi1(g,c1) - aez = ASet% AezNorm(cface,c0+c) - cez = Geom% cEZ(cface,c0+c) - - if (aez > zero ) then - - area_opp = zero - psi_opp = zero - - ifp = mod(cface,nCFaces) + 1 - afp = ASet% AfpNorm(ifp,c0+c) - - if ( afp < zero ) then - cfp = Geom% cFP(ifp,c0+c) - area_opp = -afp - psi_opp = -afp*Set% Psi1(g,cfp) + do i=1,nCFaces + if (Geom% cEZ(i,c1) + c0 == c2) then + ifp = mod(i,nCFaces) + 1 endif + enddo - do i=2,nCFaces-2 - ifp = mod(ifp,nCFaces) + 1 - afp = ASet% AfpNorm(ifp,c0+c) - if ( afp < zero ) then - cfp = Geom% cFP(ifp,c0+c) - area_opp = area_opp - afp - psi_opp = psi_opp - afp*Set% Psi1(g,cfp) - endif - enddo - - TestOppositeFace: if ( area_opp > zero ) then - - psi_opp = psi_opp/area_opp - - vol = Geom% Volume(c0+c) - - sigv = sig*vol - sigv2 = sigv*sigv - - gnum = aez*aez*( fouralpha*sigv2 + & - aez*(four*sigv + three*aez) ) - - gden = vol*( four*sigv*sigv2 + aez*(six*sigv2 + & - two*aez*(two*sigv + aez)) ) - - sez = ( vol*gnum*( sig*psi_opp - Set% Q(g,c,ii) ) + & - half*aez*gden*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) ) )/ & - ( gnum + gden*sig) - - else - - sez = half*aez*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) )/sig - - endif TestOppositeFace - - ATOMIC_UPDATE - Set% S(g,c,ii) = Set% S(g,c,ii) + sez - ATOMIC_END - - ATOMIC_UPDATE - Set% S(g,cez,ii) = Set% S(g,cez,ii) - sez - ATOMIC_END - - endif - - enddo - - enddo GroupLoop1 - enddo CornerLoop1 - enddo ZoneLoop1 - -#ifndef TETON_ENABLE_OPENACC -!$omp end parallel do -#endif - -#ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(c0,cfp,ifp,cez,zone,zone0,nCorner,nCFaces) & -!$acc& private(aez,area_opp,sig,vol) & -!$acc& private(sigv,sigv2,sez,gnum,gden,psi_opp,afp) -#else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Geom, ASet, GSet, Angle, nzones, Groups, ndoneZ) & -!$omp& private(c0,cfp,ifp,cez,zone,zone0,nCFaces) & -!$omp& private(aez,area_opp,sig,vol) & -!$omp& private(sigv,sigv2,sez,gnum,gden,psi_opp,afp) -#endif - - ZoneLoop11: do ii=1,nzones - GroupLoop11: do g=1,Groups - -! Loop through the zones using the NEXTZ list - - zone0 = ASet% nextZ(ndoneZ+ii,Angle) - zone = iabs( zone0 ) - - CornerLoop11: do c=9,Geom% numCorner(zone) - - c0 = Geom% cOffSet(zone) - sig = GSet% Sigt(g,zone) - -! Calculate Area_CornerFace dot Omega to determine the -! contributions from incident fluxes across external -! corner faces (FP faces) - - nCFaces = Geom% nCFacesArray(c0+c) - -! Contributions from interior corner faces (EZ faces) - - do cface=1,nCFaces - - aez = ASet% AezNorm(cface,c0+c) - cez = Geom% cEZ(cface,c0+c) - - if (aez > zero ) then - - area_opp = zero - psi_opp = zero - - ifp = mod(cface,nCFaces) + 1 - afp = ASet% AfpNorm(ifp,c0+c) - - if ( afp < zero ) then - cfp = Geom% cFP(ifp,c0+c) - area_opp = -afp - psi_opp = -afp*Set% Psi1(g,cfp) - endif - - do i=2,nCFaces-2 - ifp = mod(ifp,nCFaces) + 1 - afp = ASet% AfpNorm(ifp,c0+c) - if ( afp < zero ) then - cfp = Geom% cFP(ifp,c0+c) - area_opp = area_opp - afp - psi_opp = psi_opp - afp*Set% Psi1(g,cfp) - endif - enddo - - TestOppositeFace1: if ( area_opp > zero ) then - - psi_opp = psi_opp/area_opp - - vol = Geom% Volume(c0+c) - - sigv = sig*vol - sigv2 = sigv*sigv - - gnum = aez*aez*( fouralpha*sigv2 + & - aez*(four*sigv + three*aez) ) - - gden = vol*( four*sigv*sigv2 + aez*(six*sigv2 + & - two*aez*(two*sigv + aez)) ) - - sez = ( vol*gnum*( sig*psi_opp - Set% Q(g,c,ii) ) + & - half*aez*gden*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) ) )/ & - ( gnum + gden*sig) - - else + Qez = source + QQ = GSet% STotal(g,c1) + tau*Set% Psi(g,c1,Angle) + mult = -one + aez = -aez - sez = half*aez*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) )/sig + endif - endif TestOppositeFace1 + psi_opp = zero + area_opp = zero - ATOMIC_UPDATE - Set% S(g,c,ii) = Set% S(g,c,ii) + sez - ATOMIC_END + afp = ASet% AfpNorm(ifp,c1) - ATOMIC_UPDATE - Set% S(g,cez,ii) = Set% S(g,cez,ii) - sez - ATOMIC_END + if ( afp < zero ) then + cfp = Geom% cFP(ifp,c1) + area_opp = -afp + psi_opp = -afp*Set% Psi1(g,cfp) + endif + do i=2,nCFaces-2 + ifp = mod(ifp,nCFaces) + 1 + afp = ASet% AfpNorm(ifp,c1) + if ( afp < zero ) then + cfp = Geom% cFP(ifp,c1) + area_opp = area_opp - afp + psi_opp = psi_opp - afp*Set% Psi1(g,cfp) endif - enddo - enddo CornerLoop11 - enddo GroupLoop11 - enddo ZoneLoop11 + TestOppositeFace: if ( area_opp > zero ) then -#ifndef TETON_ENABLE_OPENACC -!$omp end parallel do -#endif + psi_opp = psi_opp/area_opp + vol = Geom% Volume(c1) -#ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(c,c0,cez,zone,zone0,nCorner,nCFaces) & -!$acc& private(aez,sig) -#else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, GSet, Geom, ASet, Angle, nzones, Groups, ndoneZ) & -!$omp& private(c,c0,cez,zone,zone0,nCorner,nCFaces) & -!$omp& private(aez,sig) -#endif - - ZoneLoop: do ii=1,nzones - GroupLoop: do g=1,Groups + sigv = sig*vol + sigv2 = sigv*sigv -! Loop through the zones using the NEXTZ list + gnum = aez*aez*( fouralpha*sigv2 + & + aez*(four*sigv + three*aez) ) - zone0 = ASet% nextZ(ndoneZ+ii,Angle) - zone = iabs( zone0 ) - nCorner = Geom% numCorner(zone) - c0 = Geom% cOffSet(zone) + gden = vol*( four*sigv*sigv2 + aez*(six*sigv2 + & + two*aez*(two*sigv + aez)) ) - sig = GSet% Sigt(g,zone) + sez = ( vol*gnum*( sig*psi_opp - QQ ) + & + half*aez*gden*( QQ - Qez ) )/ & + ( gnum + gden*sig) + SS = SS + mult*sez - if ( zone0 > 0 ) then + else - do i=1,nCorner + sez = half*aez*( QQ - Qez )/sig + SS = SS + mult*sez - c = ASet% nextC(c0+i,angle) + endif TestOppositeFace -! Corner angular flux - Set% Psi1(g,c0+c) = Set% S(g,c,ii)/(ASet% ANormSum(c0+c) + sig*Geom% Volume(c0+c)) + enddo -! Calculate the contribution of this flux to the sources of -! downstream corners in this zone. The downstream corner index is -! "ez_exit." +! Corner angular flux + Set% Psi1(g,c) = SS/(ASet% ANormSum(c) + sig*Geom% Volume(c)) - nCFaces = Geom% nCFacesArray(c0+c) - - do cface=1,nCFaces - aez = ASet% AezNorm(cface,c0+c) - - if (aez > zero) then - cez = Geom% cEZ(cface,c0+c) - Set% S(g,cez,ii) = Set% S(g,cez,ii) + aez*Set% Psi1(g,c0+c) - endif - enddo - - enddo - - else - -! Direct Solve (non-lower triangular, use old values of Psi1) - do c=1,nCorner - do cface=1,Geom% nCFacesArray(c0+c) - aez = ASet% AezNorm(cface,c0+c) - - if (aez > zero) then - cez = Geom% cEZ(cface,c0+c) - Set% S(g,cez,ii) = Set% S(g,cez,ii) + aez*Set% Psi1(g,c0+c) - endif - enddo - enddo - - do c=1,nCorner - Set% Psi1(g,c0+c) = Set% S(g,c,ii)/(ASet% ANormSum(c0+c) + sig*Geom% Volume(c0+c)) - enddo - - endif - - enddo GroupLoop - enddo ZoneLoop + enddo GroupLoop1 + enddo CornerLoop1 #ifndef TETON_ENABLE_OPENACC !$omp end parallel do #endif - ndoneZ = ndoneZ + nzones + ndone = ndone + nCorner enddo HyperPlaneLoop + enddo DomainLoop enddo SetLoop #ifdef TETON_ENABLE_OPENACC @@ -762,34 +543,34 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) TOMP(end target teams distribute) #endif - ! Update Boundary data #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, ASet, BdyExitPtr, offSet, Angle, Groups, b, c) + !$acc parallel loop gang num_gangs(nSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, angle, Groups, b, c) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nSets, Quad, sendIndex)&) -TOMPC(private(Set, ASet, BdyExitPtr, offSet, Angle, Groups, b, c)) + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nSets, Quad, sendIndex)&) + TOMPC(private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, angle, Groups, b, c)) #endif SetLoop3: do setID=1,nSets - Set => Quad% SetDataPtr(setID) - ASet => Quad% AngSetPtr(Set% angleSetID) - Groups = Set% Groups - Angle = Set% AngleOrder(sendIndex) - offSet = ASet% cycleOffSet(angle) - BdyExitPtr => ASet% BdyExitPtr(Angle) + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + Groups = Set% Groups + angle = Set% AngleOrder(sendIndex) + offSet = ASet% cycleOffSet(angle) + BdyExitPtr => ASet% BdyExitPtr(angle) + HypPlanePtr => ASet% HypPlanePtr(angle) #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(b,c) + !$acc loop vector collapse(2) & + !$acc& private(b,c) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, BdyExitPtr, Groups, Angle) private(b,c) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, BdyExitPtr, Groups, Angle) private(b,c) #endif do i=1,BdyExitPtr% nxBdy @@ -802,17 +583,39 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do +#endif + +! Update Interface Elements + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) +#endif + + do i=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(i) + Set% PsiInt(g,i,angle) = Set% Psi1(g,c) + enddo + enddo + + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do #endif ! Update Psi in the cycle list #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(c) + !$acc loop vector collapse(2) & + !$acc& private(c) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(angle, Set, ASet, offSet, Groups) private(c) #endif do mCycle=1,ASet% numCycles(angle) do g=1,Groups @@ -821,15 +624,15 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif enddo SetLoop3 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif ! We only store Psi if this is the last transport sweep in the time step @@ -837,59 +640,59 @@ subroutine CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) if ( savePsi ) then #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, Angle, Groups) + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, Angle, Groups) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nSets, Quad, sendIndex)&) -TOMPC(private(Set, Angle, Groups)) + TOMP(target teams distribute num_teams(nZoneSets) &) + TOMPC(thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nZoneSets, nSets, Quad, Geom, sendIndex) &) + TOMPC(private(Set, setID, Angle, Groups)) #endif - SetLoop2: do setID=1,nSets + ZoneSetLoop4: do zSetID=1,nZoneSets + SetLoop4: do setID=1,nSets - Set => Quad% SetDataPtr(setID) - Groups = Set% Groups - Angle = Set% AngleOrder(sendIndex) + Set => Quad% SetDataPtr(setID) + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) + !$acc loop vector collapse(2) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Angle, Groups) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Angle, Groups, zSetID, Geom) #endif - CornerLoop2: do c=1,Set% nCorner - GroupLoop2: do g=1,Groups + CornerLoop4: do c=Geom% corner1(zSetID),Geom% corner2(zSetID) + GroupLoop4: do g=1,Groups - Set% Psi(g,c,Angle) = Set% Psi1(g,c) + Set% Psi(g,c,Angle) = Set% Psi1(g,c) - enddo GroupLoop2 - enddo CornerLoop2 + enddo GroupLoop4 + enddo CornerLoop4 #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif - enddo SetLoop2 + enddo SetLoop4 + enddo ZoneSetLoop4 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif endif -#ifdef TETON_ENABLE_OPENACC -!$acc end data -#else -TOMP(target exit data map(release: tau, sendIndex, angleList)) -#endif + TOMP_MAP(target exit data map(release: tau, sendIndex, angleList)) deallocate( angleList ) + return end subroutine CornerSweepUCBxyz_GPU diff --git a/src/teton/gpu/InitSweepGreyUCBrz_OMPOL.F90 b/src/teton/gpu/InitSweepGreyUCBrz_OMPOL.F90 index 00efb15..c2370d5 100644 --- a/src/teton/gpu/InitSweepGreyUCBrz_OMPOL.F90 +++ b/src/teton/gpu/InitSweepGreyUCBrz_OMPOL.F90 @@ -21,7 +21,7 @@ subroutine InitGreySweepUCBrz_GPU use GreyAcceleration_mod use AngleSet_mod use OMPWrappers_mod - use ArrayChecks_mod + use CodeChecks_mod implicit none @@ -129,42 +129,67 @@ subroutine InitGreySweepUCBrz_GPU TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) - TOMP(target enter data map(to: numAngles, angleList, omega, quadwt, fac, quadTauW1, quadTauW2, Starting)) + TOMP_MAP(target enter data map(to: numAngles, angleList, omega, quadwt, fac, quadTauW1, quadTauW2, Starting)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) - TOMPC(shared(nZoneSets, numAngles, omega, Geom, GTA)) + TOMPC(shared(nZoneSets, numAngles, omega, Geom, GTA) &) + TOMPC(private(angle)) +#endif ZoneSetLoop1: do zSetID=1,nZoneSets do angle=1,numAngles +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else !$omp parallel do collapse(2) default(none) & !$omp& shared(Geom, GTA, angle, omega, zSetID) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do cface=1,2 GTA% AfpNorm(cface,c,angle) = DOT_PRODUCT( omega(:,angle),Geom% A_fp(:,cface,c) ) GTA% AezNorm(cface,c,angle) = DOT_PRODUCT( omega(:,angle),Geom% A_ez(:,cface,c) ) enddo enddo - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo enddo ZoneSetLoop1 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(R_afp, R_afp2, R, R2) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, numAngles, GTA, Geom, fac)&) - TOMPC(private(angle, R_afp, R_afp2, R, R2)) + TOMPC(private(R_afp, R_afp2, R, R2, angle)) +#endif ZoneSetLoop2: do zSetID=1,nZoneSets do angle=1,numAngles +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(R_afp,R_afp2,R,R2) +#else !$omp parallel do default(none) & !$omp& shared(angle, Geom, GTA, fac, zSetID) private(R_afp,R_afp2,R,R2) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) R_afp = Geom% RadiusFP(1,c) @@ -178,28 +203,44 @@ subroutine InitGreySweepUCBrz_GPU R *(GTA% AezNorm(1,c,angle) - abs(GTA% AezNorm(1,c,angle))) + & R2 *(GTA% AezNorm(2,c,angle) - abs(GTA% AezNorm(2,c,angle))) ) enddo - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo enddo ZoneSetLoop2 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(aSetID,angGTA,c0,cez,nCorner,R,aez,afp) & + !$acc& private(sigA,sigA2,gnum,gtau,B0,B1,B2,dInv,Sigt,SigtEZ) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(nZoneSets, Geom, GTA, Quad, numAngles, quadwt, angleList, quadTauW1, quadTauW2, fac, Starting) &) - TOMPC(private(aSetID,angGTA,c0,cez,nCorner, R,aez,afp,sigA,sigA2,gnum,gtau,B0,B1,B2,dInv, Sigt,SigtEZ)) + TOMPC(private(aSetID,angGTA,c0,cez,nCorner,R,aez,afp,sigA,sigA2,gnum,gtau,B0,B1,B2,dInv,Sigt,SigtEZ)) +#endif ZoneSetLoop: do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(aSetID,angGTA,c0,cez,nCorner,R,aez,afp) & + !$acc& private(sigA,sigA2,gnum,gtau,B0,B1,B2,dInv,Sigt,SigtEZ) +#else !$omp parallel do default(none) & - !$omp& shared(Geom, GTA, Quad, numAngles, quadwt, zSetID) & + !$omp& shared(Geom, GTA, Quad, numAngles, quadwt, zSetID) & !$omp& shared(angleList, quadTauW1, quadTauW2, fac, Starting) & - !$omp& private(aSetID,angGTA,c0,cez,nCorner) & - !$omp& private(R,aez,afp,sigA,sigA2,gnum,gtau,B0,B1,B2,dInv) & - !$omp& private(Sigt,SigtEZ) + !$omp& private(aSetID,angGTA,c0,cez,nCorner,R,aez,afp) & + !$omp& private(sigA,sigA2,gnum,gtau,B0,B1,B2,dInv,Sigt,SigtEZ, c1, c, cface) +#endif ZoneLoop: do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) @@ -342,13 +383,19 @@ subroutine InitGreySweepUCBrz_GPU enddo AngleLoop enddo ZoneLoop - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo ZoneSetLoop +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) - TOMP(target exit data map(release: numAngles, angleList, omega, quadwt, fac, quadTauW1, quadTauW2, Starting)) +#endif + + TOMP_MAP(target exit data map(release: numAngles, angleList, omega, quadwt, fac, quadTauW1, quadTauW2, Starting)) deallocate( angleList ) diff --git a/src/teton/gpu/InitSweepGreyUCBxyz_OMPOL.F90 b/src/teton/gpu/InitSweepGreyUCBxyz_OMPOL.F90 index 1ea3e62..13bac3a 100644 --- a/src/teton/gpu/InitSweepGreyUCBxyz_OMPOL.F90 +++ b/src/teton/gpu/InitSweepGreyUCBxyz_OMPOL.F90 @@ -21,7 +21,7 @@ subroutine InitGreySweepUCBxyz_GPU use QuadratureList_mod use GreyAcceleration_mod use AngleSet_mod - use ArrayChecks_mod + use CodeChecks_mod implicit none @@ -105,49 +105,85 @@ subroutine InitGreySweepUCBxyz_GPU enddo enddo - TETON_CHECK_BOUNDS1(Quad%AngSetPtr, numAngles) TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) + TOMP_MAP(target enter data map(to: numAngles, angleList, omega, quadwt)) - TOMP(target enter data map(to: numAngles, angleList, omega, quadwt)) - +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) - TOMPC(shared(nZoneSets, numAngles, Geom, GTA, omega)&) - TOMPC(private(angle)) + TOMPC(shared(nZoneSets, numAngles, Geom, GTA, omega, angle)) +#endif ZoneSetLoop1: do zSetID=1,nZoneSets do angle=1,numAngles - ! NOTE - This does not support a collapse(2), as it is not a canonical loop form due to the lookup in inner loop bounds. - !$omp parallel do default(none) & +! NOTE: This loop doesn't support a collapse(2), as its not a canonical form +! loop ( the inner loop bounds can not be predetermined ); it's significantly +! faster to split into two loops as below + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & !$omp& shared(Geom, GTA, angle, omega, zSetID) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) - do cface=1,Geom% nCFacesArray(c) + do cface=1,3 GTA% AfpNorm(cface,c,angle) = DOT_PRODUCT( omega(:,angle),Geom% A_fp(:,cface,c)) GTA% AezNorm(cface,c,angle) = DOT_PRODUCT( omega(:,angle),Geom% A_ez(:,cface,c)) enddo enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif +#ifdef TETON_ENABLE_OPENACC +!$acc loop vector +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, GTA, angle, omega, zSetID) +#endif + do c=Geom% corner1(zSetID),Geom% corner2(zSetID) + do cface=4,Geom% nCFacesArray(c) + GTA% AfpNorm(cface,c,angle) = DOT_PRODUCT( omega(:,angle),Geom% A_fp(:,cface,c)) + GTA% AezNorm(cface,c,angle) = DOT_PRODUCT( omega(:,angle),Geom% A_ez(:,cface,c)) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo enddo ZoneSetLoop1 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) - TOMPC(shared(nZoneSets, numAngles, Geom, GTA)&) - TOMPC(private(angle)) + TOMPC(shared(nZoneSets, numAngles, Geom, GTA, angle)) +#endif ZoneSetLoop2: do zSetID=1,nZoneSets do angle=1,numAngles +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) & !$omp& shared(Geom, GTA, angle, zSetID) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) GTA% ANormSum(c,angle) = zero @@ -157,26 +193,43 @@ subroutine InitGreySweepUCBxyz_GPU GTA% AezNorm(cface,c,angle) + abs( GTA% AezNorm(cface,c,angle) ) ) enddo enddo - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo enddo ZoneSetLoop2 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(zone,ifp,c0,cez,nCorner,nCFaces,aSetID,angGTA) & + !$acc& private(aez,afp,sigv,sigv2,gnum,gtau,B0,B1,B2,coef,dInv,Sigt,SigtEZ) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(nZoneSets, Geom, GTA, Quad, numAngles, angleList, quadwt) &) TOMPC(private(zone,ifp,c0,cez,nCorner,nCFaces,aSetID,angGTA) &) TOMPC(private(aez,afp,sigv,sigv2,gnum,gtau,B0,B1,B2,coef,dInv,Sigt,SigtEZ)) +#endif ZoneSetLoop: do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(zone,ifp,c0,cez,nCorner,nCFaces,aSetID,angGTA) & + !$acc& private(aez,afp,sigv,sigv2,gnum,gtau,B0,B1,B2,coef,dInv,Sigt,SigtEZ) +#else !$omp parallel do default(none) & !$omp& shared(Geom, GTA, Quad, numAngles, angleList, quadwt, zSetID) & !$omp& private(zone,ifp,c0,cez,nCorner,nCFaces,aSetID,angGTA) & - !$omp& private(aez,afp,sigv,sigv2,gnum,gtau,B0,B1,B2,coef,dInv,Sigt,SigtEZ) + !$omp& private(aez,afp,sigv,sigv2,gnum,gtau,B0,B1,B2,coef,dInv,Sigt,SigtEZ, Angle, c1, c, cface, i) +#endif ZoneLoop: do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) @@ -301,13 +354,19 @@ subroutine InitGreySweepUCBxyz_GPU enddo AngleLoop enddo ZoneLoop - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo ZoneSetLoop +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) - TOMP(target exit data map(release: numAngles, angleList, omega, quadwt)) +#endif + + TOMP_MAP(target exit data map(release: numAngles, angleList, omega, quadwt)) deallocate( angleList ) diff --git a/src/teton/gpu/OMPWrappers.F90.templates b/src/teton/gpu/OMPWrappers.F90.templates index b1806da..ac51332 100644 --- a/src/teton/gpu/OMPWrappers.F90.templates +++ b/src/teton/gpu/OMPWrappers.F90.templates @@ -106,9 +106,8 @@ TETON_VERIFY( c_associated(d_ptr), "Failed to retrieve device pointer for " // label ) - ! Disassociate the pointer -! Workaround for El Capitan EA platform. Otherwise, the runtime complains about an invalid reference count in the pointer table. -! (Kostas has put in a bug ticket to compiler team already) + ! Disassociate the pointer: adding this back so I can use CCE15 PFN 02/07/2024 + TOMP(target exit data map(release:h_ptr)) err_code = omp_target_disassociate_ptr( C_LOC(h_ptr), omp_get_default_device() ) diff --git a/src/teton/gpu/SetSweep_OMPOL.F90 b/src/teton/gpu/SetSweep_OMPOL.F90 index 735a160..a155291 100644 --- a/src/teton/gpu/SetSweep_OMPOL.F90 +++ b/src/teton/gpu/SetSweep_OMPOL.F90 @@ -82,7 +82,8 @@ subroutine SetSweep_GPU(savePsi) nCommSets = getNumberOfCommSets(Quad) ndim = Size% ndim SnSweep = .TRUE. - sweepVersion = Options%getSweepVersion() + sweepVersion = Options%getSweepVersion() + #if !defined(TETON_ENABLE_MINIAPP_BUILD) useBoltzmannCompton = getUseBoltzmann(Compton) #endif @@ -95,7 +96,7 @@ subroutine SetSweep_GPU(savePsi) if ( useBoltzmannCompton .and. Size% useCUDASolver .and. Size% ngr >= 16) then do setID=1,nGroupSets GSet => getGroupSetData(Quad, setID) - TOMP(target update to (GSet% STotal)) + TOMP_UPDATE(target update to (GSet% STotal)) enddo endif #endif @@ -108,6 +109,7 @@ subroutine SetSweep_GPU(savePsi) ! If this is the first flux iteration, initialize the communication ! order and incident flux on shared boundaries + START_RANGE("Teton_Init_Comm_Order") do cSetID=1,nCommSets CSet => getCommSetData(Quad, cSetID) @@ -115,6 +117,7 @@ subroutine SetSweep_GPU(savePsi) call restoreCommOrder(CSet) call setIncidentFlux(cSetID) enddo + END_RANGE("Teton_Init_Comm_Order") ! Begin Flux Iteration @@ -127,17 +130,21 @@ subroutine SetSweep_GPU(savePsi) ! Post receives for all data + START_RANGE("Teton_Comm_Boundary_Fluxes") do cSetID=1,nCommSets call InitExchange(cSetID) enddo + END_RANGE("Teton_Comm_Boundary_Fluxes") ! Loop over angles, solving for each in turn: AngleLoop: do sendIndex=1,NumAnglesDyn -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(nCommSets,Quad,SnSweep, sendIndex) & -!$omp& private(CSet,Angle) + START_RANGE("Teton_Comm_Boundary_Fluxes") + +!!$omp parallel do default(none) schedule(dynamic) & +!!$omp& shared(nCommSets,Quad,SnSweep, sendIndex) & +!!$omp& private(CSet,Angle) do cSetID=1,nCommSets CSet => getCommSetData(Quad, cSetID) @@ -153,7 +160,8 @@ subroutine SetSweep_GPU(savePsi) call RecvFlux(SnSweep, cSetID, Angle) enddo -!$omp end parallel do +!!$omp end parallel do + END_RANGE("Teton_Comm_Boundary_Fluxes") do setID=1,nSets @@ -162,11 +170,15 @@ subroutine SetSweep_GPU(savePsi) ! Update incident fluxes on reflecting boundaries + START_RANGE("Teton_Update_Reflecting_Fluxes") call snreflect(SnSweep, setID, Angle) + END_RANGE("Teton_Update_Reflecting_Fluxes") ! Map the latest boundary values - TOMP(target update to( Set%PsiB(:,:,Angle) ) ) + START_RANGE("Teton_OpenMP_Updates") + TOMP_UPDATE(target update to( Set%PsiB(:,:,Angle) ) ) + END_RANGE("Teton_OpenMP_Updates") enddo @@ -178,25 +190,35 @@ subroutine SetSweep_GPU(savePsi) AngleType: if ( .not. ASet% FinishingDirection(Angle) ) then time1 = MPIWtime() - START_RANGE("Teton_Sweep_GPU") + START_RANGE("Teton_Calc_Rad_Energy_Density") + + if (sweepVersion == 0) then - if (ndim == 3) then - if (sweepVersion == 0) then + if (ndim == 3) then call SweepUCBxyz_GPU(nSets, sendIndex, savePsi) - elseif (sweepVersion == 1) then + elseif (ndim == 2) then + call SweepUCBrz_GPU(nSets, sendIndex, savePsi) + endif + + elseif (sweepVersion == 1) then + + if (ndim == 3) then call CornerSweepUCBxyz_GPU(nSets, sendIndex, savePsi) - else - TETON_FATAL("Invalid value set for Sweep kernel version to use.") + elseif (ndim == 2) then + call CornerSweepUCBrz_GPU(nSets, sendIndex, savePsi) endif - elseif (ndim == 2) then - call SweepUCBrz_GPU(nSets, sendIndex, savePsi) + + else + TETON_FATAL("Invalid value set for Sweep kernel version to use.") endif + END_RANGE("Teton_Calc_Rad_Energy_Density") ! Update the total scalar intensity on the GPU + START_RANGE("Teton_Update_Total_Scalar_Intensity") call getPhiTotal(sendIndex) + END_RANGE("Teton_Update_Total_Scalar_Intensity") - END_RANGE("Teton_Sweep_GPU") time2 = MPIWtime() dtime = (time2 - time1)/sixty Size%GPUSweepTimeCycle = Size%GPUSweepTimeCycle + dtime @@ -208,7 +230,9 @@ subroutine SetSweep_GPU(savePsi) Set => getSetData(Quad, setID) Angle = Set% AngleOrder(sendIndex) - TOMP(target update from( Set%PsiB(:,:,Angle) )) + START_RANGE("Teton_OpenMP_Updates") + TOMP_UPDATE(target update from( Set%PsiB(:,:,Angle) )) + END_RANGE("Teton_OpenMP_Updates") enddo @@ -216,6 +240,7 @@ subroutine SetSweep_GPU(savePsi) ! Test convergence of incident fluxes + START_RANGE("Teton_Sweep_Test_Conv") !$omp parallel do default(none) schedule(static) & !$omp& shared(nCommSets, FluxConverged) do cSetID=1,nCommSets @@ -230,6 +255,7 @@ subroutine SetSweep_GPU(savePsi) enddo !$omp end parallel do + END_RANGE("Teton_Sweep_Test_Conv") ! If this is the end of the radiation step and we are saving Psi do ! not perform additional sweeps diff --git a/src/teton/gpu/SweepGreyUCBrz_OMPOL.F90 b/src/teton/gpu/SweepGreyUCBrz_OMPOL.F90 index 0776a09..028f287 100644 --- a/src/teton/gpu/SweepGreyUCBrz_OMPOL.F90 +++ b/src/teton/gpu/SweepGreyUCBrz_OMPOL.F90 @@ -22,14 +22,14 @@ subroutine SweepGreyUCBrzNEW_GPU(sendIndex, PsiB) use GreyAcceleration_mod use SetData_mod use AngleSet_mod - use ArrayChecks_mod + use CodeChecks_mod implicit none ! Arguments integer, intent(in) :: sendIndex - real(adqt), intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) + real(adqt), intent(inout) :: PsiB(Size%nSurfElem,Size%nangGTA) ! Local @@ -45,7 +45,6 @@ subroutine SweepGreyUCBrzNEW_GPU(sendIndex, PsiB) integer :: nGTASets integer :: nZoneSets integer :: nHyperDomains - integer :: iter integer :: c integer :: c0 @@ -94,7 +93,7 @@ subroutine SweepGreyUCBrzNEW_GPU(sendIndex, PsiB) nSets = getNumberOfSets(Quad) nGTASets = getNumberOfGTASets(Quad) nZoneSets = getNumberOfZoneSets(Quad) - nHyperDomains = Quad% nHyperDomains + nHyperDomains = getNumberOfHyperDomains(Quad,2) ! Quiet the compiler 'variable may not be initialized' warning. FinishingDirection = .FALSE. @@ -141,282 +140,404 @@ subroutine SweepGreyUCBrzNEW_GPU(sendIndex, PsiB) TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) - TOMP(target enter data map(to: angleList, nSets, nGTASets, nHyperDomains, PsiB)) + TOMP_MAP(target enter data map(to: angleList, nSets, nGTASets, nHyperDomains, PsiB)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets,nGTASets,Geom,nSets,Quad )&) TOMPC(private(Set)) +#endif ZoneSetLoop0: do zSetID=1,nZoneSets do setID=1,nGTASets Set => Quad% SetDataPtr(nSets+setID) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) & !$omp& shared(Set, Geom, zSetID) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) Set% tPsi(c) = zero enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo enddo ZoneSetLoop0 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nGTASets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, angle0, angle, c) +#else TOMP(target teams distribute num_teams(nGTASets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nGTASets, nSets,PsiB, Quad, angleList)&) - TOMPC(private(Set, angle0, angle)) + TOMPC(private(Set, ASet, HypPlanePtr, angle0, angle, c)) +#endif GTASetLoop0: do setID=1,nGTASets - Set => Quad% SetDataPtr(nSets+setID) - angle = angleList(1,setID) - angle0 = angleList(2,setID) + Set => Quad% SetDataPtr(nSets+setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + angle = angleList(1,setID) + angle0 = angleList(2,setID) + HypPlanePtr => ASet% HypPlanePtr(angle) +! Initialize Boundary Values + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) & !$omp& shared(Set, PsiB, angle0, angle) +#endif do b=1,Set%nbelem Set% tPsi(Set%nCorner+b) = PsiB(b,angle0+angle) enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif - enddo GTASetLoop0 - - TOMP(end target teams distribute) - +! Initialize values at hyper-domain interfaces - SweepIteration: do iter=1,GTA% nGreySweepIters +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(c) +#else + !$omp parallel do default(none) & + !$omp& shared(Set, HypPlanePtr, PsiB, angle0, angle) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + c = HypPlanePtr% interfaceList(b) + Set% tPsi(c) = PsiB(Set%nbelem+b,angle0+angle) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif - TOMP(target teams distribute collapse(2) num_teams(nHyperDomains*nGTASets) &) - TOMPC(thread_limit(omp_device_team_thread_limit) default(none) &) - TOMPC(shared(nGTASets, nHyperDomains, angleList, Geom, GTA, Quad, nSets)&) - TOMPC(private(Set, ASet, HypPlanePtr, setID, domID, angle, angle0, angle2, hyperPlane, hplane1, hplane2) &) - TOMPC(private(ndoneZ, nzones, fac, zone, nCorner, c0, cez, cfp, aez, afp, R, R_afp, sigA, sigA2, gnum) &) - TOMPC(private(gtau, sez, dInv)) + enddo GTASetLoop0 - GTASetLoop: do setID=1,nGTASets - DomainLoop: do domID=1,nHyperDomains +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang collapse(2) num_gangs(nHyperDomains*nGTASets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, setID, domID, angle, angle0, angle2, hyperPlane) & + !$acc& private(hplane1, hplane2, ndoneZ, nzones, fac, zone, nCorner, c0, cez, cfp) & + !$acc& private(aez, afp, R, R_afp, sigA, sigA2, gnum, gtau, sez, dInv) +#else + TOMP(target teams distribute collapse(2) num_teams(nHyperDomains*nGTASets) &) + TOMPC(thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nGTASets, nHyperDomains, angleList, Geom, GTA, Quad, nSets)&) + TOMPC(private(Set, ASet, HypPlanePtr, setID, domID, angle, angle0, angle2, hyperPlane) &) + TOMPC(private(hplane1, hplane2, ndoneZ, nzones, fac, zone, nCorner, c0, cez, cfp) &) + TOMPC(private(aez, afp, R, R_afp, sigA, sigA2, gnum, gtau, sez, dInv)) +#endif + + GTASetLoop: do setID=1,nGTASets + DomainLoop: do domID=1,nHyperDomains + + Set => Quad% SetDataPtr(nSets+setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + angle = angleList(1,setID) + angle0 = angleList(2,setID) + angle2 = angleList(3,setID) + + HypPlanePtr => ASet% HypPlanePtr(angle) + hplane1 = HypPlanePtr% hplane1(domID) + hplane2 = HypPlanePtr% hplane2(domID) + fac = ASet% angDerivFac(angle) + ndoneZ = HypPlanePtr% ndone(domID) + +! Loop through all of the zones using the NEXT list + + HyperPlaneLoop: do hyperPlane=hplane1,hplane2 + + nzones = HypPlanePtr% zonesInPlane(hyperPlane) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(zone, nCorner, c, c0, cez, cfp, aez, afp) & + !$acc& private(R, R_afp, sigA, sigA2, gnum, gtau, sez, dInv) +#else + !$omp parallel do default(none) schedule(static) & + !$omp& shared(Geom, GTA, ASet, Set, Quad) & + !$omp& shared(ndoneZ, nzones, angle, angle2, fac) & + !$omp& private(zone, nCorner, c, c0, cez, cfp, aez, afp) & + !$omp& private(R, R_afp, sigA, sigA2, gnum, gtau, sez, dInv) +#endif + + ZoneLoop: do ii=1,nzones + + zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) + nCorner = Geom% numCorner(zone) + c0 = Geom% cOffSet(zone) + +! Contributions from volume terms and angular derivative + + do c=1,nCorner + Set% src(c0+c) = GTA%TsaSource(c0+c) + & + fac*Geom% Area(c0+c)*Set% tPsiM(c0+c) + Set% pInc(c0+c) = fac*Geom% Area(c0+c)*Set% tInc(c0+c) + enddo - Set => Quad% SetDataPtr(nSets+setID) - ASet => Quad% AngSetPtr(Set% angleSetID) - angle = angleList(1,setID) - angle0 = angleList(2,setID) - angle2 = angleList(3,setID) + CornerLoop: do c=1,nCorner - HypPlanePtr => ASet% HypPlanePtr(angle) - hplane1 = HypPlanePtr% hplane1(domID) - hplane2 = HypPlanePtr% hplane2(domID) - fac = ASet% angDerivFac(angle) - ndoneZ = HypPlanePtr% ndone(domID) + sigA = GTA%GreySigTotal(c0+c)*Geom% Area(c0+c) -! Loop through all of the zones using the NEXT list + CornerFaceLoop: do cface=1,2 - HyperPlaneLoop: do hyperPlane=hplane1,hplane2 + afp = GTA% AfpNorm(cface,c0+c,angle2) + aez = GTA% AezNorm(cface,c0+c,angle2) - nzones = HypPlanePtr% zonesInPlane(hyperPlane) + if ( afp < zero ) then + cfp = Geom% cFP(cface,c0+c) + R_afp = Geom% RadiusFP(cface,c0+c)*afp + Set% src(c0+c) = Set% src(c0+c) - R_afp*Set% tPsi(cfp) + Set% pInc(c0+c) = Set% pInc(c0+c) - R_afp*Set% tPsi(cfp) + else + cfp = -1 + endif - !$omp parallel do default(none) schedule(static) & - !$omp& shared(Geom, GTA, ASet, Set, Quad) & - !$omp& shared(ndoneZ, nzones, angle, angle2, fac) & - !$omp& private(zone, nCorner, c, c0) & - !$omp& private(cez, cfp, aez, afp, R, R_afp) & - !$omp& private(sigA, sigA2, gnum, gtau, sez, dInv) + if ( aez > zero ) then - ZoneLoop: do ii=1,nzones + R = Geom% RadiusEZ(cface,c0+c) + cez = Geom% cEZ(cface,c0+c) - zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) - nCorner = Geom% numCorner(zone) - c0 = Geom% cOffSet(zone) + if ( afp < zero ) then -! Contributions from volume terms and angular derivative + sigA2 = sigA*sigA - do c=1,nCorner - Set% src(c0+c) = GTA%TsaSource(c0+c) + & - fac*Geom% Area(c0+c)*Set% tPsiM(c0+c) - Set% pInc(c0+c) = fac*Geom% Area(c0+c)*Set% tInc(c0+c) - enddo + gnum = aez*aez*( fouralpha*sigA2 + & + aez*(four*sigA + three*aez) ) - CornerLoop: do c=1,nCorner + gtau = gnum/ & + ( gnum + four*sigA2*sigA2 + aez*sigA*(six*sigA2 + & + two*aez*(two*sigA + aez)) ) - sigA = GTA%GreySigTotal(c0+c)*Geom% Area(c0+c) + sez = R*( gtau*sigA*( Set% tPsi(cfp) - GTA% Q(c0+c) ) + & + half*aez*(one - gtau)*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) ) - CornerFaceLoop: do cface=1,2 + Set% src(c0+c) = Set% src(c0+c) + sez + Set% src(c0+cez) = Set% src(c0+cez) - sez - afp = GTA% AfpNorm(cface,c0+c,angle2) - aez = GTA% AezNorm(cface,c0+c,angle2) + Set% pInc(c0+c) = Set% pInc(c0+c) + R*gtau*sigA*Set% tPsi(cfp) + Set% pInc(c0+cez) = Set% pInc(c0+cez) - R*gtau*sigA*Set% tPsi(cfp) - if ( afp < zero ) then - cfp = Geom% cFP(cface,c0+c) - R_afp = Geom% RadiusFP(cface,c0+c)*afp - Set% src(c0+c) = Set% src(c0+c) - R_afp*Set% tPsi(cfp) - Set% pInc(c0+c) = Set% pInc(c0+c) - R_afp*Set% tPsi(cfp) - else - cfp = -1 - endif + else - if ( aez > zero ) then + sez = half*R*aez*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) + Set% src(c0+c) = Set% src(c0+c) + sez + Set% src(c0+cez) = Set% src(c0+cez) - sez - R = Geom% RadiusEZ(cface,c0+c) - cez = Geom% cEZ(cface,c0+c) + endif - if ( afp < zero ) then + endif - sigA2 = sigA*sigA + enddo CornerFaceLoop - gnum = aez*aez*( fouralpha*sigA2 + & - aez*(four*sigA + three*aez) ) + enddo CornerLoop - gtau = gnum/ & - ( gnum + four*sigA2*sigA2 + aez*sigA*(six*sigA2 + & - two*aez*(two*sigA + aez)) ) +! Solve the corners in the correct order - sez = R*( gtau*sigA*( Set% tPsi(cfp) - GTA% Q(c0+c) ) + & - half*aez*(one - gtau)*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) ) + do i=1,nCorner - Set% src(c0+c) = Set% src(c0+c) + sez - Set% src(c0+cez) = Set% src(c0+cez) - sez + c = ASet% nextC(c0+i,angle) - Set% pInc(c0+c) = Set% pInc(c0+c) + R*gtau*sigA*Set% tPsi(cfp) - Set% pInc(c0+cez) = Set% pInc(c0+cez) - R*gtau*sigA*Set% tPsi(cfp) + dInv = one/(GTA% ANormSum(c0+c,angle2) + & + Geom% Volume(c0+c)*GTA%GreySigTotal(c0+c)) - else +! Corner angular flux + Set% tPsi(c0+c) = dInv*Set% src(c0+c) + Set% pInc(c0+c) = dInv*Set% pInc(c0+c) - sez = half*R*aez*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) - Set% src(c0+c) = Set% src(c0+c) + sez - Set% src(c0+cez) = Set% src(c0+cez) - sez +! Calculate the contribution of this flux to the sources of +! downstream corners in this zone. - endif + do cface=1,2 + aez = GTA% AezNorm(cface,c0+c,angle2) - endif + if (aez > zero) then + R = Geom% RadiusEZ(cface,c0+c) + cez = Geom% cEZ(cface,c0+c) - enddo CornerFaceLoop + Set% src(c0+cez) = Set% src(c0+cez) + R*aez*Set% tPsi(c0+c) + Set% pInc(c0+cez) = Set% pInc(c0+cez) + R*aez*Set% pInc(c0+c) + endif + enddo - enddo CornerLoop + enddo -! Solve the corners in the correct order + enddo ZoneLoop +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif - do i=1,nCorner + ndoneZ = ndoneZ + nzones - c = ASet% nextC(c0+i,angle) + enddo HyperPlaneLoop - dInv = one/(GTA% ANormSum(c0+c,angle2) + & - Geom% Volume(c0+c)*GTA%GreySigTotal(c0+c)) + enddo DomainLoop + enddo GTASetLoop -! Corner angular flux - Set% tPsi(c0+c) = dInv*Set% src(c0+c) - Set% pInc(c0+c) = dInv*Set% pInc(c0+c) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif -! Calculate the contribution of this flux to the sources of -! downstream corners in this zone. +! Update exiting boundary fluxes - do cface=1,2 - aez = GTA% AezNorm(cface,c0+c,angle2) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nGTASets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, BdyExitPtr, HypPlanePtr, angle0, angle, b, c) +#else + TOMP(target teams distribute num_teams(nGTASets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nGTASets, PsiB, Quad, angleList, nSets)&) + TOMPC(private(Set, ASet, BdyExitPtr, HypPlanePtr, angle0, angle, b, c)) +#endif - if (aez > zero) then - R = Geom% RadiusEZ(cface,c0+c) - cez = Geom% cEZ(cface,c0+c) + do setID=1,nGTASets - Set% src(c0+cez) = Set% src(c0+cez) + R*aez*Set% tPsi(c0+c) - Set% pInc(c0+cez) = Set% pInc(c0+cez) + R*aez*Set% pInc(c0+c) - endif - enddo + Set => Quad% SetDataPtr(nSets+setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + angle = angleList(1,setID) + angle0 = angleList(2,setID) + BdyExitPtr => ASet% BdyExitPtr(angle) + HypPlanePtr => ASet% HypPlanePtr(angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(b,c) +#else + !$omp parallel do default(none) & + !$omp& shared(Set, BdyExitPtr, angle0, angle, PsiB) & + !$omp& private(b,c) +#endif - enddo + do i=1,BdyExitPtr% nxBdy + b = BdyExitPtr% bdyList(1,i) + c = BdyExitPtr% bdyList(2,i) - enddo ZoneLoop - !$omp end parallel do + PsiB(b,angle0+angle) = Set% tPsi(c) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif - ndoneZ = ndoneZ + nzones +! Update Interface values - enddo HyperPlaneLoop +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(c) +#else + !$omp parallel do default(none) & + !$omp& shared(Set, HypPlanePtr, PsiB, angle0, angle) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + c = HypPlanePtr% interfaceList(b) + PsiB(Set%nbelem+b,angle0+angle) = Set% tPsi(c) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif - enddo DomainLoop - enddo GTASetLoop + enddo - TOMP(end target teams distribute) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif - enddo SweepIteration -! Update exiting boundary fluxes +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, angle, quadwt, quadTauW1, quadTauW2) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, nGTASets, Geom, GTA, Quad, angleList, nSets)&) + TOMPC(private(Set, ASet, angle, quadwt, quadTauW1, quadTauW2)) +#endif - TOMP(target teams distribute num_teams(nGTASets) thread_limit(omp_device_team_thread_limit) default(none) &) - TOMPC(shared(nGTASets, PsiB, Quad, angleList, nSets)&) - TOMPC(private(Set, ASet, BdyExitPtr, angle0, angle, b, c)) + ZoneSetLoop1: do zSetID=1,nZoneSets do setID=1,nGTASets - Set => Quad% SetDataPtr(nSets+setID) - ASet => Quad% AngSetPtr(Set% angleSetID) - angle = angleList(1,setID) - angle0 = angleList(2,setID) - BdyExitPtr => ASet% BdyExitPtr(angle) - - !$omp parallel do default(none) & - !$omp& shared(Set, BdyExitPtr, angle0, angle, PsiB) & - !$omp& private(b,c) - - do i=1,BdyExitPtr% nxBdy - b = BdyExitPtr% bdyList(1,i) - c = BdyExitPtr% bdyList(2,i) - - PsiB(b,angle0+angle) = Set% tPsi(c) - enddo - - !$omp end parallel do + Set => Quad% SetDataPtr(nSets+setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + + angle = angleList(1,setID) + quadwt = ASet% weight(angle) + quadTauW1 = ASet% quadTauW1(angle) + quadTauW2 = ASet% quadTauW2(angle) + + if ( ASet% StartingDirection(angle) ) then + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) & + !$omp& shared(Geom,Set,zSetID) +#endif + do c=Geom% corner1(zSetID),Geom% corner2(zSetID) + Set% tPsiM(c) = Set% tPsi(c) + Set% tInc(c) = Set% pInc(c) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + else + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, Set, GTA, quadwt, quadTauW1, quadTauW2, zSetID) +#endif + do c=Geom% corner1(zSetID),Geom% corner2(zSetID) + Set% tPsiM(c) = quadTauW1*Set% tPsi(c) - quadTauW2*Set% tPsiM(c) + Set% tInc(c) = quadTauW1*Set% pInc(c) - quadTauW2*Set% tInc(c) + GTA% PhiInc(c) = GTA% PhiInc(c) + quadwt*Set% pInc(c) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + endif enddo - TOMP(end target teams distribute) - - - TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) - TOMPC(shared(nZoneSets, nGTASets, Geom, GTA, Quad, angleList, nSets)&) - TOMPC(private(Set, ASet, angle, quadwt, quadTauW1, quadTauW2)) - - ZoneSetLoop1: do zSetID=1,nZoneSets - - do setID=1,nGTASets - Set => Quad% SetDataPtr(nSets+setID) - ASet => Quad% AngSetPtr(Set% angleSetID) + enddo ZoneSetLoop1 - angle = angleList(1,setID) - quadwt = ASet% weight(angle) - quadTauW1 = ASet% quadTauW1(angle) - quadTauW2 = ASet% quadTauW2(angle) - - if ( ASet% StartingDirection(angle) ) then - - !$omp parallel do default(none) & - !$omp& shared(Geom,Set,zSetID) - do c=Geom% corner1(zSetID),Geom% corner2(zSetID) - Set% tPsiM(c) = Set% tPsi(c) - Set% tInc(c) = Set% pInc(c) - enddo - !$omp end parallel do - - else - - !$omp parallel do default(none) & - !$omp& shared(Geom, Set, GTA, quadwt, quadTauW1, quadTauW2, zSetID) - do c=Geom% corner1(zSetID),Geom% corner2(zSetID) - Set% tPsiM(c) = quadTauW1*Set% tPsi(c) - quadTauW2*Set% tPsiM(c) - Set% tInc(c) = quadTauW1*Set% pInc(c) - quadTauW2*Set% tInc(c) - GTA% PhiInc(c) = GTA% PhiInc(c) + quadwt*Set% pInc(c) - enddo - !$omp end parallel do - - endif - - enddo - - enddo ZoneSetLoop1 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif - TOMP(end target teams distribute) - TOMP(target exit data map(release: angleList, nSets, nGTASets, nHyperDomains)) - TOMP(target exit data map(from: PsiB)) + TOMP_MAP(target exit data map(release: angleList, nSets, nGTASets, nHyperDomains)) + TOMP_MAP(target exit data map(from: PsiB)) deallocate( angleList ) diff --git a/src/teton/gpu/SweepGreyUCBxyz_OMPOL.F90 b/src/teton/gpu/SweepGreyUCBxyz_OMPOL.F90 index d08fda4..52e3a86 100644 --- a/src/teton/gpu/SweepGreyUCBxyz_OMPOL.F90 +++ b/src/teton/gpu/SweepGreyUCBxyz_OMPOL.F90 @@ -22,14 +22,14 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) use SetData_mod use AngleSet_mod use GreyAcceleration_mod - use ArrayChecks_mod + use CodeChecks_mod implicit none ! Arguments - integer, intent(in) :: sendIndex - real(adqt), intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) + integer, intent(in) :: sendIndex + real(adqt), intent(inout) :: PsiB(Size% nSurfElem,Size%nangGTA) ! Local @@ -44,7 +44,6 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) integer :: nGTASets integer :: nZoneSets integer :: nHyperDomains - integer :: iter integer :: angle0 integer :: angle @@ -96,14 +95,15 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) nSets = getNumberOfSets(Quad) nGTASets = getNumberOfGTASets(Quad) nZoneSets = getNumberOfZoneSets(Quad) - nHyperDomains = Quad% nHyperDomains + nHyperDomains = getNumberOfHyperDomains(Quad,2) + allocate( angleList(2,nGTASets) ) do setID=1,nGTASets - Set => Quad% SetDataPtr(nSets+setID) - angleList(1,setID) = Set% AngleOrder(sendIndex) - angleList(2,setID) = Set% angle0 + Set => Quad% SetDataPtr(nSets+setID) + angleList(1,setID) = Set% AngleOrder(sendIndex) + angleList(2,setID) = Set% angle0 enddo ! Verify we won't get out-of-bounds accesses below. @@ -111,61 +111,123 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) - TOMP(target enter data map(to: angleList, nSets, nGTASets, nHyperDomains, PsiB)) + + TOMP_MAP(target enter data map(to: angleList, nSets, nGTASets, nHyperDomains)) + TOMP_MAP(target enter data map(to: PsiB)) + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, nGTASets, Geom, Quad, nSets)&) - TOMPC(private(Set)) + TOMPC(private(Set, setID)) +#endif ZoneSetLoop0: do zSetID=1,nZoneSets do setID=1,nGTASets Set => Quad% SetDataPtr(nSets+setID) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) & !$omp& shared(Set, Geom, zSetID) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) Set% tPsi(c) = zero enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo enddo ZoneSetLoop0 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nGTASets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, angle0, angle, c) +#else TOMP(target teams distribute num_teams(nGTASets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(nGTASets, PsiB, Quad, nSets, angleList)&) - TOMPC(private(Set, angle0, angle)) + TOMPC(private(Set, ASet, HypPlanePtr, angle0, angle, c)) +#endif GTASetLoop0: do setID=1,nGTASets - Set => Quad% SetDataPtr(nSets+setID) - angle = angleList(1,setID) - angle0 = angleList(2,setID) + Set => Quad% SetDataPtr(nSets+setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + angle = angleList(1,setID) + angle0 = angleList(2,setID) + HypPlanePtr => ASet% HypPlanePtr(angle) ! Initialize Boundary Values +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) & !$omp& shared(Set, PsiB, angle0, angle) +#endif do b=1,Set%nbelem Set% tPsi(Set%nCorner+b) = PsiB(b,angle0+angle) enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif - enddo GTASetLoop0 - - TOMP(end target teams distribute) +! Initialize values at hyper-domain interfaces +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(c) +#else + !$omp parallel do default(none) & + !$omp& shared(Set, HypPlanePtr, PsiB, angle0, angle) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + c = HypPlanePtr% interfaceList(b) + Set% tPsi(c) = PsiB(Set%nbelem+b,angle0+angle) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif - SweepIteration: do iter=1,GTA% nGreySweepIters + enddo GTASetLoop0 - TOMP(target teams distribute collapse(2) num_teams(nHyperDomains*nGTASets) &) - TOMPC(thread_limit(omp_device_team_thread_limit) default(none) &) - TOMPC(shared(nGTASets, nHyperDomains, Geom, GTA, Quad, nSets, angleList)&) - TOMPC(private(Set, ASet, HypPlanePtr, angle0, angle, angGTA, hyperPlane, hplane1, hplane2, ndoneZ, nzones, zone, nCorner) &) - TOMPC(private(c0, ifp, cez, cfp, nCFaces, aez, area_opp, sigv, sigv2, gnum, gtau, sez, psi_opp, denom, afp)) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang collapse(2) num_gangs(nHyperDomains*nGTASets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, angle0, angle, angGTA, hyperPlane) & + !$acc& private(hplane1, hplane2, ndoneZ, nzones, zone, nCorner, c0, ifp) & + !$acc& private(cez, cfp, nCFaces, aez, area_opp, sigv, sigv2, gnum, gtau) & + !$acc& private(sez, psi_opp, denom, afp) +#else + TOMP(target teams distribute collapse(2) num_teams(nHyperDomains*nGTASets) &) + TOMPC(thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nGTASets, nHyperDomains, Geom, GTA, Quad, nSets, angleList)&) + TOMPC(private(Set, ASet, HypPlanePtr, angle0, angle, angGTA, hyperPlane) &) + TOMPC(private(hplane1, hplane2, ndoneZ, nzones, zone, nCorner, c0, ifp) &) + TOMPC(private(cez, cfp, nCFaces, aez, area_opp, sigv, sigv2, gnum, gtau) &) + TOMPC(private(sez, psi_opp, denom, afp)) +#endif GTASetLoop: do setID=1,nGTASets DomainLoop: do domID=1,nHyperDomains @@ -187,34 +249,78 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) nzones = HypPlanePtr% zonesInPlane(hyperPlane) -!$omp parallel do default(none) schedule(static) & -!$omp& shared(Geom, GTA, ASet, Set, ndoneZ, nzones, angle, angGTA) & -!$omp& private(zone, nCorner, c0, ifp, cez, cfp, nCFaces, aez, area_opp, sigv, sigv2, gnum, gtau, sez, psi_opp) & -!$omp& private(denom, afp) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(zone, nCorner, c0, cfp, nCFaces, afp) +#else + !$omp parallel do default(none) collapse(2) schedule(static) & + !$omp& shared(Geom, GTA, ASet, Set, ndoneZ, nzones, angle, angGTA) & + !$omp& private(zone, nCorner, c0, cfp, nCFaces, afp, cface) +#endif - ZoneLoop: do ii=1,nzones + ZoneLoop1: do ii=1,nzones + CornerLoop1: do c=1,8 - zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) - nCorner = Geom% numCorner(zone) - c0 = Geom% cOffSet(zone) + zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) + c0 = Geom% cOffSet(zone) ! Contributions from volume terms - do c=1,nCorner Set% src(c0+c) = GTA%TsaSource(c0+c) Set% pInc(c0+c) = zero - enddo - CornerLoop: do c=1,nCorner +! Contributions from external corner faces (FP faces) - sigv = Geom% Volume(c0+c)*GTA%GreySigTotal(c0+c) nCFaces = Geom% nCFacesArray(c0+c) + do cface=1,nCFaces + + afp = GTA% AfpNorm(cface,c0+c,angGTA) + cfp = Geom% cFP(cface,c0+c) + + if ( afp < zero ) then + Set% src(c0+c) = Set% src(c0+c) - afp*Set% tPsi(cfp) + Set% pInc(c0+c) = Set% pInc(c0+c) - afp*Set% tPsi(cfp) + endif + enddo + + enddo CornerLoop1 + enddo ZoneLoop1 + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(zone, nCorner, c0, cfp, nCFaces, afp) +#else + !$omp parallel do default(none) schedule(static) & + !$omp& shared(Geom, GTA, ASet, Set, ndoneZ, nzones, angle, angGTA) & + !$omp& private(zone, nCorner, c0, cfp, nCFaces, afp, c, cface) +#endif + + ZoneLoop2: do ii=1,nzones + + zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) + nCorner = Geom% numCorner(zone) + c0 = Geom% cOffSet(zone) + + CornerLoop2: do c=9,nCorner + +! Contributions from volume terms + + Set% src(c0+c) = GTA%TsaSource(c0+c) + Set% pInc(c0+c) = zero + ! Contributions from external corner faces (FP faces) + nCFaces = Geom% nCFacesArray(c0+c) + do cface=1,nCFaces - afp = GTA% AfpNorm(cface,c0+c,angGTA) + afp = GTA% AfpNorm(cface,c0+c,angGTA) cfp = Geom% cFP(cface,c0+c) if ( afp < zero ) then @@ -223,6 +329,34 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) endif enddo + enddo CornerLoop2 + enddo ZoneLoop2 + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(zone, nCorner, c0, ifp, cez, cfp, nCFaces, aez, area_opp) & + !$acc& private(sigv, sigv2, gnum, gtau, sez, psi_opp, denom, afp) +#else + !$omp parallel do collapse(2) default(none) schedule(static) & + !$omp& shared(Geom, GTA, ASet, Set, ndoneZ, nzones, angle, angGTA) & + !$omp& private(zone, nCorner, c0, ifp, cez, cfp, nCFaces, aez, area_opp) & + !$omp& private(sigv, sigv2, gnum, gtau, sez, psi_opp, denom, afp, cface, i) +#endif + + ZoneLoop3: do ii=1,nzones + CornerLoop3: do c=1,8 + + zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) + c0 = Geom% cOffSet(zone) + + sigv = Geom% Volume(c0+c)*GTA%GreySigTotal(c0+c) + nCFaces = Geom% nCFacesArray(c0+c) + ! Contributions from interior corner faces (EZ faces) do cface=1,ncfaces @@ -269,25 +403,156 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) sez = gtau*sigv*( psi_opp - GTA% Q(c0+c) ) + & half*aez*(one - gtau)*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) - Set% src(c0+c) = Set% src(c0+c) + sez - Set% src(c0+cez) = Set% src(c0+cez) - sez - + ATOMIC_UPDATE Set% pInc(c0+c) = Set% pInc(c0+c) + gtau*sigv*psi_opp + ATOMIC_END + + ATOMIC_UPDATE Set% pInc(c0+cez) = Set% pInc(c0+cez) - gtau*sigv*psi_opp + ATOMIC_END else - sez = half*aez*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) - Set% src(c0+c) = Set% src(c0+c) + sez - Set% src(c0+cez) = Set% src(c0+cez) - sez + sez = half*aez*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) endif TestOppositeFace + ATOMIC_UPDATE + Set% src(c0+c) = Set% src(c0+c) + sez + ATOMIC_END + + ATOMIC_UPDATE + Set% src(c0+cez) = Set% src(c0+cez) - sez + ATOMIC_END + + endif + + enddo + + enddo CornerLoop3 + enddo ZoneLoop3 + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(zone, nCorner, c0, ifp, cez, cfp, nCFaces, aez, area_opp) & + !$acc& private(sigv, sigv2, gnum, gtau, sez, psi_opp, denom, afp) +#else + !$omp parallel do default(none) schedule(static) & + !$omp& shared(Geom, GTA, ASet, Set, ndoneZ, nzones, angle, angGTA) & + !$omp& private(zone, nCorner, c0, ifp, cez, cfp, nCFaces, aez, area_opp) & + !$omp& private(sigv, sigv2, gnum, gtau, sez, psi_opp, denom, afp, c, cface) +#endif + + ZoneLoop4: do ii=1,nzones + + zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) + nCorner = Geom% numCorner(zone) + c0 = Geom% cOffSet(zone) + + CornerLoop4: do c=9,nCorner + + sigv = Geom% Volume(c0+c)*GTA%GreySigTotal(c0+c) + nCFaces = Geom% nCFacesArray(c0+c) + +! Contributions from interior corner faces (EZ faces) + + do cface=1,ncfaces + + aez = GTA% AezNorm(cface,c0+c,angGTA) + cez = Geom% cEZ(cface,c0+c) + + if (aez > zero ) then + + psi_opp = zero + area_opp = zero + + ifp = mod(cface,nCFaces) + 1 + afp = GTA% AfpNorm(ifp,c0+c,angGTA) + + if ( afp < zero ) then + cfp = Geom% cFP(ifp,c0+c) + area_opp = -afp + psi_opp = -afp*Set% tPsi(cfp) + endif + + do i=2,nCFaces-2 + ifp = mod(ifp,nCFaces) + 1 + afp = GTA% AfpNorm(ifp,c0+c,angGTA) + if ( afp < zero ) then + cfp = Geom% cFP(ifp,c0+c) + area_opp = area_opp - afp + psi_opp = psi_opp - afp*Set% tPsi(cfp) + endif + enddo + + TestOppositeFace4: if (area_opp > zero) then + + psi_opp = psi_opp/area_opp + sigv2 = sigv*sigv + + gnum = aez*aez*( fouralpha*sigv2 + & + aez*(four*sigv + three*aez) ) + + gtau = gnum/ & + ( gnum + four*sigv2*sigv2 + aez*sigv*(six*sigv2 + & + two*aez*(two*sigv + aez)) ) + + sez = gtau*sigv*( psi_opp - GTA% Q(c0+c) ) + & + half*aez*(one - gtau)*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) + + ATOMIC_UPDATE + Set% pInc(c0+c) = Set% pInc(c0+c) + gtau*sigv*psi_opp + ATOMIC_END + + ATOMIC_UPDATE + Set% pInc(c0+cez) = Set% pInc(c0+cez) - gtau*sigv*psi_opp + ATOMIC_END + + else + + sez = half*aez*( GTA% Q(c0+c) - GTA% Q(c0+cez) ) + + endif TestOppositeFace4 + + ATOMIC_UPDATE + Set% src(c0+c) = Set% src(c0+c) + sez + ATOMIC_END + + ATOMIC_UPDATE + Set% src(c0+cez) = Set% src(c0+cez) - sez + ATOMIC_END + endif enddo - enddo CornerLoop + enddo CornerLoop4 + enddo ZoneLoop4 + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(zone, nCorner, c0, c, cez, nCFaces, aez, denom) +#else + !$omp parallel do default(none) schedule(static) & + !$omp& shared(Geom, GTA, ASet, Set, ndoneZ, nzones, angle, angGTA) & + !$omp& private(zone, nCorner, c0, c, cez, nCFaces, aez, denom, i, cface) +#endif + + ZoneLoop5: do ii=1,nzones + + zone = iabs( ASet% nextZ(ndoneZ+ii,angle) ) + nCorner = Geom% numCorner(zone) + c0 = Geom% cOffSet(zone) do i=1,nCorner c = ASet% nextC(c0+i,angle) @@ -316,8 +581,10 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) enddo - enddo ZoneLoop -!$omp end parallel do + enddo ZoneLoop5 +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif ndoneZ = ndoneZ + nzones @@ -326,28 +593,40 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) enddo DomainLoop enddo GTASetLoop +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) - - enddo SweepIteration - +#endif ! Update exiting boundary fluxes +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nGTASets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, BdyExitPtr, HypPlanePtr, angle0, angle, b, c) +#else TOMP(target teams distribute num_teams(nGTASets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nGTASets, nSets, PsiB, angleList, Quad)&) - TOMPC(private(Set, ASet, BdyExitPtr, angle0, angle, b, c)) + TOMPC(private(Set, ASet, BdyExitPtr, HypPlanePtr, angle0, angle, b, c)) +#endif do setID=1,nGTASets - Set => Quad% SetDataPtr(nSets+setID) - ASet => Quad% AngSetPtr(Set% angleSetID) - angle = angleList(1,setID) - angle0 = angleList(2,setID) - BdyExitPtr => ASet% BdyExitPtr(angle) - -!$omp parallel do default(none) & -!$omp& shared(Set, BdyExitPtr, angle0, angle, PsiB) & -!$omp& private(b,c) + Set => Quad% SetDataPtr(nSets+setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + angle = angleList(1,setID) + angle0 = angleList(2,setID) + BdyExitPtr => ASet% BdyExitPtr(angle) + HypPlanePtr => ASet% HypPlanePtr(angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(b,c) +#else + !$omp parallel do default(none) & + !$omp& shared(Set, BdyExitPtr, angle0, angle, PsiB) & + !$omp& private(b,c) +#endif do i=1,BdyExitPtr% nxBdy b = BdyExitPtr% bdyList(1,i) @@ -355,41 +634,80 @@ subroutine SweepGreyUCBxyzNEW_GPU(sendIndex, PsiB) PsiB(b,angle0+angle) = Set% tPsi(c) enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + +! Update Interface values -!$omp end parallel do +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(c) +#else + !$omp parallel do default(none) & + !$omp& shared(Set, HypPlanePtr, angle0, angle, PsiB) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + c = HypPlanePtr% interfaceList(b) + PsiB(Set%nbelem+b,angle0+angle) = Set% tPsi(c) + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, zSetID, angle, quadwt) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, nGTASets, nSets, GTA, angleList, Quad, Geom)&) - TOMPC(private(Set, ASet, zSetID, angle, quadwt)) + TOMPC(private(Set, ASet, zSetID, angle, quadwt, setID)) +#endif - ZoneSetLoop3: do zSetID=1,nZoneSets + ZoneSetLoop3: do zSetID=1,nZoneSets - do setID=1,nGTASets - - Set => Quad% SetDataPtr(nSets+setID) - ASet => Quad% AngSetPtr(Set% angleSetID) + do setID=1,nGTASets - angle = angleList(1,setID) - quadwt = ASet% weight(angle) + Set => Quad% SetDataPtr(nSets+setID) + ASet => Quad% AngSetPtr(Set% angleSetID) -!$omp parallel do default(none) & -!$omp& shared(Geom, Set, GTA, quadwt, zSetID) - do c=Geom% corner1(zSetID),Geom% corner2(zSetID) - GTA% PhiInc(c) = GTA% PhiInc(c) + quadwt*Set% pInc(c) - enddo -!$omp end parallel do + angle = angleList(1,setID) + quadwt = ASet% weight(angle) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, Set, GTA, quadwt, zSetID) +#endif + do c=Geom% corner1(zSetID),Geom% corner2(zSetID) + GTA% PhiInc(c) = GTA% PhiInc(c) + quadwt*Set% pInc(c) enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + enddo - enddo ZoneSetLoop3 + enddo ZoneSetLoop3 +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) - TOMP(target exit data map(release: angleList, nSets, nGTASets, nHyperDomains)) - TOMP(target exit data map(from: PsiB)) +#endif + + TOMP_MAP(target exit data map(from: PsiB)) + TOMP_MAP(target exit data map(release: angleList, nSets, nGTASets, nHyperDomains)) + deallocate( angleList ) diff --git a/src/teton/gpu/SweepUCBrz_OMPOL.F90 b/src/teton/gpu/SweepUCBrz_OMPOL.F90 index 268dea5..87df544 100644 --- a/src/teton/gpu/SweepUCBrz_OMPOL.F90 +++ b/src/teton/gpu/SweepUCBrz_OMPOL.F90 @@ -29,7 +29,7 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) use SetData_mod use AngleSet_mod use GroupSet_mod - use ArrayChecks_mod + use CodeChecks_mod implicit none @@ -46,25 +46,29 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) type(GroupSet), pointer :: GSet type(HypPlane), pointer :: HypPlanePtr type(BdyExit), pointer :: BdyExitPtr - - integer :: setID - integer :: zSetID - integer :: Angle - integer :: g - integer :: Groups - - integer :: mCycle - integer :: offset - integer :: nAngleSets - integer :: nZoneSets - integer :: nzones - integer :: c - integer :: ii - integer :: ndoneZ - integer :: hyperPlane - integer :: nHyperplanes - - real(adqt) :: tau + type(SweepSet), pointer :: Swp + + integer :: setID + integer :: zSetID + integer :: Angle + integer :: g + integer :: Groups + + integer :: mCycle + integer :: offset + integer :: nAngleSets + integer :: nZoneSets + integer :: nHyperDomains + + integer :: nzones + integer :: ii + integer :: ndoneZ + integer :: hyperPlane + integer :: domID + integer :: hplane1 + integer :: hplane2 + + real(adqt) :: tau ! Local @@ -73,12 +77,13 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) integer :: cface integer :: cez integer :: cfp + integer :: c integer :: c0 - integer :: zone integer :: nCorner real(adqt), parameter :: fouralpha=1.82d0 + real(adqt) :: fac real(adqt) :: sigA real(adqt) :: sigA2 @@ -105,9 +110,10 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) integer, allocatable :: angleList(:) ! Constants - tau = Size% tau - nAngleSets = getNumberOfAngleSets(Quad) - nZoneSets = getNumberOfZoneSets(Quad) + tau = Size% tau + nAngleSets = getNumberOfAngleSets(Quad) + nZoneSets = getNumberOfZoneSets(Quad) + nHyperDomains = getNumberOfHyperDomains(Quad,1) allocate( angleList(nAngleSets) ) @@ -128,12 +134,17 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) - TOMP(target enter data map(to: tau, sendIndex, angleList)) + TOMP_MAP(target enter data map(to: tau, sendIndex, angleList)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(ASet, angle) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, nAngleSets,Geom, angleList, Quad)&) - TOMPC(private(ASet, Angle)) + TOMPC(private(ASet, angle)) +#endif ZoneSetLoop: do zSetID=1,nZoneSets @@ -144,27 +155,41 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) ASet => Quad% AngSetPtr(setID) angle = angleList(setID) -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Geom, ASet, Angle, zSetID) private(c,cface) - +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Geom, ASet, Angle, zSetID) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do cface=1,2 ASet% AfpNorm(cface,c) = DOT_PRODUCT( ASet% omega(:,angle),Geom% A_fp(:,cface,c) ) ASet% AezNorm(cface,c) = DOT_PRODUCT( ASet% omega(:,angle),Geom% A_ez(:,cface,c) ) enddo enddo - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo enddo ZoneSetLoop -TOMP(end target teams distribute) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nZoneSets, nAngleSets, Geom, angleList, Quad)&) -TOMPC(private(ASet, angle, fac, R_afp,R_afp2,R,R2)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(ASet, angle, fac, R_afp, R_afp2, R, R2) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, nAngleSets, Geom, angleList, Quad)&) + TOMPC(private(ASet, angle, fac, R_afp, R_afp2, R, R2)) +#endif ZoneSetLoop2: do zSetID=1,nZoneSets @@ -176,8 +201,13 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) angle = angleList(setID) fac = ASet% angDerivFac(Angle) -!$omp parallel do default(none) & -!$omp& shared(Geom, ASet, angle, fac, zSetID) private(R_afp,R_afp2,R,R2) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(R_afp, R_afp2, R, R2) +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, ASet, angle, fac, zSetID) private(R_afp, R_afp2, R, R2) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) R_afp = Geom% RadiusFP(1,c) @@ -191,14 +221,104 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) R *(ASet% AezNorm(1,c) - abs(ASet% AezNorm(1,c))) + & R2 *(ASet% AezNorm(2,c) - abs(ASet% AezNorm(2,c))) ) enddo - -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo enddo ZoneSetLoop2 -TOMP(end target teams distribute) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, Angle, Groups, offSet) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(sendIndex, Quad, nSets) &) + TOMPC(private(Set, ASet, HypPlanePtr, Angle, Groups, offSet)) +#endif + + SetLoop0: do setID=1,nSets + + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + offSet = ASet% cycleOffSet(angle) + HypPlanePtr => ASet% HypPlanePtr(angle) + +! Initialize boundary values in Psi1 and interior values on the cycle +! list + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) +#endif + do mCycle=1,ASet% numCycles(Angle) + do g=1,Groups + c = ASet% cycleList(offSet+mCycle) + Set% Psi1(g,c) = Set% cyclePsi(g,offSet+mCycle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups, Angle) +#endif + do c=1,Set%nbelem + do g=1,Groups + Set% Psi1(g,Set%nCorner+c) = Set% PsiB(g,c,Angle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + +! Initialize values at hyper-domain interfaces + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(b) + Set% Psi1(g,c) = Set% PsiInt(g,b,angle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + enddo SetLoop0 + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + ! TODO: ! IBM XLF segfaults if 'mCycle', 'b', and 'g' are not scoped to private below. This should not @@ -213,22 +333,36 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) ! the loop.` ! ! Look into reporting this bug to IBM, using UMT as a reproducer. -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nSets, Quad, Geom, sendIndex, tau)&) -TOMPC(private(c, cfp, Set, ASet, GSet, HypPlanePtr, Angle) &) -TOMPC(private(Groups, nHyperPlanes, ndoneZ, mCycle, b, g, offset, hyperPlane, nzones, fac)&) -TOMPC(private(c0,cez,zone,nCorner, sigA,sigA2,source,area,sig,sez,gnum,gden, aez,afp,R,R_afp,denom)) + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang collapse(2) num_gangs(nSets*nHyperDomains) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, GSet, HypPlanePtr, Swp, Angle, Groups, hplane1, hplane2, ndoneZ) & + !$acc& private(hyperPlane, nzones, fac, c, c0, cfp, cez, zone, nCorner) & + !$acc& private(sigA, sigA2, source, area, sig, sez, gnum, gden, aez, afp, R, R_afp, denom) +#else + TOMP(target teams distribute collapse(2) num_teams(nSets*nHyperDomains) default(none) &) + TOMPC(thread_limit(omp_device_team_thread_limit) &) + TOMPC(shared(nSets, nHyperDomains, Quad, Geom, sendIndex, tau)&) + TOMPC(private(Set, ASet, GSet, HypPlanePtr, Swp, Angle, Groups, hplane1, hplane2, ndoneZ) &) + TOMPC(private(b, g, hyperPlane, nzones, fac, c, c0, cfp, cez, zone, nCorner)&) + TOMPC(private(sigA, sigA2, source, area, sig, sez, gnum, gden, aez, afp, R, R_afp, denom)) +#endif SetLoop: do setID=1,nSets + DomainLoop: do domID=1,nHyperDomains Set => Quad% SetDataPtr(setID) ASet => Quad% AngSetPtr(Set% angleSetID) GSet => Quad% GrpSetPtr(Set% groupSetID) + Swp => Set% SweepPtr(domID) Groups = Set% Groups Angle = Set% AngleOrder(sendIndex) - nHyperPlanes = ASet% nHyperPlanes(Angle) - ndoneZ = 0 + HypPlanePtr => ASet% HypPlanePtr(Angle) + hplane1 = HypPlanePtr% hplane1(domID) + hplane2 = HypPlanePtr% hplane2(domID) + ndoneZ = HypPlanePtr% ndone(domID) ! Angle Constants @@ -237,37 +371,20 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) ! Initialize variable. cfp = -1 -! Initialize boundary values in Psi1 and interior values on the cycle -! list - - HypPlanePtr => ASet% HypPlanePtr(Angle) - offSet = ASet% cycleOffSet(angle) - - do mCycle=1,ASet% numCycles(Angle) - c = ASet% cycleList(offSet+mCycle) - - do g=1,Groups - Set% Psi1(g,c) = Set% cyclePsi(g,offSet+mCycle) - enddo - enddo - -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Groups, Angle) - do b=1,Set%nbelem - do g=1,Groups - Set% Psi1(g,Set%nCorner+b) = Set% PsiB(g,b,Angle) - enddo - enddo -!$omp end parallel do - - HyperPlaneLoop: do hyperPlane=1,nHyperPlanes + HyperPlaneLoop: do hyperPlane=hplane1,hplane2 nzones = HypPlanePtr% zonesInPlane(hyperPlane) -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Geom, ASet, GSet, Angle, nzones, Groups, ndoneZ, tau, fac) & -!$omp& private(c0,cez,cfp,zone,nCorner,sigA,sigA2,source,area,sig,sez,gnum,gden) & -!$omp& private(aez,afp,R,R_afp,denom) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c0, cez, cfp, zone, nCorner, sigA, sigA2, source) & + !$acc& private(area, sig, sez, gnum, gden, aez, afp, R, R_afp, denom) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Geom, ASet, GSet, Swp, Angle, nzones, Groups, ndoneZ, tau, fac) & + !$omp& private(c0, cez, cfp, zone, nCorner, sigA, sigA2, source) & + !$omp& private(area, sig, sez, gnum, gden, aez, afp, R, R_afp, denom) +#endif ZoneLoop: do ii=1,nzones GroupLoop: do g=1,Groups @@ -284,8 +401,8 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) do c=1,nCorner source = GSet% STotal(g,c0+c) + tau*Set% Psi(g,c0+c,Angle) - Set% Q(g,c,ii) = source - Set% S(g,c,ii) = Geom% Volume(c0+c)*source + & + Swp% Q(g,c,ii) = source + Swp% S(g,c,ii) = Geom% Volume(c0+c)*source + & fac*Geom% Area(c0+c)*Set% PsiM(g,c0+c) enddo @@ -299,7 +416,7 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) if ( afp < zero ) then cfp = Geom% cFP(cface,c0+c) R_afp = Geom% RadiusFP(cface,c0+c)*afp - Set% S(g,c,ii) = Set% S(g,c,ii) - R_afp*Set% Psi1(g,cfp) + Swp% S(g,c,ii) = Swp% S(g,c,ii) - R_afp*Set% Psi1(g,cfp) endif if ( aez > zero ) then @@ -320,18 +437,18 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) two*aez*(two*sigA + aez))) sez = R*( area*gnum* & - ( sig*Set% Psi1(g,cfp) - Set% Q(g,c,ii) ) + & - half*aez*gden*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) ) )/ & + ( sig*Set% Psi1(g,cfp) - Swp% Q(g,c,ii) ) + & + half*aez*gden*( Swp% Q(g,c,ii) - Swp% Q(g,cez,ii) ) )/ & ( gnum + gden*sig ) - Set% S(g,c,ii) = Set% S(g,c,ii) + sez - Set% S(g,cez,ii) = Set% S(g,cez,ii) - sez + Swp% S(g,c,ii) = Swp% S(g,c,ii) + sez + Swp% S(g,cez,ii) = Swp% S(g,cez,ii) - sez else - sez = half*R*aez*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) )/sig - Set% S(g,c,ii) = Set% S(g,c,ii) + sez - Set% S(g,cez,ii) = Set% S(g,cez,ii) - sez + sez = half*R*aez*( Swp% Q(g,c,ii) - Swp% Q(g,cez,ii) )/sig + Swp% S(g,c,ii) = Swp% S(g,c,ii) + sez + Swp% S(g,cez,ii) = Swp% S(g,cez,ii) - sez endif @@ -349,7 +466,7 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) ! Corner angular flux denom = ASet% ANormSum(c0+c) + sig*Geom% Volume(c0+c) - Set% Psi1(g,c0+c) = Set% S(g,c,ii)/denom + Set% Psi1(g,c0+c) = Swp% S(g,c,ii)/denom ! Calculate the contribution of this flux to the sources of ! downstream corners in this zone. The downstream corner index is @@ -361,7 +478,7 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) if (aez > zero) then R = Geom% RadiusEZ(cface,c0+c) cez = Geom% cEZ(cface,c0+c) - Set% S(g,cez,ii) = Set% S(g,cez,ii) + R*aez*Set% Psi1(g,c0+c) + Swp% S(g,cez,ii) = Swp% S(g,cez,ii) + R*aez*Set% Psi1(g,c0+c) endif enddo @@ -369,30 +486,32 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) enddo GroupLoop enddo ZoneLoop - -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif ndoneZ = ndoneZ + nzones enddo HyperPlaneLoop -! Update Psi in the cycle list - - do mCycle=1,ASet% numCycles(angle) - c = ASet% cycleList(offSet+mCycle) - - do g=1,Groups - Set% cyclePsi(g,offSet+mCycle) = Set% Psi1(g,c) - enddo - enddo + enddo DomainLoop enddo SetLoop -TOMP(end target teams distribute) -TOMP(target exit data map(release: tau, sendIndex, angleList)) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nSets, Quad, sendIndex)&) -TOMPC(private(Set, ASet, Angle, Groups, quadTauW1, quadTauW2)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, Angle, Groups, quadTauW1, quadTauW2) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nSets, Quad, sendIndex)&) + TOMPC(private(Set, ASet, Angle, Groups, quadTauW1, quadTauW2)) +#endif SetLoop2: do setID=1,nSets @@ -406,24 +525,32 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) if ( ASet% StartingDirection(Angle) ) then -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Groups) - +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups) +#endif do c=1,Set% nCorner do g=1,Groups Set% PsiM(g,c) = Set% Psi1(g,c) enddo enddo - -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif else quadTauW1 = ASet% quadTauW1(Angle) quadTauW2 = ASet% quadTauW2(Angle) -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Groups, quadTauW1, quadTauW2) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups, quadTauW1, quadTauW2) +#endif do c=1,Set% nCorner do g=1,Groups @@ -431,31 +558,46 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) quadTauW2*Set% PsiM(g,c) enddo enddo - -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif endif enddo SetLoop2 - -TOMP(end target teams distribute) -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nSets, Quad, sendIndex)&) -TOMPC(private(Set, ASet, BdyExitPtr, Angle, Groups, b, c)) - -!!TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) private(setID, Set, ASet, BdyExitPtr, Angle, Groups)) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Angle, Groups, b, c) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nSets, Quad, sendIndex)&) + TOMPC(private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Angle, Groups, b, c)) +#endif SetLoop3: do setID=1,nSets - Set => Quad% SetDataPtr(setID) - ASet => Quad% AngSetPtr(Set% angleSetID) - Groups = Set% Groups - Angle = Set% AngleOrder(sendIndex) - BdyExitPtr => ASet% BdyExitPtr(Angle) - -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, BdyExitPtr, Angle, Groups) private(b,c) - + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + offSet = ASet% cycleOffSet(angle) + BdyExitPtr => ASet% BdyExitPtr(Angle) + HypPlanePtr => ASet% HypPlanePtr(angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(b, c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, BdyExitPtr, Angle, Groups) private(b, c) +#endif do i=1,BdyExitPtr% nxBdy do g=1,Groups b = BdyExitPtr% bdyList(1,i) @@ -464,14 +606,60 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) Set% PsiB(g,b,Angle) = Set% Psi1(g,c) enddo enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif -!$omp end parallel do +! Update Interface Elements - if ( ASet% FinishingDirection(Angle+1) ) then +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) +#endif + + do i=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(i) + Set% PsiInt(g,i,angle) = Set% Psi1(g,c) + enddo + enddo + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, BdyExitPtr, Angle, Groups) private(b,c) +! Update Psi in the cycle list +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) +#endif + do mCycle=1,ASet% numCycles(angle) + do g=1,Groups + c = ASet% cycleList(offSet+mCycle) + Set% cyclePsi(g,offSet+mCycle) = Set% Psi1(g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + + if ( ASet% FinishingDirection(Angle+1) ) then + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(b, c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, BdyExitPtr, Angle, Groups) private(b, c) +#endif do i=1,BdyExitPtr% nxBdy do g=1,Groups b = BdyExitPtr% bdyList(1,i) @@ -480,24 +668,33 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) Set% PsiB(g,b,Angle+1) = Set% PsiM(g,c) enddo enddo - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif endif enddo SetLoop3 -TOMP(end target teams distribute) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif ! We only store Psi if this is the last transport sweep in the time step if ( savePsi ) then -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nSets, sendIndex, Quad)&) -TOMPC(private(Set, ASet, Angle, Groups)) - +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, Angle, Groups) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nSets, sendIndex, Quad)&) + TOMPC(private(Set, ASet, Angle, Groups)) +#endif SetLoop4: do setID=1,nSets Set => Quad% SetDataPtr(setID) @@ -506,9 +703,12 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) Groups = Set% Groups Angle = Set% AngleOrder(sendIndex) -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, ASet, Angle, Groups) - +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, ASet, Angle, Groups) +#endif CornerLoop4: do c=1,Set% nCorner GroupLoop4: do g=1,Groups @@ -520,15 +720,23 @@ subroutine SweepUCBrz_GPU(nSets, sendIndex, savePsi) enddo GroupLoop4 enddo CornerLoop4 - -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo SetLoop4 -TOMP(end target teams distribute) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif endif + + TOMP_MAP(target exit data map(always,release: tau, sendIndex, angleList)) + deallocate( angleList ) diff --git a/src/teton/gpu/SweepUCBxyz_OMPOL.F90 b/src/teton/gpu/SweepUCBxyz_OMPOL.F90 index a23d3de..ae4a7ed 100644 --- a/src/teton/gpu/SweepUCBxyz_OMPOL.F90 +++ b/src/teton/gpu/SweepUCBxyz_OMPOL.F90 @@ -28,7 +28,7 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) use SetData_mod use AngleSet_mod use GroupSet_mod - use ArrayChecks_mod + use CodeChecks_mod implicit none @@ -45,25 +45,29 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) type(GroupSet), pointer :: GSet type(HypPlane), pointer :: HypPlanePtr type(BdyExit), pointer :: BdyExitPtr - - integer :: setID - integer :: zSetID - integer :: Angle - integer :: g - integer :: Groups - - integer :: mCycle - integer :: offSet - integer :: nAngleSets - integer :: nZoneSets - - integer :: nzones - integer :: ii - integer :: ndoneZ - integer :: hyperPlane - integer :: nHyperplanes - - real(adqt) :: tau + type(SweepSet), pointer :: Swp + + integer :: setID + integer :: zSetID + integer :: Angle + integer :: g + integer :: Groups + + integer :: mCycle + integer :: offSet + integer :: nAngleSets + integer :: nZoneSets + integer :: nHyperDomains + + integer :: nzones + integer :: ii + integer :: ndoneZ + integer :: hyperPlane + integer :: domID + integer :: hplane1 + integer :: hplane2 + + real(adqt) :: tau ! Local @@ -105,9 +109,10 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) ! Constants - tau = Size% tau - nAngleSets = getNumberOfAngleSets(Quad) - nZoneSets = getNumberOfZoneSets(Quad) + tau = Size% tau + nAngleSets = getNumberOfAngleSets(Quad) + nZoneSets = getNumberOfZoneSets(Quad) + nHyperDomains = getNumberOfHyperDomains(Quad,1) allocate( angleList(nAngleSets) ) @@ -130,21 +135,17 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) TETON_CHECK_BOUNDS1(Geom%corner1, nZoneSets) TETON_CHECK_BOUNDS1(Geom%corner2, nZoneSets) - TOMP(target enter data map(to: tau, sendIndex, angleList)) + TOMP_MAP(target enter data map(to: tau, sendIndex, angleList)) #ifdef TETON_ENABLE_OPENACC - !$acc parallel loop gang & - !$acc& num_gangs(nZoneSets) & - !$acc& vector_length(omp_device_team_thread_limit) & - !$acc& private(ASet, setID, Angle) + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(ASet, angle) #else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) - TOMPC(private(ASet, setID, Angle) &) + TOMPC(private(ASet, angle) &) TOMPC(shared(nZoneSets, angleList, Quad, Geom, nAngleSets) ) #endif - !TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) private(ASet, zSetID, setID, Angle)) - ZoneSetLoop: do zSetID=1,nZoneSets ! Loop over angle sets @@ -159,10 +160,10 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) ! faster to split into two loops as below #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) + !$acc loop vector collapse(2) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Geom, ASet, Angle, zSetID) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Geom, ASet, Angle, zSetID) #endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do cface=1,3 @@ -171,14 +172,14 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif #ifdef TETON_ENABLE_OPENACC -!$acc loop vector + !$acc loop vector #else -!$omp parallel do default(none) & -!$omp& shared(Geom, ASet, Angle, zSetID) + !$omp parallel do default(none) & + !$omp& shared(Geom, ASet, Angle, zSetID) #endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do cface=4,Geom% nCFacesArray(c) @@ -187,7 +188,7 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif enddo @@ -195,24 +196,20 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo ZoneSetLoop #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang & -!$acc& num_gangs(nZoneSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(ASet) + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(ASet) #else -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(private(ASet) &) -TOMPC(shared(nZoneSets, nAngleSets, Quad, Geom)) + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nZoneSets, nAngleSets, Quad, Geom) &) + TOMPC(private(ASet)) #endif -!!TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) private(ASet, zSetID, setID)) - ZoneSetLoop2: do zSetID=1,nZoneSets ! Loop over angle sets @@ -222,12 +219,11 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) ASet => Quad% AngSetPtr(setID) #ifdef TETON_ENABLE_OPENACC -!$acc loop vector + !$acc loop vector #else -!$omp parallel do default(none) & -!$omp& shared(Geom, ASet, zSetID) + !$omp parallel do default(none) & + !$omp& shared(Geom, ASet, zSetID) #endif - do c=Geom% corner1(zSetID),Geom% corner2(zSetID) ASet% ANormSum(c) = zero do cface=1,Geom% nCFacesArray(c) @@ -236,9 +232,8 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) ASet% AezNorm(cface,c) + abs( ASet% AezNorm(cface,c) ) ) enddo enddo - #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif enddo @@ -246,26 +241,21 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo ZoneSetLoop2 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang & -!$acc& num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, ASet, Angle, Groups, offSet) + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, Angle, Groups, offSet) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(sendIndex, Quad, nSets) &) -TOMPC(private(Set, ASet, Angle, Groups, offSet)) + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(sendIndex, Quad, nSets) &) + TOMPC(private(Set, ASet, HypPlanePtr, Angle, Groups, offSet, c)) #endif -!TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) private(Set, ASet, setID, Angle, Groups) &) -!TOMPC(private(mCycle, c, g, offSet)) - SetLoop0: do setID=1,nSets Set => Quad% SetDataPtr(setID) @@ -274,16 +264,17 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) Groups = Set% Groups Angle = Set% AngleOrder(sendIndex) offSet = ASet% cycleOffSet(angle) + HypPlanePtr => ASet% HypPlanePtr(angle) ! Initialize boundary values in Psi1 and interior values on the cycle -! list +! and hyper-domain interface lists #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(c) + !$acc loop vector collapse(2) & + !$acc& private(c) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) #endif do mCycle=1,ASet% numCycles(Angle) do g=1,Groups @@ -292,15 +283,15 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) + !$acc loop vector collapse(2) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Groups, Angle) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Groups, Angle) #endif do c=1,Set%nbelem do g=1,Groups @@ -308,15 +299,34 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do +#endif + +! Initialize values at hyper-domain interfaces + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) +#endif + do b=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(b) + Set% Psi1(g,c) = Set% PsiInt(g,b,angle) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do #endif enddo SetLoop0 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif @@ -336,47 +346,50 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) ! Look into reporting this bug to IBM, using UMT as a reproducer. #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang & -!$acc& num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, ASet, GSet, HypPlanePtr, Angle, Groups) & -!$acc& private(nHyperPlanes, ndoneZ, nzones, hyperPlane) + !$acc parallel loop gang collapse(2) num_gangs(nSets*nHyperDomains) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, GSet, Swp, HypPlanePtr, Angle, Groups) & + !$acc& private(hplane1, hplane2, ndoneZ, nzones, hyperPlane) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) &) -TOMPC(private(Set, ASet, GSet, HypPlanePtr, Angle, Groups) &) -TOMPC(private(nHyperPlanes, ndoneZ, nzones, hyperPlane)) + TOMP(target teams distribute collapse(2) num_teams(nSets*nHyperDomains) &) + TOMPC(thread_limit(omp_device_team_thread_limit) &) + TOMPC(private(Set, ASet, GSet, Swp, HypPlanePtr, Angle, Groups) &) + TOMPC(private(hplane1, hplane2, ndoneZ, nzones, hyperPlane)) #endif SetLoop: do setID=1,nSets + DomainLoop: do domID=1,nHyperDomains Set => Quad% SetDataPtr(setID) ASet => Quad% AngSetPtr(Set% angleSetID) GSet => Quad% GrpSetPtr(Set% groupSetID) + Swp => Set% SweepPtr(domID) Groups = Set% Groups Angle = Set% AngleOrder(sendIndex) - nHyperPlanes = ASet% nHyperPlanes(Angle) - ndoneZ = 0 HypPlanePtr => ASet% HypPlanePtr(Angle) + hplane1 = HypPlanePtr% hplane1(domID) + hplane2 = HypPlanePtr% hplane2(domID) + ndoneZ = HypPlanePtr% ndone(domID) - HyperPlaneLoop: do hyperPlane=1,nHyperPlanes + HyperPlaneLoop: do hyperPlane=hplane1,hplane2 nzones = HypPlanePtr% zonesInPlane(hyperPlane) #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(c0,cfp,ifp,cez,zone,zone0,nCorner,nCFaces) & -!$acc& private(aez,aez2,area_opp,source,sig,vol) & -!$acc& private(sigv,sigv2,sez,gnum,gden,psi_opp) & -!$acc& private(afp,denom) + !$acc loop vector collapse(2) & + !$acc& private(c0,cfp,ifp,cez,zone,zone0,nCorner,nCFaces) & + !$acc& private(aez,aez2,area_opp,source,sig,vol) & + !$acc& private(sigv,sigv2,sez,gnum,gden,psi_opp) & + !$acc& private(afp,denom) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, Geom, ASet, GSet, Angle, nzones, Groups) & -!$omp& shared(ndoneZ, tau) & -!$omp& private(c0,cfp,ifp,cez,zone,zone0,nCorner,nCFaces) & -!$omp& private(aez,aez2,area_opp,source,sig,vol) & -!$omp& private(sigv,sigv2,sez,gnum,gden,psi_opp) & -!$omp& private(afp,denom) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, Geom, ASet, GSet, Swp, Angle, nzones, Groups) & + !$omp& shared(ndoneZ, tau) & + !$omp& private(c0,cfp,ifp,cez,zone,zone0,nCorner,nCFaces) & + !$omp& private(aez,aez2,area_opp,source,sig,vol) & + !$omp& private(sigv,sigv2,sez,gnum,gden,psi_opp) & + !$omp& private(afp,denom) #endif ZoneLoop: do ii=1,nzones @@ -388,16 +401,14 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) zone = iabs( zone0 ) nCorner = Geom% numCorner(zone) c0 = Geom% cOffSet(zone) - - psi_opp = zero sig = GSet% Sigt(g,zone) ! Contributions from volume terms do c=1,nCorner source = GSet% STotal(g,c0+c) + tau*Set% Psi(g,c0+c,Angle) - Set% Q(g,c,ii) = source - Set% S(g,c,ii) = Geom% Volume(c0+c)*source + Swp% Q(g,c,ii) = source + Swp% S(g,c,ii) = Geom% Volume(c0+c)*source enddo CornerLoop: do c=1,nCorner @@ -414,7 +425,7 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) cfp = Geom% cFP(cface,c0+c) if ( afp < zero ) then - Set% S(g,c,ii) = Set% S(g,c,ii) - afp*Set% Psi1(g,cfp) + Swp% S(g,c,ii) = Swp% S(g,c,ii) - afp*Set% Psi1(g,cfp) endif enddo @@ -428,6 +439,7 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) if (aez > zero ) then area_opp = zero + psi_opp = zero ifp = mod(cface,nCFaces) + 1 afp = ASet% AfpNorm(ifp,c0+c) @@ -464,18 +476,18 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) gden = vol*( four*sigv*sigv2 + aez*(six*sigv2 + & two*aez*(two*sigv + aez)) ) - sez = ( vol*gnum*( sig*psi_opp - Set% Q(g,c,ii) ) + & - half*aez*gden*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) ) )/ & + sez = ( vol*gnum*( sig*psi_opp - Swp% Q(g,c,ii) ) + & + half*aez*gden*( Swp% Q(g,c,ii) - Swp% Q(g,cez,ii) ) )/ & ( gnum + gden*sig) - Set% S(g,c,ii) = Set% S(g,c,ii) + sez - Set% S(g,cez,ii) = Set% S(g,cez,ii) - sez + Swp% S(g,c,ii) = Swp% S(g,c,ii) + sez + Swp% S(g,cez,ii) = Swp% S(g,cez,ii) - sez else - sez = half*aez*( Set% Q(g,c,ii) - Set% Q(g,cez,ii) )/sig - Set% S(g,c,ii) = Set% S(g,c,ii) + sez - Set% S(g,cez,ii) = Set% S(g,cez,ii) - sez + sez = half*aez*( Swp% Q(g,c,ii) - Swp% Q(g,cez,ii) )/sig + Swp% S(g,c,ii) = Swp% S(g,c,ii) + sez + Swp% S(g,cez,ii) = Swp% S(g,cez,ii) - sez endif TestOppositeFace @@ -494,7 +506,7 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) ! Corner angular flux denom = ASet% ANormSum(c0+c) + sig*Geom% Volume(c0+c) - Set% Psi1(g,c0+c) = Set% S(g,c,ii)/denom + Set% Psi1(g,c0+c) = Swp% S(g,c,ii)/denom ! Calculate the contribution of this flux to the sources of ! downstream corners in this zone. The downstream corner index is @@ -507,7 +519,7 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) if (aez > zero) then cez = Geom% cEZ(cface,c0+c) - Set% S(g,cez,ii) = Set% S(g,cez,ii) + aez*Set% Psi1(g,c0+c) + Swp% S(g,cez,ii) = Swp% S(g,cez,ii) + aez*Set% Psi1(g,c0+c) endif enddo @@ -522,13 +534,13 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) if (aez > zero) then cez = Geom% cEZ(cface,c0+c) - Set% S(g,cez,ii) = Set% S(g,cez,ii) + aez*Set% Psi1(g,c0+c) + Swp% S(g,cez,ii) = Swp% S(g,cez,ii) + aez*Set% Psi1(g,c0+c) endif enddo enddo do c=1,nCorner - Set% Psi1(g,c0+c) = Set% S(g,c,ii)/(ASet% ANormSum(c0+c) + sig*Geom% Volume(c0+c)) + Set% Psi1(g,c0+c) = Swp% S(g,c,ii)/(ASet% ANormSum(c0+c) + sig*Geom% Volume(c0+c)) enddo endif @@ -537,104 +549,124 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo ZoneLoop #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif ndoneZ = ndoneZ + nzones enddo HyperPlaneLoop + enddo DomainLoop enddo SetLoop #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif ! Update Boundary data #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang & -!$acc& num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, ASet, BdyExitPtr, offSet, Angle, Groups, b, c) + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Angle, Groups, b, c) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nSets, Quad, sendIndex)&) -TOMPC(private(Set, ASet, BdyExitPtr, offSet, Angle, Groups, b, c)) + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nSets, Quad, sendIndex)&) + TOMPC(private(Set, ASet, BdyExitPtr, HypPlanePtr, offSet, Angle, Groups, b, c)) #endif - SetLoop3: do setID=1,nSets + SetLoop3: do setID=1,nSets - Set => Quad% SetDataPtr(setID) - ASet => Quad% AngSetPtr(Set% angleSetID) - Groups = Set% Groups - Angle = Set% AngleOrder(sendIndex) - offSet = ASet% cycleOffSet(angle) - BdyExitPtr => ASet% BdyExitPtr(Angle) + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + Groups = Set% Groups + Angle = Set% AngleOrder(sendIndex) + offSet = ASet% cycleOffSet(angle) + BdyExitPtr => ASet% BdyExitPtr(Angle) + HypPlanePtr => ASet% HypPlanePtr(angle) #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(b,c) + !$acc loop vector collapse(2) & + !$acc& private(b,c) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, BdyExitPtr, Groups, Angle) private(b,c) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, BdyExitPtr, Groups, Angle) private(b,c) #endif - do i=1,BdyExitPtr% nxBdy - do g=1,Groups - b = BdyExitPtr% bdyList(1,i) - c = BdyExitPtr% bdyList(2,i) + do i=1,BdyExitPtr% nxBdy + do g=1,Groups + b = BdyExitPtr% bdyList(1,i) + c = BdyExitPtr% bdyList(2,i) - Set% PsiB(g,b,Angle) = Set% Psi1(g,c) - enddo + Set% PsiB(g,b,Angle) = Set% Psi1(g,c) enddo + enddo #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif -! Update Psi in the cycle list +! Update Interface Elements #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) & -!$acc& private(c) + !$acc loop vector collapse(2) & + !$acc& private(c) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, Groups, angle) private(c) #endif - do mCycle=1,ASet% numCycles(angle) + + do i=1,HypPlanePtr% interfaceLen do g=1,Groups - c = ASet% cycleList(offSet+mCycle) - Set% cyclePsi(g,offSet+mCycle) = Set% Psi1(g,c) + c = HypPlanePtr% interfaceList(i) + Set% PsiInt(g,i,angle) = Set% Psi1(g,c) enddo enddo + #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif - enddo SetLoop3 +! Update Psi in the cycle list #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc loop vector collapse(2) & + !$acc& private(c) #else -TOMP(end target teams distribute) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Angle, Set, ASet, offSet, Groups) private(c) #endif + do mCycle=1,ASet% numCycles(angle) + do g=1,Groups + c = ASet% cycleList(offSet+mCycle) + Set% cyclePsi(g,offSet+mCycle) = Set% Psi1(g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + enddo SetLoop3 + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + ! We only store Psi if this is the last transport sweep in the time step if ( savePsi ) then #ifdef TETON_ENABLE_OPENACC -!$acc parallel loop gang & -!$acc& num_gangs(nSets) & -!$acc& vector_length(omp_device_team_thread_limit) & -!$acc& private(Set, Angle, Groups) + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, Angle, Groups) #else -TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nSets, Quad, sendIndex)&) -TOMP(private(Set, Angle, Groups)) + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nSets, Quad, sendIndex)&) + TOMP(private(Set, Angle, Groups)) #endif SetLoop2: do setID=1,nSets @@ -644,12 +676,11 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) Angle = Set% AngleOrder(sendIndex) #ifdef TETON_ENABLE_OPENACC -!$acc loop vector collapse(2) + !$acc loop vector collapse(2) #else -!$omp parallel do collapse(2) default(none) & -!$omp& shared(Set, ASet, Angle, Groups) + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, ASet, Angle, Groups) #endif - CornerLoop2: do c=1,Set% nCorner GroupLoop2: do g=1,Groups @@ -657,22 +688,21 @@ subroutine SweepUCBxyz_GPU(nSets, sendIndex, savePsi) enddo GroupLoop2 enddo CornerLoop2 - #ifndef TETON_ENABLE_OPENACC -!$omp end parallel do + !$omp end parallel do #endif enddo SetLoop2 #ifdef TETON_ENABLE_OPENACC -!$acc end parallel loop + !$acc end parallel loop #else -TOMP(end target teams distribute) + TOMP(end target teams distribute) #endif endif -TOMP(target exit data map(release: tau, sendIndex, angleList)) + TOMP_MAP(target exit data map(always,release: tau, sendIndex, angleList)) deallocate( angleList ) diff --git a/src/teton/gpu/UpdateScalarIntensity_OMPOL.F90 b/src/teton/gpu/UpdateScalarIntensity_OMPOL.F90 index 74c176b..291046b 100644 --- a/src/teton/gpu/UpdateScalarIntensity_OMPOL.F90 +++ b/src/teton/gpu/UpdateScalarIntensity_OMPOL.F90 @@ -10,7 +10,7 @@ ! * ! * !*********************************************************************** - subroutine ScalarIntensityDecompose_GPU(P) + subroutine ScalarIntensityDecompose_GPU use cmake_defines_mod, only : omp_device_team_thread_limit use kind_mod @@ -25,8 +25,6 @@ subroutine ScalarIntensityDecompose_GPU(P) ! Arguments - real(adqt), intent(inout) :: P(Size% ncornr) - ! Local integer :: zSetID @@ -49,16 +47,27 @@ subroutine ScalarIntensityDecompose_GPU(P) wtiso = Size% wtiso - TOMP(target enter data map(to: wtiso)) + TOMP_MAP(target enter data map(to: wtiso)) + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(cc,c0,nCorner,diagInv,t,v) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) - TOMPC(shared(nZoneSets, P, Geom, GTA, wtiso)&) + TOMPC(shared(nZoneSets, Geom, GTA, wtiso)&) TOMPC(private(cc,c0,nCorner,diagInv,t,v)) +#endif ZoneSetLoop: do zSetID=1,nZoneSets -!$omp parallel do default(none) & -!$omp& shared(P, Geom, GTA, wtiso, zSetID) & -!$omp& private(cc,c0,nCorner,diagInv,t,v) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(cc,c0,nCorner,diagInv,t,v) +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, GTA, wtiso, zSetID) & + !$omp& private(cc,c0,nCorner,diagInv,t,v) +#endif ZoneLoop: do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) @@ -68,9 +77,9 @@ subroutine ScalarIntensityDecompose_GPU(P) c0 = Geom% cOffSet(zone) do c=1,nCorner - P(c0+c) = GTA% PhiInc(c0+c) + GTA% Sscat(c0+c) = zero do cc=1,nCorner - P(c0+c) = P(c0+c) + GTA% TT(cc,c0+c)* & + GTA% Sscat(c0+c) = GTA% Sscat(c0+c) + GTA% TT(cc,c0+c)* & wtiso*GTA%GreySource(c0+cc) GTA% TT(cc,c0+c) = -wtiso*GTA%GreySigScat(c0+cc)* & GTA% TT(cc,c0+c) @@ -108,38 +117,20 @@ subroutine ScalarIntensityDecompose_GPU(P) enddo -! Solve Ly = S - - do k=2,nCorner - t = zero - do i=1,k-1 - t = t - GTA% TT(i,c0+k)*P(c0+i) - enddo - P(c0+k) = P(c0+k) + t - enddo - -! Solve Ux = y - - P(c0+nCorner) = P(c0+nCorner)/GTA% TT(nCorner,c0+nCorner) - - do k=nCorner-1,1,-1 - t = zero - - do i=k+1,nCorner - t = t + P(c0+i)*GTA% TT(i,c0+k) - enddo - - P(c0+k) = (P(c0+k) - t)/GTA% TT(k,c0+k) - enddo - enddo ZoneLoop - +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo ZoneSetLoop +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) - TOMP(target exit data map(release: wtiso)) +#endif + + TOMP_MAP(target exit data map(release: wtiso)) return end subroutine ScalarIntensityDecompose_GPU @@ -179,15 +170,25 @@ subroutine ScalarIntensitySolve_GPU(P) nZoneSets = getNumberOfZoneSets(Quad) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(c0,nCorner,t) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, P, Geom, GTA)&) TOMPC(private(c0,nCorner,t)) +#endif ZoneSetLoop: do zSetID=1,nZoneSets -!$omp parallel do default(none) & -!$omp& shared(P, Geom, GTA, zSetID) & -!$omp& private(c0,nCorner,t) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(c0,nCorner,t) +#else + !$omp parallel do default(none) & + !$omp& shared(P, Geom, GTA, zSetID) & + !$omp& private(c0,nCorner,t) +#endif ZoneLoop: do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) @@ -226,11 +227,17 @@ subroutine ScalarIntensitySolve_GPU(P) enddo ZoneLoop +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo ZoneSetLoop +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif return diff --git a/src/teton/gpu/finalizeGPUMemory.F90 b/src/teton/gpu/finalizeGPUMemory.F90 index ca21972..de27a23 100644 --- a/src/teton/gpu/finalizeGPUMemory.F90 +++ b/src/teton/gpu/finalizeGPUMemory.F90 @@ -15,6 +15,7 @@ subroutine finalizeGPUMemory(setID) use Size_mod use SetData_mod use OMPWrappers_mod + use Options_mod use MemoryAllocator_mod implicit none @@ -22,38 +23,57 @@ subroutine finalizeGPUMemory(setID) ! Arguments integer, intent(in) :: setID -! Locals - integer :: err_code +! Local + integer :: dom + integer :: nHyperDomains + integer :: sweepVersion + + nHyperDomains = getNumberOfHyperDomains(Quad,1) + sweepVersion = Options% getSweepVersion() ! Update Psi on the host before releasing it's memory on the device - TOMP(target update from(Quad% SetDataPtr(setID)% Psi) ) + TOMP_UPDATE(target update from(Quad% SetDataPtr(setID)% Psi) ) ! Delete the arrays if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% PsiM) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% PsiM)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% PsiM)) endif UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% Psi) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% Psi)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% Psi)) UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% Psi1) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% Psi1)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% Psi1)) UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% PsiB) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% PsiB)) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% PsiB)) - UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% Q) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% Q)) + UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% cyclePsi) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% cyclePsi)) - UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% S) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% S)) + UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% PsiInt) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% PsiInt)) + + if ( sweepVersion == 0 ) then + + do dom=1,nHyperDomains + UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% SweepPtr(dom)% Q) + TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% SweepPtr(dom)% Q)) + + UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% SweepPtr(dom)% S) + TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% SweepPtr(dom)% S)) + enddo + + endif + +! Both Cray and XLF throw an error on the following line PFN 02/07/2024 +! UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% SweepPtr) + TOMP_MAP(target exit data map(always,release:Quad% SetDataPtr(setID)% SweepPtr)) - UMPIRE_DEVICE_POOL_FREE(Quad% SetDataPtr(setID)% cyclePsi) - TOMP(target exit data map(always,release:Quad% SetDataPtr(setID)% cyclePsi)) return end subroutine finalizeGPUMemory diff --git a/src/teton/gpu/initializeGPUMemory.F90 b/src/teton/gpu/initializeGPUMemory.F90 index da985d2..00eafdd 100644 --- a/src/teton/gpu/initializeGPUMemory.F90 +++ b/src/teton/gpu/initializeGPUMemory.F90 @@ -15,45 +15,62 @@ subroutine initializeGPUMemory use Size_mod use SetData_mod use OMPWrappers_mod + use Options_mod use MemoryAllocator_mod implicit none ! Local - integer :: setID - integer :: nSets - integer :: err_code + integer :: setID + integer :: dom + integer :: nSets + integer :: nHyperDomains + integer :: sweepVersion ! Constants - nSets = getNumberOfSets(Quad) + nSets = getNumberOfSets(Quad) + nHyperDomains = getNumberOfHyperDomains(Quad,1) + sweepVersion = Options% getSweepVersion() do setID=1,nSets if (Size% ndim == 2) then UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID) % PsiM) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID) % PsiM)) - + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID) % PsiM)) endif UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% Psi) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% Psi)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% Psi)) UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% Psi1) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% Psi1)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% Psi1)) UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% PsiB) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% PsiB)) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% PsiB)) + + UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% cyclePsi) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% cyclePsi)) - UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% Q) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% Q)) + UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% PsiInt) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% PsiInt)) - UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% S) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% S)) +! Both Cray and XLF throw an error on the following line PFN 02/07/2024 +! UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% SweepPtr) + TOMP_MAP(target enter data map(always,to:Quad% SetDataPtr(setID)% SweepPtr)) - UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% cyclePsi) - TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% cyclePsi)) + if ( sweepVersion == 0 ) then + + do dom=1,nHyperDomains + UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% SweepPtr(dom)% Q) + TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% SweepPtr(dom)% Q)) + + UMPIRE_DEVICE_POOL_ALLOC(Quad% SetDataPtr(setID)% SweepPtr(dom)% S) + TOMP(target enter data map(always,to:Quad% SetDataPtr(setID)% SweepPtr(dom)% S)) + enddo + + endif enddo diff --git a/src/teton/include/TetonBlueprint.hh b/src/teton/include/TetonBlueprint.hh index 1ca247c..c27001c 100644 --- a/src/teton/include/TetonBlueprint.hh +++ b/src/teton/include/TetonBlueprint.hh @@ -55,6 +55,15 @@ class TetonBlueprint ------------------------------------------------------------------------------------------------------------------ */ void ProcessSurfaceEdits(int rank); + /* ------------------------------------------------------------------------------------------------------------------ + Gets local zone face IDs, and global corner IDs for a single surface + ------------------------------------------------------------------------------------------------------------------ */ + void GetSurfaceEditZoneFacesAndCorners(int rank, + const conduit::Node &surf_face_topo, + const std::map, int> &verts_face_map, + std::vector &surf_edits_loczonefaces, + std::vector &surf_edits_corners) const; + /* ------------------------------------------------------------------------------------------------------------------ This builds several arrays containing the boundary condition information for each face. The non-shared boundary info is pulled from the face_attribute field generated by @@ -153,10 +162,6 @@ class TetonBlueprint //TODO: Investigate if some of these can be made local variables and passed between methods so they are freed when // the method exits. - conduit::int32 *m_zone_to_faces; - conduit::int32 *m_zone_to_corners; - conduit::int32 *m_face_to_zones; - std::map, int> zoneface_to_halfface; std::map, int> zoneface_to_lface; diff --git a/src/teton/include/TetonConduitInterface.hh b/src/teton/include/TetonConduitInterface.hh index 754ac4e..e6f6dc4 100644 --- a/src/teton/include/TetonConduitInterface.hh +++ b/src/teton/include/TetonConduitInterface.hh @@ -15,6 +15,7 @@ #include "TetonSources.hh" #include "conduit/conduit.hpp" +#include #include #include @@ -22,17 +23,367 @@ namespace Teton { class Teton { + // --------------------------------------------------------------------------- + // Some internal constants. + // --------------------------------------------------------------------------- + static const std::string PREFIX; + static const std::string MCARRAY_PREFIX; + static const std::string PARTITION_FIELD; + static const std::string PARTITION_FIELD_BOUNDARY; + public: - Teton() : areSourceProfilesSet(false), mIsInitialized(false) + Teton(); + + ~Teton(); + + conduit::Node &getMeshBlueprint() + { + return getDatastore()["blueprint"]; + } + const conduit::Node &getMeshBlueprint() const { + return getDatastore()["blueprint"]; } - // Teton destructor. Important note: This function requires MPI to still be - // active ( not finalized ), as it uses MPI when tearing down Teton. - ~Teton(); + conduit::Node &getDatastore(); + const conduit::Node &getDatastore() const; + + conduit::Node &getOptions() + { + return getDatastore()["options"]; + } + const conduit::Node &getOptions() const + { + return getDatastore()["options"]; + } + + /*! + * \brief Returns a node that contains the partitioned mesh. If partitioning + * is not being done then the blueprint node is returned. + * + * \return A Conduit node that contains the partitioned mesh, or the blueprint + * mesh if no partitioning is being done. + */ + conduit::Node &getMeshBlueprintPart(); + const conduit::Node &getMeshBlueprintPart() const; + + /*! + * \brief Get whether verbose output is selected. + * + * \return An integer that indicates the verbosity level. + * + * \note The value is retrieved from the options, if the value exists or from + * the environment. The environment can override the options. + */ + int getVerbose() const; void initialize(MPI_Comm communicator, bool fromRestart = false); + // Advance a radiation step, returns dt recommended by Teton for the next time step + double step(int cycle); + + // Requires the mesh blueprint node to have 'state/cycle' populated. + void dump(MPI_Comm communicator, std::string path = "."); + + void setTimeStep(int cycle, double dtrad, double timerad); + + /*! + * \brief Update the mesh positions for Teton, possibly repartitioning. + */ + void updateMeshPositions(); + + //------------------------------------------------------------------------ + // Result-getting functions + //------------------------------------------------------------------------ + + /*! + \brief Get the radiation temperature for a zone. + + \param zone A 1-origin zone id valid for the blueprint mesh. + + \note This value is obtained from the blueprint mesh's radiation_energy_density + field, which must exist in order to return a valid value. + + \return The radiation temperature. + */ + double getRadiationTemperature(int zone) const; + + /*! + \brief Get the radiation deposited for a zone. + + \param zone A 1-origin zone id valid for the blueprint mesh. + + \note This value is obtained from the blueprint mesh's electron_energy_deposited + field, which must exist in order to return a valid value. + + \return The radiation deposited. + */ + double getRadiationDeposited(int zone) const; + + /*! + \brief Get the material temperature for a zone. + + \param zone A 1-origin zone id valid for the blueprint mesh. + + \return The material temperature. + */ + double getMaterialTemperature(int zone) const; + + // TODO: remove these once all host codes swich to getting the + // force density fields from the conduit node + void getRadiationForceDensity1D(double *RadiationForceDensityX); + + /*! + \brief This method copies the radiation_force_{x,y,x} or {z,r} fields into + the supplied data arrays. The arrays should be sized the same as + the radiation_force_z field. + + \param[out] RadiationForceDensityX The array to hold x (or z data in 2D). + \param[out] RadiationForceDensityY The array to hold y (or r data in 2D). + \param[out] RadiationForceDensityZ The array to hold z data. + + \note This method simply returns the radiation force components in the + supplied output arrays. This same thing can be achieved using the + radiation_force_* fields from blueprint directly. + + \note remove once host codes swich to getting the data from Conduit node. + */ + void getRadiationForceDensity(double *RadiationForceDensityX, + double *RadiationForceDensityY, + double *RadiationForceDensityZ); + + /*! + \brief This is used for the post-ALE step of rescaling psi + based on the remapped radiation energy density. + Here the array rad_energy_density needs to be sized + to (ngroups * nzones) before being passed. + + \note The underlying teton_reconstructpsi function does bookkeeping + that involves sets and volume of the geometry it knows about. + It also sets Mat%trz for each zone. This says that this function + should be called on the partitioned mesh. + + However, the host code will be passing a rad_energy_density that + is valid for the blueprint mesh and not the partitiond mesh. This + suggests another partition step internally to send the field to + the partition mesh. + + The returned rad_energy is valid for the partitioned mesh. That + computation can't be done on the blueprint mesh without knowing + the zone volume. It might not matter though since the host code + appears to reduce(sum) over all ranks. + + \param[out] rad_energy Total rad energy. + \param[in] rad_energy_density Array sized double[ngroups][nzones] that contains + the radiation energy density (nzones varies fastest). + */ + void reconstructPsi(double *rad_energy, const double *rad_energy_density); + + /*! + \brief This is used to update the angular intensity to be consistent with + changes in the corner volumes of the mesh from the Lagrange motion. + That is, psi is rescale so that the total radiation energy in the + zone remains constant. + */ + void reconstructPsiFromdV(); + + /*! + \brief Copies the zonal psi values into the \a psi array. The psi array + should be sized double[nAngles][ngroups][nzones] (nzones varying fastest). + If the Teton operated on a partitiond mesh, this involves mapping the + partitioned psi back onto the blueprint mesh. + + \param[in] numAngles The number of angles. This must match what Teton was given previously. + \param[out] psi The zonal psi array that will contain the values. + + \note Ideally we would not pass numAngles as Teton should already know it. + */ + void getZonalPsi(int numAngles, double *psi); + + /*! + * \brief Tell Teton to compute the radiation flux and make it available to retrieve + * via getRadiationFlux. + * + * \note Makes a bunch of Conduit fields to store the different groups of the radiation + * flux and migrates these fields back to the original mesh where they are left + * as separate scalar fields, rather than being reassembled. + */ + void setRadiationFlux(); + + /*! + * \brief Reads values from Teton into the supplied array. The values are obtained + * from teton_getradiationflux. + * + * \param zone A 1-origin zone index. + * \param[out] zflux Array sized double[ngroups][ndims] + * + * \note The Conduit scalars that were produced in setRadiationFlux are sampled for + * the specified zone and data are stored into zflux. + */ + void getRadiationFlux(int zone, double *zflux) const; + + /*! + * \brief Get information about the Teton run. + * + * \param[out] noutrt, number of thermal [outer] iterations in this cycle + * \param[out] ninrt number of inner [transport] iterations in this cycle + * \param[out] ngdart number of grey diffusion iterations + * \param[out] nNLIters number of nonlinear iterations + * \param[out] maxNLIters maximum nonlinear iterations + * \param[out] TrMaxZone zone id with maximum T rad + * \param[out] TeMaxZone zone id with maximum electron temperature + * \param[out] TrMaxProcess MPI process with maximum T rad + * \param[out] TeMaxProcess MPI process with maximum electron temperature + * \param[out] dtused Teton returns the time step used in cycle just completed + * \param[out] dtrad radiation vote for next time step + * \param[out] TrMax T rad maximum value + * \param[out] TeMax T electron maximum value + * \param[out] EnergyRadiation energy contained in the radiation field + * \param[out] PowerIncident power of photons incident + * \param[out] PowerEscape power of photons escaping + * \param[out] PowerAbsorbed power of energy absorbed + * \param[out] PowerEmitted power of photons emitted + * \param[out] PowerExtSources power of photons from fixed volumetric sources + * \param[out] PowerCompton power of Compton scattering photons + * \param[out] EnergyCheck energy not accounted for this cycle. + */ + void getEdits(int &noutrt, + int &ninrt, + int &ngdart, + int &nNLIters, + int &maxNLIters, + int &TrMaxZone, + int &TeMaxZone, + int &TrMaxProcess, + int &TeMaxProcess, + double &dtused, + double &dtrad, + double &TrMax, + double &TeMax, + double &EnergyRadiation, + double &PowerIncident, + double &PowerEscape, + double &PowerAbsorbed, + double &PowerEmitted, + double &PowerExtSources, + double &PowerCompton, + double &EnergyCheck) const; + + /*! + * \brief Get iteration/dtcontrol values, taking into account possible partitioning. + * + * \param[out] flag A reason indicating a change in time step. + * \param[out] process The MPI rank of the process that is causing time step change. + * \param[out] zone The 1-origin zone number that is causing time step change. + * \param[out] message An informative message for the time step change. + */ + void getDtControls(int &flag, int &process, int &zone, std::string &message) const; + + void dumpTallyToJson() const; + + void setSourceProfiles(); + + // Do not call this unless setSourceProfiles has been set: + void resetSourceProfiles(); + + // NOTE: These functions return information for the partitioned corner mesh. + // Codes that would be calling these expect to deal with the blueprint + // mesh. Can we eliminate the reason for why a code would need to call + // these? These methods are not compatible with partitioning. + int *getCornerToVertexArray() + { + return &mCornerToVertex[0]; + } + int *getZoneToNCornersArray() + { + return &mZoneToNCorners[0]; + } + int *getZoneToCornersArray() + { + return &mZoneToCorners[0]; + } + + // --------------------------------------------------------------------------- + // Functions pertaining to checkpoints/restarts + // --------------------------------------------------------------------------- + conduit::Node &getCheckpoint(); + void checkpointPrepareForLoad(); + void checkpointPrepareForSave(); + void checkpointDataLoaded(); + void checkpointExternalDataLoaded(); + void checkpointFinished(); + + // --------------------------------------------------------------------------- + // Some relevant field and topo names. + // --------------------------------------------------------------------------- + static const std::string FIELD_ELECTRON_ENERGY_DEPOSITED; + static const std::string FIELD_RADIATION_ENERGY_DENSITY; + static const std::string FIELD_RADIATION_TEMPERATURE; + static const std::string FIELD_RADIATION_FORCE_X; + static const std::string FIELD_RADIATION_FORCE_Y; + static const std::string FIELD_RADIATION_FORCE_Z; + static const std::string FIELD_RADIATION_FORCE_R; + static const std::string FIELD_CORNER_VOLUME_SUMS; + static const std::string FIELD_RADIATION_FLUX_X; + static const std::string FIELD_RADIATION_FLUX_Y; + static const std::string FIELD_RADIATION_FLUX_Z; + static const std::string FIELD_RADIATION_FLUX_R; + static const std::string FIELD_MATERIAL_TEMPERATURE; + static const std::string TOPO_MAIN; + static const std::string TOPO_BOUNDARY; + + private: + // --------------------------------------------------------------------------- + // Internal helpers + // --------------------------------------------------------------------------- + + /*! + * \brief Get the main topology. + + * \param root The node through which we'll get the main topology. + + * \return A reference to the main topology. + */ + conduit::Node &getMainTopology(conduit::Node &root); + const conduit::Node &getMainTopology(const conduit::Node &root) const; + + /*! + \brief Creates a new zonal field it does not exist. + + \param root The root node into which we'll create the field. + \param topoName The name of the topology. + \param fieldName The name of the field to make. + \param nzones The number of zones. + */ + void createZonalField(conduit::Node &root, const std::string &topoName, const std::string &fieldName, int nzones); + + /*! + \brief Reads values from Teton into the supplied array. The values are obtained + from teton_getradiationdeposited. + + \param[out] RadEnergyDeposited The destination array that holds nzones values. + \param nzones The number of zones. + */ + void getRadEnergyDeposited(double *RadEnergyDeposited, int nzones) const; + + /*! + \brief Reads values from Teton into the supplied array. The values are obtained + from teton_getradiationtemperature. + + \param[out] RadTemp The destination array that holds nzones values. + \param nzones The number of zones. + */ + void getRadiationTemperature(double *RadTemp, int nzones) const; + + /*! + \brief Reads values from Teton into the supplied array. The values are obtained + from teton_getmaterialtemperature. + + \param[out] MatTemp The destination array that holds nzones values. + \param nzones The number of zones. + */ + void getMaterialTemperature(double *MatTemp, int nzones) const; + // This stores the needed mesh data needed to compute // forces on the vertices void storeMeshData(); @@ -53,16 +404,17 @@ class Teton void constructEdits(); + /*! + * \brief Constructs surface flux tallies. Results are output into fields that + * are present on the tally surfaces in the blueprint node (even when + * partitioning is enabled). + */ void computeGenericSurfaceFluxTally(); - void dumpTallyToJson() const; void constructSize(); void constructMemoryAllocator(); - // Advance a radiation step, returns dt recommended by Teton for the next time step - double step(int cycle); - void constructQuadrature(); // set Teton node positions @@ -74,10 +426,6 @@ class Teton // communication void setCommunication(); - void setSourceProfiles(); - // Do not call this unless setSourceProfiles has been set: - void resetSourceProfiles(); - // set up the mesh connectivity arrays used in Teton. void setMeshConnectivity(); @@ -93,97 +441,268 @@ class Teton // a member function to update opacity void updateOpacity(); + void SumSharedNodalValues(conduit::Node &root, double *nodal_field); + // updates the radiation force if the fields // "radiation_force_r" (dim == 2) or "radiation_force_x" // (dim == 3) fields are present in the conduit blueprint node void updateRadiationForce(); void updateZonalRadiationForce(); - // Requires the mesh blueprint node to have 'state/cycle' populated. - void dump(MPI_Comm communicator, std::string path = "."); - double *getCornerTemperature(); - double getMaterialTemperature(int zone); - - double getRadiationTemperature(int zone); - - double getRadiationDeposited(int zone); - - void setTimeStep(int cycle, double dtrad, double timerad); - // This updates Teton's zone vertex coordinates based on - // changes to the mesh nodes (from hydro) - void updateMeshPositions(); - - // TODO: remove these once all host codes swich to getting the - // force density fields from the conduit node - void getRadiationForceDensity1D(double *RadiationForceDensityX); - void getRadiationForceDensity(double *RadiationForceDensityX, - double *RadiationForceDensityY, - double *RadiationForceDensityZ); - - // Updates the field "fields/rad_energy_deposited/values" in the - // blueprint conduit node if it exists - void getRadEnergyDeposited(double *RadEnergyDeposited); void updateRadEnergyDeposited(); - int *getCornerToVertexArray() - { - return &mCornerToVertex[0]; - } - int *getZoneToNCornersArray() - { - return &mZoneToNCorners[0]; - } - int *getZoneToCornersArray() - { - return &mZoneToCorners[0]; - } - // This is used for the post-ALE step of rescaling psi - // based on the remapped radiation energy density. - // Here the array rad_energy_density needs to be sized - // to (ngrousps * nzones) before being passed - void reconstructPsi(double *rad_energy, double *rad_energy_density); - // This is used to update the angular intensity - // to be consistent with changes in the corner volumes - // of the mesh from the Lagrange motion. That is, - // psi is rescale so that the total radiation energy in - // the zone remains constant. - void reconstructPsiFromdV(); - - conduit::Node &getMeshBlueprint() - { - return getDatastore()["blueprint"]; - } - conduit::Node &getDatastore(); - conduit::Node &getOptions() - { - return getDatastore()["options"]; - } - // Const version of above functions: - const conduit::Node &getMeshBlueprint() const - { - return getDatastore()["blueprint"]; - } - const conduit::Node &getDatastore() const; - const conduit::Node &getOptions() const - { - return getDatastore()["options"]; - } - // --------------------------------------------------------------------------- - // Functions pertaining to checkpoints/restarts + // Internal functions pertaining to partitioning // --------------------------------------------------------------------------- - conduit::Node &getCheckpoint(); - void checkpointPrepareForLoad(); - void checkpointPrepareForSave(); - void checkpointDataLoaded(); - void checkpointExternalDataLoaded(); - void checkpointFinished(); - double mDTrad; + /*! + * \brief Consult the options and return whether partitioning is being done. + * + * \return True of partitioning is enabled; False otherwise. + */ + bool doPartitioning() const; + + /*! + * \brief Creates a mapping between boundary elements and main topology elements + * and then uses the partition field on the main topology to generate new + * partition fields for the supplied topology names. + * + * \param mesh The input mesh. + * \param topoNames A vector of topology names. + * + * \return A vector containing new partition field names. + */ + std::vector createPartitionFields(conduit::Node &mesh, const std::vector &topoNames); + + /*! + * \brief Take a partitioned secondary mesh and combine it into the partitioned + * mesh node, rewriting its connectivity so it shares coordinates with + * the partmesh instead of having its own coordset. + * + * \param partmesh The node that contains the partitioned volume mesh. + * \param topoName The name of the volume topology. + * \param secondPartmesh The partitioned secondary mesh. + * \param secondTopoName The name of the secondary topology. + * + * \note We assume that the number of domains in partmesh and bpartmesh is + * the same and the domains ids are in the same order. + */ + void assimilateTopology(conduit::Node &partmesh, + const std::string &topoName, + conduit::Node &secondPartmesh, + const std::string &secondTopoName); + + /*! + * \brief Returns whether the field (assumed to be size ngroup*nzones) needs + * interleaving or not to be represented as a Blueprint mcarray. An interleaved + * field would be sized array[nzones][ngroup] whereas an array sized + * array[ngroup][nzones] would not need interleaving. + * + * \param fieldName The name of the field we're checking. + * + * \return True if the field needs interleaving, false otherwise. + */ + bool doInterleave(const std::string &fieldName) const; + + /*! + * \brief Scan through fields in the source mesh and identify those + * which have too many elements (multiple of numGroups). Create + * alternate mcarray zero-copy representations for those fields so + * they can pass through the partitioner. + * + * \param root The root, which is usually the blueprint mesh. When mapping back, + * we can pass the part mesh. + */ + void add_mcarray_fields(conduit::Node &root); + + /*! + * \brief Remove any mcarray fields from the root node. + * + * \param root The root, which is usually the blueprint mesh. When mapping back, + * we can pass the part mesh. + */ + void remove_mcarray_fields(conduit::Node &root); + + /*! + * \brief Fetch a conduit mcarray field node given the original name of the field. + * If \a fieldName is not an mcarray created by the Teton interface, return + * the regular field node. If the node is not located, a Conduit exception + * is thrown. + * + * \param root A top-level node like the blueprint or partition node that contains + * a "fields" node. + * \param fieldName The name of the field that may have been converted to mcarray. + + * \return The field node that contains the mcarray data. + */ + const conduit::Node &fetch_mcarray(const conduit::Node &root, const std::string &fieldName) const; + + /*! + * \brief Takes the mesh from the blueprint node and partitions it and stores + * the partitiond mesh in the blueprint_partitioned node. + * + * In our scheme, the host code will use the getMeshBlueprint() method to store + * the mesh on which Teton will operate. That node will optionally go through + * the partition() method to repartition the mesh. Various methods from then on + * will access getMeshBlueprint() when they want to access the original mesh + * from the host code. The getMeshBlueprintPart() method is used when the + * partitioned mesh is desired. + * + * \param fromRestart Whether we're initializing from a restart, in which case, + * the partitioning should be redone. + */ + void partition(bool fromRestart); + + /*! + * \brief Return the names of the topologies that share a coordset with main. + * This will include the boundary topo and possibly some surface tally + * topos. + * + * \param mesh The top-level node that contains the "topologies" node. + * \return A vector containing topology names that share a coordset with main. + */ + std::vector getPartitionTopologies(const conduit::Node &mesh) const; + + /*! + * \brief Propagate fields from the blueprint mesh supplied by the host to the + * partitioned mesh. + * + * \param topoName The topology that owns the fields. + * \param fieldNames The names of the fields to repartition and put on the + * partitiond mesh. + * \param updateCoords Whether to update the coordinates on the partitioned mesh. + */ + void sendFieldsOrig2Part(const std::string &topoName, const std::vector &fieldNames, bool updateCoords); + + /*! + * \brief Propagate fields from the partitioned mesh back to the mesh + * supplied by the host to the partitioned mesh. + * + * \param topoName The topology that owns the fields. + * \param fieldNames The names of the fields to repartition and put on the + * partitiond mesh. + */ + void sendFieldsPart2Orig(const std::string &topoName, const std::vector &fieldNames); + + /*! + * \brief Optionally repartition the coordinates and then update the mesh + * positions for Teton from the part mesh. + * + * \param doPartition True if we want to do partitioning (if enabled) + */ + void updateMeshPositions(bool doPartition); + + /*! + * \brief Clean up for partitioning. This removes extra fields that were added, etc. + */ + void partitionCleanup(); + + /*! + * \brief Return the names of the radiation force density fields, taking into + * account the mesh dimension. + * + * \return A vector of field names. + */ + const std::vector &radiationForceDensityFields() const; + + /*! + * \brief Initialize the radiation force density field names. We store them in mRadiationForceDensityFields. + * + */ + void initializeRadiationForceDensityFieldNames(); + + /*! + * \brief Return a vector of double pointers that correspond do the radiation force density fields. + * + * \param root The root node in which to search for the fields. + * + * \return A vector of double pointers for the fields. + */ + std::vector radiationForceDensity(conduit::Node &root) const; + + /*! + * \brief Create the radiation force density fields on the blueprint mesh if they + * do not exist. The fields need to be there in order for getRadiationForceDensity + * to work. + * + * \param root The root node where the fields will be created if they do not exist. + * \param elementAssociation Pass true to make element-associated fields or false to make + * veretx-associated fields. + */ + void createRadiationForceDensity(conduit::Node &root, bool association); + + /*! + * \brief Create the return radiation temperature field on the blueprint mesh, + * if it does not exist. + */ + void createRadiationTemperature(); + + /*! + * \brief Create the return material temperature field on the blueprint mesh, + * if it does not exist. + */ + void createMaterialTemperature(); + + /*! + * \brief Initialize the radiation flux field names. We store them in mRadiationFluxFields. + */ + void initializeRadiationFluxFieldNames(); + + /*! + * \brief Return the radiation flux field names for the mesh dimension. + * + * \return A vector of field names that contain the radiation flux. + */ + const std::vector &getRadiationFluxFields() const; + /*! + * \brief Given a rank and zone id, for the original Teton mesh, return the + * rank and zone id in the partitioned mesh, if it exists. If the provided + * domain and rank do not exist then return {-1,-1} in the output domain, zone. + * + * \param originalDomZone Contains the domain and zone number we're looking for + * in the partitioned mesh. Component [0] contains a 0-origin + * MPI rank number. Component [1] contains a 1-origin Teton + * zone id. + * \param[out] partDomZone The rank and zone id of the zone in the partitioned mesh. + * Component [0] contains a 0-origin MPI rank number. + * Component [1] contains a 1-origin Teton zone id. + */ + void zoneLookupOrig2Part(int originalDomZone[2], int partDomZone[2]) const; + + /// Flags that are useful for the test() method. + enum + { + Test_RadiationForceDensity = 1, + Test_ZonalPsi = 2, + Test_MaterialTemperature = 4, + Test_RadiationTemperature = 8, + Test_RadiationDeposited = 16, + Test_ReconstructPsi = 32 + }; + + /*! + * \brief Capture the current Blueprint state so we can use it to test + * against a baseline. + * + * \param[out] n The node that holds the output node. + * \param datastore The Teton datastore. + * \param bp The Conduit node that contains the Teton mesh. + * \param options The Conduit node that contains the Teton options. + * \param flags A set of flags (or-ed together) to activate various tests. + * + * \return The name of the file to use for current or baseline. + */ + std::string makeTestNode(conduit::Node &n, + conduit::Node &datastore, + conduit::Node &bp, + conduit::Node &options, + int flags); private: + double mDTrad; + bool areSourceProfilesSet; // Whether or not setSourceProfiles has been called bool mIsInitialized; // Whether or not Teton::initialize has been called @@ -207,6 +726,11 @@ class Teton std::vector mZoneToNCorners; std::vector mZoneToCorners; std::vector mCornerToZone; + + std::vector mMapBackFields; //!< Vector of field names to map back during partitioning. + std::map mMCArrays; //!< Map of field names to mcarray names + std::vector mRadiationForceDensityFields; //!< Vector of field names for radiation force density + std::vector mRadiationFluxFields; //!< Vector of field names for radiation flux }; } //end namespace Teton diff --git a/src/teton/include/TetonInterface.hh b/src/teton/include/TetonInterface.hh index e8fe290..e4710ce 100644 --- a/src/teton/include/TetonInterface.hh +++ b/src/teton/include/TetonInterface.hh @@ -388,6 +388,12 @@ void teton_setverbose(const int *verbose); // 1 - corner sweep (improved parallelism) void teton_setsweepversion(const int *sweepversion); +// Set number of hyper-domains, for sweep and/or new GTA. +// 0 - Let Teton automatically decide. (default) +// >= 1 - User sets numer of hyper-domains to this number. +void teton_setsweepnumhyperdomains(const int *hyperdomains); +void teton_setgtanumhyperdomains(const int *hyperdomains); + // construct Teton memory allocator. void teton_constructmemoryallocator( int * @@ -396,6 +402,9 @@ void teton_constructmemoryallocator( int *umpire_device_pool_allocator_id); // scalar, umpire allocator to use for memory allocations on the accelerator // device (gpu). value < 0 = use native OpenMP or CUDA for memory allocation. +// destruct the Teton memory allocator +void teton_destructmemoryallocator(); + // destroy mesh data in case it has changed during the simulation void teton_destructmeshdata(const bool *nonLTE); diff --git a/src/teton/include/TetonNDAccessor.hh b/src/teton/include/TetonNDAccessor.hh new file mode 100644 index 0000000..ad209ed --- /dev/null +++ b/src/teton/include/TetonNDAccessor.hh @@ -0,0 +1,319 @@ +//--------------------------------------------------------------------------// +// TetonNDAccessor.hh +// +// This class provides an accessor for multidimensional data that can be +// expressed as mcarrays in Blueprint. +// +//--------------------------------------------------------------------------// + +#ifndef __TETON_NDACCESSOR_HH__ +#define __TETON_NDACCESSOR_HH__ + +#include "conduit/conduit_node.hpp" +#include +#include + +namespace Teton +{ + +namespace utilities +{ + +// Store a named dimension +struct NDDimension +{ + std::string name; + conduit::index_t size; +}; + +/** + \brief Iterate over dimensions such that each mcarray component is visited once. + \param dims The dimensions. + \param func A function to call on each relevant index. + */ +template void iterate_dimensions(const std::vector &dims, Functor &&func) +{ + std::vector idx(dims.size(), 0); + + if (dims.size() == 1) + { + func(0, idx); + } + else if (dims.size() == 2) + { + for (conduit::index_t i = 0; i < dims[1].size; i++) + { + idx[1] = i; + func(i, idx); + } + } + else if (dims.size() == 3) + { + conduit::index_t c = 0; + for (conduit::index_t j = 0; j < dims[2].size; j++) + { + for (conduit::index_t i = 0; i < dims[1].size; i++, c++) + { + idx[1] = i; + idx[2] = j; + func(c, idx); + } + } + } +} + +/** + \brief This class creates mcarrays for multidimensional data. + */ +class NDAccessor +{ + public: + /** + \brief Constructor + \param v The "values" node for the Blueprint field. + \param d Dimension information. + \param inter Whether the data are interleaved (x0,y0,z0,x1,y1,z1,...) + */ + NDAccessor(conduit::Node &v, const NDDimension &d, bool inter = false) : values(v), dims(), interleave(inter) + { + dims.push_back(d); + } + + /** + \brief Constructor + \param v The "values" node for the Blueprint field. + \param d A vector of dimension information. + \param inter Whether the data are interleaved (x0,y0,z0,x1,y1,z1,...) + */ + NDAccessor(conduit::Node &v, const std::vector &d, bool inter = false) + : values(v), + dims(d), + interleave(inter) + { + } + + /** + \brief Constructor + \param v The "values" node for the Blueprint field. + \param d An initialization vector of dimension information. + \param inter Whether the data are interleaved (x0,y0,z0,x1,y1,z1,...) + */ + NDAccessor(conduit::Node &v, const std::initializer_list &d, bool inter = false) + : values(v), + dims(d), + interleave(inter) + { + } + + /** + \brief Return the number of dimensions. + \return The number of dimensions. + */ + size_t number_of_dimensions() const + { + return dims.size(); + } + + /** + \brief Return the total number of elements for all components. + \return The number of elements for all components. + */ + conduit::index_t total_number_of_elements() const + { + conduit::index_t sz = 1; + for (const auto &d : dims) + sz *= static_cast(d.size); + return sz; + } + + /** + \brief Return the number of array elements in the data's fastest dimension. + \return The number of array elements. + */ + conduit::index_t number_of_elements() const + { + return (values.number_of_children() > 0) ? values[0].dtype().number_of_elements() + : values.dtype().number_of_elements(); + } + + /** + \brief Associate a data pointer with the Conduit node(s). If there are multiple + conventions or dimensions then mcarray components will be created. + + \param data A pointer to the buffer that contains the data. + */ + template void set_external(const T *data) + { + // TODO: This could be more general to handle more dimensions. We only need up to 3D for Teton. + auto data_ptr = const_cast(data); + constexpr auto elemSizeBytes = sizeof(T); + conduit::index_t size = dims[0].size; + if (dims.size() == 1) + { + values.set_external(data_ptr, size); + } + else + { + // Compute stride for the last dimensions for the interleaved case. + // We intentionally omit the first dimension. + conduit::index_t stride = 1; + for (size_t i = 1; i < dims.size(); i++) + stride *= dims[i].size; + + const auto _this = this; + iterate_dimensions(dims, + [&](conduit::index_t component, const std::vector &idx) + { + conduit::Node &comp = values[_this->index_to_name(idx)]; + if (interleave) + { + // Data are interleaved [comp0][comp1]...[comp0][comp1]... + comp.set_external(data_ptr, size, component * elemSizeBytes, stride * elemSizeBytes); + } + else + { + // Data are contiguous [all comp 0][all comp 1]... + comp.set_external(data_ptr, size, component * (size * elemSizeBytes)); + } + }); + } + } + + /** + \brief Return the data for the specified array index. + + \param idx The index of the element whose data will be returned. + + \return The data at the index \a idx. + */ + double operator()(const std::vector &idx) const + { + double retval{}; +#ifndef _NDEBUG + if (idx.size() != dims.size()) + { + throw std::invalid_argument("idx,dim size mismatch"); + } +#endif + if (values.number_of_children() > 0) + { + // Fetch the right mcarray component and get the right value out. + // TODO: this node lookup is a potential performance issue. + conduit::Node &comp = values.fetch_existing(index_to_name(idx)); + auto acc = comp.as_double_accessor(); + retval = acc[idx[0]]; + } + else + { + // All data are in a single array. Make an index. + auto size_factor = [&](int dim) + { + conduit::index_t s = 1; + for (int i = 0; i < dim; i++) + s *= dims[i].size; + return s; + }; + conduit::index_t index = 0; + if (interleave) + { + // 2D + index = idx[0] * dims[1].size + idx[1]; + } + else + { + for (size_t i = 0; i < idx.size(); i++) + index += idx[i] * size_factor(i); + } + auto acc = values.as_double_accessor(); +#ifndef _NDEBUG + if (index >= acc.number_of_elements()) + { + std::stringstream ss; + ss << "Out of bounds index " << index << " in " << values.path() + << ". number_of_elements=" << acc.number_of_elements() << " idx={"; + for (const auto ival : idx) + ss << ival << ", "; + ss << "}\n"; + throw std::range_error(ss.str()); + } +#endif + retval = acc[index]; + } + return retval; + } + + /** + \brief Turn the data, however it is organized, into a linear array in \a dest. + Data are copied so all values from component 0 are contiguous, followed + by component 1, and so on (double[ncomp][nelem]). + + \param[out] The destination array that contains the contiguous data. + */ + void to_contiguous(double *dest) const + { + if (values.number_of_children() > 0) + { + double *dptr = dest; + const auto _this = this; + iterate_dimensions(dims, + [&](conduit::index_t /*c*/, const std::vector &idx) + { + const conduit::Node &comp = values.fetch_existing(_this->index_to_name(idx)); + auto acc = comp.as_double_accessor(); + auto n = acc.number_of_elements(); + for (conduit::index_t i = 0; i < n; i++) + *dptr++ = acc[i]; + }); + } + else + { + auto acc = values.as_double_accessor(); + auto n = acc.number_of_elements(); + for (conduit::index_t i = 0; i < n; i++) + dest[i] = acc[i]; + } + } + + /** + \brief Print the dimensions to a stream. + \param os The stream to use for printing. + */ + void print(std::ostream &os) const + { + os << "{"; + for (const auto &d : dims) + os << "{" << d.name << ", " << d.size << "},"; + os << "}"; + } + + private: + /** + \brief Produces an mcarray name given a tuple of dimension indices. + \param idx A tuple of indices within the set of dimensions provided to + the object. The first dimension is assumed to vary the fastest + so it is ignored. + \return The name of the mcarray component. + */ + std::string index_to_name(const std::vector &idx) const + { + std::stringstream ss; + // Dimension 0 skipped on purpose. + for (size_t i = 1; i < dims.size(); i++) + { + if (i > 1) + ss << "_"; + ss << dims[i].name << idx[i]; + } + return ss.str(); + } + + private: + conduit::Node &values; + std::vector dims; + bool interleave; +}; + +} // namespace utilities + +} // namespace Teton +#endif diff --git a/src/teton/include/TetonSources.hh b/src/teton/include/TetonSources.hh index c69db9e..ab747db 100644 --- a/src/teton/include/TetonSources.hh +++ b/src/teton/include/TetonSources.hh @@ -134,6 +134,8 @@ class PointSource : public TetonSource public: // timevals is of size ntimes // source_profile is of size ntimes x ngroups x nangles + // + // @note The zone_index value should be -1 if the zone is not owned by the current MPI rank. PointSource(int nangles, int ngroups, int zone_index, // Which zone is the point source in? TODO, convert from coordinate to zone index diff --git a/src/teton/include/TetonTesting.hh b/src/teton/include/TetonTesting.hh new file mode 100644 index 0000000..c480056 --- /dev/null +++ b/src/teton/include/TetonTesting.hh @@ -0,0 +1,44 @@ +//--------------------------------------------------------------------------// +// TetonTesting.hh +// +// This file contains functions that are helpful for testing teton by +// comparing baselines vs current Conduit output. +// +//--------------------------------------------------------------------------// + +#ifndef __TETON_TESTING_HH__ +#define __TETON_TESTING_HH__ + +#include "conduit/conduit_node.hpp" + +#include + +#include + +namespace Teton +{ + +namespace testing +{ + +/*! + * \brief Compare current node against baseline. + * + * \param n The node that holds the data to be tested. + * \param fileBase The base name of the file to be used for baselines or current data. + * \param cycle The current cycle number. + * \param make True if we are making a baseline, false if we're comparing to a baseline. + * \param comm The MPI communicator to use. + * + * \return True if the current results are sufficiently close to baselines. + * + * There are some environment variables that affect its operation. + * TETON_TESTING_BASELINE_DIR - set path for baseline files. + * TETON_TESTING_CURRENT_DIR - set path for current files. + */ +bool test(const conduit::Node &n, const std::string &fileBase, int cycle, bool make, MPI_Comm comm); + +} // namespace testing + +} // namespace Teton +#endif diff --git a/src/teton/include/TetonUtilities.hh b/src/teton/include/TetonUtilities.hh new file mode 100644 index 0000000..868865b --- /dev/null +++ b/src/teton/include/TetonUtilities.hh @@ -0,0 +1,182 @@ +//--------------------------------------------------------------------------// +// TetonUtilities.hh +// +// This file contains functions that are helpful for implementing the Teton +// Conduit interface. +// +//--------------------------------------------------------------------------// + +#ifndef __TETON_UTILITIES_HH__ +#define __TETON_UTILITIES_HH__ + +#include "conduit/conduit_blueprint_mesh_utils.hpp" +#include "conduit/conduit_node.hpp" + +#include +#include +#include + +#include + +namespace Teton +{ + +namespace utilities +{ + +/** + \brief Convert paths to int32 storage. + + \param rank The MPI rank + \param root The root Conduit node to search for the path keys. + \param keys The vector of path keys that will be modified if they exist. + + \note This is used to ensure that some fields that conduit produces are + converted to int32 so other algorithms do not end up promoting int + types to index_t and messing up Teton's int32 assumptions. + */ +void convert_int32(int rank, conduit::Node &root, const std::vector &keys); + +/** + \brief Scans through the Conduit tree and returns the paths that have a dtype. + + \param n The node to search. + \param dtype The dtype we're looking for. + \param[out] keys The keys with the dtype we're looking for. + */ +void find_dtype(const conduit::Node &n, + const conduit::DataType &dtype, + const std::string &path, + std::vector &paths); + +/** + \brief return a vector of keys with dtype int64. + \param n The node to search. + \return A vector of paths with dtype int64 + */ +std::vector find_int64(const conduit::Node &n); + +/** + \brief Scans through field values and returns true if the values do not have errors. + \param rank The MPI rank + \param n The Conduit node containing the field values. + \ + */ +bool scan_field_values(int rank, const conduit::Node &n); + +/*! + * \brief Examines a domain and looks for duplicate points with different ids. + * + * \param domainId The id of the domain. + * \param dom The node that contains the domain. + * \param coordset The node that contains the coordset. + * \param[out] info A node to contain any findings. + * + * \return True if there are duplicate points; False otherwise. + */ +bool find_local_duplicate_points(int domainId, + const conduit::Node &dom, + const conduit::Node &coordset, + conduit::Node &info); + +/** + \brief Gather strings from all ranks and make sure all ranks get that sorted unique vector of strings. + + \param vec The vector of strings on the current MPI rank. + \param comm The MPI communicator to use. + + \return A vector of strings that includes all of the unique strings across all ranks. + */ +std::vector globalizeStringVector(const std::vector &vec, MPI_Comm comm); + +/** + \brief Examine blueprint node fields (for provided names, if given) and invoke a function + on all of the fields that look like they need to be described by an mcarray. + + \param blueprint The blueprint node that contains the topologies and fields. + \param mainTopologyName The name of the main topology. + \param options The options node that contains Teton options. + \param fieldNames A vector of field names that we want to check. If this is empty, all + fields are checked. + \param func The function to be invoked when a field looks like it needs to be an + mcarray. +*/ +template +void iterate_mcarray_candidates(const conduit::Node &blueprint, + const std::string &mainTopologyName, + const conduit::Node &options, + const std::vector &fieldNames, + Func &&func) +{ + const conduit::Node &fields = blueprint.fetch_existing("fields"); + const conduit::Node &main_topo = blueprint.fetch_existing("topologies/" + mainTopologyName); + + conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + + auto looks_like_mcarray = [&](const conduit::Node &f) -> bool + { + if (f["association"].as_string() == "element") + { + const conduit::Node &v = f.fetch_existing("values"); + // We look for a "scalar" that acts like an mcarray. + if (v.number_of_children() == 0) + { + conduit::index_t nvalues = v.dtype().number_of_elements(); + conduit::index_t values_per_zone = nvalues / nzones; + if (values_per_zone == ngroups && ngroups > 1 && v.dtype().is_float64()) + { + // We appear to have a single buffer that contains ngroups values per zone. + return true; + } + } + } + return false; + }; + + if (fieldNames.empty()) + { + for (conduit::index_t i = 0; i < fields.number_of_children(); i++) + { + const conduit::Node &f = fields[i]; + if (looks_like_mcarray(f)) + func(f); + } + } + else + { + for (const auto &name : fieldNames) + { + if (fields.has_path(name)) + { + const conduit::Node &f = fields.fetch_existing(name); + if (looks_like_mcarray(f)) + func(f); + } + } + } +} + +/** + * \brief This class prints a simple banner to the console using RAII pattern. + */ +class Banner +{ + public: + Banner(MPI_Comm c, const std::string &str); + ~Banner(); + + private: + void printLine(const std::string s) const; + void emit(char c, int n) const; + + MPI_Comm comm; + int rank; + std::string name; + static int level; +}; + +} // namespace utilities + +} // namespace Teton +#endif diff --git a/src/teton/include/macros.h b/src/teton/include/macros.h index 264cdd2..0f2811e 100644 --- a/src/teton/include/macros.h +++ b/src/teton/include/macros.h @@ -15,22 +15,20 @@ !----------------------------------------------------------------------------- ! Code contract checks. !----------------------------------------------------------------------------- -! - TETONASSERT - enabled in debug builds or non-performance critical builds when extra checking desired. +! - TETON_ASSERT - enabled in debug builds or non-performance critical builds when extra checking desired. ! Will conditionally emit a message and shut down code if provided logical check fails. ! Define TETON_COMPILE_ASSERTS to enable. ! -! - TETONVERIFY - always enabled, should used to ensure correct problem input or critical areas of code. +! - TETON_VERIFY - always enabled, should used to ensure correct problem input or critical areas of code. ! Will conditionally emit a message and shut down code if provided logical check fails. ! -! - TETONFATAL - always enabled, use when code state is in an unrecoverable state. +! - TETON_FATAL - always enabled, use when code state is in an unrecoverable state. ! Will unconditionally emit an error messaged and shut down code. - -! TODO - make tetonAssert all caps! Difficult to distinguish that this is a macro call in code. ! - AB #ifdef TETON_COMPILE_ASSERTS -# define tetonAssert(bool,s) call f90assert(bool,__FILE__,__LINE__,s) +# define TETON_ASSERT(bool,s) call f90assert(bool,__FILE__,__LINE__,s) #else -# define tetonAssert(bool,s) +# define TETON_ASSERT(bool,s) #endif #define TETON_VERIFY(bool,s) call f90verify(bool,__FILE__,__LINE__,s) diff --git a/src/teton/include/omp_wrappers.h b/src/teton/include/omp_wrappers.h index 748034f..519bc26 100644 --- a/src/teton/include/omp_wrappers.h +++ b/src/teton/include/omp_wrappers.h @@ -4,27 +4,34 @@ !----------------------------------------------------------------------------- !----------------------------------------------------------------------------- -! These macros provide a mechanism for enabling/disabling OpenMP target pragma lines. -! Some compilers have trouble compiling newer OpenMP target offload pragmas. +! These macros provide a mechanism for enabling/disabling OpenMP target pragma lines +! and OpenMP target functionality, such as umpire host and device pool integration. ! -! TETON_ENABLE_OPENMP_OFFLOAD - enables any lines annotated with the TOMP macros, should be set by the build system. -! -! TOMP - Macro to enable/disable OMP target offload pragma line. -! TOMPC - Macro to enable/disable continued OMP pragma line. +! If this macro logic gets any more complex, this might need to be reworked... -- black27 + +! Enable basic openmp pragmas in code if openmp offloading enabled. #if defined(TETON_ENABLE_OPENMP_OFFLOAD) # define TOMP(source) !$omp source # define TOMPC(source) !$omp& source -# if defined(TETON_ENABLE_UMPIRE) -# define UMPIRE_DEVICE_POOL_ALLOC(source) call target_alloc_and_pair_ptrs(source) -# define UMPIRE_DEVICE_POOL_FREE(source) call target_free_and_unpair_ptrs(source) -# else -# define UMPIRE_DEVICE_POOL_ALLOC(source) -# define UMPIRE_DEVICE_POOL_FREE(source) -# endif - #else # define TOMP(source) # define TOMPC(source) +#endif + +! Enable openmp data map and update pragmas, if openmp offloading enabled and not using unified cpu and gpu memory. +#if defined(TETON_ENABLE_OPENMP_OFFLOAD) && !defined(TETON_OPENMP_HAS_UNIFIED_MEMORY) +# define TOMP_MAP(source) !$omp source +# define TOMP_UPDATE(source) !$omp source +#else +# define TOMP_MAP(source) +# define TOMP_UPDATE(source) +#endif + +! Enable Umpire integration for host and device pools, if openmp offloading enabled and not using unified cpu and gpu memory. +#if defined(TETON_ENABLE_UMPIRE) && defined(TETON_ENABLE_OPENMP_OFFLOAD) && !defined(TETON_OPENMP_HAS_UNIFIED_MEMORY) +# define UMPIRE_DEVICE_POOL_ALLOC(source) call target_alloc_and_pair_ptrs(source) +# define UMPIRE_DEVICE_POOL_FREE(source) call target_free_and_unpair_ptrs(source) +#else # define UMPIRE_DEVICE_POOL_ALLOC(source) # define UMPIRE_DEVICE_POOL_FREE(source) #endif diff --git a/src/teton/interface/CMakeLists.txt b/src/teton/interface/CMakeLists.txt index 62b0a85..876a85b 100644 --- a/src/teton/interface/CMakeLists.txt +++ b/src/teton/interface/CMakeLists.txt @@ -4,5 +4,7 @@ if (ENABLE_BLUEPRINT_INTERFACE) TetonBlueprint.cc TetonConduitInterface.cc TetonSurfaceTallies.cc + TetonTesting.cc + TetonUtilities.cc ) endif() diff --git a/src/teton/interface/TetonBlueprint.cc b/src/teton/interface/TetonBlueprint.cc index be3b04e..d320814 100644 --- a/src/teton/interface/TetonBlueprint.cc +++ b/src/teton/interface/TetonBlueprint.cc @@ -10,7 +10,7 @@ #include "conduit/conduit_relay_mpi_io_blueprint.hpp" #include "dbc_macros.h" -#if defined(TETON_ENABLE_CALIPER) +#if defined(TETON_ENABLE_CALIPER) && defined(TETON_ENABLE_BLUEPRINT_GENERATION_CALIPER_TIMERS) #include "caliper/cali.h" #else #define CALI_MARK_BEGIN(label) @@ -583,27 +583,23 @@ int TetonBlueprint::GetOppositeCorner(int zone, int face, int corner, int rank) int corner1 = -1; size_t ncorners = face_to_corners2[face].size(); + const int dim = mParametersNode.fetch_existing("size/ndim").as_int32(); double node_x = corner_to_node_x[corner]; double node_y = corner_to_node_y[corner]; - double node_z; - int dim = mParametersNode.fetch_existing("size/ndim").as_int32(); - if (dim > 2) - node_z = corner_to_node_z[corner]; + double node_z = (dim > 2) ? corner_to_node_z[corner] : 0.; for (size_t c = 0; c < ncorners; ++c) { corner1 = face_to_corners2[face][c]; double node1_x = corner_to_node_x[corner1]; double node1_y = corner_to_node_y[corner1]; - double node1_z; - if (dim > 2) - node1_z = corner_to_node_z[corner1]; int zone1 = corner_to_zone[corner1]; // Compilers frequently throw warnings about floating point comparisons being unreliable. // In this case, we should have identical binary comparisons because we are just comparing positions. if (dim > 2) { + const double node1_z = corner_to_node_z[corner1]; if (node1_x == node_x && node1_y == node_y && node1_z == node_z && zone1 != zone) { return corner1; @@ -627,9 +623,7 @@ int TetonBlueprint::GetOppositeCorner(int zone, int face, int corner, int rank) } else { - std::pair zoneface; - zoneface.first = zone; - zoneface.second = face; + std::pair zoneface(zone, face); try { const int nbelem = mParametersNode.fetch_existing("size/nbelem").to_int32(); @@ -717,7 +711,7 @@ void TetonBlueprint::ComputeSharedFaces(int rank) } int sface_offset = 0; - conduit::int32 *shared_faces_array_ptr; + conduit::int32 *shared_faces_array_ptr = nullptr; int nsfaces = 0; int sfaces_array_len = mMeshNode.fetch_existing("shared_boundaries/shared_faces").dtype().number_of_elements(); if (ndim == 2) @@ -746,10 +740,8 @@ void TetonBlueprint::ComputeSharedFaces(int rank) { int zone = shared_faces_array_ptr[sface_offset + 1]; int face = shared_faces_array_ptr[sface_offset + 2]; - std::pair zoneface; - zoneface.first = zone; - zoneface.second = face; - int halfface; + std::pair zoneface(zone, face); + int halfface = 0; try { halfface = zoneface_to_halfface.at(zoneface); @@ -857,10 +849,14 @@ void TetonBlueprint::CreateTetonMeshCornerCoords() { CALI_CXX_MARK_FUNCTION; - int nzones = mParametersNode.fetch_existing("size/nzones").as_int32(); - int ndim = mParametersNode.fetch_existing("size/ndim").as_int32(); + const int nzones = mParametersNode.fetch_existing("size/nzones").as_int32(); + const int ndim = mParametersNode.fetch_existing("size/ndim").as_int32(); + + // Make an estimate for the vector size. + const size_t corners_per_zone[] = {1, 2, 4, 8}; + size_t nnodes_est = static_cast(nzones) * corners_per_zone[ndim]; std::vector zone_verts; - // std::vector zone_ncorners; + zone_verts.reserve(nnodes_est * ndim); // loop over mesh elements for (int zone = 0; zone < nzones; ++zone) @@ -1309,6 +1305,112 @@ void TetonBlueprint::CreateConduitFaceAttributes(conduit::Node &meshNode, int ra c(bndry_values, face_values, src_dest, 0); } +void TetonBlueprint::GetSurfaceEditZoneFacesAndCorners(int rank, + const conduit::Node &surf_face_topo, + const std::map, int> &verts_face_map, + std::vector &surf_edits_loczonefaces, + std::vector &surf_edits_corners) const +{ + const int *corner_to_vertex = mMeshNode.fetch_existing("arrays/corner_to_vertex").value(); + + const int surf_face_topo_length = conduit::blueprint::mesh::utils::topology::length(surf_face_topo); + for (int f = 0; f < surf_face_topo_length; ++f) + { + std::vector face_points = conduit::blueprint::mesh::utils::topology::unstructured::points( + surf_face_topo, + f); + std::set face_vertices(face_points.begin(), face_points.end()); + int face = -1; + if (verts_face_map.find(face_vertices) != verts_face_map.end()) + { + face = verts_face_map.at(face_vertices); + } + else + { + TETON_FATAL_C( + rank, + "TetonBlueprint::ProcessSurfaceEdits: surface edit face doesn't have vertices that correspond to a mesh face"); + } + + // This mesh face shares (generically) two zones---need to determine the + // correct zone based on the vertex orientation of the vertices on the face + // NOTE: this logic will not work for AMR meshes + // TODO: fix for AMR meshes + // The code below uses that face_to_corners2[face] has a list of corners + // associated with the mesh face, face. Generically this list will involves two + // zones, and the orientation will be according to the left-hand rule. Using the association between each corner and vertex (for non-AMR meshes), we want to find the zone with the same orientation + int vert1 = face_points[0]; + int vert2 = face_points[1]; + int zone1 = face_to_zones2[face][0]; + int zone_for_surf_face = zone1; + if (face_to_zones2[face].size() > 1) + { + int zone2 = face_to_zones2[face][1]; + zone_for_surf_face = zone2; + // Loop over corners associated with first zone that + // shares this face. For non-AMR meshes, each vertex + // in the zone corresponds to a corner + int nverts_for_face = face_to_corners2[face].size() / 2; + for (int c = 0; c < nverts_for_face - 1; ++c) + { + int corner1 = face_to_corners2[face][c]; + int corner2 = face_to_corners2[face][c + 1]; + int vert_tail = corner_to_vertex[corner1]; + int vert_head = corner_to_vertex[corner2]; + if ((vert1 == vert_tail) && (vert2 == vert_head)) + { + zone_for_surf_face = zone1; + continue; + } + } + if (nverts_for_face > 2) // true if dim == 3 + { + int corner1 = face_to_corners2[face][nverts_for_face - 1]; + int corner2 = face_to_corners2[face][0]; + int vert_tail = corner_to_vertex[corner1]; + int vert_head = corner_to_vertex[corner2]; + if ((vert1 == vert_tail) && (vert2 == vert_head)) + { + zone_for_surf_face = zone1; + } + } + } + + // Determine the local zone face ID associated with + // the pair (zoneID, faceID) + int nfaces_in_zone = zone_to_faces2[zone_for_surf_face].size(); + int local_face = -1; + for (int f1 = 0; f1 < nfaces_in_zone; ++f1) + { + int face_in_zone = zone_to_faces2[zone_for_surf_face][f1]; + if (face_in_zone == face) + local_face = f1; + } + if (local_face == -1) + { + TETON_FATAL_C( + rank, + "TetonBlueprint::ProcessSurfaceEdits: surface edit face doesn't have vertices that correspond to a mesh face"); + } + + // Finally, push the global corner IDs and local zone face IDs + // associated with the pair (zoneID, faceID) + int ncorner_faces = face_to_corners2[face].size(); + for (int c = 0; c < ncorner_faces; ++c) + { + int corner = face_to_corners2[face][c]; + int zone = corner_to_zone[corner]; + if (zone != zone_for_surf_face) + { + continue; + } + // Note: increment because Teton uses 1-based indexing + surf_edits_loczonefaces.push_back(local_face + 1); + surf_edits_corners.push_back(corner + 1); + } + } +} + // WORKING ON // // TODO: fix for AMR meshes // NOTE: Compare to template void Teton::makeCornerLists @@ -1318,9 +1420,6 @@ void TetonBlueprint::CreateConduitFaceAttributes(conduit::Node &meshNode, int ra void TetonBlueprint::ProcessSurfaceEdits(int rank) { - std::vector> surf_edits_loczonefaces; - std::vector> surf_edits_corners; - // Create the map {face vertices => face}. Here the face IDs are those // generated by the call to generate_faces. We need to match these face IDs // with the vertices associated with the surface edit faces @@ -1345,126 +1444,36 @@ void TetonBlueprint::ProcessSurfaceEdits(int rank) rank, "TetonBlueprint::ProcessSurfaceEdits: field arrays/corner_to_vertex has not yet been created; need to call CreateConnectivityArrays first"); } - int *corner_to_vertex = mMeshNode.fetch_existing("arrays/corner_to_vertex").value(); // Loop over surface_edit topologies - conduit::Node &topos = mMeshNode["topologies"]; - conduit::Node &surface_edits = mParametersNode["surface_edits"]; + const conduit::Node &topos = mMeshNode["topologies"]; + const conduit::Node &surface_edits = mParametersNode["surface_edits"]; conduit::NodeConstIterator surface_edits_it = surface_edits.children(); - int num_surfaces = surface_edits.number_of_children(); - surf_edits_loczonefaces.resize(num_surfaces); - surf_edits_corners.resize(num_surfaces); - int surface_id = 0; while (surface_edits_it.has_next()) { const conduit::Node ¶m_surface_edit = surface_edits_it.next(); - // get the surface topology + std::vector surf_edits_loczonefaces, surf_edits_corners; + // Check whether the surface topology exists on this rank. + // Only fill in the vectors if the topology exists on this rank. We still add + // the surface edit since the consumer of it will pass data into teton_surfaceedit, + // which contains collective communication and all ranks need to participate. std::string surface_edit_name_str = param_surface_edit["zone_face_topology_name"].as_string(); - conduit::Node &surf_face_topo = topos[surface_edit_name_str]; - const int surf_face_topo_length = conduit::blueprint::mesh::utils::topology::length(surf_face_topo); - for (int f = 0; f < surf_face_topo_length; ++f) - { - std::vector face_points = conduit::blueprint::mesh::utils::topology::unstructured::points( - surf_face_topo, - f); - std::set face_vertices(face_points.begin(), face_points.end()); - int face; - if (verts_face_map.find(face_vertices) != verts_face_map.end()) - { - face = verts_face_map.at(face_vertices); - } - else - { - TETON_FATAL_C( - rank, - "TetonBlueprint::ProcessSurfaceEdits: surface edit face doesn't have vertices that correspond to a mesh face"); - } - - // This mesh face shares (generically) two zones---need to determine the - // correct zone based on the vertex orientation of the vertices on the face - // NOTE: this logic will not work for AMR meshes - // TODO: fix for AMR meshes - // The code below uses that face_to_corners2[face] has a list of corners - // associated with the mesh face, face. Generically this list will involves two - // zones, and the orientation will be according to the left-hand rule. Using the association between each corner and vertex (for non-AMR meshes), we want to find the zone with the same orientation - int vert1 = face_points[0]; - int vert2 = face_points[1]; - int zone1 = face_to_zones2[face][0]; - int zone_for_surf_face = zone1; - if (face_to_zones2[face].size() > 1) - { - int zone2 = face_to_zones2[face][1]; - zone_for_surf_face = zone2; - // Loop over corners associated with first zone that - // shares this face. For non-AMR meshes, each vertex - // in the zone corresponds to a corner - int nverts_for_face = face_to_corners2[face].size() / 2; - for (int c = 0; c < nverts_for_face - 1; ++c) - { - int corner1 = face_to_corners2[face][c]; - int corner2 = face_to_corners2[face][c + 1]; - int vert_tail = corner_to_vertex[corner1]; - int vert_head = corner_to_vertex[corner2]; - if ((vert1 == vert_tail) && (vert2 == vert_head)) - { - zone_for_surf_face = zone1; - continue; - } - } - if (nverts_for_face > 2) // true if dim == 3 - { - int corner1 = face_to_corners2[face][nverts_for_face - 1]; - int corner2 = face_to_corners2[face][0]; - int vert_tail = corner_to_vertex[corner1]; - int vert_head = corner_to_vertex[corner2]; - if ((vert1 == vert_tail) && (vert2 == vert_head)) - { - zone_for_surf_face = zone1; - } - } - } - - // Determine the local zone face ID associated with - // the pair (zoneID, faceID) - int nfaces_in_zone = zone_to_faces2[zone_for_surf_face].size(); - int local_face = -1; - for (int f1 = 0; f1 < nfaces_in_zone; ++f1) - { - int face_in_zone = zone_to_faces2[zone_for_surf_face][f1]; - if (face_in_zone == face) - local_face = f1; - } - if (local_face == -1) - { - TETON_FATAL_C( - rank, - "TetonBlueprint::ProcessSurfaceEdits: surface edit face doesn't have vertices that correspond to a mesh face"); - } + if (topos.has_child(surface_edit_name_str)) + { + // get the surface topology + const conduit::Node &surf_face_topo = topos.fetch_existing(surface_edit_name_str); - // Finally, push the global corner IDs and local zone face IDs - // associated with the pair (zoneID, faceID) - int ncorner_faces = face_to_corners2[face].size(); - for (int c = 0; c < ncorner_faces; ++c) - { - int corner = face_to_corners2[face][c]; - int zone = corner_to_zone[corner]; - if (zone != zone_for_surf_face) - { - continue; - } - // Note: incremenet because Teton uses 1-based indexing - surf_edits_loczonefaces[surface_id].push_back(local_face + 1); - surf_edits_corners[surface_id].push_back(corner + 1); - } + // Get the local zone faces and corners. + GetSurfaceEditZoneFacesAndCorners(rank, + surf_face_topo, + verts_face_map, + surf_edits_loczonefaces, + surf_edits_corners); } - // Append to blueprint mesh node - mMeshNode["teton/surface_edits/" + surface_edit_name_str + "/corners"].set(surf_edits_corners[surface_id].data(), - surf_edits_corners[surface_id].size()); - mMeshNode["teton/surface_edits/" + surface_edit_name_str + "/local_zone_faces"].set( - surf_edits_loczonefaces[surface_id].data(), - surf_edits_corners[surface_id].size()); - surface_id += 1; + conduit::Node &surface_edit = mMeshNode["teton/surface_edits/" + surface_edit_name_str]; + surface_edit["corners"].set(surf_edits_corners); + surface_edit["local_zone_faces"].set(surf_edits_loczonefaces); } } @@ -1609,7 +1618,6 @@ void TetonBlueprint::ComputeFaceIDs(std::map> &boundaries, // First we get the max of the non-shared bc_ids so that we // can assign a unique bc_id int bc_id_max = 0; - int n_shared_corner_faces = 0; for (auto itr = boundaries.begin(); itr != boundaries.end(); ++itr) { bc_id_max = std::max(itr->first, bc_id_max); @@ -1654,19 +1662,6 @@ void TetonBlueprint::ComputeFaceIDs(std::map> &boundaries, size_t ncorners = face_to_corners2[face].size(); boundary_id_to_ncornerfaces[bc_id] += ncorners; face_to_bcid[face] = bc_id; - - if (ndim == 2) - { - n_shared_corner_faces += 1; - } - else if (ndim == 3) - { - n_shared_corner_faces += ncorners; - } - else - { - std::cout << "1D is not yet implemented! " << std::endl; - } } } } @@ -1914,7 +1909,6 @@ void TetonBlueprint::ComputeFaceIDs1D(int *boundary_connectivity, // bou while (groups_it.has_next()) { const conduit::Node &group = groups_it.next(); - int num_sfaces = group["values"].dtype().number_of_elements(); int nbr_rank = group["neighbors"].to_int(); int index = fn_counter_teton + num_nonshared_bndrs; // We want to increment this only when num_sfaces > 0 @@ -1938,9 +1932,6 @@ void TetonBlueprint::ComputeFaceIDs1D(int *boundary_connectivity, // bou mParametersNode["boundary_conditions/num_source"] = boundaries_types[2]; mParametersNode["boundary_conditions/num_comm"] = boundaries_types[3]; mParametersNode["boundary_conditions/num_total"] = num_bndrs; - - std::vector elem_ids(2); - int nelem = mMeshNode["topologies/mesh/elements/dims/i"].value(); mParametersNode["boundary_conditions/type"].set(bc_type_int.data(), 2); mParametersNode["boundary_conditions/zone_ids"].set(bc_zone_id.data(), 2); mParametersNode["boundary_conditions/neighbor_ids"].set(bc_nghbr_id.data(), 2); @@ -1955,7 +1946,6 @@ void TetonBlueprint::OutputTetonMesh(int rank, MPI_Comm comm) int nelem = mMeshNode["topologies/mesh/elements/dims/i"].value(); // TODO Move to mesh conduit Node // TODO: fix rank - int rank = 0; mParametersNode["size/ndim"] = 1; mParametersNode["size/rank"] = rank; mParametersNode["size/nzones"] = nelem; @@ -2046,7 +2036,7 @@ void TetonBlueprint::OutputTetonMesh(int rank, MPI_Comm comm) std::vector boundaries_types = {0, 0, 0, 0}; std::vector boundary_conditions; int nbelem_corner_faces = 0; - m_face_to_bcid.resize(nfaces); + m_face_to_bcid.resize(nfaces, 0); ComputeFaceIDs(boundaries, nbelem_corner_faces, boundaries_types, boundary_conditions, m_face_to_bcid, rank); @@ -2063,27 +2053,22 @@ void TetonBlueprint::OutputTetonMesh(int rank, MPI_Comm comm) { const conduit::Node &fn_face_group = groups_it.next(); const conduit::Node fn_corner_group = mMeshNode["adjsets/main_corner/groups"][fn_face_group.name()]; - const conduit::Node &fn_faces = fn_face_group["values"]; - const conduit::Node &fn_corners = fn_corner_group["values"]; // Store the index of each corner in the corner adjacency list. // Precondition: The order of the corners in the corner mesh adjacency set from // conduit must match with the order in the neighboring domain's list. std::map cpoint_to_gindex; - for (int cp = 0; cp < fn_corners.dtype().number_of_elements(); cp++) + const auto corner_data = fn_corner_group["values"].as_int_accessor(); + for (int cp = 0; cp < corner_data.number_of_elements(); cp++) { - conduit::Node corner_data(conduit::DataType(fn_corners.dtype().id(), 1), - (void *) fn_corners.element_ptr(cp), - true); - cpoint_to_gindex[corner_data.to_int()] = cp; + int cpoint = corner_data[cp]; + cpoint_to_gindex[cpoint] = cp; } - for (int f = 0; f < fn_faces.dtype().number_of_elements(); f++) + const auto face_data = fn_face_group["values"].as_int_accessor(); + for (int f = 0; f < face_data.number_of_elements(); f++) { - conduit::Node face_data(conduit::DataType(fn_faces.dtype().id(), 1), - (void *) fn_faces.element_ptr(f), - true); - const int face = face_data.to_int(); + const int face = face_data[f]; // Makes sure a shared face is really a shared face if (face_to_zones2[face].size() > 1) @@ -2091,15 +2076,10 @@ void TetonBlueprint::OutputTetonMesh(int rank, MPI_Comm comm) const int bcid = m_face_to_bcid[face]; const int zone = face_to_zones2[face].front(); // only 1 b/c on boundary - const std::vector corners = face_to_corners2[face]; - - const std::vector - face_points = conduit::blueprint::mesh::utils::topology::unstructured::points(face_topology, face); std::map fpoint_to_corner; - for (size_t c = 0; c < corners.size(); c++) + for (const auto corner : face_to_corners2[face]) { - const int corner = corners[c]; const std::vector corner_points = conduit::blueprint::mesh::utils::topology::unstructured::points(corner_topology, corner); @@ -2108,10 +2088,11 @@ void TetonBlueprint::OutputTetonMesh(int rank, MPI_Comm comm) fpoint_to_corner[corner_points.front()] = corner; } + const std::vector + face_points = conduit::blueprint::mesh::utils::topology::unstructured::points(face_topology, face); std::vector> fpoint_list; - for (size_t fp = 0; fp < face_points.size(); fp++) + for (const auto face_point : face_points) { - const int face_point = face_points[fp]; fpoint_list.emplace_back(cpoint_to_gindex[face_point], face_point); } std::sort(fpoint_list.begin(), fpoint_list.end()); diff --git a/src/teton/interface/TetonConduitInterface.cc b/src/teton/interface/TetonConduitInterface.cc index 8c2f9c3..15067ae 100644 --- a/src/teton/interface/TetonConduitInterface.cc +++ b/src/teton/interface/TetonConduitInterface.cc @@ -1,3 +1,4 @@ +#include #include #include #include @@ -7,14 +8,29 @@ #include #include "conduit/conduit_blueprint.hpp" +#include "conduit/conduit_blueprint_mesh.hpp" +#include "conduit/conduit_blueprint_mesh_utils.hpp" +#include "conduit/conduit_blueprint_mpi_mesh.hpp" +#include "conduit/conduit_config.h" #include "conduit/conduit_relay.hpp" #include "conduit/conduit_relay_config.h" +#include "conduit/conduit_relay_mpi.hpp" #include "conduit/conduit_relay_mpi_io_blueprint.hpp" +#if defined(CONDUIT_USE_PARMETIS) +#include "conduit/conduit_blueprint_mesh_topology_metadata.hpp" +// We can only enable partitioning right now if Conduit includes Parmetis. +#include "conduit/conduit_blueprint_mpi_mesh_parmetis.hpp" +#pragma message "Teton built with partitioning support." +#define TETON_PARTITIONING +#endif #include "TetonBlueprint.hh" #include "TetonConduitInterface.hh" #include "TetonInterface.hh" +#include "TetonNDAccessor.hh" #include "TetonSurfaceTallies.hh" +#include "TetonTesting.hh" +#include "TetonUtilities.hh" #include "dbc_macros.h" #if defined(TETON_USE_CUDA) @@ -23,6 +39,7 @@ #endif #if defined(TETON_ENABLE_CALIPER) +#pragma message "Teton built with Caliper support." #include "caliper/cali.h" #else #define CALI_MARK_BEGIN(label) @@ -31,6 +48,9 @@ #define CALI_CXX_MARK_FUNCTION #endif +// Uncomment to enable partition debugging console output. +// #define PARTITION_DEBUG + extern "C" { extern conduit::Node *teton_get_datastore_cptr(); @@ -46,6 +66,67 @@ extern void teton_conduitcheckpoint_teardown(); namespace Teton { +std::string field_path(const std::string &fieldName) +{ + return "fields/" + fieldName; +} + +std::string field_values(const std::string &fieldName) +{ + return field_path(fieldName) + "/values"; +} + +//--------------------------------------------------------------------------- +// Teton +const std::string Teton::PREFIX("__teton__"); +const std::string Teton::MCARRAY_PREFIX(Teton::PREFIX + "mcarray_"); +const std::string Teton::PARTITION_FIELD(Teton::PREFIX + "parmetis_result"); +const std::string Teton::PARTITION_FIELD_BOUNDARY(Teton::PREFIX + "parmetis_result_boundary"); + +const std::string Teton::FIELD_ELECTRON_ENERGY_DEPOSITED("electron_energy_deposited"); +const std::string Teton::FIELD_RADIATION_ENERGY_DENSITY("radiation_energy_density"); +// We gather radiation_temperature results into this field so values may be queried by +// getRadiationTemperature(). Prepend the prefix so we don't disturb the radiation_temperature +// field if the host code happens to provide one. We take this approach for some of the +// other fields below as well. +const std::string Teton::FIELD_RADIATION_TEMPERATURE(Teton::PREFIX + "radiation_temperature"); + +const std::string Teton::FIELD_RADIATION_FORCE_X("radiation_force_x"); +const std::string Teton::FIELD_RADIATION_FORCE_Y("radiation_force_y"); +const std::string Teton::FIELD_RADIATION_FORCE_Z("radiation_force_z"); +const std::string Teton::FIELD_RADIATION_FORCE_R("radiation_force_r"); +const std::string Teton::FIELD_CORNER_VOLUME_SUMS(Teton::PREFIX + "cornerVolumeSums"); + +const std::string Teton::FIELD_RADIATION_FLUX_X(Teton::PREFIX + "radiation_flux_x"); +const std::string Teton::FIELD_RADIATION_FLUX_Y(Teton::PREFIX + "radiation_flux_y"); +const std::string Teton::FIELD_RADIATION_FLUX_Z(Teton::PREFIX + "radiation_flux_z"); +const std::string Teton::FIELD_RADIATION_FLUX_R(Teton::PREFIX + "radiation_flux_r"); + +// This field is handled similiar to FIELD_RADIATION_TEMPERATURE. +const std::string Teton::FIELD_MATERIAL_TEMPERATURE(Teton::PREFIX + "material_temperature"); + +const std::string Teton::TOPO_MAIN("main"); +const std::string Teton::TOPO_BOUNDARY("boundary"); + +Teton::Teton() + : mDTrad(0.), + areSourceProfilesSet(false), + mIsInitialized(false), + mGTAorder(2), + mInternalComptonFlag((int) tetonComptonFlag::none), + mCommunicator(MPI_COMM_WORLD), + mRank(0), + mCornerToVertex(), + mZoneToNCorners(), + mZoneToCorners(), + mCornerToZone(), + mMapBackFields(), + mMCArrays(), + mRadiationForceDensityFields(), + mRadiationFluxFields() +{ +} + Teton::~Teton() { if (mIsInitialized) @@ -60,7 +141,31 @@ Teton::~Teton() } } teton_destructmeshdata(&enableNLTE); + + teton_destructmemoryallocator(); + } +} + +int Teton::getVerbose() const +{ + const conduit::Node &options = getOptions(); + int verbose = 0; + if (options.has_path("verbose")) + { + verbose = options.fetch_existing("verbose").value(); + // Initialize from the environment if TETON_VERBOSE is set. + if (getenv("TETON_VERBOSE") != nullptr) + verbose = atoi(getenv("TETON_VERBOSE")); + } + else + { + // Initialize from the environment if TETON_VERBOSE is set. + if (getenv("TETON_VERBOSE") != nullptr) + { + verbose = atoi(getenv("TETON_VERBOSE")); + } } + return verbose; } void Teton::initialize(MPI_Comm communicator, bool fromRestart) @@ -76,41 +181,63 @@ void Teton::initialize(MPI_Comm communicator, bool fromRestart) conduit::Node &datastore = getDatastore(); conduit::Node &options = getOptions(); conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); int verbose = 0; - if (options.has_path("verbose")) - { - verbose = options.fetch_existing("verbose").value(); - if (verbose && mRank == 0) - std::cout << "Teton: setting verbosity to " << verbose << std::endl; - } - else - { - options["verbose"] = verbose; - } + options["verbose"] = verbose = getVerbose(); + if (verbose && mRank == 0) + std::cout << "Teton: setting verbosity to " << verbose << std::endl; if (verbose >= 2) { + // Save parameters. if (mRank == 0) { std::cerr << "Teton: Dump copy of input..." << std::endl; } conduit::relay::io::save(options, "parameters_input_" + std::to_string(mRank) + ".conduit_json", "conduit_json"); conduit::relay::io::save(options, "parameters_input_" + std::to_string(mRank) + ".json", "json"); - conduit::blueprint::mesh::paint_adjset("main_adjset", "main_adjset", blueprint); + + // Save mesh. + if (verbose >= 3) + conduit::blueprint::mesh::paint_adjset("main_adjset", "main_adjset", blueprint); conduit::relay::io::save(blueprint, "mesh_input_" + std::to_string(mRank) + ".conduit_json", "conduit_json"); conduit::relay::io::save(blueprint, "mesh_input_" + std::to_string(mRank) + ".json", "json"); +#if defined(PARTITION_DEBUG) && defined(CONDUIT_RELAY_IO_HDF5_ENABLED) +#pragma message "Saving blueprint_initialize." + if (mRank == 0) + { + std::cerr << "Teton: Save mesh..." << std::endl; + } + MPI_Barrier(communicator); + // Save to a plottable file. + conduit::relay::mpi::io::blueprint::save_mesh(blueprint, "blueprint_initialize", "hdf5", communicator); +#endif } - // Create secondary (corner) mesh topology and connectivity arrays. +#if !defined(TETON_ENABLE_MINIAPP_BUILD) + // Make sure that the blueprint mesh has temperature fields on it because + // we need to map these values back. + createRadiationTemperature(); + createMaterialTemperature(); +#endif + initializeRadiationFluxFieldNames(); + initializeRadiationForceDensityFieldNames(); + + // Partition the mesh, if necessary. This migrates all fields to the partition mesh. + partition(fromRestart); + // The "part" node now contains partitioned data. + + // Create secondary (corner) mesh topology and connectivity arrays, using the part mesh. CALI_MARK_BEGIN("Teton_Construct_Corner_Mesh"); - TetonBlueprint blueprintHelper(blueprint, options); - blueprintHelper.OutputTetonMesh(mRank, communicator); + TetonBlueprint blueprintHelper(part, options); + blueprintHelper.OutputTetonMesh(mRank, mCommunicator); CALI_MARK_END("Teton_Construct_Corner_Mesh"); if (verbose >= 2) { - conduit::blueprint::mesh::paint_adjset("main_corner", "corner_adjset", blueprint); + if (verbose >= 3) + conduit::blueprint::mesh::paint_adjset("main_corner", "corner_adjset", part); if (mRank == 0) { std::cerr << "Teton: Dump blueprint with generated topologies..." << std::endl; @@ -184,6 +311,36 @@ void Teton::initialize(MPI_Comm communicator, bool fromRestart) options["sweep/kernel/version"] = 0; } + // Initialize default numbers of hyper domains, if not provided. + // Valid values are: + // 0 - Indicates user did not set number of hyper-domains. Teton will automatically determine a number. (default) + // >= 1 - Overrides Teton's automatic setting with this number. + // Number of sweep hyper-domains: + if (!options.has_path("sweep/sn/numhyperdomains")) + { + // Default to 0 to allow Teton's automatic algorithm to determine a number. + options["sweep/sn/numhyperdomains"] = 0; + } + else if (options["sweep/sn/numhyperdomains"].as_int() < 0) + { + std::cerr + << "Teton: Invalid number of sweep hyper-domains, please set options/sweep/sn/numhyperdomains to 0 for automatic setting, or a value >= 1 ..." + << std::endl; + } + + // Number of new GTA hyper-domains: + if (!options.has_path("sweep/gta/numhyperdomains")) + { + // Default to 0 to allow Teton's automatic algorithm to determine a number. + options["sweep/gta/numhyperdomains"] = 0; + } + else if (options["sweep/gta/numhyperdomains"].as_int() < 0) + { + std::cerr + << "Teton: Invalid number of new GTA hyper-domains, please set options/sweep/gta/numhyperdomains to 0 for automatic setting, or a value >= 1 ..." + << std::endl; + } + if (blueprint.has_path("fields/absorption_opacity/values")) { updateOpacity(); @@ -202,15 +359,15 @@ void Teton::initialize(MPI_Comm communicator, bool fromRestart) void Teton::storeMeshData() { conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); // To compute the radiation forces, Teton needs to hang on to // this connectivity array - if (blueprint.has_path("arrays/corner_to_vertex")) + if (part.has_path("arrays/corner_to_vertex")) { int ncornr = options.fetch_existing("size/ncornr").to_int(); mCornerToVertex.resize(ncornr); - int *corner_to_vert_ptr = blueprint.fetch_existing("arrays/corner_to_vertex").value(); + int *corner_to_vert_ptr = part.fetch_existing("arrays/corner_to_vertex").value(); for (int c = 0; c < ncornr; ++c) { // Store the vertex ID corresponding to this corner ID. @@ -220,42 +377,32 @@ void Teton::storeMeshData() } // To compute the radiation forces, Teton also needs to hang on to // this connectivity array - if (blueprint.has_path("relations/corner_to_zone")) + if (part.has_path("relations/corner_to_zone")) { int ncornr = options.fetch_existing("size/ncornr").to_int(); - int *corner_to_zone_ptr = blueprint.fetch_existing("relations/corner_to_zone").value(); + int *corner_to_zone_ptr = part.fetch_existing("relations/corner_to_zone").value(); mCornerToZone.resize(ncornr); for (int c = 0; c < ncornr; ++c) { mCornerToZone[c] = corner_to_zone_ptr[c]; } } - if (blueprint.has_path("arrays/zone_to_ncorners")) + + if (part.has_path("arrays/zone_to_ncorners")) { - int *zone_to_ncorner_ptr = blueprint.fetch_existing("arrays/zone_to_ncorners").value(); - int nzones = options.fetch_existing("size/nzones").to_int(); - mZoneToNCorners.resize(nzones); - for (int zone = 0; zone < nzones; ++zone) - { - int ncorners = zone_to_ncorner_ptr[zone]; - mZoneToNCorners[zone] = ncorners; - } + const conduit::Node &zones_to_ncorners = part.fetch_existing("arrays/zone_to_ncorners"); + const int *zone_to_ncorner_ptr = zones_to_ncorners.value(); + auto n = static_cast(zones_to_ncorners.dtype().number_of_elements()); + mZoneToNCorners.resize(n); + memcpy(&mZoneToNCorners[0], zone_to_ncorner_ptr, n * sizeof(int)); } - if (blueprint.has_path("arrays/zone_to_corners")) + if (part.has_path("arrays/zone_to_corners")) { - int *zone_to_corner_ptr = blueprint.fetch_existing("arrays/zone_to_corners").value(); - int corner_counter = 0; - int nzones = options.fetch_existing("size/nzones").to_int(); - for (int zone = 0; zone < nzones; ++zone) - { - int ncorners = mZoneToNCorners[zone]; - for (int c = 0; c < ncorners; ++c) - { - int corner = zone_to_corner_ptr[corner_counter]; - mZoneToCorners.push_back(corner); - corner_counter += 1; - } - } + const conduit::Node &zones_to_corners = part.fetch_existing("arrays/zone_to_corners"); + const int *zone_to_corner_ptr = zones_to_corners.value(); + auto n = static_cast(zones_to_corners.dtype().number_of_elements()); + mZoneToCorners.resize(n); + memcpy(&mZoneToCorners[0], zone_to_corner_ptr, n * sizeof(int)); } } @@ -307,9 +454,9 @@ void Teton::constructBoundaries() if (ndim > 1) { int numBCTotal = options.fetch_existing("boundary_conditions/num_total").value(); - int *BCTypeInt = options.fetch_existing("boundary_conditions/type").as_int_ptr(); - int *BCCornerFaces = options.fetch_existing("boundary_conditions/corner_face_ids").as_int_ptr(); - int *BCNeighborID = options.fetch_existing("boundary_conditions/neighbor_ids").as_int_ptr(); + int *BCTypeInt = options.fetch_existing("boundary_conditions/type").value(); + int *BCCornerFaces = options.fetch_existing("boundary_conditions/corner_face_ids").value(); + int *BCNeighborID = options.fetch_existing("boundary_conditions/neighbor_ids").value(); TETON_VERIFY_C(mRank, (numBCTotal > 0), "No boundary conditions defined."); @@ -317,9 +464,9 @@ void Teton::constructBoundaries() } else { - int *BCTypeInt = options.fetch_existing("boundary_conditions/type").as_int_ptr(); - int *BCNeighborID = options.fetch_existing("boundary_conditions/neighbor_ids").as_int_ptr(); - int *BCCornerFaces = options.fetch_existing("boundary_conditions/bc_ncorner_faces").as_int_ptr(); + int *BCTypeInt = options.fetch_existing("boundary_conditions/type").value(); + int *BCNeighborID = options.fetch_existing("boundary_conditions/neighbor_ids").value(); + int *BCCornerFaces = options.fetch_existing("boundary_conditions/bc_ncorner_faces").value(); int numBCTotal = 2; teton_addboundary(&numBCTotal, &BCTypeInt[0], &BCCornerFaces[0], &BCNeighborID[0]); @@ -503,9 +650,10 @@ void Teton::dump(MPI_Comm communicator, std::string path) { // This is defined in conduit_relay_config.h #if defined(CONDUIT_RELAY_IO_HDF5_ENABLED) - conduit::Node &blueprint = getMeshBlueprint(); + // NOTE: this routine saves the partitioned mesh given to Teton. + conduit::Node &part = getMeshBlueprintPart(); std::string file_protocol = "hdf5"; - conduit::relay::mpi::io::blueprint::save_mesh(blueprint, path + "/blueprint_mesh", file_protocol, communicator); + conduit::relay::mpi::io::blueprint::save_mesh(part, path + "/blueprint_mesh", file_protocol, communicator); #else std::cerr << " Teton: Unable to dump mesh blueprint viz file. Conduit was not built with HDF5 support." << std::endl; @@ -517,9 +665,16 @@ void Teton::dump(MPI_Comm communicator, std::string path) double Teton::step(int cycle) { +#if defined(PARTITION_DEBUG) + MPI_Barrier(mCommunicator); + std::stringstream cs; + cs << "Teton::step " << cycle; + utilities::Banner b(mCommunicator, cs.str()); +#endif conduit::Node &datastore = getDatastore(); conduit::Node &options = getOptions(); conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); // TODO - These should be moved and made defaults in conduit node. int maxOSComptonChangeCorner = 1; @@ -604,26 +759,63 @@ double Teton::step(int cycle) // // Calling updateMeshPositions will cause the volume difference from // last cycle to current cycle, that Teton tracks, to be updated. + std::vector updateFields; int mesh_motion = 1; if (options.has_path("mesh_motion")) { mesh_motion = options.fetch_existing("mesh_motion").value(); } + // Determine whether materials are present and required fields. + int materials = 0; + if (blueprint.has_path("fields/thermo_density/values")) + { + materials = 1; + + // Fields required in setMaterial. + updateFields.push_back("thermo_density"); + updateFields.push_back("electron_specific_heat"); + updateFields.push_back("electron_temperature"); + updateFields.push_back("radiation_temperature"); + updateFields.push_back("electron_number_density"); + if (blueprint.has_path("fields/specific_energy_source")) + { + updateFields.push_back("specific_energy_source"); + } + } + int opacity = 0; + if (blueprint.has_path("fields/absorption_opacity/values")) + { + opacity = 1; + + // Fields required in updateOpacities + updateFields.push_back("absorption_opacity"); + updateFields.push_back("scattering_opacity"); + } + + // Update some fields, sending them through the partitioner from the blueprint + // mesh to the partitioned mesh. + std::string mainTopologyName(getMainTopology(part).name()); + sendFieldsOrig2Part(mainTopologyName, updateFields, mesh_motion == 1); + + // Now, do the updates using the partitiond data. if (mesh_motion) { - updateMeshPositions(); + // Partitioning is not necessary since if mesh coordinates were + // updated, it happened above in sendFieldsOrig2Part. + const bool dopartition = false; + updateMeshPositions(dopartition); } //setMeshVelocity(); // This updates the material properties (other than the opacities) // TODO Add something better than this to check for whether or not // new field values have been provided. - if (blueprint.has_path("fields/thermo_density/values")) + if (materials) { setMaterials(); } - if (blueprint.has_path("fields/absorption_opacity/values")) + if (opacity) { updateOpacity(); } @@ -631,7 +823,6 @@ double Teton::step(int cycle) // Set the time step information // A host code can either set these values in conduit, or can // setTimeStep() and that function will add these entries. - // cycle = options.fetch_existing("iteration/cycle").value(); double dtrad = options.fetch_existing("iteration/dtrad").value(); double timerad = options.fetch_existing("iteration/timerad").value(); double tfloor = options.fetch_existing("iteration/tfloor").value(); @@ -648,6 +839,7 @@ double Teton::step(int cycle) // Update cycle number in mesh blueprint. This is used by conduit or Visit if we dump this mesh. blueprint["state/cycle"] = cycle; + part["state/cycle"] = cycle; // If sanitizer level is provided: if (options.has_path("iteration/sanitizer/level")) @@ -668,38 +860,84 @@ double Teton::step(int cycle) // Main function in Teton to take a radiation step teton_radtr(); + // Update the radiation force (if the field is present) #if !defined(TETON_ENABLE_MINIAPP_BUILD) // Note that, if the radiation force is present, there will always be - // a z component in 2D or 3D ((r,z) or (x,y,z) coordinates). + // a z component in 2D or 3D ((r,z) or (x,y,z) coordinates). Also note + // that we make this check on the original blueprint mesh since that is + // where the host will have declared its field association. // TODO: fix when 1D is added + mMapBackFields.clear(); + bool has_rad_force = blueprint.has_path("fields/radiation_force_z"); std::string rad_force_type; + bool elementAssociation = false; if (has_rad_force) { - rad_force_type = blueprint["fields/radiation_force_z/association"].as_string(); + // The host provided radiation_force_z so we can check association. + rad_force_type = blueprint.fetch_existing("fields/radiation_force_z/association").as_string(); + elementAssociation = rad_force_type == "element"; } - if (has_rad_force && rad_force_type == "element") + + // The radiation_force_* fields are always computed so the getRadiationForceDensity() + // method can work when partitioning is enabled. + // + // Create radiation_force_* fields on part mesh if they do not exist. + // This does nothing if they already exist. + createRadiationForceDensity(part, elementAssociation); + if (elementAssociation) { updateZonalRadiationForce(); } - if (has_rad_force && rad_force_type != "element") + else { updateRadiationForce(); } - // Update the radiation energy deposited to the material - if (blueprint.has_path("fields/electron_energy_deposited/values")) - { - double *electron_energy_deposited = blueprint.fetch_existing("fields/electron_energy_deposited/values").value(); - getRadEnergyDeposited(electron_energy_deposited); - } + // Count number of zones in part topo. + const conduit::Node &part_topo = getMainTopology(part); + const int npart_zones = static_cast(conduit::blueprint::mesh::utils::topology::length(part_topo)); + + // Update the radiation energy deposited to the material. + // Always compute this field so the getRadiationDeposited() method can work when + // partitioning is enabled. + // + // Create field on the part mesh if it does not exist. + createZonalField(part, mainTopologyName, FIELD_ELECTRON_ENERGY_DEPOSITED, npart_zones); + + std::string path(field_values(FIELD_ELECTRON_ENERGY_DEPOSITED)); + double *electron_energy_deposited = part.fetch_existing(path).value(); + getRadEnergyDeposited(electron_energy_deposited, npart_zones); + mMapBackFields.push_back(FIELD_ELECTRON_ENERGY_DEPOSITED); // Update the radiation energy density - if (blueprint.has_path("fields/radiation_energy_density/values")) - { - double *radiation_energy_density = blueprint.fetch_existing("fields/radiation_energy_density/values").value(); + // Check the blueprint mesh in case the host added this field. + if (blueprint.has_path(field_values(FIELD_RADIATION_ENERGY_DENSITY))) + { + // During partitioning, FIELD_RADIATION_ENERGY_DENSITY would have been wrapped + // as an mcarray due to it being a "multigroup" field. However, the partitioner + // output would only be nzones in length. Since we're gathering data for mapback, + // and Teton expects a large contiguous buffer, make sure it is large enough. This + // should be ok since it is the partitioned mesh and we're immediately filling its + // values from Teton. + conduit::Node &red = part[field_path(FIELD_RADIATION_ENERGY_DENSITY)]; + if (doPartitioning()) + { + const int ngr = options.fetch_existing("quadrature/num_groups").to_int(); + const auto expected_elements = static_cast(npart_zones * ngr); + if (red["values"].dtype().number_of_elements() < expected_elements) + { + red["values"].set(conduit::DataType::float64(expected_elements)); + } + // If this is the first time filling out "red" then we may need to also set + // some additional fields. + red["association"] = "element"; + red["topology"] = mainTopologyName; + } + double *radiation_energy_density = red["values"].value(); teton_getradiationenergydensity(radiation_energy_density); + mMapBackFields.push_back(FIELD_RADIATION_ENERGY_DENSITY); } #endif @@ -721,6 +959,26 @@ double Teton::step(int cycle) // This also puts the recommended timestep for the next iteration in mDTrad teton_publishedits(&mDTrad); +#if !defined(TETON_ENABLE_MINIAPP_BUILD) + // Update the temperature result fields. NOTE: this had to come after teton_publishedits. + + // Create field on the part mesh if it does not exist. + createZonalField(part, mainTopologyName, FIELD_RADIATION_TEMPERATURE, npart_zones); + createZonalField(part, mainTopologyName, FIELD_MATERIAL_TEMPERATURE, npart_zones); + + double *radiation_temperature = part.fetch_existing(field_values(FIELD_RADIATION_TEMPERATURE)).value(); + getRadiationTemperature(radiation_temperature, npart_zones); + mMapBackFields.push_back(FIELD_RADIATION_TEMPERATURE); + + double *material_temperature = part.fetch_existing(field_values(FIELD_MATERIAL_TEMPERATURE)).value(); + getMaterialTemperature(material_temperature, npart_zones); + mMapBackFields.push_back(FIELD_MATERIAL_TEMPERATURE); +#endif + + // Migrate partition results to original mesh. + sendFieldsPart2Orig(mainTopologyName, mMapBackFields); + mMapBackFields.clear(); + double MatCoupTimeTotal, SweepTimeTotal, GPUSweepTimeTotal, GTATimeTotal; double RadtrTimeTotal, InitTimeTotal, FinalTimeTotal, timeNonRad = 0.0, timeOther = 0.0; @@ -745,6 +1003,40 @@ double Teton::step(int cycle) std::string dtmsg = dtmessage_ptr; options["iteration/dtcontrol/message"] = dtmsg; +#if defined(PARTITION_DEBUG) + MPI_Barrier(mCommunicator); + bool testing = false; + if (getenv("TETON_TESTING") != nullptr) + { + testing = atoi(getenv("TETON_TESTING")) > 0; + if (testing) + { + // Test the results that have been computed in the cycle, store them. + bool makeBaselines = getenv("TETON_TESTING_MAKE_BASELINES") != nullptr; + const int flags = Test_RadiationForceDensity | Test_RadiationTemperature | Test_ReconstructPsi; + conduit::Node n; + const std::string fileBase = makeTestNode(n, getDatastore(), getMeshBlueprint(), getOptions(), flags); + testing::test(n, fileBase, cycle, makeBaselines, mCommunicator); + + // Save the blueprint in a form we can look at in VisIt so we can compare baseline vs current. + int verbose = getVerbose(); + if (verbose >= 2) + { + std::string name = makeBaselines ? "baseline" : "current"; + add_mcarray_fields(blueprint); + conduit::relay::mpi::io::blueprint::save_mesh(blueprint, name, "hdf5", mCommunicator); + remove_mcarray_fields(blueprint); + + if (doPartitioning()) + { + std::string namep = makeBaselines ? "baseline_part" : "current_part"; + conduit::relay::mpi::io::blueprint::save_mesh(part, namep, "hdf5", mCommunicator); + } + } + } + } +#endif + return mDTrad; } @@ -818,12 +1110,16 @@ void Teton::computeGenericSurfaceFluxTally() // The SURFACE information is in blueprint. // The other details of the tally (shape, groups, frame, etc.) live in options. conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); conduit::Node &options = getOptions(); - if (blueprint.has_path("teton/surface_edits")) + // NOTE: teton/surface_edits is created in TetonBlueprint::ProcessSurfaceEdits and + // all ranks will contain teton/surface_edits in the partitioned mesh, though + // some of the arrays may be empty if the rank had no faces. + if (part.has_path("teton/surface_edits")) { - conduit::Node &surface_edit_options_all = options["surface_edits"]; - conduit::Node &surface_edit_blueprint_all = blueprint["teton/surface_edits"]; + conduit::Node &surface_edit_options_all = options.fetch_existing("surface_edits"); + conduit::Node &surface_edit_blueprint_all = part.fetch_existing("teton/surface_edits"); conduit::NodeConstIterator surface_edit_blueprint_it = surface_edit_blueprint_all.children(); while (surface_edit_blueprint_it.has_next()) { @@ -831,9 +1127,15 @@ void Teton::computeGenericSurfaceFluxTally() std::string surface_edit_name = surface_info.name(); const conduit::Node &surface_edit_option = surface_edit_options_all[surface_edit_name]; // options for tallying - const int *corners_ptr = surface_info.fetch_existing("corners").as_int_ptr(); - const int *local_zone_faces_ptr = surface_info.fetch_existing("local_zone_faces").as_int_ptr(); - const int num_corner_faces = surface_info["corners"].dtype().number_of_elements(); + int tmp = 0; + const conduit::Node &corners = surface_info.fetch_existing("corners"); + const int num_corner_faces = corners.dtype().number_of_elements(); + const int *corners_ptr = (num_corner_faces > 0) ? corners.as_int_ptr() : &tmp; + + const conduit::Node &local_zone_faces = surface_info.fetch_existing("local_zone_faces"); + const int *local_zone_faces_ptr = (local_zone_faces.dtype().number_of_elements() > 0) + ? local_zone_faces.as_int_ptr() + : &tmp; const bool transform_to_lab_frame = surface_edit_option["transform_to_lab_frame"].as_int(); const bool apply_time_shift = surface_edit_option["apply_time_shift"].as_int(); @@ -859,6 +1161,9 @@ void Teton::computeGenericSurfaceFluxTally() const double scale_tally = surface_edit_option.fetch_existing("scale_tally").as_double(); const bool calculate_error_metrics = surface_edit_option.fetch_existing("calculate_error_metrics").as_int(); + // The tally result arrays were passed in from the host code as fields on the + // surface mesh, though it is not really a field for the surface mesh. Each + // field is a non-spatial results array that is the same size on all ranks. double *tally = blueprint["fields/" + surface_edit_name + "_tallies/values"].as_double_ptr(); double *tally_incident = nullptr; double *error_est_shift = nullptr; @@ -874,6 +1179,7 @@ void Teton::computeGenericSurfaceFluxTally() .as_double_ptr(); } + // NOTE: This function performs global reductions on the output fields. teton_surfaceedit(&num_corner_faces, &transform_to_lab_frame, corners_ptr, @@ -1092,7 +1398,6 @@ void Teton::setSourceProfiles() // all interior (volumetric?) sources: const conduit::Node &profiles_node = options["sources/profiles"]; conduit::NodeConstIterator interior_sources_it = options["sources/interior_sources"].children(); - int isrc = 0; while (interior_sources_it.has_next()) { const conduit::Node &src_node = interior_sources_it.next(); @@ -1105,8 +1410,29 @@ void Teton::setSourceProfiles() // const double* location = src_node["location"].as_double_ptr(); // TODO convert coordinate to zone index int source_rank = src_node.fetch_existing("rank").to_int(); // Rank that contains the point source - int teton_zone_index = (mRank == source_rank) ? src_node.fetch_existing("zone_index").to_int() : -1; - // int teton_zone_index = (mRank == 3) ? (isrc == 0 ? 4 : 8) : -1; + int teton_zone_index = src_node.fetch_existing("zone_index").to_int(); + int teton_part_zone_index = -1; + + if (doPartitioning()) + { + // The original rank and zone index were passed in for the source. The + // zone index identifies the zone that contains the source point. If we + // repartitioned, this could be a different rank and zone. We need to + // let the new owner rank return the new zone index and have everyone + // else return -1 for the zone index since they do not own it. + // + // NOTE: Any new sources that get implemented here would need to map + // their zone ids too to be compatible with partitioning. + int origDomZone[2] = {source_rank, teton_zone_index}; + int partDomZone[2] = {-1, -1}; + zoneLookupOrig2Part(origDomZone, partDomZone); + teton_part_zone_index = partDomZone[1]; + } + else + { + teton_part_zone_index = (mRank == source_rank) ? teton_zone_index : -1; + } + double multiplier = 1.0; if (src_node.has_path("multiplier")) multiplier = src_node.fetch_existing("multiplier").to_double(); @@ -1115,7 +1441,7 @@ void Teton::setSourceProfiles() { std::string tally_file_name = profile_node["filename"].as_string(); std::string tally_name = profile_node["tallyname"].as_string(); - mSourceManager.AddPointSourceFromTally(teton_zone_index, tally_file_name, tally_name, multiplier); + mSourceManager.AddPointSourceFromTally(teton_part_zone_index, tally_file_name, tally_name, multiplier); } // else if (profile_type == "isotropic") // { // TODO generic isotropic group-dependent @@ -1131,23 +1457,59 @@ void Teton::setSourceProfiles() std::cerr << "Unsupported source spatial shape " << spatial_shape << std::endl; exit(1); } - isrc++; + } +} + +void Teton::zoneLookupOrig2Part(int originalDomZone[2], int partDomZone[2]) const +{ + if (doPartitioning()) + { + const conduit::Node &part = getMeshBlueprintPart(); + const std::string mainTopologyName(getMainTopology(part).name()); + std::string vkey = "fields/" + mainTopologyName + "_original_element_ids/values"; + const conduit::Node &vnode = part.fetch_existing(vkey); + const auto orig_domains = vnode.fetch_existing("domains").as_int_accessor(); + const auto orig_zones = vnode.fetch_existing("ids").as_int_accessor(); + const conduit::index_t n = orig_domains.number_of_elements(); + + // Indicate not found. + partDomZone[0] = -1; + partDomZone[1] = -1; + + const int originalZone0 = originalDomZone[1] - 1; + for (conduit::index_t i = 0; i < n; i++) + { + // Compare orig_zones against zero-origin zone number since the array + // stores zero-origin zone ids. + if (orig_domains[i] == originalDomZone[0] && orig_zones[i] == originalZone0) + { + // This rank contains the zone we're looking for. Return the new rank, zone index. + partDomZone[0] = mRank; + partDomZone[1] = static_cast(i) + 1; // 1-origin zone + break; + } + } + } + else + { + // Return the inputs as partitioning did not occur. + partDomZone[0] = originalDomZone[0]; + partDomZone[1] = originalDomZone[1]; } } void Teton::setMeshSizeAndPositions() { conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); int ndim = options.fetch_existing("size/ndim").value(); int nzones = options.fetch_existing("size/nzones").value(); if (ndim > 1) { - double *zone_verts_ptr = blueprint.fetch_existing("arrays/zone_verts").value(); - //int *ncorners_ptr = blueprint.fetch_existing("fields/ncorners").value(); - int *ncorners_ptr = blueprint.fetch_existing("arrays/zone_to_ncorners").value(); + double *zone_verts_ptr = part.fetch_existing("arrays/zone_verts").value(); + int *ncorners_ptr = part.fetch_existing("arrays/zone_to_ncorners").value(); int ndim = options.fetch_existing("size/ndim").value(); int maxCorner = options.fetch_existing("size/maxCorner").value(); @@ -1171,8 +1533,8 @@ void Teton::setMeshSizeAndPositions() } else { - int nvertices = blueprint["coordsets/coords/values/x"].dtype().number_of_elements(); - double *vertex_coords = blueprint["coordsets/coords/values/x"].value(); + int nvertices = part["coordsets/coords/values/x"].dtype().number_of_elements(); + double *vertex_coords = part["coordsets/coords/values/x"].value(); int nzones = nvertices - 1; std::vector zoneCoordinates(2); for (int zone = 0; zone < nzones; ++zone) @@ -1184,19 +1546,17 @@ void Teton::setMeshSizeAndPositions() teton_setnodeposition(&zoneID, &zoneCoordinates[0]); } } - - return; } void Teton::setMeshVelocity() { conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); int nzones = options.fetch_existing("size/nzones").value(); // TODO: change this to conform to blueprint standard - double *velocities_ptr = blueprint.fetch_existing("fields/velocity_at_corners").as_double_ptr(); - int *ncorners_ptr = blueprint.fetch_existing("arrays/zone_to_ncorners").value(); + double *velocities_ptr = part.fetch_existing("fields/velocity_at_corners").as_double_ptr(); + int *ncorners_ptr = part.fetch_existing("arrays/zone_to_ncorners").value(); int ndim = options.fetch_existing("size/ndim").value(); int maxCorner = options.fetch_existing("size/maxCorner").value(); @@ -1224,23 +1584,23 @@ void Teton::setMeshVelocity() void Teton::setCommunication() { conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); int nsfaces; int *shared_faces_ptr = nullptr; - if (blueprint.has_path("shared_boundaries/nsfaces")) + if (part.has_path("shared_boundaries/nsfaces")) { - nsfaces = blueprint.fetch_existing("shared_boundaries/nsfaces").to_int(); + nsfaces = part.fetch_existing("shared_boundaries/nsfaces").to_int(); if (nsfaces > 0) { - shared_faces_ptr = blueprint.fetch_existing("shared_boundaries/shared_faces").value(); + shared_faces_ptr = part.fetch_existing("shared_boundaries/shared_faces").value(); } } else // if (options.has_path("shared_boundaries/nsfaces")) { // For backward compatbility - nsfaces = options.fetch_existing("shared_boundaries/nsfaces").value(); + nsfaces = options.fetch_existing("shared_boundaries/nsfaces").to_int(); if (nsfaces > 0) { shared_faces_ptr = options.fetch_existing("shared_boundaries/shared_faces").value(); @@ -1279,11 +1639,11 @@ void Teton::setMeshConnectivity() CALI_CXX_MARK_FUNCTION; conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); int nzones = options.fetch_existing("size/nzones").value(); - std::string coord_type = blueprint.fetch_existing("coordsets/coords/type").as_string(); + std::string coord_type = part.fetch_existing("coordsets/coords/type").as_string(); if (coord_type == "rectilinear") { // Teton expects two boundary conditions, one for zone1D == 1 and one for zoneID == nzones @@ -1299,7 +1659,7 @@ void Teton::setMeshConnectivity() else { int connect_off_set = 0; - int *connectivity_ptr = blueprint.fetch_existing("teton/arrays/corner_connectivity").value(); + int *connectivity_ptr = part.fetch_existing("teton/arrays/corner_connectivity").value(); for (int zone = 0; zone < nzones; ++zone) { int zoneID = connectivity_ptr[connect_off_set]; @@ -1357,7 +1717,7 @@ void Teton::setMeshConnectivity() &FaceToBCList[0]); } - blueprint.remove("teton/arrays/corner_connectivity"); + part.remove("teton/arrays/corner_connectivity"); } } @@ -1365,20 +1725,20 @@ void Teton::setMaterials() { conduit::Node &datastore = getDatastore(); conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); int nzones = options.fetch_existing("size/nzones").value(); - double *density_ptr = blueprint.fetch_existing("fields/thermo_density/values").value(); - double *cv_ptr = blueprint.fetch_existing("fields/electron_specific_heat/values").value(); - double *tez_ptr = blueprint.fetch_existing("fields/electron_temperature/values").value(); - double *trz_ptr = blueprint.fetch_existing("fields/radiation_temperature/values").value(); - double *nez_ptr = blueprint.fetch_existing("fields/electron_number_density/values").value(); + double *density_ptr = part.fetch_existing("fields/thermo_density/values").value(); + double *cv_ptr = part.fetch_existing("fields/electron_specific_heat/values").value(); + double *tez_ptr = part.fetch_existing("fields/electron_temperature/values").value(); + double *trz_ptr = part.fetch_existing("fields/radiation_temperature/values").value(); + double *nez_ptr = part.fetch_existing("fields/electron_number_density/values").value(); // Really the effective electron specific energy source. - if (blueprint.has_path("fields/specific_energy_source")) + if (part.has_path("fields/specific_energy_source")) { - double *matSource = blueprint.fetch_existing("fields/specific_energy_source/values").value(); + double *matSource = part.fetch_existing("fields/specific_energy_source/values").value(); teton_setmaterialsource(matSource); } @@ -1408,14 +1768,11 @@ void Teton::updateOpacity() { #if !defined(TETON_ENABLE_MINIAPP_BUILD) conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); - int ngr = options.fetch_existing("quadrature/num_groups").to_int(); //coerce from unsigned int or size_t - int ig; - std::vector siga_loc; - std::vector sigs_loc; - siga_loc.resize(ngr); - sigs_loc.resize(ngr); + // NOTE: nzones in this case is on the part mesh. + conduit::index_t nzones = options.fetch_existing("size/nzones").to_index_t(); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); bool useInternalSigmaS = false; if (options.has_path("compton/use_internal_sigma_s")) @@ -1427,26 +1784,45 @@ void Teton::updateOpacity() // zero out opacities teton_initopacity(); - int nzones = options.fetch_existing("size/nzones").value(); - double *absorption_opacity_ptr = blueprint.fetch_existing("fields/absorption_opacity/values").value(); - double *scattering_opacity_ptr = nullptr; + // These fields would have been turned into mcarrays if partitioning. We + // get them as though they might be mcarrays and then use NDAccessor to + // get the (zone,ig) element data. Note though that we are getting them + // from the part mesh and even if the mcarrays do not exist on the blueprint + // mesh at present, they will be on the part mesh. + auto n_absorption_opacity = const_cast(fetch_mcarray(part, "absorption_opacity")); + std::vector dims{{"zone", nzones}, {"group", ngroups}}; + utilities::NDAccessor absorption_opacity(n_absorption_opacity["values"], dims, doInterleave("absorption_opacity")); + std::vector siga_loc(ngroups, 0), sigs_loc(ngroups, 0); if (useTableSigmaS) { - scattering_opacity_ptr = blueprint.fetch_existing("fields/scattering_opacity/values").value(); + auto n_scattering_opacity = const_cast(fetch_mcarray(part, "scattering_opacity")); + utilities::NDAccessor scattering_opacity(n_scattering_opacity["values"], + dims, + doInterleave("scattering_opacity")); + for (conduit::index_t zone = 0; zone < nzones; zone++) + { + for (conduit::index_t ig = 0; ig < ngroups; ig++) + { + std::vector idx{zone, ig}; + siga_loc[ig] = absorption_opacity(idx); + sigs_loc[ig] = scattering_opacity(idx); + } + int zoneID = zone + 1; + teton_setopacity(&zoneID, &siga_loc[0], &sigs_loc[0], &useTableSigmaS); + } } - - // Initialize opacities to handle multi-material zones - int offset = 0; - for (int zone = 0; zone < nzones; zone++) + else { - for (ig = 0; ig < ngr; ig++) + for (conduit::index_t zone = 0; zone < nzones; zone++) { - siga_loc[ig] = absorption_opacity_ptr[offset]; - sigs_loc[ig] = useTableSigmaS ? scattering_opacity_ptr[offset] : 0.; - offset += 1; + for (conduit::index_t ig = 0; ig < ngroups; ig++) + { + siga_loc[ig] = absorption_opacity(std::vector{zone, ig}); + sigs_loc[ig] = 0.; + } + int zoneID = zone + 1; + teton_setopacity(&zoneID, &siga_loc[0], &sigs_loc[0], &useTableSigmaS); } - int zoneID = zone + 1; - teton_setopacity(&zoneID, &siga_loc[0], &sigs_loc[0], &useTableSigmaS); } if (not useTableSigmaS) @@ -1592,13 +1968,6 @@ void Teton::checkpointFinished() node.reset(); } -double Teton::getMaterialTemperature(int zone) -{ - double matTemp; - teton_getmaterialtemperature(&zone, &matTemp); - return matTemp; -} - conduit::Node &Teton::getDatastore() { return *teton_get_datastore_cptr(); @@ -1609,38 +1978,75 @@ const conduit::Node &Teton::getDatastore() const return *teton_get_datastore_cptr(); } -double Teton::getRadiationTemperature(int zone) +double Teton::getRadiationTemperature(int zone) const +{ + // This used to call teton_getradiationtemperature directly but we get the + // results from the field to support partitioned meshes. + + const conduit::Node &blueprint = getMeshBlueprint(); + auto acc = blueprint.fetch_existing(field_values(FIELD_RADIATION_TEMPERATURE)).as_double_accessor(); + int zone0 = zone - 1; + return acc[zone0]; +} + +double Teton::getMaterialTemperature(int zone) const { - double radTemp; - teton_getradiationtemperature(&zone, &radTemp); - return radTemp; + // This used to call teton_getmaterialtemperature directly but we get the + // results from the field to support partitioned meshes. + + const conduit::Node &blueprint = getMeshBlueprint(); + auto acc = blueprint.fetch_existing(field_values(FIELD_MATERIAL_TEMPERATURE)).as_double_accessor(); + int zone0 = zone - 1; + return acc[zone0]; } -double Teton::getRadiationDeposited(int zone) +double Teton::getRadiationDeposited(int zone) const { - double eDep, tRad; - teton_getradiationdeposited(&zone, &eDep, &tRad); - return eDep; + // This used to call teton_getradiationdeposited directly but we get the + // results from the field to support partitioned meshes. + + const conduit::Node &blueprint = getMeshBlueprint(); + auto acc = blueprint.fetch_existing(field_values(FIELD_ELECTRON_ENERGY_DEPOSITED)).as_double_accessor(); + int zone0 = zone - 1; + return acc[zone0]; } void Teton::setTimeStep(int cycle, double dtrad, double timerad) { conduit::Node &options = getOptions(); + conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); options["iteration/cycle"] = cycle; options["iteration/dtrad"] = dtrad; options["iteration/timerad"] = timerad; // Used by conduit or Visit if mesh is dumped for viz purposes. - conduit::Node &blueprint = getMeshBlueprint(); - blueprint["state/cycle"] = cycle; + part["state/cycle"] = cycle; } void Teton::updateMeshPositions() { + // This method is public it gets called by client codes. We need to ensure + // that the part mesh gets its coordinates updated from the blueprint mesh. + const bool doPartition = true; + updateMeshPositions(doPartition); +} + +void Teton::updateMeshPositions(bool doPartition) +{ + CALI_CXX_MARK_FUNCTION; + conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); + + // If we're partitioning, update the coordinates. + if (doPartition) + { + std::string mainTopologyName(getMainTopology(part).name()); + sendFieldsOrig2Part(mainTopologyName, std::vector{}, true); + } int nzones = mZoneToNCorners.size(); int ndim = options.fetch_existing("size/ndim").value(); @@ -1650,7 +2056,12 @@ void Teton::updateMeshPositions() // MARBL is providing a zone_verts array directly, at the moment. This array is a listing of the zone vertices, // in the same order as the corners in each zone. Use that if it is present, otherwise generate it from the // blueprint coords. - if (!blueprint.has_path("arrays/zone_verts")) + // + // NOTE: If doing partitioning then there will be no arrays/zone_verts since + // it would have been supplied in the "blueprint" node instead of the + // "blueprint_partition" node. Thus, we'll make the node using data in + // the part mesh. + if (!part.has_path("arrays/zone_verts")) { int corner_counter = 0; int zoneVertsSize = 0; @@ -1662,32 +2073,32 @@ void Teton::updateMeshPositions() if (ndim == 1) { - m_x = blueprint.fetch_existing("coordsets/coords/values/x").value(); + m_x = part.fetch_existing("coordsets/coords/values/x").value(); } else if (ndim == 2) { - if (blueprint.has_path("coordsets/coords/values/r")) + if (part.has_path("coordsets/coords/values/r")) { - m_r = blueprint.fetch_existing("coordsets/coords/values/r").value(); + m_r = part.fetch_existing("coordsets/coords/values/r").value(); } else { // assuming zr ordering for marbl/ares as fallback. EVERYONE should just specify r and z directly. - m_r = blueprint.fetch_existing("coordsets/coords/values/y").value(); + m_r = part.fetch_existing("coordsets/coords/values/y").value(); } - if (blueprint.has_path("coordsets/coords/values/z")) + if (part.has_path("coordsets/coords/values/z")) { - m_z = blueprint.fetch_existing("coordsets/coords/values/z").value(); + m_z = part.fetch_existing("coordsets/coords/values/z").value(); } else { // assuming zr ordering for marbl/ares as fallback. EVERYONE should just specify r and z directly. For now, issue a warning. - m_z = blueprint.fetch_existing("coordsets/coords/values/x").value(); + m_z = part.fetch_existing("coordsets/coords/values/x").value(); } } else if (ndim == 3) { - m_x = blueprint.fetch_existing("coordsets/coords/values/x").value(); - m_y = blueprint.fetch_existing("coordsets/coords/values/y").value(); - m_z = blueprint.fetch_existing("coordsets/coords/values/z").value(); + m_x = part.fetch_existing("coordsets/coords/values/x").value(); + m_y = part.fetch_existing("coordsets/coords/values/y").value(); + m_z = part.fetch_existing("coordsets/coords/values/z").value(); } else { @@ -1733,7 +2144,7 @@ void Teton::updateMeshPositions() corner_counter += 1; } } - blueprint["arrays/zone_verts"].set(zoneVerts.data(), zoneVerts.size()); + part["arrays/zone_verts"].set(zoneVerts.data(), zoneVerts.size()); } setMeshSizeAndPositions(); @@ -1742,233 +2153,336 @@ void Teton::updateMeshPositions() teton_getvolume(); // We're done updating the node positions, we shouldn't need zone_verts anymore. - blueprint.remove("arrays/zone_verts"); + part.remove("arrays/zone_verts"); return; } -// NOTE: the Vectors RadiationForceXTotal, ..., must -// already be sized to the number of mesh vertices -void Teton::getRadiationForceDensity1D(double *RadiationForceDensityX) +const std::vector &Teton::radiationForceDensityFields() const { - // Compute the radiation force internally in Teton - // for each zone and corner - teton_setradiationforce(); + return mRadiationForceDensityFields; +} - conduit::Node &options = getOptions(); - int nzones = options.fetch_existing("size/nzones").value(); - int nverts = nzones + 1; - std::vector RadiationForce(2); - std::vector CornerVolumes(2); - std::vector CornerVolumeSumsAtVertex(nverts); +void Teton::initializeRadiationForceDensityFieldNames() +{ + // Get the number of dimensions from the blueprint mesh since it might not be + // in the options yet. + const conduit::Node &blueprint = getMeshBlueprint(); + std::string csname(getMainTopology(blueprint).fetch_existing("coordset").as_string()); + const conduit::Node &coordset = blueprint.fetch_existing("coordsets/" + csname); + const int ndim = static_cast(conduit::blueprint::mesh::coordset::dims(coordset)); - for (int v = 0; v < nverts; ++v) + mRadiationForceDensityFields.clear(); + mRadiationForceDensityFields.reserve(ndim); + if (ndim == 1) { - CornerVolumeSumsAtVertex[v] = 0.0; - RadiationForceDensityX[v] = 0.0; + mRadiationForceDensityFields.emplace_back(FIELD_RADIATION_FORCE_X); } - - for (int zone = 0; zone < nzones; ++zone) + else if (ndim == 2) { - // Get the radiation force and volume on each corner of each zone - int zoneID = zone + 1; - teton_getradiationforce(&zoneID, &RadiationForce[0]); - teton_getcornervolumes(&zoneID, &CornerVolumes[0]); - - int v1 = zone; - int v2 = zone + 1; - RadiationForceDensityX[v1] += RadiationForce[0]; - RadiationForceDensityX[v2] += RadiationForce[1]; - CornerVolumeSumsAtVertex[v1] += CornerVolumes[0]; - CornerVolumeSumsAtVertex[v2] += CornerVolumes[1]; + mRadiationForceDensityFields.emplace_back(FIELD_RADIATION_FORCE_Z); + mRadiationForceDensityFields.emplace_back(FIELD_RADIATION_FORCE_R); } - - for (int v = 0; v < nzones + 1; ++v) + else if (ndim == 3) { - RadiationForceDensityX[v] /= CornerVolumeSumsAtVertex[v]; + mRadiationForceDensityFields.emplace_back(FIELD_RADIATION_FORCE_X); + mRadiationForceDensityFields.emplace_back(FIELD_RADIATION_FORCE_Y); + mRadiationForceDensityFields.emplace_back(FIELD_RADIATION_FORCE_Z); } } -// NOTE: the Vectors RadiationForceXTotal, ..., must -// already be sized to the number of mesh vertices -void Teton::getRadiationForceDensity(double *RadiationForceDensityX, - double *RadiationForceDensityY, - double *RadiationForceDensityZ) +std::vector Teton::radiationForceDensity(conduit::Node &root) const { - conduit::Node &options = getOptions(); - int ndim = options.fetch_existing("size/ndim").value(); - if (ndim == 1) - { - getRadiationForceDensity1D(RadiationForceDensityX); - return; - } - - // Compute the radiation force internally in Teton - // for each zone and corner - teton_setradiationforce(); - - int maxCorner = options.fetch_existing("size/maxCorner").value(); - int nzones = options.fetch_existing("size/nzones").value(); - int nverts = options.fetch_existing("size/nverts").value(); - std::vector RadiationForce(ndim * maxCorner); - std::vector CornerVolumes(maxCorner); - std::vector CornerVolumeSumsAtVertex(nverts); - int corner_counter = 0; - - for (int v = 0; v < nverts; ++v) + std::vector ptrs; + const auto names = radiationForceDensityFields(); + for (const auto &name : names) { - CornerVolumeSumsAtVertex[v] = 0.0; - RadiationForceDensityX[v] = 0.0; - RadiationForceDensityY[v] = 0.0; - if (ndim == 3) - RadiationForceDensityZ[v] = 0.0; + double *d = root.fetch_existing(field_values(name)).value(); + ptrs.push_back(d); } + return ptrs; +} - for (int zone = 0; zone < nzones; ++zone) - { - // Get the radiation force and volume on each corner of each zone - int zoneID = zone + 1; - teton_getradiationforce(&zoneID, &RadiationForce[0]); - teton_getcornervolumes(&zoneID, &CornerVolumes[0]); +void Teton::createRadiationForceDensity(conduit::Node &root, bool elementAssociation) +{ + // NOTE: If we actually create these force fields then it could cause + // updateRadiationForce() to be called when it otherwise might + // not have been, as in the case where there were no force fields. + + // Make sure the radiation_force_ paths exist on the blueprint mesh since + // they will be queried in getRadiationForceDensity(). + std::string mainTopologyName(getMainTopology(root).name()); + const auto names = radiationForceDensityFields(); + conduit::index_t nnodes = 0, nzones = 0; + for (const auto &name : names) + { + // Determine number of nodes. + if (nnodes == 0) + { + const conduit::Node &topo = root.fetch_existing("topologies/" + mainTopologyName); + std::string csname(topo.fetch_existing("coordset").as_string()); + const conduit::Node &coordset = root.fetch_existing("coordsets/" + csname); + nnodes = conduit::blueprint::mesh::coordset::length(coordset); + nzones = conduit::blueprint::mesh::topology::length(topo); + } - // Average the radiation force around vertices - int ncorners = mZoneToNCorners[zone]; - for (int c = 0; c < ncorners; ++c) + const auto path = field_path(name); + if (!root.has_path(path) && nnodes > 0 && nzones > 0) { - int cornerID = mZoneToCorners[corner_counter]; - int vertexID = mCornerToVertex[cornerID]; - corner_counter += 1; - RadiationForceDensityX[vertexID] += RadiationForce[c * ndim + 0]; - if (ndim > 1) - RadiationForceDensityY[vertexID] += RadiationForce[c * ndim + 1]; - if (ndim == 3) - RadiationForceDensityZ[vertexID] += RadiationForce[c * ndim + 2]; - CornerVolumeSumsAtVertex[vertexID] += CornerVolumes[c]; + conduit::index_t nvalues = elementAssociation ? nzones : nnodes; + conduit::Node &f = root[path]; + f["association"] = elementAssociation ? "element" : "vertex"; + f["topology"] = mainTopologyName; + f["values"].set(conduit::DataType::float64(nvalues)); + memset(f["values"].as_float64_ptr(), 0, nvalues * sizeof(conduit::float64)); } } - for (int v = 0; v < nverts; ++v) + // Create a vertex field for the corner volume sums. These get used in + // getRadiationForceDensity. + if (!elementAssociation && !root.has_path(field_path(FIELD_CORNER_VOLUME_SUMS)) && nnodes > 0) { - RadiationForceDensityX[v] /= CornerVolumeSumsAtVertex[v]; - if (ndim > 1) - RadiationForceDensityY[v] /= CornerVolumeSumsAtVertex[v]; - if (ndim == 3) - RadiationForceDensityZ[v] /= CornerVolumeSumsAtVertex[v]; + conduit::Node &f = root[field_path(FIELD_CORNER_VOLUME_SUMS)]; + f["association"] = "vertex"; + f["topology"] = mainTopologyName; + f["values"].set(conduit::DataType::float64(nnodes)); + memset(f["values"].as_float64_ptr(), 0, nnodes * sizeof(conduit::float64)); } } -void Teton::updateRadiationForce() +void Teton::SumSharedNodalValues(conduit::Node &root, double *nodal_field) { - // Compute the radiation force internally in Teton - // for each zone and corner - teton_setradiationforce(); - - conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); - int ndim = options.fetch_existing("size/ndim").value(); - int maxCorner = options.fetch_existing("size/maxCorner").value(); - int nzones = options.fetch_existing("size/nzones").value(); - int nverts = options.fetch_existing("size/nverts").value(); - std::vector RadiationForce(ndim * maxCorner); - int corner_counter = 0; + const conduit::Node &options = getOptions(); - double *radiation_force_x = nullptr; - double *radiation_force_y = nullptr; - double *radiation_force_z = nullptr; - if (ndim == 2) - { - radiation_force_x = blueprint.fetch_existing("fields/radiation_force_r/values").as_double_ptr(); - radiation_force_y = blueprint.fetch_existing("fields/radiation_force_z/values").as_double_ptr(); - } - else if (ndim == 3) - { - radiation_force_x = blueprint.fetch_existing("fields/radiation_force_x/values").as_double_ptr(); - radiation_force_y = blueprint.fetch_existing("fields/radiation_force_y/values").as_double_ptr(); - radiation_force_z = blueprint.fetch_existing("fields/radiation_force_z/values").as_double_ptr(); - } - // TODO: finish this case - else + if (root.has_path("adjsets")) { - std::cerr << "1D updateRadiationForce not yet implemented! Teton is exiting . . ." << std::endl; - exit(1); - } + int ndim = options.fetch_existing("size/ndim").value(); + std::string adjset_name = ndim > 1 ? "adjsets/main_adjset" : "adjsets/mesh"; + const conduit::Node &vertex_adjset = root[adjset_name]; + conduit::NodeConstIterator groups_it = vertex_adjset["groups"].children(); + const int num_vertex_groups = vertex_adjset["groups"].number_of_children(); + const int num_vertices = root.fetch_existing("coordsets/coords/values/x").dtype().number_of_elements(); - for (int v = 0; v < nverts; ++v) - { - radiation_force_x[v] = 0.0; - radiation_force_y[v] = 0.0; - if (ndim == 3) - radiation_force_z[v] = 0.0; + while (groups_it.has_next()) + { + const conduit::Node &vertex_group = groups_it.next(); + const auto group_neighbors = vertex_group.fetch_existing("neighbors").as_int_accessor(); + const auto group_vertices = vertex_group.fetch_existing("values").as_int_accessor(); + const int num_neighbors = static_cast(group_neighbors.number_of_elements()); + const int num_vertices = static_cast(group_vertices.number_of_elements()); + + std::vector requests_vec(2 * num_neighbors); + MPI_Request *send_requests = requests_vec.data(); + MPI_Request *recv_requests = requests_vec.data() + num_neighbors; + std::vector statuses_vec(num_neighbors); + MPI_Status *statuses = statuses_vec.data(); + + std::vector> fields_to_send(num_neighbors); + std::vector> fields_to_recv(num_neighbors); + for (int vn = 0; vn < num_neighbors; ++vn) + { + fields_to_send[vn].resize(num_vertices); + fields_to_recv[vn].resize(num_vertices); + } + + for (int vn = 0; vn < num_neighbors; ++vn) + { + const int nbr_rank = group_neighbors[vn]; + + for (int j = 0; j < num_vertices; ++j) + { + const int vid = group_vertices[j]; + fields_to_send[vn][j] = nodal_field[vid]; + } + + int tag = 0; + MPI_Isend(&fields_to_send[vn][0], + num_vertices, + MPI_DOUBLE, + nbr_rank, + tag, + mCommunicator, + &send_requests[vn]); + MPI_Irecv(&fields_to_recv[vn][0], + num_vertices, + MPI_DOUBLE, + nbr_rank, + tag, + mCommunicator, + &recv_requests[vn]); + } + MPI_Waitall(num_neighbors, send_requests, statuses); + MPI_Waitall(num_neighbors, recv_requests, statuses); + + // Add neighboring contributions to nodal field + for (int vn = 0; vn < num_neighbors; ++vn) + { + for (int j = 0; j < num_vertices; ++j) + { + const int vid = group_vertices[j]; + nodal_field[vid] += fields_to_recv[vn][j]; + } + } + } } +} - for (int zone = 0; zone < nzones; ++zone) - { - // Get the radiation force and volume on each corner of each zone - int zoneID = zone + 1; - teton_getradiationforce(&zoneID, &RadiationForce[0]); +// NOTE: the Vectors RadiationForceXTotal, ..., must +// already be sized to the number of mesh vertices +void Teton::getRadiationForceDensity1D(double *RadiationForceDensityX) +{ + getRadiationForceDensity(RadiationForceDensityX, nullptr, nullptr); +} - // Average the radiation force around vertices - int ncorners = mZoneToNCorners[zone]; - for (int c = 0; c < ncorners; ++c) +// NOTE: the Vectors RadiationForceXTotal, ..., must +// already be sized to the number of mesh vertices +void Teton::getRadiationForceDensity(double *RadiationForceDensityX, + double *RadiationForceDensityY, + double *RadiationForceDensityZ) +{ + // The data arrays we're copying from in the blueprint fields were updated + // in updateRadiationForce() during step(). + const conduit::Node &blueprint = getMeshBlueprint(); + const auto fieldNames = radiationForceDensityFields(); + double *dest[] = {RadiationForceDensityX, RadiationForceDensityY, RadiationForceDensityZ}; + conduit::index_t n{}; + for (size_t c = 0; c < 3; c++) + { + if (c < fieldNames.size()) + { + const conduit::Node &n_cvs = blueprint.fetch_existing(field_values(FIELD_CORNER_VOLUME_SUMS)); + const conduit::Node &n_comp = blueprint.fetch_existing(field_values(fieldNames[c])); + const auto cvs = n_cvs.as_double_accessor(); + const auto acc = n_comp.as_double_accessor(); + n = acc.number_of_elements(); + for (conduit::index_t i = 0; i < n; i++) + dest[c][i] = acc[i] / cvs[i]; + } + else if (dest[c] != nullptr) { - int cornerID = mZoneToCorners[corner_counter]; - int vertexID = mCornerToVertex[cornerID]; - corner_counter += 1; - radiation_force_x[vertexID] += RadiationForce[c * ndim + 0]; - radiation_force_y[vertexID] += RadiationForce[c * ndim + 1]; - if (ndim == 3) - radiation_force_z[vertexID] += RadiationForce[c * ndim + 2]; + // Zero out this component. Relies on previous iteration setting n. + memset(dest[c], 0, n * sizeof(double)); } } } -void Teton::updateZonalRadiationForce() +void Teton::updateRadiationForce() { + CALI_CXX_MARK_FUNCTION; + // Compute the radiation force internally in Teton // for each zone and corner teton_setradiationforce(); conduit::Node &options = getOptions(); - conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); int ndim = options.fetch_existing("size/ndim").value(); int maxCorner = options.fetch_existing("size/maxCorner").value(); + maxCorner = std::max(maxCorner, 2); int nzones = options.fetch_existing("size/nzones").value(); - std::vector RadiationForce(ndim * maxCorner); + int nverts = options.fetch_existing("size/nverts").value(); + std::vector RadiationForce(ndim * maxCorner, 0.); + std::vector CornerVolumes(maxCorner, 0.); + int corner_counter = 0; - double *radiation_force_x = nullptr; - double *radiation_force_y = nullptr; - double *radiation_force_z = nullptr; - if (ndim == 2) - { - radiation_force_x = blueprint.fetch_existing("fields/radiation_force_r/values").as_double_ptr(); - radiation_force_y = blueprint.fetch_existing("fields/radiation_force_z/values").as_double_ptr(); - } - else if (ndim == 3) - { - radiation_force_x = blueprint.fetch_existing("fields/radiation_force_x/values").as_double_ptr(); - radiation_force_y = blueprint.fetch_existing("fields/radiation_force_y/values").as_double_ptr(); - radiation_force_z = blueprint.fetch_existing("fields/radiation_force_z/values").as_double_ptr(); - } - // TODO: finish this case - else - { - std::cerr << "1D updateZonalRadiationForce not yet implemented! Teton is exiting . . ." << std::endl; - exit(1); - } + const auto fieldNames = radiationForceDensityFields(); + auto radiationForce = radiationForceDensity(part); + conduit::Node &n_cvs = part.fetch_existing(field_values(FIELD_CORNER_VOLUME_SUMS)); + const conduit::index_t nvalues = n_cvs.dtype().number_of_elements(); - for (int zone = 0; zone < nzones; ++zone) + // We need to map these fields back to the blueprint mesh because they are + // queried as results. + for (const auto &f : fieldNames) + mMapBackFields.push_back(f); + mMapBackFields.push_back(FIELD_CORNER_VOLUME_SUMS); + + // Zero out the radiation force components. + for (auto &ptr : radiationForce) + memset(ptr, 0, nvalues * sizeof(double)); + auto nc = static_cast(radiationForce.size()); + + // Zero out the corner volume sums. These are needed for getRadiationForceDensity + // to return the right values d=m/v. + double *cornerVolumeSums = n_cvs.value(); + memset(cornerVolumeSums, 0, nvalues * sizeof(double)); + + if (ndim == 1) { - radiation_force_x[zone] = 0.0; - radiation_force_y[zone] = 0.0; + // Note: 1D does not involve certain mapping arrays. + for (int zone = 0; zone < nzones; ++zone) + { + // Get the radiation force and volume on each corner of each zone + int zoneID = zone + 1; + teton_getradiationforce(&zoneID, &RadiationForce[0]); + teton_getcornervolumes(&zoneID, &CornerVolumes[0]); + + int v1 = zone; + int v2 = zone + 1; + radiationForce[0][v1] += RadiationForce[0]; + radiationForce[0][v2] += RadiationForce[1]; + cornerVolumeSums[v1] += CornerVolumes[0]; + cornerVolumeSums[v2] += CornerVolumes[1]; + } } - if (ndim == 3) + else { for (int zone = 0; zone < nzones; ++zone) { - radiation_force_z[zone] = 0.0; + // Get the radiation force and volume on each corner of each zone + int zoneID = zone + 1; + teton_getradiationforce(&zoneID, &RadiationForce[0]); + teton_getcornervolumes(&zoneID, &CornerVolumes[0]); + + // Average the radiation force around vertices + int ncorners = mZoneToNCorners[zone]; + for (int c = 0; c < ncorners; ++c) + { + int cornerID = mZoneToCorners[corner_counter]; + int vertexID = mCornerToVertex[cornerID]; + corner_counter += 1; + + for (int comp = 0; comp < nc; comp++) + radiationForce[comp][vertexID] += RadiationForce[c * ndim + comp]; + + cornerVolumeSums[vertexID] += CornerVolumes[c]; + } } } + // Sum shared vertex values across processor domains. + SumSharedNodalValues(part, cornerVolumeSums); + for (double *forceComponent : radiationForce) + SumSharedNodalValues(part, forceComponent); +} + +void Teton::updateZonalRadiationForce() +{ + CALI_CXX_MARK_FUNCTION; + + // Compute the radiation force internally in Teton + // for each zone and corner + teton_setradiationforce(); + + conduit::Node &options = getOptions(); + conduit::Node &part = getMeshBlueprintPart(); + int ndim = options.fetch_existing("size/ndim").value(); + int maxCorner = options.fetch_existing("size/maxCorner").value(); + int nzones = options.fetch_existing("size/nzones").value(); + std::vector RadiationForce(ndim * maxCorner); + + const auto fieldNames = radiationForceDensityFields(); + auto radiationForce = radiationForceDensity(part); + + // We need to map these fields back to the blueprint mesh because they are + // queried as results. + for (const auto &f : fieldNames) + mMapBackFields.push_back(f); + + // Zero out the radiation force components. + for (auto &ptr : radiationForce) + memset(ptr, 0, nzones * sizeof(double)); + auto nc = static_cast(radiationForce.size()); + for (int zone = 0; zone < nzones; ++zone) { // Get the radiation force and volume on each corner of each zone @@ -1978,22 +2492,14 @@ void Teton::updateZonalRadiationForce() int ncorners = mZoneToNCorners[zone]; for (int c = 0; c < ncorners; ++c) { - radiation_force_x[zone] += RadiationForce[c * ndim + 0]; - radiation_force_y[zone] += RadiationForce[c * ndim + 1]; - if (ndim == 3) - radiation_force_z[zone] += RadiationForce[c * ndim + 2]; + for (int comp = 0; comp < nc; comp++) + radiationForce[comp][zone] += RadiationForce[c * ndim + comp]; } } } -void Teton::getRadEnergyDeposited(double *RadEnergyDeposited) +void Teton::getRadEnergyDeposited(double *RadEnergyDeposited, int nzones) const { - conduit::Node &options = getOptions(); - int nzones = options.fetch_existing("size/nzones").value(); - for (int zone = 0; zone < nzones; ++zone) - { - RadEnergyDeposited[zone] = 0.0; - } for (int zone = 0; zone < nzones; ++zone) { // Get the radiation energy deposited @@ -2003,10 +2509,108 @@ void Teton::getRadEnergyDeposited(double *RadEnergyDeposited) } } -void Teton::reconstructPsi(double *rad_energy, double *rad_energy_density) +conduit::Node &Teton::getMainTopology(conduit::Node &root) +{ + conduit::Node &topologies = root.fetch_existing("topologies"); + return topologies.child(0); +} + +const conduit::Node &Teton::getMainTopology(const conduit::Node &root) const +{ + const conduit::Node &topologies = root.fetch_existing("topologies"); + return topologies.child(0); +} + +void Teton::createZonalField(conduit::Node &root, const std::string &topoName, const std::string &fieldName, int nzones) +{ + std::string path(field_path(fieldName)); + if (!root.has_path(path)) + { + conduit::Node &f = root[path]; + f["topology"] = topoName; + f["association"] = "element"; + f["values"].set(conduit::DataType::float64(nzones)); + memset(f["values"].data_ptr(), 0, sizeof(conduit::float64) * nzones); + } +} + +void Teton::getRadiationTemperature(double *RadTemp, int nzones) const +{ + for (int zone = 0; zone < nzones; ++zone) + { + // Get the radiation temperature + int zoneID = zone + 1; + RadTemp[zone] = 0.; + teton_getradiationtemperature(&zoneID, &RadTemp[zone]); + } +} + +void Teton::getMaterialTemperature(double *MatTemp, int nzones) const +{ + for (int zone = 0; zone < nzones; ++zone) + { + // Get the material temperature + int zoneID = zone + 1; + MatTemp[zone] = 0.; + teton_getmaterialtemperature(&zoneID, &MatTemp[zone]); + } +} + +void Teton::reconstructPsi(double *rad_energy, const double *rad_energy_density) { #if !defined(TETON_ENABLE_MINIAPP_BUILD) - teton_reconstructpsi(rad_energy, rad_energy_density); + CALI_CXX_MARK_FUNCTION; + + // Determine nzones, ngroups + conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &options = getOptions(); + + const conduit::Node &main_topo = getMainTopology(blueprint); + conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + + if (doPartitioning()) + { + conduit::Node &part = getMeshBlueprintPart(); + std::string mainTopologyName(main_topo.name()); + + // Add rad_energy_density as an mcarray on the blueprint mesh. + std::string fieldName(MCARRAY_PREFIX + "rad_energy_density"); + conduit::Node &fields = blueprint["fields"]; + conduit::Node &n_f = fields[fieldName]; + n_f["topology"] = mainTopologyName; + n_f["association"] = "element"; + conduit::Node &values = n_f["values"]; + // radiation_energy_density is shaped double[ngroups][nzones] so we do not need to interleave. + bool interleave = false; + utilities::NDAccessor acc(values, {{"zone", nzones}, {"group", ngroups}}, interleave); + acc.set_external(rad_energy_density); + + // Partition to get rad_energy_density on part mesh. + const std::vector fieldNames{fieldName}; + sendFieldsOrig2Part(mainTopologyName, fieldNames, false); + + // On part mesh, get various mcarray components, put back together into a + // contiguous block [ngroups][nzones] so we can call Teton. + conduit::Node &partFields = part["fields"]; + conduit::Node &partValues = partFields.fetch_existing(fieldName + "/values"); + const conduit::Node &part_main_topo = getMainTopology(part); + conduit::index_t npartZones = conduit::blueprint::mesh::utils::topology::length(part_main_topo); + utilities::NDAccessor accp(partValues, {{"zone", npartZones}, {"group", ngroups}}, interleave); + std::vector part_rad_energy_density(ngroups * npartZones); + accp.to_contiguous(&part_rad_energy_density[0]); + + // Call Teton + teton_reconstructpsi(rad_energy, &part_rad_energy_density[0]); + + // Clean up + fields.remove_child(fieldName); + partFields.remove_child(fieldName); + } + else + { + teton_reconstructpsi(rad_energy, const_cast(rad_energy_density)); + } #endif } @@ -2017,4 +2621,1357 @@ void Teton::reconstructPsiFromdV() #endif } +void Teton::getZonalPsi(int numAngles, double *psi) +{ +#if defined(TETON_PARTITIONING) && !defined(TETON_ENABLE_MINIAPP_BUILD) + CALI_CXX_MARK_FUNCTION; + + if (doPartitioning()) + { + conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &options = getOptions(); + conduit::Node &part = getMeshBlueprintPart(); + + const conduit::Node &main_topo = getMainTopology(part); + std::string mainTopologyName(main_topo.name()); + conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + //conduit::index_t nangles = options.fetch_existing("quadrature/nSetsMaster").to_index_t(); + conduit::index_t nangles = numAngles; + + // Get the zonalpsi for the part mesh. + double *part_psi = new double[nangles * ngroups * nzones]; + teton_getzonalpsi(&numAngles, part_psi); + + // Add zonalpsi as an mcarray on the part mesh (external). + const std::string fieldName(MCARRAY_PREFIX + "zonalpsi"); + const std::vector fieldNames{fieldName}; + conduit::Node &partfields = part.fetch_existing("fields"); + conduit::Node &pzpsi = partfields[fieldName]; + pzpsi["association"] = "element"; + pzpsi["topology"] = mainTopologyName; + bool interleave = false; // since zones is the fastest changing in how the data are stored. + utilities::NDAccessor accp(pzpsi["values"], + {{"zone", nzones}, {"group", ngroups}, {"angle", nangles}}, + interleave); + accp.set_external(part_psi); + + // Send zonalpsi mcarray back to the blueprint mesh. + sendFieldsPart2Orig(mainTopologyName, fieldNames); + + // Cleanup part fields. + partfields.remove_child(fieldName); + delete[] part_psi; + + // Construct an accessor on the blueprint mesh's zonalpsi field and then iterate + // over it, copying the data into the psi array. + const conduit::Node &bp_main_topo = getMainTopology(blueprint); + conduit::index_t nbpzones = conduit::blueprint::mesh::utils::topology::length(bp_main_topo); + conduit::Node &fields = blueprint.fetch_existing("fields"); + conduit::Node &zpsi = fields[fieldName]; + utilities::NDAccessor acc(zpsi["values"], + {{"zone", nbpzones}, {"group", ngroups}, {"angle", nangles}}, + interleave); +#if 1 + acc.to_contiguous(psi); +#else + // For reference + double *dptr = psi; + for (conduit::index_t a = 0; a < nangles; a++) + for (conduit::index_t g = 0; g < ngroups; g++) + for (conduit::index_t z = 0; z < nzones; z++) + { + *dptr++ = acc(std::vector{z, g, a}); + } +#endif + // Cleanup blueprint fields. + fields.remove_child(fieldName); + } + else + { + teton_getzonalpsi(&numAngles, psi); + } +#else + teton_getzonalpsi(&numAngles, psi); +#endif +} + +//--------------------------------------------------------------------------- + +bool Teton::doPartitioning() const +{ + bool p = false; +#if defined(TETON_PARTITIONING) + if (getOptions().has_path("partitioning")) + { + int value = getOptions().fetch_existing("partitioning").to_int(); + p = value != 0; + } + if (getenv("TETON_PARTITION") != nullptr) + { + int value = atoi(getenv("TETON_PARTITION")); + p = value != 0; + } +#endif + return p; +} + +conduit::Node &Teton::getMeshBlueprintPart() +{ +#if defined(TETON_PARTITIONING) + std::string meshKey(doPartitioning() ? "blueprint_partitioned" : "blueprint"); + return getDatastore()[meshKey]; +#else + return getDatastore()["blueprint"]; +#endif +} + +const conduit::Node &Teton::getMeshBlueprintPart() const +{ +#if defined(TETON_PARTITIONING) + std::string meshKey(doPartitioning() ? "blueprint_partitioned" : "blueprint"); + return getDatastore()[meshKey]; +#else + return getDatastore()["blueprint"]; +#endif +} + +std::vector Teton::createPartitionFields(conduit::Node &mesh, const std::vector &topoNames) +{ + std::vector fieldNames; +#if defined(TETON_PARTITIONING) + CALI_CXX_MARK_FUNCTION; + + // Make a list of field names + for (const auto &tname : topoNames) + fieldNames.push_back(Teton::PREFIX + "parmetis_result_" + tname); + + auto doms = conduit::blueprint::mesh::domains(mesh); + for (const auto &domptr : doms) + { + conduit::Node &dom = *domptr; + + // Look in the domain to see if any of the fields need to be created. + bool buildFields = true; + if (dom.has_path("fields")) + { + const conduit::Node &fields = dom.fetch_existing("fields"); + int missingCount = 0; + for (const auto &fname : fieldNames) + missingCount = fields.has_path(fname) ? 0 : 1; + buildFields = missingCount > 0; + } + + // If any fields need to be created, do it. + if (buildFields) + { + const conduit::Node &topo = getMainTopology(dom); + const conduit::Node &coordset = dom.fetch_existing("coordsets/" + topo["coordset"].as_string()); + // Assume all of the input topologies will have the same topological dimension and + // that they will match the "boundary" topology. It is possible the boundary topology + // does not exist. Let's assume that if the boundary topology does not exist then + // neither will the others. Any other topologies we're dealing with will have the + // same topological dimension as the boundary topology. + if (dom.has_path("topologies/" + TOPO_BOUNDARY)) + { + const conduit::Node &btopo = dom.fetch_existing("topologies/" + TOPO_BOUNDARY); + auto d = static_cast(conduit::blueprint::mesh::utils::topology::dims(btopo)); +#if defined(PARTITION_DEBUG) + if (mRank == 0) + std::cout << "Teton: partition - create partition field - build hash" << std::endl; +#endif + // Produce the external "faces" of the domain and for each "face", hash + // its node ids and associate that hash with the parent zone for the face. + std::vector> desired_maps{{d, d + 1}}; + conduit::blueprint::mesh::utils::TopologyMetadata md(topo, coordset, d, desired_maps); + const conduit::Node &dtopo = md.get_topology(d); + auto nent = md.get_topology_length(d); + std::map hashToZone; + for (conduit::index_t ei = 0; ei < nent; ei++) + { + auto vv = md.get_global_association(ei, d, d + 1); + if (vv.size() == 1) + { + // Get the ids that make up the entity and hash them. + auto ids = conduit::blueprint::mesh::utils::topology::unstructured::points(dtopo, ei); + std::sort(ids.begin(), ids.end()); + conduit::uint64 h = conduit::utils::hash(&ids[0], static_cast(ids.size())); + + // Save hash to parent zone. + hashToZone[h] = vv[0]; + } + } + + // Get the partition field for the main topology. + const conduit::Node &pf = dom.fetch_existing("fields/" + PARTITION_FIELD + "/values"); + auto f = pf.as_int32_accessor(); + + // Now, iterate through the secondary topologies, hash each entity's ids + // and try to look up the parent zone. The hashToZone map should contain + // all possible external faces for the domain so the boundary should be + // a subset of that. + for (size_t ti = 0; ti < topoNames.size(); ti++) + { + std::string fieldKey("fields/" + fieldNames[ti]); + if (!dom.has_path(fieldKey)) + { + // Only make the field if the topology exists on this rank. + std::string topoKey("topologies/" + topoNames[ti]); + if (dom.has_path(topoKey)) + { + const conduit::Node &topo = dom.fetch_existing(topoKey); + auto blen = conduit::blueprint::mesh::topology::length(topo); +#if defined(PARTITION_DEBUG) + if (mRank == 0) + std::cout << "Teton: partition - create partition field - " << fieldNames[ti] << std::endl; +#endif + conduit::Node &newfield = dom[fieldKey]; + newfield["association"] = "element"; + newfield["topology"] = topoNames[ti]; + newfield["values"].set(conduit::DataType::int32(blen)); + auto topoPartition = newfield["values"].as_int32_ptr(); + for (conduit::index_t ei = 0; ei < blen; ei++) + { + // Get the ids that make up the entity and hash them. + auto ids = conduit::blueprint::mesh::utils::topology::unstructured::points(topo, ei); + std::sort(ids.begin(), ids.end()); + conduit::uint64 h = conduit::utils::hash(&ids[0], static_cast(ids.size())); + + // Look up the zone id and map it through the partition field. + const auto it = hashToZone.find(h); + topoPartition[ei] = (it != hashToZone.end()) ? f[it->second] : 0; + } + } + } + } + } + } + } +#endif + return fieldNames; +} + +void Teton::assimilateTopology(conduit::Node &partmesh, + const std::string &topoName, + conduit::Node &secondPartmesh, + const std::string &secondTopoName) +{ +#if defined(TETON_PARTITIONING) + CALI_CXX_MARK_FUNCTION; + + if (partmesh.dtype().is_empty()) + { + return; + } + if (secondPartmesh.dtype().is_empty()) + { + return; + } + + // Get the coordset and topo for the volume mesh + auto domains = conduit::blueprint::mesh::domains(partmesh); + if (domains.size() < 1) + { + std::cout << "assimilateTopology: Must have at least one domain: " << domains.size() << std::endl; + return; + } + + // Get the coordset and topo for the boundary mesh + auto bdomains = conduit::blueprint::mesh::domains(secondPartmesh); + if (bdomains.size() < 1) + { + std::cout << "assimilateTopology: Must have at least one domain: " << bdomains.size() << std::endl; + return; + } + + if (domains.size() != bdomains.size()) + { + std::cout << "assimilateTopology: Incompatible numbers of domains " << domains.size() << ", " << bdomains.size() + << std::endl; + return; + } + + for (size_t domid = 0; domid < domains.size(); domid++) + { + conduit::Node &mesh = *domains[domid]; + const conduit::Node &topo = mesh["topologies/" + topoName]; + const conduit::Node &coordset = conduit::blueprint::mesh::utils::topology::coordset(topo); + + conduit::Node &bmesh = *bdomains[domid]; + const conduit::Node &btopo = bmesh["topologies/" + secondTopoName]; + const conduit::Node &bcoordset = conduit::blueprint::mesh::utils::topology::coordset(btopo); + + // Iterate over the boundary mesh coordinates and look them up in the + // volume mesh's coordset. + conduit::blueprint::mesh::utils::query::PointQuery Q(mesh); + const auto axes = conduit::blueprint::mesh::utils::coordset::axes(bcoordset); + const auto ndims = axes.size(); + const conduit::Node &bcvalues = bcoordset.fetch_existing("values"); + const int domain_id = static_cast(domid); + const auto bx = bcvalues[axes[0]].as_double_accessor(); + const auto by = bcvalues[axes[ndims > 1 ? 1 : 0]].as_double_accessor(); + const auto bz = bcvalues[axes[ndims > 2 ? 2 : 0]].as_double_accessor(); + conduit::index_t nSearchPoints = bx.number_of_elements(); + for (conduit::index_t i = 0; i < nSearchPoints; i++) + { + double pt[3]; + pt[0] = bx[i]; + pt[1] = ndims > 1 ? by[i] : 0.; + pt[2] = ndims > 2 ? bz[i] : 0.; + Q.add(domain_id, pt); + } + Q.execute(coordset.name()); + + // Make a new the topology that uses the volume mesh coordset. + // We remap the connectivity. + const auto &res = Q.results(domain_id); + auto bconnSrc = btopo["elements/connectivity"].as_int32_accessor(); + conduit::index_t nbconn = bconnSrc.number_of_elements(); + conduit::Node &newtopo = mesh["topologies/" + secondTopoName]; + newtopo["type"] = btopo["type"]; + newtopo["coordset"] = coordset.name(); + newtopo["elements/shape"] = btopo["elements/shape"]; + newtopo["elements/connectivity"].set(conduit::DataType::int32(nbconn)); + auto bconnNew = newtopo["elements/connectivity"].as_int32_array(); + for (conduit::index_t i = 0; i < nbconn; i++) + { + bconnNew[i] = res[bconnSrc[i]]; + } + if (btopo.has_path("elements/sizes")) + btopo["elements/sizes"].to_data_type(conduit::DataType::int32().id(), newtopo["elements/sizes"]); + if (btopo.has_path("elements/offsets")) + btopo["elements/offsets"].to_data_type(conduit::DataType::int32().id(), newtopo["elements/offsets"]); + + // Iterate over the boundary mesh's fields and steal them for the partmesh. + if (bmesh.has_child("fields")) + { + conduit::Node &srcFields = bmesh["fields"]; + conduit::Node &destFields = mesh["fields"]; + for (conduit::index_t i = 0; i < srcFields.number_of_children(); i++) + { + conduit::Node &f = srcFields[i]; + destFields[f.name()].set(f); + } + } + } +#endif +} + +void Teton::createRadiationTemperature() +{ + // Checking pre-partition so we use blueprint instead of part. + conduit::Node &blueprint = getMeshBlueprint(); + + // This field is used to return radiation temperature values from Teton as a + // field on the mesh that will be queried via getRadiationTemperature(). + if (doPartitioning()) + { + const conduit::Node &main_topo = getMainTopology(blueprint); + std::string mainTopologyName(main_topo.name()); + const conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + createZonalField(blueprint, mainTopologyName, FIELD_RADIATION_TEMPERATURE, nzones); + } +} + +void Teton::createMaterialTemperature() +{ + // Checking pre-partition so we use blueprint instead of part. + conduit::Node &blueprint = getMeshBlueprint(); + + // This field is used to return radiation temperature values from Teton as a + // field on the mesh that will be queried via getMaterialTemperature(). + if (doPartitioning()) + { + const conduit::Node &main_topo = getMainTopology(blueprint); + std::string mainTopologyName(main_topo.name()); + const conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + createZonalField(blueprint, mainTopologyName, FIELD_MATERIAL_TEMPERATURE, nzones); + } +} + +bool Teton::doInterleave(const std::string &fieldName) const +{ + bool retval = fieldName != FIELD_RADIATION_ENERGY_DENSITY; + return retval; +} + +void Teton::add_mcarray_fields(conduit::Node &root) +{ +#if defined(TETON_PARTITIONING) + conduit::Node &options = getOptions(); + const conduit::Node &main_topo = getMainTopology(root); + conduit::Node &fields = root.fetch_existing("fields"); + + conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + + // Some fields have been provided as a single oversized buffer that has + // multiple components but is not actually an MCArray. We'll make an MCArray + // for that field, giving it a new name. This is done so we can have + // the field pass through the partitioner while preserving all of the data. + + // Make new mcarray fields for the fields that look like mcarrays. + const std::vector copy_keys{"association", "topology"}; + utilities::iterate_mcarray_candidates(root, + main_topo.name(), + options, + std::vector{}, + [&](const conduit::Node &srcField) + { + std::string fieldName(srcField.name()); + std::string newFieldName(MCARRAY_PREFIX + srcField.name()); + + // Copy basic attributes + conduit::Node &newField = fields[newFieldName]; + for (const auto &k : copy_keys) + { + if (srcField.has_child(k)) + newField[k].set(srcField.fetch_existing(k)); + } + + // Make the mcarray so it points to the original field's data. These fields are interleaved. + const conduit::Node &srcValues = srcField.fetch_existing("values"); + const double *srcData = srcValues.as_float64_ptr(); + utilities::NDAccessor acc(newField["values"], {{"zone", nzones}, {"group", ngroups}}, doInterleave(fieldName)); + acc.set_external(srcData); + + // Record that we made a new mcarray. + mMCArrays[fieldName] = newFieldName; + }); +#endif +} + +void Teton::remove_mcarray_fields(conduit::Node &root) +{ +#if defined(TETON_PARTITIONING) + if (doPartitioning()) + { + conduit::Node &fields = root["fields"]; + for (auto it = mMCArrays.begin(); it != mMCArrays.end(); it++) + { + if (fields.has_path(it->second)) + { + fields.remove(it->second); + } + } + } +#endif +} + +const conduit::Node &Teton::fetch_mcarray(const conduit::Node &root, const std::string &fieldName) const +{ + const conduit::Node &fields = root.fetch_existing("fields"); + + // See if the requested field is an mcarray we created. + auto it = mMCArrays.find(fieldName); + if (it != mMCArrays.end()) + { + // Return the mcarray for the requested old field name. + return fields.fetch_existing(it->second); + } + // We did not find it. Assume that fieldName is a regular field. + return fields.fetch_existing(fieldName); +} + +void Teton::partition(bool fromRestart) +{ +#if defined(TETON_PARTITIONING) + // Make a lambda that helps display some dtype information for the mesh. +#if defined(PARTITION_DEBUG) + auto check_widest_dtype = [](const conduit::Node &mesh, int rank, const std::string &caption) + { + auto dtype = conduit::blueprint::mesh::utils::find_widest_dtype(mesh, conduit::DataType::int32()); + std::cout << " Widest " << caption << " int type: " << dtype.name() << std::endl; + if (!dtype.is_int32()) + { + const auto paths = utilities::find_int64(mesh); + if (!paths.empty()) + { + if (rank == 0) + { + std::cout << "int64 paths: "; + for (const auto &p : paths) + { + std::cout << p << ", "; + } + std::cout << std::endl; + } + } + } + }; + + utilities::Banner b(mCommunicator, "Teton::partition"); +#endif + conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); + bool alreadyPartitioned = part.has_child("partition_options_main"); + if (doPartitioning() && (!alreadyPartitioned || fromRestart)) + { + CALI_CXX_MARK_SCOPE("Teton_Partition"); + + int rank = 0, size = 1; + MPI_Comm_rank(mCommunicator, &rank); + MPI_Comm_size(mCommunicator, &size); + + std::string mainTopoName(getMainTopology(blueprint).name()); + + // Create a new field on the blueprint mesh that we're partitioning. + conduit::Node opts; + opts["topology"] = mainTopoName; + opts["field_prefix"] = PREFIX; + if (blueprint.has_path("adjsets/main_adjset")) + opts["adjset"] = "main_adjset"; // plays a role in global node id generation + if (rank == 0) + std::cout << "Teton: partition - make partition field." << std::endl; + conduit::blueprint::mpi::mesh::generate_partition_field(blueprint, opts, mCommunicator); + + // The partition field will be index_t, which can make Conduit start + // generating index_t for other things like topology maps. Force int32 + // because other int types are catastrophic in the interface at this time. + const std::vector replacements{"fields/" + PARTITION_FIELD + "/values", + "fields/" + PREFIX + "global_element_ids/values", + "fields/" + PREFIX + "global_vertex_ids/values"}; + utilities::convert_int32(rank, blueprint, replacements); + + // There are no int64/index_t in blueprint mesh now. +#if defined(PARTITION_DEBUG) + auto dtype = conduit::blueprint::mesh::utils::find_widest_dtype(blueprint, conduit::DataType::int32()); + if (rank == 0) + std::cout << " Widest blueprint int type: " << dtype.name() << std::endl; + MPI_Barrier(mCommunicator); +#endif + // Create a field selection for the main topology. + conduit::Node partopts; + partopts["mapping"] = 1; + partopts["original_element_ids"] = "main_original_element_ids"; + partopts["original_vertex_ids"] = "main_vertex_element_ids"; + conduit::Node &selections = partopts["selections"]; + conduit::Node &sel1 = partopts["selections"].append(); + sel1["type"] = "field"; + sel1["domain_id"] = "any"; + sel1["field"] = PARTITION_FIELD; + sel1["topology"] = mainTopoName; + sel1["destination_ranks"].set(conduit::DataType::int32(size)); + auto ranks = sel1["destination_ranks"].as_int32_ptr(); + for (int i = 0; i < size; i++) + ranks[i] = i; + +#if defined(PARTITION_DEBUG) + // Print the partitioning options. + if (rank == 0) + { + std::cout << "Teton: partition enabled." << std::endl; + std::cout << "Teton: part.path=" << part.path() << std::endl; + std::cout << "Teton: partops:" << std::endl; + partopts.print(); + } + + // Save out the partition mesh and parameters to YAML + const std::string protocol("yaml"); + std::stringstream ss, ss2; + ss << "partition_mesh." << rank << "." << protocol; + std::string meshFilename(ss.str()); + ss2 << "partition_options_main." << rank << "." << protocol; + std::string optsFilename(ss2.str()); +#pragma message "Partition mesh input will be saved." + conduit::relay::io::save(blueprint, meshFilename, protocol); + conduit::relay::io::save(partopts, optsFilename, protocol); +#endif + + // Partition the blueprint mesh and store the results in part. + if (rank == 0) + std::cout << "Teton: partition - partition main" << std::endl; + add_mcarray_fields(blueprint); + conduit::blueprint::mpi::mesh::partition(blueprint, partopts, part, mCommunicator); + remove_mcarray_fields(blueprint); + + // If we sent fields to the part mesh that were mcarray, keep only the + // mcarray version of the field. The non-mcarray version is probably not + // the right size. + for (auto it = mMCArrays.begin(); it != mMCArrays.end(); it++) + { + std::string origFieldKey = "fields/" + it->first; + if (part.has_path(origFieldKey)) + { + part.remove(origFieldKey); + } + } + + // The partitioner may produce index_t data, even if the inputs were not + // index_t. We must convert it to int32 or other parts of Conduit may + // start inserting index_t data because they find the widest dtype. This + // causes problems for Teton down the line since it requires int32. + auto repkeys = utilities::find_int64(part); +#if defined(PARTITION_DEBUG) + if (rank == 0) + check_widest_dtype(part, rank, "part"); + MPI_Barrier(mCommunicator); +#endif + + // Do the conversion + utilities::convert_int32(rank, part, repkeys); + +#if defined(PARTITION_DEBUG) + // Double-check that the dtype is int32. + if (rank == 0) + check_widest_dtype(part, rank, "part"); + MPI_Barrier(mCommunicator); +#endif +#if defined(PARTITION_DEBUG) && defined(CONDUIT_RELAY_IO_HDF5_ENABLED) + // Save the partitioned mesh to a file that can be visualized. + const std::string file_protocol = "hdf5"; + if (rank == 0) + std::cout << "Teton: partition - save part node to " << file_protocol << std::endl; + conduit::relay::mpi::io::blueprint::save_mesh(part, "part", file_protocol, mCommunicator); +#endif + // Get the names of the topos that we need to partition in addition to main, + // the "secondary" topologies. These include the boundary topology. + auto partitionTopos = getPartitionTopologies(blueprint); + + // Create partition field for secondary topologies. + if (rank == 0) + std::cout << "Teton: partition - create partition fields" << std::endl; + auto partitionFields = createPartitionFields(blueprint, partitionTopos); + + // Process each secondary topology. + for (size_t ti = 0; ti < partitionTopos.size(); ti++) + { + // Now repartition the secondary topology. + conduit::Node tpartopts; + conduit::Node &tsel = tpartopts["selections"].append(); + tsel["type"] = "field"; + tsel["domain_id"] = "any"; + tsel["field"] = partitionFields[ti]; + tsel["topology"] = partitionTopos[ti]; + tsel["destination_ranks"].set(sel1["destination_ranks"]); // same rank map + tpartopts["mapping"] = 0; // If we ever need to map fields back, set this to 1. + tpartopts["original_element_ids"] = partitionTopos[ti] + "_original_element_ids"; + tpartopts["original_vertex_ids"] = partitionTopos[ti] + "_vertex_element_ids"; + tpartopts["build_adjsets"] = 0; + if (partitionTopos[ti] != TOPO_BOUNDARY) + { + // Restrict fields mapped to one that does not exist. Map no fields. + // This is done because surface flux topologies fields are not really + // fields on that mesh. + conduit::Node &fields = tpartopts["fields"]; + fields[PREFIX + "_impossible_to_find_123456789"] = 1; + } + + if (rank == 0) + { + std::cout << "Teton: partition - partition " << partitionTopos[ti] << std::endl; +#if defined(PARTITION_DEBUG) + tpartopts.print(); +#endif + } + conduit::Node newpart; + conduit::blueprint::mpi::mesh::partition(blueprint, tpartopts, newpart, mCommunicator); + + // Merge the secondary topology from newpart into part. + assimilateTopology(part, mainTopoName, newpart, partitionTopos[ti]); + + // Save partitioning options for later. + part["partition_options_" + partitionTopos[ti]].move(tpartopts); + } + + // Do the conversion + repkeys = utilities::find_int64(part); + utilities::convert_int32(rank, part, repkeys); + + // Save partitioning options for later. We have to do this after calls + // to mesh::partition() because that method resets the input node. + part["partition_options_main"].move(partopts); + +#if defined(PARTITION_DEBUG) + check_widest_dtype(part, rank, "combined part"); + MPI_Barrier(mCommunicator); + +#if defined(CONDUIT_RELAY_IO_HDF5_ENABLED) + // Save the partitioned mesh, plus boundary to a file that can be visualized. + if (rank == 0) + std::cout << "Teton: partition - save part_with_boundary to " << file_protocol << std::endl; + conduit::relay::mpi::io::blueprint::save_mesh(part, "part_with_boundary", file_protocol, mCommunicator); +#endif + + // Check whether there are duplicated local points. We hope not. + conduit::Node info; + bool dups = utilities::find_local_duplicate_points(rank, part, part["coordsets/coords"], info); + if (dups) + { + info.print(); + } + MPI_Barrier(mCommunicator); +#endif + } +#endif +} + +std::vector Teton::getPartitionTopologies(const conduit::Node &root) const +{ + std::vector topoNames; + + // We'll exclude any of these names. The "main" topology is handled explicitly + // and we do not want any of the derived topologies to go through partitioning. + const std::string mainTopologyName(getMainTopology(root).name()); + const std::vector exclusions{mainTopologyName, "main_corner", "main_face"}; + + const conduit::Node &topologies = root.fetch_existing("topologies"); + const std::string mainCoordset(getMainTopology(root).fetch_existing("coordset").as_string()); + // Make a vector of all of the topology names that share a coordset with main. + for (conduit::index_t i = 0; i < topologies.number_of_children(); i++) + { + const conduit::Node &topo = topologies[i]; + + if (std::find(exclusions.begin(), exclusions.end(), topo.name()) == exclusions.end()) + { + const std::string coordset = topo.fetch_existing("coordset").as_string(); + if (coordset == mainCoordset) + { + topoNames.push_back(topo.name()); + } + } + } + return utilities::globalizeStringVector(topoNames, mCommunicator); +} + +void Teton::sendFieldsOrig2Part(const std::string &topoName, + const std::vector &fieldNames, + bool updateCoords) +{ +#if defined(TETON_PARTITIONING) + CALI_CXX_MARK_FUNCTION; + + if (doPartitioning() && (!fieldNames.empty() || updateCoords)) + { +#if defined(PARTITION_DEBUG) + utilities::Banner b(mCommunicator, "sendFieldsOrig2Part"); + if (mRank == 0) + { + std::cout << "topoName: " << topoName << "\n"; + std::cout << "updateCoords: " << updateCoords << "\n"; + std::cout << "fieldNames:\n"; + for (const auto &name : fieldNames) + std::cout << " - \"" << name << "\"" << std::endl; + } + MPI_Barrier(mCommunicator); +#endif + conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); + const conduit::Node &options = getOptions(); + conduit::Node &partopts = part["partition_options_" + topoName]; + + // Make sure mcarray fields are up to date. + add_mcarray_fields(blueprint); + + conduit::Node newpartmesh, updateopts; + // Copy the partition options for the topology and restrict the fields + // (add fields only if they exist in the blueprint mesh). + updateopts.set(partopts); + updateopts["mapping"] = 0; + updateopts["build_adjsets"] = 0; + conduit::Node &fields = updateopts["fields"]; + for (const auto &f : fieldNames) + { + // If the field happens to be the name of an mcarray then we want to send + // the mcarray instead of the original field name. + const conduit::Node &mcf = fetch_mcarray(blueprint, f); + std::string actualField(mcf.name()); + if (blueprint.has_path("fields/" + actualField)) + fields[actualField] = 1; + } + // Without a "fields" node, the partitioner will attempt to map all fields to + // the partitioned mesh. We really do not want to do any fields here since the + // host code could have deallocated the field memory that we know about in the + // blueprint node. This actually happened! Since we need a fields node, make up + // a field name that would never exist so the partitioner will not find any of + // the field names in this list. + if (fieldNames.empty()) + fields[PREFIX + "_impossible_to_find_123456789"] = 1; + + // Partition the mesh again. + conduit::blueprint::mpi::mesh::partition(blueprint, updateopts, newpartmesh, mCommunicator); + + // Iterate through the fields in each domain and move them over to the part mesh. + // This assumes that the newpartmesh was partitioned the same way as the + // original part mesh, which should be true. + auto destDoms = conduit::blueprint::mesh::domains(part); + auto srcDoms = conduit::blueprint::mesh::domains(newpartmesh); + assert(destDoms.size() == srcDoms.size()); + for (size_t i = 0; i < srcDoms.size(); i++) + { + conduit::Node &srcFields = srcDoms[i]->fetch_existing("fields"); + conduit::Node &destFields = destDoms[i]->fetch_existing("fields"); + for (conduit::index_t fi = 0; fi < srcFields.number_of_children(); fi++) + { + conduit::Node &src = srcFields[fi]; + std::string fname(src.name()); + // Move the field over if it was one we wanted in the options. + if (fields.has_child(fname)) + { + // BJW: There is a chance that the partitioned domain was passed straight + // through the partitioner without modification. For those cases, we + // do not want to disturb the destFields/srcFields. + conduit::Node &dest = destFields[fname]; + + // If the src and dest field are not the same then we need to copy. + bool doCopy = true; + if (dest.has_path("values")) + { + conduit::Node &srcValues = src["values"]; + conduit::Node &destValues = dest["values"]; + if (srcValues.number_of_children() > 0 && destValues.number_of_children() > 0) + doCopy = destValues[0].data_ptr() != srcValues[0].data_ptr(); + else + doCopy = destValues.data_ptr() != srcValues.data_ptr(); + } + if (doCopy) + { + // Copy the field to destFields. + dest.reset(); + dest["association"] = src["association"]; + dest["topology"] = src["topology"]; + dest["values"] = src["values"]; + } + } + } + + // Move coordset from src to dest. + if (updateCoords) + { + conduit::Node &srcCoords = srcDoms[i]->fetch_existing("coordsets/coords"); + conduit::Node &destCoords = destDoms[i]->fetch_existing("coordsets/coords"); + destCoords.set(srcCoords); + } + } + + // Remove mcarray fields + remove_mcarray_fields(blueprint); + } +#endif +} + +void Teton::sendFieldsPart2Orig(const std::string &topoName, const std::vector &fieldNames) +{ +#if defined(TETON_PARTITIONING) + CALI_CXX_MARK_FUNCTION; + + // The plan here is to send fields from the part mesh back to the original mesh. + // For normal fields, we can do this no problem. For oversize "mcarray" fields, + // we send back the mcarray field names and then recopy their data into the + // field they represent. On the partmesh, the mcarray components are often separate + // fields since they likely came from the original mesh in the first place and + // were not reaggregated into contiguous memory. + + if (doPartitioning()) + { + conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); + const conduit::Node &options = getOptions(); + const std::string mainTopologyName(getMainTopology(blueprint).name()); + +#if defined(PARTITION_DEBUG) + utilities::Banner b(mCommunicator, "sendFieldsPart2Orig"); + if (mRank == 0) + { + std::cout << "topoName: " << topoName << std::endl; + std::cout << "fieldNames:\n"; + for (const auto &name : fieldNames) + std::cout << " - \"" << name << "\"" << std::endl; + } + MPI_Barrier(mCommunicator); +#endif + + // Make sure the part mesh has its mcarray fields wrapped to send back. + add_mcarray_fields(part); + + // Get the list of fields that we think need to be mcarrays. If we're sending + // back one of these, send back the mcarray instead since the mcarrays are often + // set_external'd from the host. Also, much of the time, we send the mcarray + // data + std::map normal2mcarray; + utilities::iterate_mcarray_candidates(blueprint, + mainTopologyName, + options, + fieldNames, + [&](const conduit::Node &f) + { normal2mcarray[f.name()] = MCARRAY_PREFIX + f.name(); }); + + // Build up the mapback options. + conduit::Node mbopts; + mbopts["field_prefix"] = PREFIX; + mbopts["original_element_ids"] = topoName + "_original_element_ids"; + mbopts["original_vertex_ids"] = topoName + "_original_vertex_ids"; + std::vector sname; + for (const auto &f : fieldNames) + { + // If the field we want to send back seems like an mcarray, we will send back + // the mcarray instead. + std::string sendName(f); + const auto it = normal2mcarray.find(f); + if (it != normal2mcarray.end()) + sendName = it->second; + + mbopts["fields"].append().set(sendName); + + // Make sure the part mesh contains the field name. + if (part.has_path("fields/" + sendName)) + { + sname.push_back(sendName); + } + } + + if (mbopts.has_path("fields") && mbopts["fields"].number_of_children() > 0) + { + // Move selected fields from the partitioned mesh back to the original mesh. + conduit::blueprint::mpi::mesh::partition_map_back(part, mbopts, blueprint, mCommunicator); + + if (topoName == mainTopologyName) + { + // At this point, fields have been mapped from part fields back onto blueprint + // fields. Copy the mcarray fields back to their single-buffer original variable. + const conduit::Node &main_topo = getMainTopology(blueprint); + conduit::Node &fields = blueprint.fetch_existing("fields"); + conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + + for (auto it = normal2mcarray.begin(); it != normal2mcarray.end(); it++) + { + conduit::Node &n_origField = fields[it->first]; + conduit::Node &n_origValues = n_origField["values"]; + double *origValues = n_origValues.as_float64_ptr(); + + // Get the mcarray field we're copying into the original field. + const conduit::Node &n_srcField = fields[it->second]; + auto values = const_cast(n_srcField.fetch_existing("values")); + + // Copy mcarray components back into original contiguous field. This is + // compatible with original fields that are set_external. + utilities::NDAccessor src(values, {{"zone", nzones}, {"group", ngroups}}, doInterleave(it->first)); + src.to_contiguous(origValues); + + // We don't need the mcarray data anymore. + fields.remove(it->second); + } + } + } + } +#endif +} + +void Teton::initializeRadiationFluxFieldNames() +{ + // Get the number of dimensions from the blueprint mesh since it might not be + // in the options yet. + const conduit::Node &blueprint = getMeshBlueprint(); + std::string csname(getMainTopology(blueprint).fetch_existing("coordset").as_string()); + const conduit::Node &coordset = blueprint.fetch_existing("coordsets/" + csname); + const int ndim = static_cast(conduit::blueprint::mesh::coordset::dims(coordset)); + + mRadiationFluxFields.clear(); + mRadiationFluxFields.reserve(ndim); + if (ndim == 1) + { + mRadiationFluxFields.emplace_back(FIELD_RADIATION_FLUX_X); + } + else if (ndim == 2) + { + mRadiationFluxFields.emplace_back(FIELD_RADIATION_FLUX_Z); + mRadiationFluxFields.emplace_back(FIELD_RADIATION_FLUX_R); + } + else if (ndim == 3) + { + mRadiationFluxFields.emplace_back(FIELD_RADIATION_FLUX_X); + mRadiationFluxFields.emplace_back(FIELD_RADIATION_FLUX_Y); + mRadiationFluxFields.emplace_back(FIELD_RADIATION_FLUX_Z); + } +} + +const std::vector &Teton::getRadiationFluxFields() const +{ + return mRadiationFluxFields; +} + +void Teton::setRadiationFlux() +{ + CALI_CXX_MARK_FUNCTION; + + // Instruct Teton to compute the radiation flux prior to retrieval + teton_setradiationflux(); + +#if defined(TETON_PARTITIONING) + if (doPartitioning()) + { + conduit::Node &options = getOptions(); + conduit::Node &part = getMeshBlueprintPart(); + + // Step 1, make some fields on the part mesh and store radiation flux into them. + const conduit::Node &main_topo = getMainTopology(part); + const std::string mainTopologyName(main_topo.name()); + conduit::Node &fields = part.fetch_existing("fields"); + conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + + // We arrange the data this way so we can more easily send it through the partitioner. + std::vector dimGroups[3]; + const auto fieldNames = getRadiationFluxFields(); + int ndims = static_cast(fieldNames.size()); + for (int dim = 0; dim < ndims; dim++) + { + conduit::Node &n = fields[fieldNames[dim]]; + n["topology"] = mainTopologyName; + n["association"] = "element"; + conduit::Node &values = n["values"]; + for (int g = 0; g < ngroups; g++) + { + std::stringstream gs; + gs << "group" << g; + std::string gname(gs.str()); + + values[gname].set(conduit::DataType::float64(nzones)); + dimGroups[dim].push_back(values[gname].as_float64_ptr()); + } + } + + // step 2, get data from Teton into Conduit fields + std::vector zflux(ngroups * ndims); + for (int zone = 0; zone < nzones; zone++) + { + int zone1 = zone + 1; + teton_getradiationflux(&zone1, &zflux[0]); + + int idx = 0; + for (int g = 0; g < ngroups; g++) + { + for (int dim = 0; dim < ndims; dim++) + { + dimGroups[dim][g][zone] = zflux[idx++]; + } + } + } + + // step 3, map back fields to main mesh. + sendFieldsPart2Orig(mainTopologyName, fieldNames); + } +#endif +} + +void Teton::getRadiationFlux(int zone, double *zflux) const +{ +#if defined(TETON_PARTITIONING) + if (doPartitioning()) + { + const conduit::Node &options = getOptions(); + const conduit::Node &blueprint = getMeshBlueprint(); + const conduit::Node &fields = blueprint.fetch_existing("fields"); + const conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + int zone0 = zone - 1; + + // Pull the data out from the Conduit fields and return in the order that + // Teton would have returned it. + const auto fieldNames = getRadiationFluxFields(); + int ndims = static_cast(fieldNames.size()); + for (int dim = 0; dim < ndims; dim++) + { + const conduit::Node &f = fields.fetch_existing(fieldNames[dim]); + const conduit::Node &values = f["values"]; + for (conduit::index_t g = 0; g < values.number_of_children(); g++) + { + const auto zonal_array = values[g].as_float64_ptr(); + zflux[g * ndims + dim] = zonal_array[zone0]; + } + } + } + else + { + teton_getradiationflux(&zone, zflux); + } +#endif +} + +void Teton::getEdits(int &noutrt, + int &ninrt, + int &ngdart, + int &nNLIters, + int &maxNLIters, + int &TrMaxZone, + int &TeMaxZone, + int &TrMaxProcess, + int &TeMaxProcess, + double &dtused, + double &dtrad, + double &TrMax, + double &TeMax, + double &EnergyRadiation, + double &PowerIncident, + double &PowerEscape, + double &PowerAbsorbed, + double &PowerEmitted, + double &PowerExtSources, + double &PowerCompton, + double &EnergyCheck) const +{ + teton_getedits(&noutrt, + &ninrt, + &ngdart, + &nNLIters, + &maxNLIters, + &TrMaxZone, + &TeMaxZone, + &TrMaxProcess, + &TeMaxProcess, + &dtused, + &dtrad, + &TrMax, + &TeMax, + &EnergyRadiation, + &PowerIncident, + &PowerEscape, + &PowerAbsorbed, + &PowerEmitted, + &PowerExtSources, + &PowerCompton, + &EnergyCheck); + + // If partitioning occurred then we need to fix up some max zone/process information. + if (doPartitioning()) + { + const conduit::Node &part = getMeshBlueprintPart(); + const std::string mainTopologyName(getMainTopology(part).name()); + std::string vkey = "fields/" + mainTopologyName + "_original_element_ids/values"; + const conduit::Node &vnode = part.fetch_existing(vkey); + const auto orig_domains = vnode.fetch_existing("domains").as_int_accessor(); + const auto orig_zones = vnode.fetch_existing("ids").as_int_accessor(); + + int maxvals[4] = {0, 0, 0, 0}, finalmaxvals[4] = {0, 0, 0, 0}; + if (mRank == TrMaxProcess) + { + // This rank owns the partitioned zone so it knows where it came from. + maxvals[0] = orig_zones[TrMaxZone - 1] + 1; + maxvals[1] = orig_domains[TrMaxZone - 1]; + } + if (mRank == TeMaxProcess) + { + // This rank owns the partitioned zone so it knows where it came from. + maxvals[2] = orig_zones[TeMaxZone - 1] + 1; + maxvals[3] = orig_domains[TeMaxZone - 1]; + } + MPI_Allreduce(maxvals, finalmaxvals, 4, MPI_INT, MPI_MAX, mCommunicator); + + TrMaxZone = finalmaxvals[0]; + TrMaxProcess = finalmaxvals[1]; + TeMaxZone = finalmaxvals[2]; + TeMaxProcess = finalmaxvals[3]; + } +} + +void Teton::getDtControls(int &flag, int &process, int &zone, std::string &message) const +{ + const conduit::Node &options = getOptions(); + flag = options.fetch_existing("iteration/dtcontrol/flag").value(); + process = options.fetch_existing("iteration/dtcontrol/process").value(); + zone = options.fetch_existing("iteration/dtcontrol/zone").value(); + message = options.fetch_existing("iteration/dtcontrol/message").as_string(); + + // If partitioning occurred then we need to fix up some max zone/process information. + if (doPartitioning()) + { + const conduit::Node &part = getMeshBlueprintPart(); + const std::string mainTopologyName(getMainTopology(part).name()); + // Check this in case partitioning has not actually happened yet. + std::string vkey = "fields/" + mainTopologyName + "_original_element_ids/values"; + if (part.has_path(vkey)) + { + const conduit::Node &vnode = part.fetch_existing(vkey); + const auto orig_domains = vnode.fetch_existing("domains").as_int_accessor(); + const auto orig_zones = vnode.fetch_existing("ids").as_int_accessor(); + + int maxvals[2] = {0, 0}, finalmaxvals[2] = {0, 0}; + if (mRank == process) + { + // This rank owns the partitioned zone so it knows where it came from. + maxvals[0] = orig_zones[zone - 1] + 1; + maxvals[1] = orig_domains[zone - 1]; + } + MPI_Allreduce(maxvals, finalmaxvals, 2, MPI_INT, MPI_MAX, mCommunicator); + + zone = finalmaxvals[0]; + process = finalmaxvals[1]; + + message += " Note - Teton repartitioned the mesh so process/zone may differ."; + } + } +} + +void Teton::partitionCleanup() +{ +#if defined(TETON_PARTITIONING) + CALI_CXX_MARK_FUNCTION; + + if (doPartitioning()) + { + conduit::Node &blueprint = getMeshBlueprint(); + conduit::Node &part = getMeshBlueprintPart(); +#if defined(CLEANUP_PARTITION_TOPOLOGY) + // Totally clear out the partitioned mesh. + part.reset(); +#endif + // Remove some fields that we added to the original mesh. + conduit::Node &fields = blueprint["fields"]; + std::vector removals{PARTITION_FIELD, PARTITION_FIELD_BOUNDARY}; + for (conduit::index_t i = 0; i < fields.number_of_children(); i++) + { + if (fields[i].name().find("original_element_ids") != std::string::npos) + removals.push_back(fields[i].name()); + if (fields[i].name().find("original_vertex_ids") != std::string::npos) + removals.push_back(fields[i].name()); + // Remove any field that begins with PREFIX. + if (fields[i].name().find(PREFIX) == 0) + removals.push_back(fields[i].name()); + } + for (const auto &name : removals) + { + if (fields.has_child(name)) + { + fields.remove(name); + } + } + } +#endif +} + +std::string Teton::makeTestNode(conduit::Node &n, + conduit::Node &datastore, + conduit::Node &bp, + conduit::Node &options, + int flags) +{ +#if defined(PARTITION_DEBUG) + utilities::Banner b(mCommunicator, "test"); +#endif + const conduit::Node &main_topo = getMainTopology(bp); + std::string mainTopologyName(main_topo.name()); + conduit::index_t nzones = conduit::blueprint::mesh::utils::topology::length(main_topo); + conduit::index_t ngroups = options.fetch_existing("quadrature/num_groups").to_index_t(); + conduit::index_t nangles = options.fetch_existing("quadrature/num_angles").to_index_t(); + + // Make a node that we'll check for validity. We make the node ourselves so + // it only has the things we want in it. + const std::vector names{ + // inputs + "fields/thermo_density", + "fields/electron_temperature", + "fields/radiation_temperature", + "fields/electron_number_density", + "fields/electron_specific_heat", + "fields/absorption_opacity", + "fields/scattering_opacity", + // Some outputs that may have been registered as fields + "fields/radiation_energy_density", + "fields/electron_energy_deposited", + "fields/radiation_force_x", + "fields/radiation_force_y", + "fields/radiation_force_z", + "fields/radiation_force_r", + }; + + for (const auto &name : names) + { + if (bp.has_path(name)) + n[name].set_external(bp.fetch_existing(name)); + } + + // Get the radiation flux outputs. + setRadiationFlux(); + for (const auto &name : getRadiationFluxFields()) + { + if (!n.has_path(name) && bp.has_path(name)) + n[name].set_external(bp.fetch_existing(name)); + } + + // Get some radiation force outputs. + if (((flags & Test_RadiationForceDensity) > 0) && bp.has_path(field_values(FIELD_CORNER_VOLUME_SUMS))) + { + conduit::index_t nnodes = conduit::blueprint::mesh::coordset::length(bp.fetch_existing("coordsets/coords")); + n["fields/__result__radiation_force_x/association"] = "element"; + n["fields/__result__radiation_force_x/topology"] = mainTopologyName; + n["fields/__result__radiation_force_x/values"].set(conduit::DataType::float64(nnodes)); + double *fx = n["fields/__result__radiation_force_x/values"].value(); + n["fields/__result__radiation_force_y/association"] = "element"; + n["fields/__result__radiation_force_y/topology"] = mainTopologyName; + n["fields/__result__radiation_force_y/values"].set(conduit::DataType::float64(nnodes)); + double *fy = n["fields/__result__radiation_force_y/values"].value(); + n["fields/__result__radiation_force_z/association"] = "element"; + n["fields/__result__radiation_force_z/topology"] = mainTopologyName; + n["fields/__result__radiation_force_z/values"].set(conduit::DataType::float64(nnodes)); + double *fz = n["fields/__result__radiation_force_z/values"].value(); + getRadiationForceDensity(fx, fy, fz); + } + + if ((flags & Test_ZonalPsi) > 0) + { + // Call getZonalPsi and get the values out. + auto nvalues = nzones * ngroups * nangles; + n["fields/__result__zonal_psi/association"] = "element"; + n["fields/__result__zonal_psi/topology"] = mainTopologyName; + n["fields/__result__zonal_psi/values"].set(conduit::DataType::float64(nvalues)); + double *values_ptr = n["fields/__result__zonal_psi/values"].value(); + getZonalPsi(nangles, values_ptr); + } + + // NOTE: getMaterialTemperature is not backed by a Conduit field with the right length. + // So, with partitioning, we can't ask for nzones values without going out of bounds + if ((flags & Test_MaterialTemperature) > 0) + { + // Call getMaterialTemperature and get the values out. + n["fields/__result__material_temperature/association"] = "element"; + n["fields/__result__material_temperature/topology"] = mainTopologyName; + n["fields/__result__material_temperature/values"].set(conduit::DataType::float64(nzones)); + double *mt = n["fields/__result__material_temperature/values"].value(); + for (int zid = 0; zid < nzones; zid++) + mt[zid] = getMaterialTemperature(zid + 1); + } + + if ((flags & Test_RadiationTemperature) > 0) + { + // Call getRadiationTemperature and get the values out. + n["fields/__result__radiation_temperature/association"] = "element"; + n["fields/__result__radiation_temperature/topology"] = mainTopologyName; + n["fields/__result__radiation_temperature/values"].set(conduit::DataType::float64(nzones)); + double *rt = n["fields/__result__radiation_temperature/values"].value(); + for (int zid = 0; zid < nzones; zid++) + rt[zid] = getRadiationTemperature(zid + 1); + } + + if ((flags & Test_RadiationDeposited) > 0) + { + n["fields/__result__rad_energy_deposited/association"] = "element"; + n["fields/__result__rad_energy_deposited/topology"] = mainTopologyName; + n["fields/__result__rad_energy_deposited/values"].set(conduit::DataType::float64(nzones)); + double *red = n["fields/__result__rad_energy_deposited/values"].value(); + for (int z = 0; z < nzones; z++) + red[z] = getRadiationDeposited(z + 1); + } + + if ((flags & Test_ReconstructPsi) > 0) + { + double erad = 0.; + n["fields/__result__reconstructpsi_radEnergyDensity/association"] = "element"; + n["fields/__result__reconstructpsi_radEnergyDensity/topology"] = mainTopologyName; + n["fields/__result__reconstructpsi_radEnergyDensity/values"].set(conduit::DataType::float64(nzones * ngroups)); + double *psired = n["fields/__result__reconstructpsi_radEnergyDensity/values"].value(); + reconstructPsi(&erad, psired); + n["__result__reconstructpsi_erad"] = erad; + } + + // Add in the options and edits so we can check their values too. + n["options"].set_external(options); + n["rtedits"].set_external(datastore.fetch_existing("rtedits")); + + int cycle = bp["state/cycle"].to_int(); + std::stringstream ss; + ss << "_cycle=" << cycle << "_rank=" << mRank << "_a=" << nangles << "_g=" << ngroups << "_z=" << nzones; + + return ss.str(); +} + } // namespace Teton diff --git a/src/teton/interface/TetonSources.cc b/src/teton/interface/TetonSources.cc index 5cfcb89..cc60a7f 100644 --- a/src/teton/interface/TetonSources.cc +++ b/src/teton/interface/TetonSources.cc @@ -76,10 +76,10 @@ PointSource::PointSource(int nangles, int ntimebins, const double *timebinbounds, const double *source_profile) - : m_num_time_bins(ntimebins), + : TetonSource(nangles, zone_index > 0 ? 1 : 0, ngroups), m_profile(), m_time_bin_bounds(), - TetonSource(nangles, zone_index > 0 ? 1 : 0, ngroups) + m_num_time_bins(ntimebins) { if (zone_index <= 0) return; diff --git a/src/teton/interface/TetonSurfaceTallies.cc b/src/teton/interface/TetonSurfaceTallies.cc index bdf5887..90411c5 100644 --- a/src/teton/interface/TetonSurfaceTallies.cc +++ b/src/teton/interface/TetonSurfaceTallies.cc @@ -25,7 +25,6 @@ void dumpTallyToJson(const conduit::Node &blueprint, const conduit::Node &option tally_node["runInfo/code/name"] = "teton"; // TODO read in host code name? std::string conduit_base_path = "xray/surfaceTallies/"; - const int ngroups_teton = options["quadrature/num_groups"].as_int(); const double *group_bounds = options.fetch_existing("quadrature/gnu").as_double_ptr(); int nanglebin_teton = -1; @@ -70,7 +69,6 @@ void dumpTallyToJson(const conduit::Node &blueprint, const conduit::Node &option const double *tbinbnds = surface_edit_option["time_bin_boundaries"].as_double_ptr(); const int ntimebin = surface_edit_option["time_bin_boundaries"].dtype().number_of_elements() - 1; const int size3d = nanglebin * ngrp * ntimebin; - const double *center = surface_edit_option.fetch_existing("center_point").as_double_ptr(); const double *tally_values_esc = blueprint["fields/" + tally_name + "_tallies/values"].as_double_ptr(); const double *tally_values_inc = nullptr; @@ -124,7 +122,6 @@ void dumpTallyToJson(const conduit::Node &blueprint, const conduit::Node &option angle_dim_info["units"] = "cos(theta)"; angle_dim_info["type"] = "angle bin"; - // Check that Teton and Ares have the same idea of how many bins there are: TETON_VERIFY_C(mpi_rank, nanglebin == nanglebin_teton, "nanglebin must match nanglebin_teton for surface tallies"); diff --git a/src/teton/interface/TetonTesting.cc b/src/teton/interface/TetonTesting.cc new file mode 100644 index 0000000..76fd6e7 --- /dev/null +++ b/src/teton/interface/TetonTesting.cc @@ -0,0 +1,343 @@ +#include "TetonTesting.hh" +#include "TetonUtilities.hh" + +#include "conduit/conduit_blueprint.hpp" +#include "conduit/conduit_blueprint_mesh.hpp" +#include "conduit/conduit_relay.hpp" +#include "conduit/conduit_relay_mpi.hpp" + +#include +#include +#include +#include +#include + +namespace Teton +{ + +namespace testing +{ + +namespace detail +{ + +//----------------------------------------------------------------------------- +std::vector split(const std::string &str) +{ + std::vector retval; + std::istringstream f(str); + std::string s; + while (getline(f, s, '/')) + { + retval.push_back(s); + } + return retval; +} + +//----------------------------------------------------------------------------- +std::vector filter(const std::vector &words, const std::vector &exclude) +{ + std::vector retval; + for (const auto &w : words) + { + if (std::find(exclude.begin(), exclude.end(), w) == exclude.end()) + retval.push_back(w); + } + return retval; +} + +//----------------------------------------------------------------------------- +std::string join(const std::vector &words, const std::string &delim) +{ + std::string retval; + for (size_t i = 0; i < words.size(); i++) + { + if (i > 0) + retval += delim; + retval += words[i]; + } + return retval; +} + +//----------------------------------------------------------------------------- +void diff_info_to_html_helper(std::ostream &os, + const conduit::Node &baseline, + const conduit::Node ¤t, + const conduit::Node &info) +{ + conduit::Node opts; + opts["num_elements_threshold"] = 50; + opts["num_children_threshold"] = 10000; + + const std::vector exclude{"children", "diff"}; + for (conduit::index_t i = 0; i < info.number_of_children(); i++) + { + const conduit::Node &n = info[i]; + if (n.name() == "errors") + { + auto path = join(filter(split(n.parent()->path()), exclude), "/"); + + os << "\n"; + os << "" << path << "\n"; + + os << ""; + if (baseline.has_path(path)) + { + const conduit::Node &obj = baseline.fetch_existing(path); + obj.to_summary_string_stream(os, opts); + + if (obj.dtype().number_of_elements() > 1) + os << "
len=" << obj.dtype().number_of_elements() << ""; + } + os << "\n"; + + os << ""; + if (current.has_path(path)) + { + const conduit::Node &obj = current.fetch_existing(path); + obj.to_summary_string_stream(os, opts); + + if (obj.dtype().number_of_elements() > 1) + os << "
len=" << obj.dtype().number_of_elements() << ""; + } + os << "\n"; + + os << "\n"; + } + else if (n.number_of_children() > 0) + { + diff_info_to_html_helper(os, baseline, current, n); + } +#if 0 + // Enable this if we want to see nodes that are the same. + else + { + auto path = join(filter(split(n.parent()->path()), exclude), "/"); + + os << "\n"; + os << "" << path << "\n"; + + os << "match\n"; + + os << "\n"; + } +#endif + } +} + +//--------------------------------------------------------------------------- +void diff_info_to_html(MPI_Comm comm, + std::ostream &os, + const conduit::Node &baseline, + const conduit::Node ¤t, + const conduit::Node &info, + int cycle) +{ + // Table style + const char *style = R"( + +)"; + + int rank = 0; + MPI_Comm_rank(comm, &rank); + + // Write real diffs out as a table. + os << style << "\n"; + if (cycle > 0) + { + std::stringstream prev; + prev << "diff." << rank << "." << (cycle - 1) << ".html"; + os << "Prev    "; + } + std::stringstream next; + next << "diff." << rank << "." << (cycle + 1) << ".html"; + os << "Next
\n"; + + os << "

Cycle " << cycle << "


\n"; + os << "\n"; + os << "\n"; + diff_info_to_html_helper(os, baseline, current, info); + os << "
PathBaselineCurrent
\n"; +} + +//----------------------------------------------------------------------------- +/** + @return True if the node is the same as the baseline node. + */ +bool compare_baseline(MPI_Comm comm, + const conduit::Node &baseline, + const conduit::Node ¤t, + conduit::Node &info, + int cycle, + bool forceFile = true) +{ + const double tolerance = 1.e-6; + + int rank = 0; + MPI_Comm_rank(comm, &rank); + + bool equal; + if (forceFile) + { + // Sometimes Node::diff lies about data arrays being equal when they are not. + // Save the results node n out to a file first and then read it back in. + // That seems to work around the issue. + std::stringstream ss; + ss << "conduit_tmp_result" << rank << "." << cycle << ".yaml"; + std::string tmp_file(ss.str()); + conduit::relay::io::save(current, tmp_file, "yaml"); + conduit::Node n_fromfile; + conduit::relay::io::load(tmp_file, "yaml", n_fromfile); + conduit::utils::remove_file(tmp_file); + // Node::diff returns true if the nodes are different. We want not different. + equal = !baseline.diff(n_fromfile, info, tolerance, true); + } + else + { + // Node::diff returns true if the nodes are different. We want not different. + equal = !baseline.diff(current, info, tolerance, true); + } + + return equal; +} + +//----------------------------------------------------------------------------- +void write_info(MPI_Comm comm, + const conduit::Node &baseline, + const conduit::Node ¤t, + const conduit::Node &info, + int cycle) +{ + int rank = 0; + MPI_Comm_rank(comm, &rank); + + conduit::Node opts; + opts["num_elements_threshold"] = 20; + opts["num_children_threshold"] = 10000; + + // Write the info as YAML. + std::stringstream ss; + ss << "diff." << rank << ".yaml"; + std::ofstream f; + f.open(ss.str().c_str()); + info.to_summary_string_stream(f, opts); + f.close(); + + // Write the important info as HTML. + std::stringstream ss2; + ss2 << "diff." << rank << "." << cycle << ".html"; + std::ofstream html; + html.open(ss2.str().c_str()); + html << "\n"; + html << " \n"; + html << " diff\n"; + html << " \n"; + html << " \n"; + diff_info_to_html(comm, html, baseline, current, info, cycle); + html << " \n"; + html << "\n"; + html.close(); +} + +} // namespace detail + +bool test(const conduit::Node &n, const std::string &fileBase, int cycle, bool make, MPI_Comm comm) +{ + int rank = 0; + MPI_Comm_rank(comm, &rank); + + if (n.has_child("fields")) + { + // Test that the fields contain "good" values. + const conduit::Node &fields = n["fields"]; + for (conduit::index_t i = 0; i < fields.number_of_children(); i++) + { + const conduit::Node &values = fields[i]["values"]; + if (values.number_of_children() > 0) + { + for (conduit::index_t comp = 0; comp < values.number_of_children(); comp++) + { + utilities::scan_field_values(rank, values[comp]); + } + } + else + { + utilities::scan_field_values(rank, values); + } + } + } + + // Make baseline filename. + std::string baselineFilename; + if (getenv("TETON_TESTING_BASELINE_DIR") != nullptr) + { + std::vector s{getenv("TETON_TESTING_BASELINE_DIR"), "baseline" + fileBase}; + baselineFilename = detail::join(s, "/"); + } + else + { + baselineFilename = "baseline" + fileBase; + } + + bool rv = true; + if (make) + conduit::relay::io::save(n, baselineFilename, "yaml"); + else + { + // Make current filename. + std::string filename; + if (getenv("TETON_TESTING_CURRENT_DIR") != nullptr) + { + std::vector s{getenv("TETON_TESTING_CURRENT_DIR"), "current" + fileBase}; + filename = detail::join(s, "/"); + } + else + { + filename = "current" + fileBase; + } + + // Save the current node. + conduit::relay::io::save(n, filename, "yaml"); + + // Load the baseline node. + conduit::Node baseline; + conduit::relay::io::load(baselineFilename, "yaml", baseline); + + // Compare current with the baseline and populate diff info. + conduit::Node info; + rv = detail::compare_baseline(comm, baseline, n, info, cycle); + + // If there were differences, write the info. + if (!rv) + detail::write_info(comm, baseline, n, info, cycle); + } + return rv; +} + +} // namespace testing + +} // namespace Teton diff --git a/src/teton/interface/TetonUtilities.cc b/src/teton/interface/TetonUtilities.cc new file mode 100644 index 0000000..63ece31 --- /dev/null +++ b/src/teton/interface/TetonUtilities.cc @@ -0,0 +1,210 @@ +#include "TetonUtilities.hh" + +#include "conduit/conduit_blueprint.hpp" +#include "conduit/conduit_blueprint_mesh.hpp" +#include "conduit/conduit_relay_mpi.hpp" + +#include +#include +#include + +namespace Teton +{ + +namespace utilities +{ + +void convert_int32(int rank, conduit::Node &root, const std::vector &keys) +{ + for (const auto &path : keys) + { + if (root.has_path(path)) + { + conduit::Node &n = root.fetch_existing(path); + if (!n.dtype().is_int32()) + { + // Convert the data to int32 and put it back in node n. I tried move/swap + // to try and steal ifield's data rather than copying it again but that + // caused n's name to be blank. + conduit::Node ifield; + n.to_int32_array(ifield); + n.set(ifield); + } + } + } +} + +void find_dtype(const conduit::Node &n, + const conduit::DataType &dtype, + const std::string &path, + std::vector &paths) +{ + auto concat = [](const std::string &path, const std::string &name) + { + if (path.empty()) + return name; + return path + "/" + name; + }; + + if (n.number_of_children() > 0) + { + // Make paths be relative to node n at the top level. + for (conduit::index_t i = 0; i < n.number_of_children(); i++) + { + if (path == "") + find_dtype(n[i], dtype, n[i].name(), paths); + else + find_dtype(n[i], dtype, concat(path, n[i].name()), paths); + } + } + else + { + if (n.dtype().id() == dtype.id()) + paths.push_back(path); + } +} + +std::vector find_int64(const conduit::Node &n) +{ + std::vector paths; + find_dtype(n, conduit::DataType::int64(), "", paths); + return paths; +} + +bool scan_field_values(int rank, const conduit::Node &n) +{ + bool retval = true; + if (n.dtype().is_float64()) + { + int count = 0; + conduit::float64_array arr = n.value(); + for (conduit::index_t i = 0; i < arr.number_of_elements(); i++) + { + if (!std::isfinite(arr[i])) + { + std::cout << rank << ":" << n.path() << ": elem[" << i << "] is not a number. " << arr[i] << std::endl; + retval = false; + count++; + if (count > 10) + break; + } + } + } + return retval; +} + +bool find_local_duplicate_points(int domainId, + const conduit::Node &dom, + const conduit::Node &coordset, + conduit::Node &info) +{ + bool retval = false; + // Make sure Conduit is new enough. +#if (CONDUIT_VERSION_MAJOR == 0 && CONDUIT_VERSION_MINOR >= 9) || (CONDUIT_VERSION_MAJOR > 0) + using conduit::index_t; + + // See whether any of the points in the local domain are duplicated. + // If a point's query result does not equal its query index then it + // must have been defined once before. + conduit::blueprint::mesh::utils::query::PointQuery localPQ(dom); + const index_t npts = conduit::blueprint::mesh::coordset::length(coordset); + for (index_t pi = 0; pi < npts; pi++) + { + auto pt = conduit::blueprint::mesh::utils::coordset::_explicit::coords(coordset, pi); + double pt3[3]; + pt3[0] = pt[0]; + pt3[1] = (pt.size() > 1) ? pt[1] : 0.; + pt3[2] = (pt.size() > 2) ? pt[2] : 0.; + localPQ.add(domainId, pt3); + } + localPQ.execute(coordset.name()); + for (index_t pi = 0; pi < npts; pi++) + { + const auto &res = localPQ.results(static_cast(domainId)); + if (res[pi] != pi) + { + const auto pts = localPQ.inputs(domainId); + double pt3[3]{pts[3 * pi], pts[3 * pi + 1], pts[3 * pi + 2]}; + std::stringstream ss; + ss << "Domain " << domainId << " duplicated point " << pi << " (" << pt3[0] << ", " << pt3[1] << ", " << pt3[2] + << ") at " << res[pi] << "."; + + conduit::Node &vn = info.append(); + vn["message"].set(ss.str()); + vn["vertex"] = pi; + vn["duplicate_vertex"] = res[pi]; + vn["coordinate"].set(pt3, 3); + + retval = true; + } + } +#endif + return retval; +} + +std::vector globalizeStringVector(const std::vector &vec, MPI_Comm comm) +{ + // Make a Conduit node from it. + conduit::Node send_node; + for (const auto &value : vec) + send_node[value] = 1; + + // Send the data to all ranks. + conduit::Node recv_node; + conduit::relay::mpi::all_gather_using_schema(send_node, recv_node, comm); + + // Pick through the output and make the output string vector from the node names. + std::set unique; + for (conduit::index_t i = 0; i < recv_node.number_of_children(); i++) + { + const conduit::Node &child = recv_node[i]; + for (conduit::index_t j = 0; j < child.number_of_children(); j++) + unique.insert(child[j].name()); + } + std::vector retval; + retval.insert(retval.begin(), unique.begin(), unique.end()); + + return retval; +} + +//------------------------------------------------------------------------------ +int Banner::level = 0; + +Banner::Banner(MPI_Comm c, const std::string &str) : comm(c), rank(0), name(str) +{ + MPI_Comm_rank(comm, &rank); + MPI_Barrier(comm); + if (rank == 0) + printLine(name + " (start)"); + MPI_Barrier(comm); + level++; +} + +Banner::~Banner() +{ + level--; + MPI_Barrier(comm); + if (rank == 0) + printLine(name + " (end)"); + MPI_Barrier(comm); +} + +void Banner::printLine(const std::string s) const +{ + int n = std::max(2, (80 - 2 - static_cast(s.size())) / 2); + emit(' ', level * 2); + emit('=', n); + std::cout << " " << s << " "; + emit('=', n); + std::cout << std::endl; +} + +void Banner::emit(char c, int n) const +{ + for (int i = 0; i < n; i++) + std::cout << c; +} + +} // namespace utilities + +} // namespace Teton diff --git a/src/teton/misc/CMakeLists.txt b/src/teton/misc/CMakeLists.txt index 67912bb..2082a9c 100644 --- a/src/teton/misc/CMakeLists.txt +++ b/src/teton/misc/CMakeLists.txt @@ -1,5 +1,5 @@ target_sources( teton PRIVATE - ArrayChecks_mod.F90 + CodeChecks_mod.F90 f90errors.F90 mpi_param_mod.F90 mpif90_mod.F90 diff --git a/src/teton/misc/ArrayChecks_mod.F90 b/src/teton/misc/CodeChecks_mod.F90 similarity index 61% rename from src/teton/misc/ArrayChecks_mod.F90 rename to src/teton/misc/CodeChecks_mod.F90 index d7ffb0e..3c05716 100644 --- a/src/teton/misc/ArrayChecks_mod.F90 +++ b/src/teton/misc/CodeChecks_mod.F90 @@ -1,9 +1,10 @@ !******************************************************************************* -! Array Bounds Checking - A set of subroutines that help detect array * -! operation errors, such as out-of-bounds accesses. * +! Code checks +! This module contains functions for checking code correctness at runtime. +! For example, verifying that an array access is valid. !******************************************************************************* -module ArrayChecks_mod +module CodeChecks_mod use, intrinsic :: iso_c_binding, only : c_double, c_int implicit none @@ -15,7 +16,11 @@ module ArrayChecks_mod is_legal_access_int_array_2d, & is_legal_access_int_array_3d, & is_legal_access_AngleSet_array_1d, & - is_legal_access_SetData_array_1d + is_legal_access_SetData_array_1d, & + is_legal_access_CommSet_array_1d, & + is_legal_access_GroupSet_array_1d, & + is_legal_access_Quadrature_array_1d + end interface @@ -32,6 +37,7 @@ function is_legal_access_double_array_1d(ptr, i) result(isLegal) if (i > SIZE(ptr, 1)) then isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " index: ", i else isLegal = .TRUE. endif @@ -46,6 +52,7 @@ function is_legal_access_double_array_2d(ptr, i, j) result(isLegal) logical :: isLegal if (i > SIZE(ptr, 1) .OR. j > SIZE(ptr, 2) ) then + print *, "Illegal array access. Shape ", SHAPE(ptr), " indices: ", i, ", ", j isLegal = .FALSE. else isLegal = .TRUE. @@ -61,6 +68,7 @@ function is_legal_access_double_array_3d(ptr, i, j, k) result(isLegal) logical :: isLegal if (i > SIZE(ptr, 1) .OR. j > SIZE(ptr, 2) .OR. k > SIZE(ptr, 3) ) then + print *, "Illegal array access. Shape ", SHAPE(ptr), " indices: ", i, ", ", j, ", ", k isLegal = .FALSE. else isLegal = .TRUE. @@ -77,6 +85,7 @@ function is_legal_access_int_array_1d(ptr, i) result(isLegal) if (i > SIZE(ptr, 1)) then isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " index: ", i else isLegal = .TRUE. endif @@ -92,6 +101,7 @@ function is_legal_access_int_array_2d(ptr, i, j) result(isLegal) if (i > SIZE(ptr, 1) .OR. j > SIZE(ptr, 2) ) then isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " indices: ", i, ", ", j else isLegal = .TRUE. endif @@ -107,6 +117,7 @@ function is_legal_access_int_array_3d(ptr, i, j, k) result(isLegal) if (i > SIZE(ptr, 1) .OR. j > SIZE(ptr, 2) .OR. k > SIZE(ptr, 3) ) then isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " indices: ", i, ", ", j, ", ", k else isLegal = .TRUE. endif @@ -123,6 +134,7 @@ function is_legal_access_AngleSet_array_1d(ptr, i) result(isLegal) if (i > SIZE(ptr, 1) ) then isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " index: ", i else isLegal = .TRUE. endif @@ -139,6 +151,58 @@ function is_legal_access_SetData_array_1d(ptr, i) result(isLegal) if (i > SIZE(ptr, 1) ) then isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " index: ", i + else + isLegal = .TRUE. + endif + + return + end function + + function is_legal_access_CommSet_array_1d(ptr, i) result(isLegal) + use CommSet_mod + + type(CommSet), intent(in), pointer, dimension(:) :: ptr + integer :: i + logical :: isLegal + + if (i > SIZE(ptr, 1) ) then + isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " index: ", i + else + isLegal = .TRUE. + endif + + return + end function + + function is_legal_access_GroupSet_array_1d(ptr, i) result(isLegal) + use GroupSet_mod + + type(GroupSet), intent(in), pointer, dimension(:) :: ptr + integer :: i + logical :: isLegal + + if (i > SIZE(ptr, 1) ) then + isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " index: ", i + else + isLegal = .TRUE. + endif + + return + end function + + function is_legal_access_Quadrature_array_1d(ptr, i) result(isLegal) + use Quadrature_mod + + type(Quadrature), intent(in), pointer, dimension(:) :: ptr + integer :: i + logical :: isLegal + + if (i > SIZE(ptr, 1) ) then + isLegal = .FALSE. + print *, "Illegal array access. Shape ", SHAPE(ptr), " index: ", i else isLegal = .TRUE. endif @@ -146,4 +210,4 @@ function is_legal_access_SetData_array_1d(ptr, i) result(isLegal) return end function -end module ArrayChecks_mod +end module CodeChecks_mod diff --git a/src/teton/misc/mpif90_mod.F90 b/src/teton/misc/mpif90_mod.F90 index a09fd43..7883c9d 100644 --- a/src/teton/misc/mpif90_mod.F90 +++ b/src/teton/misc/mpif90_mod.F90 @@ -44,19 +44,19 @@ subroutine mpi_MPIAllReduce_r(recvBuf,mpiOp,comm) implicit none ! passed variables - real(long), intent(inout) :: recvBuf + real(double), intent(inout) :: recvBuf character(*), intent(in) :: mpiOp integer, intent(in) :: comm ! local variables integer :: length, ierror - real(long) :: sendBuf + real(double) :: sendBuf ! character(4), dimension(4) :: mpiOps = & ! (/"min ","max ","prod","sum "/) ! assertions -! tetonAssert(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") +! TETON_ASSERT(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") ! copy the send buffer into temporary storage sendBuf = recvBuf @@ -102,19 +102,19 @@ subroutine mpi_MPIAllReduce_r_(recvBuf,mpiOp,comm) implicit none ! passed variables - real(long), contiguous, intent(inout) :: recvBuf(:) + real(double), contiguous, intent(inout) :: recvBuf(:) character(*), intent(in) :: mpiOp integer, intent(in) :: comm ! local variables integer :: length, ierror, alloc_stat - real(long), allocatable :: sendBuf(:) + real(double), allocatable :: sendBuf(:) ! character(4), dimension(4) :: mpiOps = & ! (/"min ","max ","prod","sum "/) ! assertions -! tetonAssert(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") +! TETON_ASSERT(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") ! allocate memory for the send buffer allocate(sendBuf(size(recvBuf))) @@ -178,7 +178,7 @@ subroutine mpi_MPIAllReduce_i(recvBuf,mpiOp,comm) ! (/"min ","max ","prod","sum "/) ! assertions -! tetonAssert(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") +! TETON_ASSERT(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") ! copy the send buffer into temporary storage sendBuf = recvBuf @@ -236,7 +236,7 @@ subroutine mpi_MPIAllReduce_i_(recvBuf,mpiOp,comm) ! (/"min ","max ","prod","sum "/) ! assertions -! tetonAssert(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") +! TETON_ASSERT(any(mpiOp==mpiOps(:)),"Invalid MPI Reduction Operation") ! allocate memory for the send buffer allocate(sendBuf(size(recvBuf))) @@ -407,8 +407,8 @@ subroutine mpi_MPIGather_r_(sendBuf,recvBuf,gatherNode,comm) implicit none ! passed variables - real(long), contiguous, intent(in) :: sendBuf(:) - real(long), contiguous, intent(inout) :: recvBuf(:,:) + real(double), contiguous, intent(in) :: sendBuf(:) + real(double), contiguous, intent(inout) :: recvBuf(:,:) integer, intent(in) :: gatherNode integer, intent(in) :: comm @@ -416,7 +416,7 @@ subroutine mpi_MPIGather_r_(sendBuf,recvBuf,gatherNode,comm) logical :: tf integer :: commSize, myNode, sendCount, recvCount, ierror - real(long) :: recvBufDum(1,1) + real(double) :: recvBufDum(1,1) ! determine size and rank commSize = getMPISize(comm) @@ -425,9 +425,9 @@ subroutine mpi_MPIGather_r_(sendBuf,recvBuf,gatherNode,comm) ! assertions if (myNode == gatherNode) then tf = size(recvBuf,1)==size(sendBuf,1) - tetonAssert(tf,"Invalid MPI Gather") + TETON_ASSERT(tf,"Invalid MPI Gather") tf = (size(recvBuf,2)==commSize) - tetonAssert(tf,"Invalid MPI Gather") + TETON_ASSERT(tf,"Invalid MPI Gather") endif ! MPI Barrier before performing the gather @@ -478,8 +478,8 @@ subroutine mpi_MPIGather_r__(sendBuf,recvBuf,gatherNode,comm) implicit none ! passed variables - real(long), contiguous, intent(in) :: sendBuf(:,:) - real(long), contiguous, intent(inout) :: recvBuf(:,:,:) + real(double), contiguous, intent(in) :: sendBuf(:,:) + real(double), contiguous, intent(inout) :: recvBuf(:,:,:) integer, intent(in) :: gatherNode integer, intent(in) :: comm @@ -487,7 +487,7 @@ subroutine mpi_MPIGather_r__(sendBuf,recvBuf,gatherNode,comm) logical(kind=1) :: tf integer :: commSize, myNode, sendCount, recvCount, ierror - real(long) :: recvBufDum(1,1,1) + real(double) :: recvBufDum(1,1,1) ! determine size and rank commSize = getMPISize(comm) @@ -496,11 +496,11 @@ subroutine mpi_MPIGather_r__(sendBuf,recvBuf,gatherNode,comm) ! assertions if (myNode == gatherNode) then tf = size(recvBuf,1)==size(sendBuf,1) - tetonAssert(tf,"Invalid MPI Gather") + TETON_ASSERT(tf,"Invalid MPI Gather") tf = size(recvBuf,2)==size(sendBuf,2) - tetonAssert(tf,"Invalid MPI Gather") + TETON_ASSERT(tf,"Invalid MPI Gather") tf = size(recvBuf,3)==commSize - tetonAssert(tf,"Invalid MPI Gather") + TETON_ASSERT(tf,"Invalid MPI Gather") endif ! MPI Barrier before performing the gather @@ -551,7 +551,7 @@ subroutine mpi_MPIBcast_r(sendBuf,nsend,root,comm) implicit none ! passed variables - real(long), contiguous, intent(inout) :: sendBuf(:) + real(double), contiguous, intent(inout) :: sendBuf(:) integer, intent(in) :: nsend integer, intent(in) :: root integer, intent(in) :: comm @@ -620,7 +620,7 @@ subroutine mpi_MPISendInit(sendBuf,nsend,isend,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(in) :: sendBuf(:,:) + real(double), contiguous, intent(in) :: sendBuf(:,:) integer, intent(in) :: nsend integer, intent(in) :: isend integer, intent(in) :: tag @@ -657,7 +657,7 @@ subroutine mpi_MPISendInit1(sendBuf,nsend,isend,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(in) :: sendBuf(:) + real(double), contiguous, intent(in) :: sendBuf(:) integer, intent(in) :: nsend integer, intent(in) :: isend integer, intent(in) :: tag @@ -731,7 +731,7 @@ subroutine mpi_MPIRecvInit(recvBuf,nrecv,irecv,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(inout) :: recvBuf(:,:) + real(double), contiguous, intent(inout) :: recvBuf(:,:) integer, intent(inout) :: nrecv integer, intent(in) :: irecv integer, intent(in) :: tag @@ -768,7 +768,7 @@ subroutine mpi_MPIRecvInit1(recvBuf,nrecv,irecv,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(inout) :: recvBuf(:) + real(double), contiguous, intent(inout) :: recvBuf(:) integer, intent(inout) :: nrecv integer, intent(in) :: irecv integer, intent(in) :: tag @@ -843,7 +843,7 @@ subroutine mpi_MPIIsend_r(sendBuf,nsend,isend,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(in) :: sendBuf(:) + real(double), contiguous, intent(in) :: sendBuf(:) integer, intent(in) :: nsend integer, intent(in) :: isend integer, intent(in) :: tag @@ -881,7 +881,7 @@ subroutine mpi_MPIIsend_r2(sendBuf,nsend,isend,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(in) :: sendBuf(:,:) + real(double), contiguous, intent(in) :: sendBuf(:,:) integer, intent(in) :: nsend integer, intent(in) :: isend integer, intent(in) :: tag @@ -997,7 +997,7 @@ subroutine mpi_MPIIrecv_r(recvBuf,nrecv,irecv,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(inout) :: recvBuf(:) + real(double), contiguous, intent(inout) :: recvBuf(:) integer, intent(inout) :: nrecv integer, intent(in) :: irecv integer, intent(in) :: tag @@ -1035,7 +1035,7 @@ subroutine mpi_MPIIrecv_r2(recvBuf,nrecv,irecv,tag,comm,request) implicit none ! passed variables - real(long), contiguous, intent(inout) :: recvBuf(:,:) + real(double), contiguous, intent(inout) :: recvBuf(:,:) integer, intent(inout) :: nrecv integer, intent(in) :: irecv integer, intent(in) :: tag diff --git a/src/teton/mods/AngleSet_mod.F90 b/src/teton/mods/AngleSet_mod.F90 index 307913e..441a8b9 100644 --- a/src/teton/mods/AngleSet_mod.F90 +++ b/src/teton/mods/AngleSet_mod.F90 @@ -40,6 +40,7 @@ module AngleSet_mod integer :: NumBin0 integer :: angle0 ! angle offset integer :: totalCycles + integer :: maxInterface logical (kind=1) :: GTASet ! Quadrature related @@ -94,8 +95,12 @@ module AngleSet_mod type, public :: HypPlane integer :: maxZones + integer :: maxCorners + integer :: interfaceLen integer, pointer, contiguous :: zonesInPlane(:) => null() + integer, pointer, contiguous :: cornersInPlane(:) => null() integer, pointer, contiguous :: badCornerList(:) => null() + integer, pointer, contiguous :: interfaceList(:) => null() integer, pointer, contiguous :: hplane1(:) => null() integer, pointer, contiguous :: hplane2(:) => null() integer, pointer, contiguous :: ndone(:) => null() @@ -233,6 +238,7 @@ subroutine AngleSet_ctor(self, & self% nPolarAngles = QuadPtr% nPolarAngles self% angle0 = angle0 self% Order = QuadPtr% Order + self% maxInterface = 1 self% GTASet = GTASet ! Allocate Memory @@ -369,6 +375,10 @@ subroutine AngleSet_ctor(self, & self% numCycles(:) = 0 self% cycleOffSet(:) = 0 self% nHyperPlanes(:) = 0 + + do n=1,NumAngles + self% HypPlanePtr(n)% interfaceLen = 0 + enddo endif @@ -453,10 +463,13 @@ end subroutine AngleSet_dtor ! construct HyperPlane !======================================================================= subroutine AngleSet_ctorHypPlane(self, angle, nHyperPlanes, meshCycles, & - nDomains, zonesInPlane, cycleList) + nHyperDomains, elementsInPlane, & + CToHypPlane, cycleList) use Size_mod + use Geometry_mod use MemoryAllocator_mod + use Options_mod implicit none @@ -465,95 +478,267 @@ subroutine AngleSet_ctorHypPlane(self, angle, nHyperPlanes, meshCycles, & integer, intent(in) :: angle integer, intent(in) :: nHyperPlanes integer, intent(in) :: meshCycles - integer, intent(in) :: nDomains - integer, intent(in) :: zonesInPlane(nHyperPlanes) + integer, intent(in) :: nHyperDomains + + integer, intent(in) :: elementsInPlane(nHyperPlanes) + integer, intent(in) :: CToHypPlane(Size% ncornr) integer, intent(in) :: cycleList(meshCycles) ! Local type(HypPlane), pointer :: HypPlanePtr + integer :: sweepVersion integer :: hPlane + integer :: hPlane1 + integer :: hPlane2 integer :: mCycle - integer :: zoneSum - integer :: zonesPerDomain - integer :: zoneTarget - integer :: planesPerDomain - integer :: planeTarget + integer :: elementSum + integer :: elementsPerDomain + integer :: elementTarget integer :: domID + integer :: numC + integer :: c + integer :: c0 + integer :: cfp + integer :: cez + integer :: cface + integer :: nCorner + integer :: nCFaces + integer :: nCFacesEZ + integer :: i + integer :: ii + integer :: ifp + integer :: zone + integer :: nzones + integer :: ndone + integer :: hp + real(adqt) :: afp + real(adqt) :: aez + + integer, allocatable :: cornerList(:) + logical (kind=1), allocatable :: done(:) logical(kind=1) :: notDone ! Allocate Memory - planesPerDomain = nHyperPlanes/nDomains - zonesPerDomain = Size% nzones/nDomains + if ( self% GTASet ) then + sweepVersion = 0 + else + sweepVersion = Options% getSweepVersion() + endif - HypPlanePtr => self% HypPlanePtr(angle) + HypPlanePtr => self% HypPlanePtr(angle) + + if ( sweepVersion == 0 ) then + call Allocator%allocate(Size%usePinnedMemory, self%label, "zonesInPlane", HypPlanePtr% zonesInPlane, nHyperPlanes) - call Allocator%allocate(Size%usePinnedMemory,self%label, "zonesInPlane", HypPlanePtr% zonesInPlane, nHyperPlanes) + elementsPerDomain = Size% nzones/nHyperDomains + HypPlanePtr% zonesInPlane(:) = elementsInPlane(:) + HypPlanePtr% maxZones = maxval( elementsInPlane(1:nHyperPlanes) ) + else + call Allocator%allocate(Size%usePinnedMemory, self%label, "cornersInPlane", HypPlanePtr% cornersInPlane, nHyperPlanes) + + elementsPerDomain = Size% ncornr/nHyperDomains + HypPlanePtr% cornersInPlane(:) = elementsInPlane(:) + HypPlanePtr% maxCorners = maxval( elementsInPlane(1:nHyperPlanes) ) + endif + + call Allocator%allocate(Size%usePinnedMemory, self%label, "hplane1", HypPlanePtr% hplane1, nHyperDomains+1) + call Allocator%allocate(Size%usePinnedMemory, self%label, "hplane2", HypPlanePtr% hplane2, nHyperDomains) + call Allocator%allocate(Size%usePinnedMemory, self%label, "ndone", HypPlanePtr% ndone, nHyperDomains+1) allocate( HypPlanePtr% badCornerList(meshCycles+1) ) - allocate( HypPlanePtr% hplane1(nDomains+1) ) - allocate( HypPlanePtr% hplane2(nDomains) ) - allocate( HypPlanePtr% ndone(nDomains+1) ) + allocate( cornerList(Size% ncornr) ) + allocate( done(Size% ncornr) ) domID = 1 - zoneSum = 0 - planeTarget = planesPerDomain - zoneTarget = zonesPerDomain + elementSum = 0 + elementTarget = elementsPerDomain notDone = .TRUE. HypPlanePtr% ndone(1) = 0 - if (Size% useGPU) then +! If nHyperDomains=1 there is no interface list, but it is given a length of 1 +! because it is mapped. In this case, it uses cornerList(1) which may be +! unset. We set it here. - do hPlane=1,nHyperPlanes - HypPlanePtr% zonesInPlane(hPlane) = zonesInPlane(hPlane) - zoneSum = zoneSum + zonesInPlane(hPlane) + cornerList(1) = 1 - if (hPlane >= planeTarget .and. notDone) then - HypPlanePtr% ndone(domID+1) = zoneSum - HypPlanePtr% hplane1(domID+1) = hPlane + 1 - HypPlanePtr% hplane2(domID) = hPlane - planeTarget = planeTarget + planesPerDomain - domID = domID + 1 + do hPlane=1,nHyperPlanes + elementSum = elementSum + elementsInPlane(hPlane) - if (domID == nDomains) then - notDone = .False. - endif + if (elementSum > elementTarget .and. notDone) then + + HypPlanePtr% ndone(domID+1) = elementSum + HypPlanePtr% hplane1(domID+1) = hPlane + 1 + HypPlanePtr% hplane2(domID) = hPlane + elementTarget = elementTarget + elementsPerDomain + domID = domID + 1 + + if (domID == nHyperDomains) then + notDone = .False. endif - enddo + endif + enddo - else + HypPlanePtr% hplane1(1) = 1 + HypPlanePtr% hplane2(nHyperDomains) = nHyperPlanes + +! We need to store the upstream corner fluxes at the hyperdomain +! boundaries so compute that size and corner list here, Note +! that we do not count the first domain as there are no +! upstream hyperdomains + + numC = 0 + done(:) = .FALSE. + + do domID=2,nHyperDomains + hPlane1 = HypPlanePtr% hplane1(domID) + hPlane2 = HypPlanePtr% hplane2(domID) + ndone = HypPlanePtr% ndone(domID) + + HyperPlaneLoop: do hPlane=hPlane1,hPlane2 + + if ( sweepVersion == 0 ) then + + nzones = elementsInPlane(hPlane) + + ZoneLoop: do ii=1,nzones + zone = iabs( self% nextZ(ndone+ii,angle) ) + c0 = Geom% cOffSet(zone) + nCorner = Geom% numCorner(zone) + + CornerLoop: do c=1,nCorner + nCFaces = Geom% nCFacesArray(c0+c) + + CornerFaceLoop: do cface=1,nCFaces + cfp = Geom% cFP(cface,c0+c) + +! Eliminate entries outside the mesh (cfp > ncornr) + if ( cfp <= Size% ncornr ) then + + hp = CToHypPlane(cfp) + +! Consider all corners in hyperplanes upstream from hplane1 + if (hp < hplane1) then + + afp = DOT_PRODUCT( self% omega(:,angle),Geom% A_fp(:,cface,c0+c) ) - do hPlane=1,nHyperPlanes - HypPlanePtr% zonesInPlane(hPlane) = zonesInPlane(hPlane) - zoneSum = zoneSum + zonesInPlane(hPlane) + if ( afp < zero ) then + if ( .not. done(cfp) ) then + numC = numC + 1 + cornerList(numC) = cfp + done(cfp) = .TRUE. + endif + endif + endif + endif + enddo CornerFaceLoop + enddo CornerLoop + enddo ZoneLoop - if (zoneSum > zoneTarget .and. notDone) then - HypPlanePtr% ndone(domID+1) = zoneSum - HypPlanePtr% hplane1(domID+1) = hPlane + 1 - HypPlanePtr% hplane2(domID) = hPlane - zoneTarget = zoneTarget + zonesPerDomain - domID = domID + 1 + ndone = ndone + nzones + + else + + nCorner = elementsInPlane(hPlane) + + CornerLoop1: do ii=1,nCorner + c = self% nextC(ndone+ii,angle) + zone = Geom% CToZone(c) + c0 = Geom% cOffSet(zone) + nCFaces = Geom% nCFacesArray(c) + + CornerFaceLoop1: do cface=1,nCFaces + cfp = Geom% cFP(cface,c) + +! Eliminate entries outside the mesh (cfp > ncornr) + if ( cfp <= Size% ncornr ) then + + hp = CToHypPlane(cfp) + +! Consider all corners in hyperplanes upstream from hplane1 + if (hp < hplane1) then + + afp = DOT_PRODUCT( self% omega(:,angle),Geom% A_fp(:,cface,c) ) + + if ( afp < zero ) then + if ( .not. done(cfp) ) then + numC = numC + 1 + cornerList(numC) = cfp + done(cfp) = .TRUE. + endif + endif + endif + endif + +! For the corner sweep we also add corners in the same zone + cez = c0 + Geom% cEZ(cface,c) + hp = CToHypPlane(cez) + +! Consider all corners in hyperplanes upstream from hplane1 + if (hp < hplane1) then + aez = DOT_PRODUCT( self% omega(:,angle),Geom% A_ez(:,cface,c) ) + + if ( aez < zero ) then + if ( .not. done(cez) ) then + numC = numC + 1 + cornerList(numC) = cez + done(cez) = .TRUE. + endif + + nCFacesEZ = Geom% nCFacesArray(cez) + + do i=1,nCFacesEZ + cfp = Geom% cFP(i,cez) + + if ( cfp <= Size% ncornr ) then + + if (CToHypPlane(cfp) < hplane1) then + + afp = DOT_PRODUCT( self% omega(:,angle),Geom% A_fp(:,i,cez) ) + + if ( afp < zero ) then + if ( .not. done(cfp) ) then + numC = numC + 1 + cornerList(numC) = cfp + done(cfp) = .TRUE. + endif + endif + endif + + endif + enddo + + endif + endif + enddo CornerFaceLoop1 + enddo CornerLoop1 + + ndone = ndone + nCorner - if (domID == nDomains) then - notDone = .False. - endif endif - enddo - endif + enddo HyperPlaneLoop + enddo - HypPlanePtr% hplane1(1) = 1 - HypPlanePtr% hplane2(nDomains) = nHyperPlanes +! Use the maximum to avoid mapping a zero-length array when nHyperDomains=1 + HypPlanePtr% interfaceLen = numC + numC = max( numC,1 ) + call Allocator%allocate(Size%usePinnedMemory, self%label, "interfaceList", HypPlanePtr% interfaceList, numC) + + do ii=1,numC + HypPlanePtr% interfaceList(ii) = cornerList(ii) + enddo do mCycle=1,meshCycles HypPlanePtr% badCornerList(mCycle) = cycleList(mCycle) enddo - HypPlanePtr% maxZones = maxval( zonesInPlane(1:nHyperPlanes) ) + deallocate( cornerList ) + deallocate( done ) return @@ -608,13 +793,14 @@ end function AngleSet_getZonesInPlane !======================================================================= ! destruct HyperPlane !======================================================================= - subroutine AngleSet_dtorHypPlane(self) + subroutine AngleSet_dtorHypPlane(self, sweepVersion) use MemoryAllocator_mod implicit none ! Passed variables type(AngleSet), intent(inout) :: self + integer, intent(in) :: sweepVersion ! Local type(HypPlane), pointer :: HypPlanePtr @@ -628,12 +814,18 @@ subroutine AngleSet_dtorHypPlane(self) HypPlanePtr => self% HypPlanePtr(angle) - call Allocator%deallocate(Size%usePinnedMemory,self%label,"zonesInPlane",HypPlanePtr% zonesInPlane) + if ( sweepVersion == 0 ) then + call Allocator%deallocate(Size%usePinnedMemory, self%label, "zonesInPlane", HypPlanePtr% zonesInPlane) + else + call Allocator%deallocate(Size%usePinnedMemory, self%label, "cornersInPlane", HypPlanePtr% cornersInPlane) + endif + + call Allocator%deallocate(Size%usePinnedMemory, self%label, "hplane1", HypPlanePtr% hplane1) + call Allocator%deallocate(Size%usePinnedMemory, self%label, "hplane2", HypPlanePtr% hplane2) + call Allocator%deallocate(Size%usePinnedMemory, self%label, "ndone", HypPlanePtr% ndone) + call Allocator%deallocate(Size%usePinnedMemory, self%label, "interfaceList", HypPlanePtr% interfaceList) deallocate( HypPlanePtr% badCornerList ) - deallocate( HypPlanePtr% hplane1 ) - deallocate( HypPlanePtr% hplane2 ) - deallocate( HypPlanePtr% ndone ) endif enddo diff --git a/src/teton/mods/CommSet_mod.F90 b/src/teton/mods/CommSet_mod.F90 index 6b37fed..ba1dafe 100644 --- a/src/teton/mods/CommSet_mod.F90 +++ b/src/teton/mods/CommSet_mod.F90 @@ -56,7 +56,7 @@ module CommSet_mod type(AngleSet), pointer :: AngleSetPtr => null() type(Communicator), pointer :: CommPtr(:,:) => null() - type(CommunicateFlux), pointer :: CommFluxPtr(:) => null() + type(CommunicateFlux), pointer :: CommFluxPtr(:) => null() end type CommSet @@ -209,6 +209,8 @@ subroutine CommSet_dtor(self) type(CommSet), intent(inout) :: self + integer :: ierr + if (Size% ncomm > 0) then deallocate( self% CommPtr ) @@ -216,6 +218,9 @@ subroutine CommSet_dtor(self) deallocate( self% RecvOrder ) deallocate( self% RecvOrder0 ) +! ERROR: Actually calling request free here causes MPI errors, but only durring shutdown +! call MPI_Request_free(self% request(1), ierr) +! call MPI_Request_free(self% request(2), ierr) deallocate( self% request ) deallocate( self% NetFlux ) @@ -225,14 +230,14 @@ subroutine CommSet_dtor(self) deallocate( self% IncFlux ) deallocate( self% IncFluxOld ) deallocate( self% relError ) + deallocate( self% Converged ) deallocate( self% NangBinList ) deallocate( self% AngleToBin ) deallocate( self% AngleOrder ) deallocate( self% AngleOrder0 ) - deallocate( self% Converged ) - + call MPI_COMM_FREE(self% COMM_GROUP, ierr) return diff --git a/src/teton/mods/Communicator_mod.F90 b/src/teton/mods/Communicator_mod.F90 index 81d7dde..9a2c4dd 100644 --- a/src/teton/mods/Communicator_mod.F90 +++ b/src/teton/mods/Communicator_mod.F90 @@ -27,15 +27,15 @@ module Communicator_mod type, public :: Communicator - integer :: nSend ! number of boundary elements to send - integer :: nRecv ! number of boundary elements to receive + integer :: nSend ! number of boundary elements to send + integer :: nRecv ! number of boundary elements to receive - integer, pointer, contiguous :: ListRecv(:) ! ListRecv(nRecv) - integer, pointer, contiguous :: ListSend(:,:) ! ListSend(2,nSend) - integer :: irequest(2) ! irequest(2) + integer, pointer, contiguous :: ListRecv(:) => null() ! ListRecv(nRecv) + integer, pointer, contiguous :: ListSend(:,:) => null() ! ListSend(2,nSend) + integer :: irequest(2) ! irequest(2) - real(adqt), pointer, contiguous :: psibsend(:,:) ! psibsend(Groups,nSend) - send buffer - real(adqt), pointer, contiguous :: psibrecv(:,:) ! psibrecv(Groups,nRecv) - receive buffer + real(adqt), pointer, contiguous :: psibsend(:,:) => null() ! psibsend(Groups,nSend) - send buffer + real(adqt), pointer, contiguous :: psibrecv(:,:) => null() ! psibrecv(Groups,nRecv) - receive buffer end type Communicator @@ -66,28 +66,28 @@ module Communicator_mod subroutine Communicator_ctor_buffer(self, nSend, nRecv, Groups, & nGroupSets) + use MemoryAllocator_mod, only : Allocator + use Size_mod, only : Size implicit none ! Passed variables - type(Communicator), intent(inout) :: self integer, intent(in) :: nSend integer, intent(in) :: nRecv integer, intent(in) :: Groups integer, intent(in) :: nGroupSets - self% nSend = nSend self% nRecv = nRecv if (self% nSend > 0) then allocate( self% ListSend(2,nSend) ) - allocate( self% psibsend(Groups,nSend*nGroupSets) ) + call Allocator%allocate(Allocator%use_for_comm_data,"Communicator","psibsend", self% psibsend, Groups, nSend*nGroupSets) endif if (self% nRecv > 0) then allocate( self% ListRecv(nRecv) ) - allocate( self% psibrecv(Groups,nRecv*nGroupSets) ) + call Allocator%allocate(Allocator%use_for_comm_data,"Communicator","psibrecv", self% psibrecv, Groups, nRecv*nGroupSets) endif return @@ -102,6 +102,8 @@ subroutine Communicator_dtor(self) use mpi_param_mod use mpif90_mod + use MemoryAllocator_mod, only : Allocator + use Size_mod, only : Size implicit none @@ -119,16 +121,17 @@ subroutine Communicator_dtor(self) call MPI_Request_Free(self%irequest(1), ierr) deallocate( self% ListSend ) - deallocate( self% psibsend ) + call Allocator%deallocate(Allocator%use_for_comm_data,"Communicator","psibsend", self% psibsend) + endif if (self% nRecv > 0) then call MPI_Request_Free(self%irequest(2), ierr) deallocate( self% ListRecv ) - deallocate( self% psibrecv ) + call Allocator%deallocate(Allocator%use_for_comm_data,"Communicator","psibrecv", self% psibrecv) + endif - return diff --git a/src/teton/mods/Datastore_mod.F90 b/src/teton/mods/Datastore_mod.F90 index 6b51b66..5620753 100644 --- a/src/teton/mods/Datastore_mod.F90 +++ b/src/teton/mods/Datastore_mod.F90 @@ -33,6 +33,7 @@ module Datastore_mod contains procedure :: save_hdf5 procedure :: initialize + procedure :: partitioning end type datastore_type @@ -79,6 +80,33 @@ end subroutine c_conduit_relay_io_save end subroutine +!======================================================================= +! Get whether partitioning is enabled. +!======================================================================= + function partitioning(self) result(res) + logical res + class(datastore_type) :: self + integer value + character(len=80) :: str + + res = .FALSE. + ! Get the value from the options. + if (theDatastore%root%has_path("options/partitioning")) then + value = theDatastore%root%fetch_path_as_int32("options/partitioning") + if (value .ne. 0) then + res = .TRUE. + endif + endif + ! Get the value from the environment + call getenv("TETON_PARTITION", str) + if (str .ne. " ") then + value = 0 + read (str,*) value + if (value .ne. 0) then + res = .TRUE. + endif + endif + end function !======================================================================= ! Get C pointer to conduit node diff --git a/src/teton/mods/GreyAcceleration_mod.F90 b/src/teton/mods/GreyAcceleration_mod.F90 index 80de755..9288046 100644 --- a/src/teton/mods/GreyAcceleration_mod.F90 +++ b/src/teton/mods/GreyAcceleration_mod.F90 @@ -13,7 +13,7 @@ module GreyAcceleration_mod type, public :: GreyAcceleration integer :: ID - integer :: nGreySweepIters + integer :: nGreySISweeps real(adqt) :: epsGrey ! Both GTA solvers @@ -24,15 +24,11 @@ module GreyAcceleration_mod real(adqt), pointer, contiguous :: Chi(:,:) => null() ! Chi(ngr,ncornr) real(adqt), pointer, contiguous :: TsaSource(:) => null() ! TsaSource(ncornr) - real(adqt), pointer, contiguous :: CGDirectionB(:,:) => null() ! CGDirectionB(nbelem,nangGTA) - real(adqt), pointer, contiguous :: CGResidualB(:,:) => null() ! CGResidualB(nbelem,nangGTA) - real(adqt), pointer, contiguous :: CGActionB(:,:) => null() ! CGActionB(nbelem,nangGTA) - real(adqt), pointer, contiguous :: CGActionSB(:,:) => null() ! CGActionSB(nbelem,nangGTA) - ! New GTA Solver only real(adqt), pointer, contiguous :: GreySigTotal(:) => null() real(adqt), pointer, contiguous :: GreySigtInv(:) => null() real(adqt), pointer, contiguous :: PhiInc(:) => null() + real(adqt), pointer, contiguous :: Sscat(:) => null() real(adqt), pointer, contiguous :: Q(:) => null() real(adqt), pointer, contiguous :: TT(:,:) => null() real(adqt), pointer, contiguous :: Pvv(:,:) => null() @@ -102,7 +98,8 @@ subroutine GreyAcceleration_ctor(self) ! Initialize a minimum convergence tolerance for the grey iteration self% ID = 1 - self% nGreySweepIters = 2 +! Number of Grey source iteration sweeps made before starting BiCG + self% nGreySISweeps = 5 self% epsGrey = 0.1_adqt if (Size% ndim == 2) then @@ -142,11 +139,6 @@ subroutine GreyAcceleration_ctor(self) call Allocator%allocate(Size%usePinnedMemory,self%label,"TsaSource", self% TsaSource, Size%ncornr) call Allocator%allocate(Size%usePinnedMemory,self%label,"GreySigScatVol", self% GreySigScatVol, Size%ncornr ) - allocate( self% CGDirectionB(Size%nbelem,Size%nangGTA) ) - allocate( self% CGResidualB(Size%nbelem,Size%nangGTA) ) - allocate( self% CGActionB(Size%nbelem,Size%nangGTA) ) - allocate( self% CGActionSB(Size%nbelem,Size%nangGTA) ) - if (Size% useNewGTASolver) then ! New GTA Solver only @@ -154,6 +146,7 @@ subroutine GreyAcceleration_ctor(self) call Allocator%allocate(Size%usePinnedMemory,self%label,"GreySigTotal", self% GreySigTotal, Size%ncornr) call Allocator%allocate(Size%usePinnedMemory,self%label,"GreySigtInv", self% GreySigtInv, Size%ncornr) call Allocator%allocate(Size%usePinnedMemory,self%label,"PhiInc", self% PhiInc, Size%ncornr) + call Allocator%allocate(Size%usePinnedMemory,self%label,"Sscat", self% Sscat, Size%ncornr) call Allocator%allocate(Size%usePinnedMemory,self%label,"Q", self% Q, Size%ncornr) call Allocator%allocate(Size%usePinnedMemory,self%label,"TT", self% TT, Size%maxCorner,Size%ncornr) call Allocator%allocate(Size%usePinnedMemory,self%label,"Pvv", self% Pvv, Size%maxCorner,Size%ncornr) @@ -221,11 +214,6 @@ subroutine GreyAcceleration_dtor(self) call Allocator%deallocate(Size%usePinnedMemory,self%label,"GreySigScatVol", self% GreySigScatVol) call Allocator%deallocate(Size%usePinnedMemory,self%label,"TsaSource", self% TsaSource) - deallocate( self% CGDirectionB ) - deallocate( self% CGResidualB ) - deallocate( self% CGActionB ) - deallocate( self% CGActionSB ) - if (Size% useNewGTASolver) then ! New GTA Solver only @@ -233,6 +221,7 @@ subroutine GreyAcceleration_dtor(self) call Allocator%deallocate(Size%usePinnedMemory,self%label,"GreySigTotal", self% GreySigTotal) call Allocator%deallocate(Size%usePinnedMemory,self%label,"GreySigtInv", self% GreySigtInv) call Allocator%deallocate(Size%usePinnedMemory,self%label,"PhiInc", self% PhiInc) + call Allocator%deallocate(Size%usePinnedMemory,self%label,"Sscat", self% Sscat) call Allocator%deallocate(Size%usePinnedMemory,self%label,"Q", self% Q) call Allocator%deallocate(Size%usePinnedMemory,self%label,"TT", self% TT) call Allocator%deallocate(Size%usePinnedMemory,self%label,"Pvv", self% Pvv) diff --git a/src/teton/mods/MemoryAllocator_mod.F90 b/src/teton/mods/MemoryAllocator_mod.F90 index 779338c..cc2d5ab 100644 --- a/src/teton/mods/MemoryAllocator_mod.F90 +++ b/src/teton/mods/MemoryAllocator_mod.F90 @@ -24,11 +24,14 @@ module MemoryAllocator_mod integer, public :: umpire_device_allocator_id = -1 logical, public :: host_allocator_present = .FALSE. logical, public :: device_allocator_present = .FALSE. + logical, public :: use_for_comm_data = .FALSE. contains - procedure, public :: construct => AllocatorType_construct - procedure, public :: destruct => AllocatorType_destruct + procedure, private :: isGPUAwareMPIEnabled + + procedure, public :: construct + procedure, public :: destruct procedure :: ar1 => allocate_host_real_c_double_1 procedure :: ar2 => allocate_host_real_c_double_2 @@ -55,12 +58,14 @@ module MemoryAllocator_mod type(AllocatorType), pointer, public :: Allocator => null() !----------------------------------------------------------------------------------------- -! AllocatorType_construct +! construct !----------------------------------------------------------------------------------------- contains - subroutine AllocatorType_construct(self, umpire_host_allocator_id, umpire_device_allocator_id) + subroutine construct(self, umpire_host_allocator_id, umpire_device_allocator_id) + use Options_mod, only : Options + class(AllocatorType), intent(inout) :: self integer :: umpire_host_allocator_id integer :: umpire_device_allocator_id @@ -72,6 +77,9 @@ subroutine AllocatorType_construct(self, umpire_host_allocator_id, umpire_device if (umpire_host_allocator_id >= 0) then self%umpire_host_allocator = umpire_resource_manager%get_allocator_by_id(umpire_host_allocator_id) +!#if defined(TETON_ENABLE_OPENMP) +! Need a way to verify this is a thread safe umpire allocator... +!#endif self%umpire_host_allocator_id = umpire_host_allocator_id self%host_allocator_present = .TRUE. endif @@ -81,18 +89,26 @@ subroutine AllocatorType_construct(self, umpire_host_allocator_id, umpire_device self%umpire_device_allocator_id = umpire_device_allocator_id self%device_allocator_present = .TRUE. endif + + self%use_for_comm_data = self%isGPUAwareMPIEnabled() + + if (self%use_for_comm_data) then + if ( Options%isRankVerbose() > 0 ) then + print *, "Teton memory allocator: Detected env variable MPICH_GPU_SUPPORT_ENABLED=1, allocating comm data using umpire." + endif + endif #endif - end subroutine AllocatorType_construct + end subroutine construct !----------------------------------------------------------------------------------------- ! AllocatorType_destruct !----------------------------------------------------------------------------------------- - subroutine AllocatorType_destruct(self) + subroutine destruct(self) class(AllocatorType), intent(inout) :: self - end subroutine AllocatorType_destruct + end subroutine destruct !----------------------------------------------------------------------------- ! Some notes on useful Umpire allocator functions. These all return @@ -127,6 +143,40 @@ end subroutine AllocatorType_destruct #define FTM_RANK 4 #include "MemoryAllocator_mod.F90.templates" +!*********************************************************************** +! isGPUAwareMPIEnabled - Return true if gpu aware mpi support is enabled. +! +! This functionality is for internal developer use only, for the El Cap EAS +! nodes and should not be exposed to users. +! +! The El Cap EAS nodes have an early MPI GPU-AWARE functionality +! available for testing. It currently requires setting the +! MPICH_GPU_SUPPORT_ENABLED env var to enable it. +! +! If Teton detects this env var, it needs to allocate its comm data +! using umpire. Otherwise, it should not allocate it via umpire. +! +! NOTE: Teton can not default to allocating the comm data using umpire. +! CPU based MPI functionality will crash if pointers allocated via hipMalloc +! are passed to it, despite the unified memory. If this is supported in +! the future, this function should be removed and Teton should always use +! the umpire allocator for comm data. +!*********************************************************************** + logical function isGPUAwareMPIEnabled(self) result(useUmpire) + class(AllocatorType), intent(inout) :: self + character(len=255) :: env_var_value + integer :: env_status + + call get_environment_variable("MPICH_GPU_SUPPORT_ENABLED", env_var_value, status=env_status) + if (env_status == 0 .AND. trim(env_var_value) == "1") then + useUmpire = .TRUE. + else + useUmpire = .FALSE. + endif + + return + end function + end module MemoryAllocator_mod !----------------------------------------------------------------------------- @@ -151,3 +201,17 @@ subroutine ConstructMemoryAllocator( umpire_host_allocator_id, & return end subroutine ConstructMemoryAllocator + + subroutine DestructMemoryAllocator( ) BIND(C,NAME="teton_destructmemoryallocator") + + use iso_c_binding + use MemoryAllocator_mod + implicit none + +! Destruct the Memory Allocator Module + call Allocator%destruct() + deallocate (Allocator) + + return + end subroutine DestructMemoryAllocator + diff --git a/src/teton/mods/MemoryAllocator_mod.F90.templates b/src/teton/mods/MemoryAllocator_mod.F90.templates index ee344e0..75de866 100644 --- a/src/teton/mods/MemoryAllocator_mod.F90.templates +++ b/src/teton/mods/MemoryAllocator_mod.F90.templates @@ -96,6 +96,8 @@ theShape = [ FTM_DIMS ] + TETON_VERIFY( SUM(theShape) > 0, "Attempt to allocate array "//parentName//"/"//varName//" with size 0. This is problematic with some OpenMP runtimes.") + !$omp critical TETON_VERIFY(.NOT. associated(varPtr), "MemoryAllocator: Unable to allocate memory for "//parentName//"/"//varName//", pointer is already associated!") nullify(varPtr) diff --git a/src/teton/mods/Options_mod.F90 b/src/teton/mods/Options_mod.F90 index e4a6660..0a356ab 100644 --- a/src/teton/mods/Options_mod.F90 +++ b/src/teton/mods/Options_mod.F90 @@ -21,6 +21,8 @@ module Options_mod procedure :: isRankZeroVerbose procedure :: setVerbose procedure :: getSweepVersion + procedure :: getSweepNumHyperDomains + procedure :: getGTANumHyperDomains end type options_type public :: setVerboseOldCVersion @@ -226,6 +228,8 @@ end subroutine setSweepVersion !*********************************************************************** ! getSweepVersion - Return the sweep implementation version to use ! from the input. +! +! The default sweep version is the zone sweep. !*********************************************************************** integer(kind=c_int) function getSweepVersion(self) result(sweepVersion) class(options_type) :: self @@ -241,4 +245,84 @@ integer(kind=c_int) function getSweepVersion(self) result(sweepVersion) return end function +!*********************************************************************** +! setSweepNumHyperDomains - Set the sweep number of hyper-domains +! from the input. +!*********************************************************************** + subroutine setSweepNumHyperDomains(hyperdomains) BIND(C,NAME="teton_setsweepnumhyperdomains") + integer(kind=C_INT), intent(in) :: hyperdomains + + ! 0 - Let Teton automatically set this value. (default) + ! >= 1 - Set to this number of hyper-domains. + + TETON_VERIFY(hyperdomains >= 0, "Setting sweep hyper-domains: the number of hyper-domains must be >= 0.") + + call theDatastore%initialize() + + call theDatastore%root%set_path("options/sweep/sn/numhyperdomains", hyperdomains) + + return + end subroutine setSweepNumHyperDomains + +!*********************************************************************** +! getSweepHyperDomains - Return the number of sweep hyper-domains to +! use. +!*********************************************************************** + integer(kind=c_int) function getSweepNumHyperDomains(self) result(hyperdomains) + class(options_type) :: self + logical*4 :: temp + + temp = theDatastore%root%has_path("options/sweep/sn/numhyperdomains") + if (.NOT. temp) then +! Default to 0, which indicates user did not provide a setting. Teton +! will set this to a non-zero value in ConstructPhaseSpaceSets.F90 + hyperdomains = 0 + else + hyperdomains = theDataStore%root%fetch_path_as_int32("options/sweep/sn/numhyperdomains") + TETON_VERIFY(hyperdomains >= 0, "Getting sweep hyper-domains: the number of hyper-domains must be >= 0.") + endif + + return + end function + +!*********************************************************************** +! setGTANumHyperDomains - Set the new GTA number of hyper-domains +! from the input. +!*********************************************************************** + subroutine setGTANumHyperDomains(hyperdomains) BIND(C,NAME="teton_setgtanumhyperdomains") + integer(kind=C_INT), intent(in) :: hyperdomains + + ! 0 - Let Teton automatically set this value. (default) + ! >= 1 - Set to this number of hyper-domains. + + TETON_VERIFY(hyperdomains >= 0, "Setting new GTA hyper-domains: the number of hyper-domains must be >= 0.") + + call theDatastore%initialize() + + call theDatastore%root%set_path("options/sweep/gta/numhyperdomains", hyperdomains) + + return + end subroutine setGTANumHyperDomains + +!*********************************************************************** +! getGTANumHyperDomains - Return the number of new GTA hyper-domains to +! use. +!*********************************************************************** + integer(kind=c_int) function getGTANumHyperDomains(self) result(hyperdomains) + class(options_type) :: self + logical*4 :: temp + + temp = theDatastore%root%has_path("options/sweep/gta/numhyperdomains") + if (.NOT. temp) then +! Default to 0, which indicates user did not provide a setting. Teton +! will set this to a non-zero value in ConstructPhaseSpaceSets.F90 + hyperdomains = 0 + else + hyperdomains = theDataStore%root%fetch_path_as_int32("options/sweep/gta/numhyperdomains") + TETON_VERIFY(hyperdomains >= 0, "Getting new GTA hyper-domains: the number of hyper-domains must be >= 0.") + endif + + return + end function + end module Options_mod diff --git a/src/teton/mods/QuadratureList_mod.F90 b/src/teton/mods/QuadratureList_mod.F90 index 90ac3e8..0c3bddc 100644 --- a/src/teton/mods/QuadratureList_mod.F90 +++ b/src/teton/mods/QuadratureList_mod.F90 @@ -12,6 +12,7 @@ module QuadratureList_mod use GroupSet_mod use CommSet_mod use Size_mod + use CodeChecks_mod private @@ -27,6 +28,8 @@ module QuadratureList_mod public getNumberOfGroupSets public getNumberOfCommSets public getNumberOfZoneSets + public getNumberOfHyperDomains + public getNumberOfHyperElements public constructSetPointers public getGTAQuadrature public getSNQuadrature @@ -59,7 +62,8 @@ module QuadratureList_mod ! At the point that the value of nSets is determined, we have not yet ! begun decomposing the SN problem into sets. - integer :: nHyperDomains + integer :: nHyperDomains(2) + integer :: nHyperElements(2) integer :: nAngleSets integer :: nGroupSets integer :: nCommSets @@ -119,6 +123,14 @@ module QuadratureList_mod module procedure QuadratureList_getNumberOfZoneSets end interface + interface getNumberOfHyperDomains + module procedure QuadratureList_getNumberOfHyperDomains + end interface + + interface getNumberOfHyperElements + module procedure QuadratureList_getNumberOfHyperElements + end interface + interface getGTAQuadrature module procedure QuadratureList_getGTAQuad end interface @@ -222,10 +234,10 @@ subroutine QuadratureList_ctor(self, nAnglesSn, nSetsMaster, nSets) integer :: nOmpMaxTeams - self% nSets = 1 - self% nGroupSets = 1 - self% nZoneSets = 1 - self% nHyperDomains = 1 + self% nSets = 1 + self% nGroupSets = 1 + self% nZoneSets = 1 + self% nHyperElements(:) = 0 #if defined(TETON_ENABLE_OPENMP) if (Size%useGPU) then @@ -380,6 +392,7 @@ function QuadratureList_getSetIDfromGroupAngle(self, group, angle) result(setID) integer :: setID + TETON_CHECK_BOUNDS2(self%SetIDList, group, angle) setID = self% SetIDList(group,angle) return @@ -529,6 +542,53 @@ function QuadratureList_getNumberOfZoneSets(self) result(nZoneSets) end function QuadratureList_getNumberOfZoneSets +!----------------------------------------------------------------------- + function QuadratureList_getNumberOfHyperDomains(self, ID) result(nHyperDomains) + +! Returns the number of hyper domains used for sweeps +! nHyperDomains number of hyper domains +! ID = 1 High-order angle set +! ID = 2 GTA angle set + +! variable declarations + implicit none + +! passed variables + type(QuadratureList), intent(in) :: self + integer :: ID + integer :: nHyperDomains + + TETON_ASSERT(ID <= 2, "Invalid ID for getNumberOfHyperDomains, must be 1 or 2") + nHyperDomains = self% nHyperDomains(ID) + + return + + end function QuadratureList_getNumberOfHyperDomains + +!----------------------------------------------------------------------- + function QuadratureList_getNumberOfHyperElements(self, ID) result(nHyperElements) + +! Returns the number of interface elements at hyper-domain +! boundaries used for sweeps +! nHyperElements number of interface elements +! ID = 1 High-order angle set +! ID = 2 GTA angle set + +! variable declarations + implicit none + +! passed variables + type(QuadratureList), intent(in) :: self + integer :: ID + integer :: nHyperElements + + TETON_ASSERT(ID <= 2, "Invalid ID for getNumberOfHyperElements, must be 1 or 2") + nHyperElements = self% nHyperElements(ID) + + return + + end function QuadratureList_getNumberOfHyperElements + !----------------------------------------------------------------------- function QuadratureList_getQuad(self,QuadID) result(QuadPtr) @@ -545,6 +605,7 @@ function QuadratureList_getQuad(self,QuadID) result(QuadPtr) type(Quadrature), pointer :: QuadPtr + TETON_CHECK_BOUNDS1(self%QuadPtr, QuadID) QuadPtr => self% QuadPtr(QuadID) return @@ -606,6 +667,7 @@ function QuadratureList_getSetData(self,setID) result(SetDataPtr) type(SetData), pointer :: SetDataPtr + TETON_CHECK_BOUNDS1(self%SetDataPtr, setID) SetDataPtr => self% SetDataPtr(setID) return @@ -628,6 +690,7 @@ function QuadratureList_getAngleSetData(self,angleSetID) result(AngSetPtr) type(AngleSet), pointer :: AngSetPtr + TETON_CHECK_BOUNDS1(self%AngSetPtr, angleSetID) AngSetPtr => self% AngSetPtr(angleSetID) return @@ -650,6 +713,7 @@ function QuadratureList_getGroupSetData(self,groupSetID) result(GrpSetPtr) type(GroupSet), pointer :: GrpSetPtr + TETON_CHECK_BOUNDS1(self%GrpSetPtr, groupSetID) GrpSetPtr => self% GrpSetPtr(groupSetID) return @@ -672,6 +736,7 @@ function QuadratureList_getCommSetData(self,commSetID) result(CommSetPtr) type(CommSet), pointer :: CommSetPtr + TETON_CHECK_BOUNDS1(self%CommSetPtr, commSetID) CommSetPtr => self% CommSetPtr(commSetID) return @@ -695,9 +760,11 @@ function QuadratureList_getAngleSetFromSetID(self,setID) result(AngSetPtr) integer :: localID + TETON_CHECK_BOUNDS1(self%angleID, setID) localID = self% angleID(setID) + TETON_CHECK_BOUNDS1(self%AngSetPtr, localID) AngSetPtr => self% AngSetPtr(localID) return @@ -721,9 +788,11 @@ function QuadratureList_getGroupSetFromSetID(self,setID) result(GrpSetPtr) integer :: localID + TETON_CHECK_BOUNDS1(self%groupID, setID) localID = self% groupID(setID) + TETON_CHECK_BOUNDS1(self%GrpSetPtr, localID) GrpSetPtr => self% GrpSetPtr(localID) return @@ -747,9 +816,11 @@ function QuadratureList_getCommSetFromSetID(self,setID) result(CommSetPtr) integer :: localID + TETON_CHECK_BOUNDS1(self%commID, setID) localID = self% commID(setID) + TETON_CHECK_BOUNDS1(self%CommSetPtr, localID) CommSetPtr => self% CommSetPtr(localID) return @@ -772,6 +843,7 @@ function QuadratureList_getGTASetData(self,setID) result(SetDataPtr) type(SetData), pointer :: SetDataPtr + TETON_CHECK_BOUNDS1(self%SetDataPtr, self% nSets + setID) SetDataPtr => self% SetDataPtr(self% nSets + setID) return @@ -795,6 +867,7 @@ function QuadratureList_getNumberOfGroups(self,setID) result(Groups) type(SetData), pointer :: SetDataPtr + TETON_CHECK_BOUNDS1(self%SetDataPtr, setID) SetDataPtr => self% SetDataPtr(setID) Groups = SetDataPtr% Groups @@ -819,6 +892,7 @@ function QuadratureList_getNumberOfAngles(self,setID) result(NumAngles) type(SetData), pointer :: SetDataPtr + TETON_CHECK_BOUNDS1(self%SetDataPtr, setID) SetDataPtr => self% SetDataPtr(setID) NumAngles = SetDataPtr% NumAngles @@ -845,6 +919,7 @@ function QuadratureList_getEnergyGroups(self,numGroups) result(GrpBnds) real(adqt) :: GrpBnds(numGroups+1) QuadPtr => self% QuadPtr(1) + TETON_CHECK_BOUNDS1(QuadPtr%Gnu, numGroups+1) do g=1,numGroups+1 GrpBnds(g) = QuadPtr% Gnu(g) enddo @@ -874,6 +949,7 @@ function QuadratureList_getGroupAverageEnergy(self,numGroups) result(gnuBar) real(adqt) :: gnuBar(0:numGroups+1) QuadPtr => self% QuadPtr(1) + TETON_CHECK_BOUNDS1(QuadPtr%gnuBar, numGroups) do g=1,numGroups gnuBar(g) = QuadPtr% gnuBar(g) enddo @@ -901,6 +977,7 @@ subroutine QuadratureList_setCounters(self) type(CommSet), pointer :: CommSetPtr integer :: setID + TETON_CHECK_BOUNDS1(self%CommSetPtr, self%nCommSets) do setID=1,self% nCommSets CommSetPtr => self% CommSetPtr(setID) CommSetPtr% fluxSweeps = 0 @@ -935,6 +1012,7 @@ function QuadratureList_getSweeps(self) result(nTotalSweeps) nSweeps = 0 totalAngles = 0 + TETON_CHECK_BOUNDS1(self%CommSetPtr, self%nCommSets) do setID=1,self% nCommSets CommSetPtr => self% CommSetPtr(setID) Groups = CommSetPtr% Groups diff --git a/src/teton/mods/RadIntensity_mod.F90 b/src/teton/mods/RadIntensity_mod.F90 index 92131c7..44cc1a8 100644 --- a/src/teton/mods/RadIntensity_mod.F90 +++ b/src/teton/mods/RadIntensity_mod.F90 @@ -80,9 +80,11 @@ subroutine RadIntensity_ctor(self) ! Multi-value fields are not support for visualization, however. num_elements_size_t = Size% nzones * Size% ngr - call theDatastore%root%set_path_external_float64_ptr("blueprint/fields/radiation_energy_density/values", self% RadEnergyDensity,num_elements_size_t ) - call theDatastore%root%set_path("blueprint/fields/radiation_energy_density/association", "element") - call theDatastore%root%set_path("blueprint/fields/radiation_energy_density/topology", "main") + if (.NOT. theDatastore%partitioning()) then + call theDatastore%root%set_path_external_float64_ptr("blueprint/fields/radiation_energy_density/values", self% RadEnergyDensity,num_elements_size_t ) + call theDatastore%root%set_path("blueprint/fields/radiation_energy_density/association", "element") + call theDatastore%root%set_path("blueprint/fields/radiation_energy_density/topology", "main") + endif ! Initialize @@ -120,7 +122,9 @@ subroutine RadIntensity_dtor(self) call Allocator%deallocate(.FALSE., self%label, "RadiationFlux", self% RadiationFlux) call Allocator%deallocate(.FALSE., self%label, "EddingtonTensorDiag", self% EddingtonTensorDiag) - call theDatastore%root%remove_path("blueprint/fields/radiation_energy_density") + if (.NOT. theDatastore%partitioning()) then + call theDatastore%root%remove_path("blueprint/fields/radiation_energy_density") + endif call Allocator%deallocate(.FALSE.,self%label,"RadEnergyDensity", self%RadEnergyDensity) diff --git a/src/teton/mods/SetData_mod.F90 b/src/teton/mods/SetData_mod.F90 index 3b35416..6b9be71 100644 --- a/src/teton/mods/SetData_mod.F90 +++ b/src/teton/mods/SetData_mod.F90 @@ -44,8 +44,7 @@ module SetData_mod real(C_DOUBLE), pointer, contiguous :: PsiB(:,:,:) => null() ! boundary intensities real(adqt), pointer, contiguous :: Phi(:,:) => null() real(adqt), pointer, contiguous :: cyclePsi(:,:) => null() - real(adqt), pointer, contiguous :: Q(:,:,:) => null() - real(adqt), pointer, contiguous :: S(:,:,:) => null() + real(adqt), pointer, contiguous :: PsiInt(:,:,:) => null() ! For GTA Sweeps @@ -56,6 +55,10 @@ module SetData_mod real(adqt), pointer, contiguous :: tInc(:) => null() real(adqt), pointer, contiguous :: src(:) => null() +! Pointers + + type(SweepSet), pointer :: SweepPtr(:) => null() + ! For Spectral Tallies real(adqt), pointer, contiguous :: RadPowerEscape(:) => null() @@ -72,26 +75,33 @@ module SetData_mod end type SetData + + type, public :: SweepSet + real(adqt), pointer, contiguous :: Q(:,:,:) => null() + real(adqt), pointer, contiguous :: S(:,:,:) => null() + end type SweepSet + contains !======================================================================= ! construct interface !======================================================================= - subroutine SetData_ctor(self, & - SetID, & - groupSetID, & - angleSetID, & - QuadID, & - Groups, & - NumAngles, & - g0, & - angle0, & - nZones, & - nCorner, & - QuadPtr, & - GTASet, & - fromRestart ) + subroutine SetData_ctor(self, & + SetID, & + groupSetID, & + angleSetID, & + QuadID, & + Groups, & + NumAngles, & + g0, & + angle0, & + nZones, & + nCorner, & + nHyperDomains, & + QuadPtr, & + GTASet, & + fromRestart ) use Size_mod use constant_mod @@ -114,6 +124,7 @@ subroutine SetData_ctor(self, & integer, intent(in) :: angle0 integer, intent(in) :: nZones integer, intent(in) :: nCorner + integer, intent(in) :: nHyperDomains type(Quadrature), target, intent(in) :: QuadPtr @@ -179,6 +190,8 @@ subroutine SetData_ctor(self, & endif endif + allocate( self% SweepPtr(nHyperDomains) ) + else call Allocator%allocate(Size%usePinnedMemory,self%label,"tPsi", self% tPsi, nCorner+Size% nbelem) @@ -277,6 +290,8 @@ subroutine SetData_dtor(self, GTASet) endif endif + deallocate( self% SweepPtr ) + else call Allocator%deallocate(Size%usePinnedMemory,self%label,"tPsi", self% tPsi ) @@ -310,15 +325,23 @@ end subroutine SetData_dtor !======================================================================= ! destruct Set dynamic memory at the end of the time step !======================================================================= - subroutine SetData_dtorDynMem(self) + subroutine SetData_dtorDynMem(self, nHyperDomains) use MemoryAllocator_mod + use Options_mod use Size_mod implicit none ! Passed variables class(SetData), intent(inout) :: self + integer, intent(in) :: nHyperDomains + + type(SweepSet), pointer :: Swp + integer :: dom + integer :: sweepVersion + + sweepVersion = Options% getSweepVersion() ! Release Memory @@ -327,8 +350,18 @@ subroutine SetData_dtorDynMem(self) ! These are only used by the GPU sweep if (Size% useGPU) then - call Allocator%deallocate(Size%usePinnedMemory, self%label, "Q", self% Q) - call Allocator%deallocate(Size%usePinnedMemory, self%label, "S", self% S) + + if ( sweepVersion == 0 ) then + + do dom=1,nHyperDomains + Swp => self% SweepPtr(dom) + call Allocator%deallocate(Size%usePinnedMemory, self%label, "Q", Swp% Q) + call Allocator%deallocate(Size%usePinnedMemory, self%label, "S", Swp% S) + enddo + + endif + + call Allocator%deallocate(Size%usePinnedMemory, self%label, "PsiInt", self% PsiInt) endif return diff --git a/src/teton/mods/Size_mod.F90 b/src/teton/mods/Size_mod.F90 index 86e94d0..83fc6ec 100644 --- a/src/teton/mods/Size_mod.F90 +++ b/src/teton/mods/Size_mod.F90 @@ -24,6 +24,7 @@ module Size_mod integer :: ncornr ! number of corners integer :: nSides integer :: nbelem ! number of boundary elements + integer :: nSurfElem ! boundary + interface elements integer :: maxcf ! maximum number of zone faces a corner touches integer :: maxCorner ! maximum number of corners in a zone integer :: maxFaces ! maximum number of zone-faces @@ -168,6 +169,7 @@ subroutine Size_ctor(self, myRankInGroup, nzones, ncornr, nSides, & self% ncornr = ncornr self% nSides = nSides self% nbelem = nbelem + self% nSurfElem = nbelem self% maxcf = maxcf self% maxCorner = maxCorner ! In 1D, 2D maxFaces = maxCorner; In 3D it's an overestimate diff --git a/src/teton/mods/cmake_defines_mod.F90.in b/src/teton/mods/cmake_defines_mod.F90.in index 16eb57c..456acec 100644 --- a/src/teton/mods/cmake_defines_mod.F90.in +++ b/src/teton/mods/cmake_defines_mod.F90.in @@ -13,5 +13,7 @@ implicit none character (len = *), parameter :: fortran_compiler="@CMAKE_Fortran_COMPILER@" integer, parameter :: omp_device_num_processors = @OMP_DEVICE_NUM_PROCESSORS@ integer, parameter :: omp_device_team_thread_limit = @OMP_DEVICE_TEAM_THREAD_LIMIT@ + integer, parameter :: min_groupset_size = @GSET_MIN_SIZE@ + integer, parameter :: max_num_hyperdomains = @MAX_NUM_HYPER_DOMAINS@ end module cmake_defines_mod diff --git a/src/teton/mods/constant_mod.F90 b/src/teton/mods/constant_mod.F90 index e8531fc..eca82de 100644 --- a/src/teton/mods/constant_mod.F90 +++ b/src/teton/mods/constant_mod.F90 @@ -46,8 +46,6 @@ module constant_mod real(adqt), parameter, public :: & adqtTiny = tiny(0.0_adqt), & adqtEpsilon = epsilon(0.0_adqt), & - longTiny = tiny(0.0_long), & - longEpsilon = epsilon(0.0_long), & adqtSmall = 1.e-150_adqt end module constant_mod diff --git a/src/teton/mods/iter_control_list_mod.F90 b/src/teton/mods/iter_control_list_mod.F90 index 314024a..160cc0f 100644 --- a/src/teton/mods/iter_control_list_mod.F90 +++ b/src/teton/mods/iter_control_list_mod.F90 @@ -118,9 +118,9 @@ subroutine iter_control_list_ctor(self) self % nIterCon = nIterCon ! assertions - tetonAssert(self%nIterCon > 0,"Invalid iter control list") - tetonAssert(self%nIterCon<=self%maxIterCon,"Invalid iter control list") - tetonAssert(associated(self%iControls),"Invalid iter control list") + TETON_ASSERT(self%nIterCon > 0,"Invalid iter control list") + TETON_ASSERT(self%nIterCon<=self%maxIterCon,"Invalid iter control list") + TETON_ASSERT(associated(self%iControls),"Invalid iter control list") return end subroutine iter_control_list_ctor @@ -182,9 +182,9 @@ subroutine iter_control_list_dtor(self) self % nIterCon = 0 ! assertions - tetonAssert(.not.allocated(self%names),"Invalid destruct") - tetonAssert(.not.associated(self%iControls),"Invalid destruct") - tetonAssert(self%nIterCon==0,"Invalid destruct") + TETON_ASSERT(.not.allocated(self%names),"Invalid destruct") + TETON_ASSERT(.not.associated(self%iControls),"Invalid destruct") + TETON_ASSERT(self%nIterCon==0,"Invalid destruct") return end subroutine iter_control_list_dtor @@ -208,8 +208,8 @@ function iter_control_list_get_nIterCon(self) result(nIterCon) nIterCon = self % nIterCon ! assertions - tetonAssert(nIterCon==self%nIterCon,"Improper data access") - tetonAssert(nIterCon>=0,"Improper data access") + TETON_ASSERT(nIterCon==self%nIterCon,"Improper data access") + TETON_ASSERT(nIterCon>=0,"Improper data access") return end function iter_control_list_get_nIterCon @@ -233,9 +233,9 @@ function iter_control_list_get_iCon(self,iteration) result(iControl) integer :: iIterCon ! assertions - tetonAssert(allocated(self%names),"Invalid iter control list") - tetonAssert(associated(self%iControls),"Invalid iter control list") - tetonAssert(any(iteration==self%names(:)),"Invalid iteration name") + TETON_ASSERT(allocated(self%names),"Invalid iter control list") + TETON_ASSERT(associated(self%iControls),"Invalid iter control list") + TETON_ASSERT(any(iteration==self%names(:)),"Invalid iteration name") iControl => self% iControls(self% maxIterCon) @@ -248,7 +248,7 @@ function iter_control_list_get_iCon(self,iteration) result(iControl) enddo ListLoop ! assertions - tetonAssert(associated(iControl),"Invalid iter control") + TETON_ASSERT(associated(iControl),"Invalid iter control") return end function iter_control_list_get_iCon diff --git a/src/teton/mods/iter_control_mod.F90 b/src/teton/mods/iter_control_mod.F90 index 3dfcdf0..ca45f92 100644 --- a/src/teton/mods/iter_control_mod.F90 +++ b/src/teton/mods/iter_control_mod.F90 @@ -236,15 +236,15 @@ subroutine iter_control_ctor(self,epsilonPoint,localError, & endif ! assertions - tetonAssert(self%epsilonPoint>zero,"Invalid iter control ctor") - tetonAssert(self%localError>=zero,"Invalid iter control ctor") - tetonAssert(self%globalError>=zero,"Invalid iter control ctor") - tetonAssert(self%maxIter>zero,"Invalid iter control ctor") - tetonAssert(self%globalMaxIterTaken>=zero,"Invalid iter control ctor") - tetonAssert(self%nIter>=zero,"Invalid iter control ctor") - tetonAssert(self%nTotIter>=zero,"Invalid iter control ctor") - tetonAssert(self%zoneOfMax>=zero,"Invalid iter control ctor") - tetonAssert(self%processOfMax>=zero,"Invalid iter control ctor") + TETON_ASSERT(self%epsilonPoint>zero,"Invalid iter control ctor") + TETON_ASSERT(self%localError>=zero,"Invalid iter control ctor") + TETON_ASSERT(self%globalError>=zero,"Invalid iter control ctor") + TETON_ASSERT(self%maxIter>zero,"Invalid iter control ctor") + TETON_ASSERT(self%globalMaxIterTaken>=zero,"Invalid iter control ctor") + TETON_ASSERT(self%nIter>=zero,"Invalid iter control ctor") + TETON_ASSERT(self%nTotIter>=zero,"Invalid iter control ctor") + TETON_ASSERT(self%zoneOfMax>=zero,"Invalid iter control ctor") + TETON_ASSERT(self%processOfMax>=zero,"Invalid iter control ctor") return end subroutine iter_control_ctor @@ -304,8 +304,8 @@ subroutine iter_control_reset_nIter(self) self % nTotIter = 0 ! assertions - tetonAssert(self%nIter==0,"Invalid iter control reset") - tetonAssert(self%nTotIter==0,"Invalid iter control reset") + TETON_ASSERT(self%nIter==0,"Invalid iter control reset") + TETON_ASSERT(self%nTotIter==0,"Invalid iter control reset") return end subroutine iter_control_reset_nIter @@ -327,15 +327,15 @@ subroutine iter_control_set_nIter(self, nIter) integer, intent(in) :: nIter ! assertions - tetonAssert(nIter>=0,"Invalid number of iterations") + TETON_ASSERT(nIter>=0,"Invalid number of iterations") ! reset the number of iterations self % nIter = nIter self % nTotIter = self % nTotIter + nIter ! assertions - tetonAssert(self%nIter>=0,"Invalid number of iterations") - tetonAssert(self%nTotIter>=0,"Invalid number of iterations") + TETON_ASSERT(self%nIter>=0,"Invalid number of iterations") + TETON_ASSERT(self%nTotIter>=0,"Invalid number of iterations") return end subroutine iter_control_set_nIter @@ -356,13 +356,13 @@ subroutine iter_control_set_maxIter(self, maxIter) integer, intent(in) :: maxIter ! assertions - tetonAssert(maxIter>=0, "Invalid maximum number of iterations") + TETON_ASSERT(maxIter>=0, "Invalid maximum number of iterations") ! reset the number of iterations self % maxIter = maxIter ! assertions - tetonAssert(self%maxIter>=0, "Invalid number of iterations") + TETON_ASSERT(self%maxIter>=0, "Invalid number of iterations") return end subroutine iter_control_set_maxIter @@ -383,13 +383,13 @@ subroutine iter_control_set_globalMaxIterTaken(self, maxTaken) integer, intent(in) :: maxTaken ! assertions - tetonAssert(maxTaken>=0, "Invalid maximum number of iterations taken") + TETON_ASSERT(maxTaken>=0, "Invalid maximum number of iterations taken") ! reset the number of iterations self % globalMaxIterTaken = maxTaken ! assertions - tetonAssert(self%globalMaxIterTaken>=0, "Invalid maximum number of iterations taken") + TETON_ASSERT(self%globalMaxIterTaken>=0, "Invalid maximum number of iterations taken") return end subroutine iter_control_set_globalMaxIterTaken @@ -411,7 +411,7 @@ subroutine iter_control_set_zoneOfMax(self, zoneOfMax) integer, intent(in) :: zoneOfMax ! assertions - tetonAssert(zoneOfMax>=0,"Invalid zone number") + TETON_ASSERT(zoneOfMax>=0,"Invalid zone number") ! reset the zone with maximum relative error self % zoneOfMax = zoneOfMax @@ -436,13 +436,13 @@ subroutine iter_control_set_processOfMax(self, processOfMax) integer, intent(in) :: processOfMax ! assertions - tetonAssert(processOfMax>=0,"Invalid process number") + TETON_ASSERT(processOfMax>=0,"Invalid process number") ! reset the process with maximum relative error self % processOfMax = processOfMax ! assertions - tetonAssert(self%processOfMax>=0,"Invalid process number") + TETON_ASSERT(self%processOfMax>=0,"Invalid process number") return end subroutine iter_control_set_processOfMax @@ -464,13 +464,13 @@ subroutine iter_control_set_localError(self, localError) real(adqt), intent(in) :: localError ! assertions - tetonAssert(localError>=0,"Invalid local error") + TETON_ASSERT(localError>=0,"Invalid local error") ! reset local maximum relative error self % localError = localError ! assertions - tetonAssert(self%localError>=0,"Invalid local error") + TETON_ASSERT(self%localError>=0,"Invalid local error") return end subroutine iter_control_set_localError @@ -492,13 +492,13 @@ subroutine iter_control_set_globalError(self, globalError) real(adqt), intent(in) :: globalError ! assertions - tetonAssert(globalError>=0,"Invalid global error") + TETON_ASSERT(globalError>=0,"Invalid global error") ! reset global maximum relative error self % globalError = globalError ! assertions - tetonAssert(self%globalError>=0,"Invalid global error") + TETON_ASSERT(self%globalError>=0,"Invalid global error") return end subroutine iter_control_set_globalError @@ -550,7 +550,7 @@ function iter_control_get_epsilonPoint(self) result(epsilonPoint) epsilonPoint = self % epsilonPoint ! assertions - tetonAssert(epsilonPoint==self%epsilonPoint,"Invalid data access") + TETON_ASSERT(epsilonPoint==self%epsilonPoint,"Invalid data access") return end function iter_control_get_epsilonPoint @@ -571,7 +571,7 @@ function iter_control_get_maxIter(self) result(maxIter) maxIter = self%maxIter ! assertions - tetonAssert(maxIter==self%maxIter,"Invalid data access") + TETON_ASSERT(maxIter==self%maxIter,"Invalid data access") return end function iter_control_get_maxIter @@ -591,7 +591,7 @@ function iter_control_get_globalMaxIterTaken(self) result(maxIter) maxIter = self%globalMaxIterTaken ! assertions - tetonAssert(maxIter==self%globalMaxIterTaken,"Invalid data access") + TETON_ASSERT(maxIter==self%globalMaxIterTaken,"Invalid data access") return end function iter_control_get_globalMaxIterTaken @@ -612,7 +612,7 @@ function iter_control_get_nIter(self) result(nIter) nIter = self%nIter ! assertions - tetonAssert(nIter==self%nIter,"Invalid data access") + TETON_ASSERT(nIter==self%nIter,"Invalid data access") return end function iter_control_get_nIter @@ -633,7 +633,7 @@ function iter_control_get_nTotIter(self) result(nTotIter) nTotIter = self%nTotIter ! assertions - tetonAssert(nTotIter==self%nTotIter,"Invalid data access") + TETON_ASSERT(nTotIter==self%nTotIter,"Invalid data access") return end function iter_control_get_nTotIter @@ -654,7 +654,7 @@ function iter_control_get_zoneOfMax(self) result(zoneOfMax) zoneOfMax = self%zoneOfMax ! assertions - tetonAssert(zoneOfMax==self%zoneOfMax,"Invalid data access") + TETON_ASSERT(zoneOfMax==self%zoneOfMax,"Invalid data access") return end function iter_control_get_zoneOfMax @@ -675,7 +675,7 @@ function iter_control_get_processOfMax(self) result(processOfMax) processOfMax = self%processOfMax ! assertions - tetonAssert(processOfMax==self%processOfMax,"Invalid data access") + TETON_ASSERT(processOfMax==self%processOfMax,"Invalid data access") return end function iter_control_get_processOfMax @@ -696,7 +696,7 @@ function iter_control_get_localError(self) result(localError) localError = self%localError ! assertions - tetonAssert(localError==self%localError,"Invalid data access") + TETON_ASSERT(localError==self%localError,"Invalid data access") return end function iter_control_get_localError @@ -717,7 +717,7 @@ function iter_control_get_globalError(self) result(globalError) globalError = self%globalError ! assertions - tetonAssert(globalError==self%globalError,"Invalid data access") + TETON_ASSERT(globalError==self%globalError,"Invalid data access") return end function iter_control_get_globalError diff --git a/src/teton/mods/kind_mod.F90 b/src/teton/mods/kind_mod.F90 index d6f6474..04f63f3 100644 --- a/src/teton/mods/kind_mod.F90 +++ b/src/teton/mods/kind_mod.F90 @@ -1,8 +1,9 @@ ! Kind Module: define the kind required for the requested precision; ! "adqt" is the default "adequate" precision. -! short: real kind for 6 digit accuracy -! long: real kind for 15 digit accuracy -! quadp: real kind for 33 digit accuracy +! +! Use the iso_c_binding C types in order to maximize compatibility +! with C, as we are sharing array pointers in conduit nodes. +! ! adqt: default ("adequate") precision module kind_mod @@ -10,10 +11,9 @@ module kind_mod private - integer, parameter, public :: short = selected_real_kind( 6, 37) - integer, parameter, public :: long = selected_real_kind(13, 307) - integer, parameter, public :: quadp = selected_real_kind(33,4931) -! integer, parameter, public :: adqt = selected_real_kind(13, 307) integer, parameter, public :: adqt = C_DOUBLE + integer, parameter, public :: float = C_FLOAT + integer, parameter, public :: double = C_DOUBLE + end module kind_mod diff --git a/src/teton/mods/system_info_mod.F90 b/src/teton/mods/system_info_mod.F90 index 979446b..0a3a4c7 100644 --- a/src/teton/mods/system_info_mod.F90 +++ b/src/teton/mods/system_info_mod.F90 @@ -61,8 +61,11 @@ subroutine printGPUMemInfo(rank) "MB, currently used: ", (gpu_total_bytes-gpu_free_bytes)/(2**20), "MB" #if defined(TETON_ENABLE_UMPIRE) + if (Allocator%umpire_host_allocator_id > -1) then + print *, "TETON UMPIRE allocator size: ", Allocator%umpire_host_allocator%get_current_size() / (2**20), "MB" + endif if (Allocator%umpire_device_allocator_id > -1) then - print *, "TETON UMPIRE device pool size: ", Allocator%umpire_device_allocator%get_current_size() / (2**20), "MB" + print *, "TETON UMPIRE device allocator size: ", Allocator%umpire_device_allocator%get_current_size() / (2**20), "MB" endif #endif diff --git a/src/teton/rt/CMakeLists.txt b/src/teton/rt/CMakeLists.txt index a9427bb..238072b 100644 --- a/src/teton/rt/CMakeLists.txt +++ b/src/teton/rt/CMakeLists.txt @@ -7,7 +7,7 @@ target_sources( teton PRIVATE GDASolver.F90 GTASolver.F90 GTASolver_OMPOL.F90 - GreySweep_OMPOL.F90 + GreySweep.F90 InitExchange.F90 LinearSolver.F90 RecvFlux.F90 @@ -25,6 +25,7 @@ target_sources( teton PRIVATE geometryUCBxyz.F90 getCollisionRate.F90 getCollisionRate_OMPOL.F90 + getDirectedGraph.F90 initFindExit.F90 initcomm.F90 pentalud.F90 @@ -38,7 +39,6 @@ target_sources( teton PRIVATE quadxyz.F90 rtmainsn.F90 rtgdac.F90 - rtorder.F90 rtquad.F90 scat_prod.F90 scat_prod1.F90 diff --git a/src/teton/rt/GTASolver.F90 b/src/teton/rt/GTASolver.F90 index 10c0c75..f64e511 100644 --- a/src/teton/rt/GTASolver.F90 +++ b/src/teton/rt/GTASolver.F90 @@ -34,6 +34,29 @@ ! * ! PHI(l+1) = PHI(l+1/2) + F(l+1) * ! * +! 20240305 - Comments from BCY: * +! I've added annotations marking where * +! each term of the BiCGSTAB algorithm on Wikipedia is computed. * +! The matrix system being solved looks something like * +! (I-Linv*S)*PHI = Linv*(GTA%GreySource) * +! where S is the grey scattering and Linv is the grey sweep * +! The action of the preconditioner is given by * +! K^{-1} = I+Linv_{eps}S_{eps} * +! where Linv_{eps} and S_{eps} are the stretched sweep and * +! scattering operators. * +! See the unpreconditioned BICGSTAB algorithm, +! https://en.wikipedia.org/wiki/Biconjugate_gradient_stabilized_method#Unpreconditioned_BiCGSTAB +! with A --> K^{-1}(I-Linv*S) and b --> K^{-1}b +! * +! Absent an extenral source, GreySweep applies the operator * +! I - (I+Linv_{eps}S_{eps})(I-Linv*S) * +! which is equivalent to * +! Linv S + Linv_{eps}S_{eps} Linv S - Linv_{eps} S_{eps} * +! See rt/GreySweep.F90 for more details. * +! * +! The variables ending in "B" seem to * +! be boundary information, though I'm not exactly sure. * +! * ! * ! Units: E/e/T/m/L/A/V/t - * ! energy/photon energy/temperature/mass/length/area/volume/time * @@ -107,6 +130,10 @@ subroutine GTASolver real(adqt), allocatable :: CGDirection(:) real(adqt), allocatable :: CGAction(:) real(adqt), allocatable :: CGActionS(:) + real(adqt), allocatable :: CGDirectionB(:,:) + real(adqt), allocatable :: CGResidualB(:,:) + real(adqt), allocatable :: CGActionB(:,:) + real(adqt), allocatable :: CGActionSB(:,:) ! Constants @@ -123,6 +150,10 @@ subroutine GTASolver allocate( CGDirection(Size% ncornr) ) allocate( CGAction(Size% ncornr) ) allocate( CGActionS(Size% ncornr) ) + allocate( CGDirectionB(Size% nbelem,Size% nangGTA) ) + allocate( CGResidualB(Size% nbelem,Size% nangGTA) ) + allocate( CGActionB(Size% nbelem,Size% nangGTA) ) + allocate( CGActionSB(Size% nbelem,Size% nangGTA) ) ! Initialize index of zone with maximum error: izRelErrPoint = -1 @@ -183,34 +214,39 @@ subroutine GTASolver ! residual GTA%GreyCorrection(:) = zero + ! x_0 = b pzOld(:) = zero CGResidual(:) = zero - GTA%CGResidualB(:,:) = zero + CGResidualB(:,:) = zero + ! \vec{\hat{r}}_0 = 1 ! Initialize the CG residual using an extraneous source - nGreyIter = 1 - withSource = .TRUE. - GTA% nGreySweepIters = 2 + nGreyIter = 1 + withSource = .TRUE. if (Size% useNewGTASolver) then - call GreySweepNEW(GTA%CGResidualB, CGResidual, withSource) + call GreySweepNEW(CGResidualB, CGResidual, withSource) else - call GreySweep(GTA%CGResidualB, CGResidual) + ! This computes CGResidual = \vec{r}_0 = \vec{\tilde{b}} = K_1^{-1} \vec{b} + ! i.e., the action of the preconditioner on the initial right hand side + ! We're starting with an initial guess of \vec{x}_0 = 0 + ! This is the only time that the GreySweep is called with a source. + call GreySweep(CGResidualB, CGResidual) endif ! Initialize the CG iteration. Remove entries with zero scattering -- ! they live in the null space of M, where A := [I-M]. - CGDirection(:) = CGResidual(:) - GTA%CGDirectionB(:,:) = GTA%CGResidualB(:,:) + CGDirection(:) = CGResidual(:) ! \vec{p}_0 = \vec{r}_0 + CGDirectionB(:,:) = CGResidualB(:,:) - rrProductOld = scat_prod1(CGResidual) + rrProductOld = scat_prod1(CGResidual) ! \rho_0 = \vec{r}_0 \cdot \vec{\hat{r}}_0 ! All CG sweeps are performed with zero extraneous source - GTA%GreySource(:) = zero - withSource = .FALSE. + GTA%GreySource(:) = zero + withSource = .FALSE. ! Begin CG loop, iterating on grey corrections @@ -239,20 +275,21 @@ subroutine GTASolver ! Perform a transport sweep to compute the action of M on the ! conjugate direction (stored in CGAction) - CGAction(:) = CGDirection(:) - GTA%CGActionB(:,:) = GTA%CGDirectionB(:,:) + CGAction(:) = CGDirection(:) + CGActionB(:,:) = CGDirectionB(:,:) if (Size% useNewGTASolver) then - call GreySweepNEW(GTA%CGActionB, CGAction, withSource) + call GreySweepNEW(CGActionB, CGAction, withSource) else - call GreySweep(GTA%CGActionB, CGAction) + call GreySweep(CGActionB, CGAction) endif ! Compute the action of the transport matrix, A, on the conjugate ! direction. Recall: A := [I-M] - CGAction(:) = CGDirection(:) - CGAction(:) - GTA%CGActionB(:,:) = GTA%CGDirectionB(:,:) - GTA%CGActionB(:,:) + ! CGAction = \vec{\nu} = K_1^{-1} A \vec{p}_{i-1} + CGAction(:) = CGDirection(:) - CGAction(:) + CGActionB(:,:) = CGDirectionB(:,:) - CGActionB(:,:) ! Compute the inner product, @@ -265,26 +302,29 @@ subroutine GTASolver exit GreyIteration endif + ! \alpha alphaCG = rrProductOld/dAdProduct ! Update the residual - CGResidual(:) = CGResidual(:) - alphaCG*CGAction(:) - GTA%CGResidualB(:,:) = GTA%CGResidualB(:,:) - alphaCG*GTA%CGActionB(:,:) + ! \vec{s} = \vec{r}_{i-1} - \alpha \vec{\nu} + CGResidual(:) = CGResidual(:) - alphaCG*CGAction(:) + CGResidualB(:,:) = CGResidualB(:,:) - alphaCG*CGActionB(:,:) - CGActionS(:) = CGResidual(:) - GTA%CGActionSB(:,:) = GTA%CGResidualB(:,:) + CGActionS(:) = CGResidual(:) + CGActionSB(:,:) = CGResidualB(:,:) if (Size% useNewGTASolver) then - call GreySweepNEW(GTA%CGActionSB, CGActionS, withSource) + call GreySweepNEW(CGActionSB, CGActionS, withSource) else - call GreySweep(GTA%CGActionSB, CGActionS) + call GreySweep(CGActionSB, CGActionS) endif ! Compute the action of the transport matrix, A, on the conjugate ! direction. Recall: A := [I-M] - CGActionS(:) = CGResidual(:) - CGActionS(:) - GTA%CGActionSB(:,:) = GTA%CGResidualB(:,:) - GTA%CGActionSB(:,:) + ! \vec{t} + CGActionS(:) = CGResidual(:) - CGActionS(:) + CGActionSB(:,:) = CGResidualB(:,:) - CGActionSB(:,:) omegaNum = scat_prod(CGActionS,CGResidual) omegaDen = scat_prod(CGActionS,CGActionS) @@ -295,26 +335,32 @@ subroutine GTASolver exit GreyIteration endif + ! \omega omegaCG = omegaNum/omegaDen ! Update the Grey additive correction + ! \vec{x}_i GTA%GreyCorrection(:) = GTA%GreyCorrection(:) + & alphaCG*CGDirection(:) + omegaCG*CGResidual(:) - CGResidual(:) = CGResidual(:) - omegaCG*CGActionS(:) - GTA%CGResidualB(:,:) = GTA%CGResidualB(:,:) - omegaCG*GTA%CGActionSB(:,:) + ! \vec{r}_i + CGResidual(:) = CGResidual(:) - omegaCG*CGActionS(:) + CGResidualB(:,:) = CGResidualB(:,:) - omegaCG*CGActionSB(:,:) ! Compute the inner product, + ! \rho_i rrProduct = scat_prod1(CGResidual) + ! \beta betaCG = (rrProduct*alphaCG)/(rrProductOld*omegaCG) ! update the conjugate direction - CGDirection(:) = CGResidual(:) + betaCG* & - (CGDirection(:) - omegaCG*CGAction(:)) + ! \vec{p}_i + CGDirection(:) = CGResidual(:) + betaCG* & + (CGDirection(:) - omegaCG*CGAction(:)) - GTA%CGDirectionB(:,:) = GTA%CGResidualB(:,:) + betaCG* & - (GTA%CGDirectionB(:,:) - omegaCG*GTA%CGActionB(:,:)) + CGDirectionB(:,:) = CGResidualB(:,:) + betaCG* & + (CGDirectionB(:,:) - omegaCG*CGActionB(:,:)) ! Compute the additive grey corrections on zones for convergence tests @@ -346,7 +392,7 @@ subroutine GTASolver print *, "Teton's GTASolver encountered a NaN on iteration", nGreyIter, " on rank ", Size% myRankInGroup, " in zone ", izRelErrPoint call sleep(15) TETON_FATAL("Grey solver encountered a NaN!") - else if (phiNew /= zero) then + else if (abs(phiNew) > zero) then relErrPoint = abs(errZone/phiNew) if (relErrPoint > maxRelErrPoint) then maxRelErrPoint = relErrPoint @@ -357,7 +403,7 @@ subroutine GTASolver pzOld(zone) = pz enddo CorrectionZoneLoop - if (phiL2 /= zero) then + if (abs(phiL2) > zero) then relErrL2 = sqrt( abs(errL2/phiL2) ) else relErrL2 = zero @@ -413,12 +459,16 @@ subroutine GTASolver ! Free memory - deallocate(pzOld, stat=alloc_stat) + deallocate(pzOld, stat=alloc_stat) - deallocate(CGResidual, stat=alloc_stat) - deallocate(CGDirection, stat=alloc_stat) - deallocate(CGAction, stat=alloc_stat) - deallocate(CGActionS, stat=alloc_stat) + deallocate(CGResidual, stat=alloc_stat) + deallocate(CGDirection, stat=alloc_stat) + deallocate(CGAction, stat=alloc_stat) + deallocate(CGActionS, stat=alloc_stat) + deallocate(CGDirectionB, stat=alloc_stat) + deallocate(CGResidualB, stat=alloc_stat) + deallocate(CGActionB, stat=alloc_stat) + deallocate(CGActionSB, stat=alloc_stat) return diff --git a/src/teton/rt/GTASolver_OMPOL.F90 b/src/teton/rt/GTASolver_OMPOL.F90 index 9c52989..1ec7d0e 100644 --- a/src/teton/rt/GTASolver_OMPOL.F90 +++ b/src/teton/rt/GTASolver_OMPOL.F90 @@ -53,26 +53,37 @@ subroutine GTASolver_GPU use RadIntensity_mod use GreyAcceleration_mod use QuadratureList_mod + use AngleSet_mod use ZoneSet_mod use ieee_arithmetic + use, intrinsic :: iso_fortran_env, only : stdin=>input_unit, & + stdout=>output_unit, & + stderr=>error_unit implicit none ! Local type(IterControl), pointer :: greyControl => NULL() + type(AngleSet), pointer :: ASet => NULL() + type(HypPlane), pointer :: HypPlanePtr => NULL() integer :: c integer :: c0 integer :: zone integer :: nCorner + integer :: angle integer :: alloc_stat integer :: nGreyIter integer :: izRelErrPoint integer :: ngdart integer :: nzones + integer :: setID integer :: zSetID integer :: nZoneSets + integer :: nAngleSets + integer :: nGTASets + integer :: nHyperElements real(adqt) :: errL2 real(adqt) :: errZone @@ -111,24 +122,38 @@ subroutine GTASolver_GPU real(adqt), allocatable :: CGDirection(:) real(adqt), allocatable :: CGAction(:) real(adqt), allocatable :: CGActionS(:) + real(adqt), allocatable :: CGDirectionB(:,:) + real(adqt), allocatable :: CGResidualB(:,:) + real(adqt), allocatable :: CGActionB(:,:) + real(adqt), allocatable :: CGActionSB(:,:) ! Constants greyControl => getIterationControl(IterControls, "grey") - nzones = Size%nzones - nZoneSets = getNumberOfZoneSets(Quad) + nzones = Size%nzones + nZoneSets = getNumberOfZoneSets(Quad) + nAngleSets = getNumberOfAngleSets(Quad) + nGTASets = getNumberOfGTASets(Quad) -! Allocate memory for BiConjugate Gradient +! We augment the number of boundary elements with the number of elements +! on hyper-domain boundaries - allocate( pzOld(nzones) ) + nHyperElements = getNumberOfHyperElements(Quad, 2) + Size% nSurfElem = Size% nbelem + nHyperElements +! Allocate memory for BiConjugate Gradient + allocate( pzOld(nzones) ) allocate( CGResidual(Size% ncornr) ) allocate( CGDirection(Size% ncornr) ) allocate( CGAction(Size% ncornr) ) allocate( CGActionS(Size% ncornr) ) + allocate( CGDirectionB(Size% nSurfElem,Size% nangGTA) ) + allocate( CGResidualB(Size% nSurfElem,Size% nangGTA) ) + allocate( CGActionB(Size% nSurfElem,Size% nangGTA) ) + allocate( CGActionSB(Size% nSurfElem,Size% nangGTA) ) ! Initialize index of zone with maximum error: izRelErrPoint = -1 @@ -136,33 +161,57 @@ subroutine GTASolver_GPU ! Sum current solution over groups for convergence test ! Compute grey source +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(nZoneSets, ZSet, Geom, Rad)) +#endif do zSetID=1,nZoneSets -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(zSetID, ZSet, Geom, Rad) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) schedule(dynamic) & + !$omp& shared(zSetID, ZSet, Geom, Rad) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) ZSet% sumT(c) = Geom% Volume(c)*sum( Rad% PhiTotal(:,c) ) enddo -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(c0, nCorner) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, Geom, Rad, ZSet)&) TOMPC(private(c0, nCorner)) +#endif do zSetID=1,nZoneSets -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(zSetID, Geom, Rad, ZSet) & -!$omp& private(c0, nCorner) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(c0, nCorner) +#else + !$omp parallel do default(none) schedule(dynamic) & + !$omp& shared(zSetID, Geom, Rad, ZSet) & + !$omp& private(c0, nCorner) +#endif do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) nCorner = Geom% numCorner(zone) @@ -175,11 +224,17 @@ subroutine GTASolver_GPU Rad% radEnergy(zone) = Rad% radEnergy(zone)/Geom% VolumeZone(zone) enddo -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif TOMP(target update from(Rad% radEnergy)) @@ -191,71 +246,109 @@ subroutine GTASolver_GPU call InitGreySweepUCBxyz_GPU endif -! Initialize the additive grey corrections, P, and CG -! residual +! Generate the LU decomposition of the preconditioner +! This decomposition is independent of the BiCGSTAB iteration +! and can be reused + call ScalarIntensityDecompose_GPU + +! Initialize the CG residual using an extraneous source with +! some number of source iterations - GTA%GreyCorrection(:) = zero - pzOld(:) = zero CGResidual(:) = zero - GTA%CGResidualB(:,:) = zero + CGResidualB(:,:) = zero + withSource = .TRUE. -! Initialize the CG residual using an extraneous source +! Here we allow for some number of source iterations before +! we start the BCG iteration - nGreyIter = 1 - withSource = .TRUE. - GTA% nGreySweepIters = 2 + SourceIterationLoop: do c=1,GTA% nGreySISweeps + + if (c == GTA% nGreySISweeps) then + CGDirection(:) = CGResidual(:) + CGDirectionB(:,:) = CGResidualB(:,:) + endif + +! This does a bunch of sweeps to get you a better initial guess + call GreySweepNEW(CGResidualB, CGResidual, withSource) + + enddo SourceIterationLoop - call GreySweepNEW(GTA%CGResidualB, CGResidual, withSource) +! Store the improved initial guess \vec{x}_0 + GTA%GreyCorrection(:) = CGDirection(:) +! This final step yields the initial preconditioned residual +! CGResidual = \vec{r}_0 = K^{-1}*(\vec{b} - A*\vec{x}_0) +! See the comment in GreySweep.F90 to see the action of GreySweepNEW +! and why subtracting \vec{x}_0 yields \vec{r}_0 + CGResidual(:) = CGResidual(:) - CGDirection(:) + CGResidualB(:,:) = CGResidualB(:,:) - CGDirectionB(:,:) -! Initialize the CG iteration. Remove entries with zero scattering -- +! Initialize zonal correction for convergence test + + pzOld(:) = zero + + do zone=1,nzones + nCorner = Geom% numCorner(zone) + c0 = Geom% cOffSet(zone) + + do c=1,nCorner + pzOld(zone) = pzOld(zone) + Geom% Volume(c0+c)*GTA%GreyCorrection(c0+c) + enddo + pzOld(zone) = pzOld(zone)/Geom% VolumeZone(zone) + enddo + + nGreyIter = GTA% nGreySISweeps + +! Initialize the BCG iteration. Remove entries with zero scattering -- ! they live in the null space of M, where A := [I-M]. - CGDirection(:) = CGResidual(:) - GTA%CGDirectionB(:,:) = GTA%CGResidualB(:,:) + CGDirection(:) = CGResidual(:) + CGDirectionB(:,:) = CGResidualB(:,:) rrProductOld = scat_prod1(CGResidual) -! All CG sweeps are performed with zero extraneous source +! All BCG sweeps are performed with zero extraneous source withSource = .FALSE. -! Begin CG loop, iterating on grey corrections +! Begin BCG loop, iterating on grey corrections +! BCY: This bicgstab iteration is the same as GTASolver.F90, see annotations +! there for more information. - ngdart = getNumberOfIterations(greyControl) - - GreyIteration: do + BCGIteration: do ! This only does something if mod(verbose_level,10) > 2 write(descriptor,'(A15,I5)') "GTASolver, GreyIteration number ", nGreyIter call PrintEnergies(trim(descriptor)) -! Exit CG if the residual is below the minimum. This used to test against zero, +! Exit BCG if the residual is below the minimum. This used to test against zero, ! but due to differences in rounding errors some platforms would return ! very small numbers and not zero. if (abs(rrProductOld) < adqtSmall) then - if (nGreyIter <= 2) then - GTA%GreyCorrection(:) = CGResidual(:) +! If source iteration has converged the corrections just exit + if (nGreyIter <= GTA% nGreySISweeps) then + GTA%GreyCorrection(:) = GTA%GreyCorrection(:) + CGResidual(:) endif - exit GreyIteration + exit BCGIteration endif -! increment the grey iteration counter +! increment the grey iteration counter; each BCG iteration requires +! two grey solves nGreyIter = nGreyIter + 2 ! Perform a transport sweep to compute the action of M on the ! conjugate direction (stored in CGAction) - CGAction(:) = CGDirection(:) - GTA%CGActionB(:,:) = GTA%CGDirectionB(:,:) + CGAction(:) = CGDirection(:) + CGActionB(:,:) = CGDirectionB(:,:) - call GreySweepNEW(GTA%CGActionB, CGAction, withSource) + call GreySweepNEW(CGActionB, CGAction, withSource) ! Compute the action of the transport matrix, A, on the conjugate ! direction. Recall: A := [I-M] - CGAction(:) = CGDirection(:) - CGAction(:) - GTA%CGActionB(:,:) = GTA%CGDirectionB(:,:) - GTA%CGActionB(:,:) + CGAction(:) = CGDirection(:) - CGAction(:) + CGActionB(:,:) = CGDirectionB(:,:) - CGActionB(:,:) ! Compute the inner product, @@ -265,25 +358,25 @@ subroutine GTASolver_GPU ! conjugate direction is zero if (abs(dAdProduct) < adqtSmall) then - exit GreyIteration + exit BCGIteration endif alphaCG = rrProductOld/dAdProduct ! Update the residual - CGResidual(:) = CGResidual(:) - alphaCG*CGAction(:) - GTA%CGResidualB(:,:) = GTA%CGResidualB(:,:) - alphaCG*GTA%CGActionB(:,:) + CGResidual(:) = CGResidual(:) - alphaCG*CGAction(:) + CGResidualB(:,:) = CGResidualB(:,:) - alphaCG*CGActionB(:,:) - CGActionS(:) = CGResidual(:) - GTA%CGActionSB(:,:) = GTA%CGResidualB(:,:) + CGActionS(:) = CGResidual(:) + CGActionSB(:,:) = CGResidualB(:,:) - call GreySweepNEW(GTA%CGActionSB, CGActionS, withSource) + call GreySweepNEW(CGActionSB, CGActionS, withSource) ! Compute the action of the transport matrix, A, on the conjugate ! direction. Recall: A := [I-M] - CGActionS(:) = CGResidual(:) - CGActionS(:) - GTA%CGActionSB(:,:) = GTA%CGResidualB(:,:) - GTA%CGActionSB(:,:) + CGActionS(:) = CGResidual(:) - CGActionS(:) + CGActionSB(:,:) = CGResidualB(:,:) - CGActionSB(:,:) omegaNum = scat_prod(CGActionS,CGResidual) omegaDen = scat_prod(CGActionS,CGActionS) @@ -291,7 +384,7 @@ subroutine GTASolver_GPU if (abs(omegaDen) < adqtSmall .or. abs(omegaNum) < adqtSmall) then GTA%GreyCorrection(:) = GTA%GreyCorrection(:) + alphaCG*CGDirection(:) - exit GreyIteration + exit BCGIteration endif omegaCG = omegaNum/omegaDen @@ -300,8 +393,8 @@ subroutine GTASolver_GPU GTA%GreyCorrection(:) = GTA%GreyCorrection(:) + & alphaCG*CGDirection(:) + omegaCG*CGResidual(:) - CGResidual(:) = CGResidual(:) - omegaCG*CGActionS(:) - GTA%CGResidualB(:,:) = GTA%CGResidualB(:,:) - omegaCG*GTA%CGActionSB(:,:) + CGResidual(:) = CGResidual(:) - omegaCG*CGActionS(:) + CGResidualB(:,:) = CGResidualB(:,:) - omegaCG*CGActionSB(:,:) ! Compute the inner product, rrProduct = scat_prod1(CGResidual) @@ -309,11 +402,11 @@ subroutine GTASolver_GPU betaCG = (rrProduct*alphaCG)/(rrProductOld*omegaCG) ! update the conjugate direction - CGDirection(:) = CGResidual(:) + betaCG* & - (CGDirection(:) - omegaCG*CGAction(:)) + CGDirection(:) = CGResidual(:) + betaCG* & + (CGDirection(:) - omegaCG*CGAction(:)) - GTA%CGDirectionB(:,:) = GTA%CGResidualB(:,:) + betaCG* & - (GTA%CGDirectionB(:,:) - omegaCG*GTA%CGActionB(:,:)) + CGDirectionB(:,:) = CGResidualB(:,:) + betaCG* & + (CGDirectionB(:,:) - omegaCG*CGActionB(:,:)) ! Compute the additive grey corrections on zones for convergence tests @@ -343,9 +436,9 @@ subroutine GTASolver_GPU if (ieee_is_nan(phiNew) .or. ieee_is_nan(errZone)) then izRelErrPoint = zone ! The zone where we first see a nan print *, "Teton's GTASolver encountered a NaN on iteration", nGreyIter, " on rank ", Size% myRankInGroup, " in zone ", izRelErrPoint - call sleep(15) + flush(stdout) TETON_FATAL("Grey solver encountered a NaN!") - else if (phiNew /= zero) then + else if (abs(phiNew) > zero) then relErrPoint = abs(errZone/phiNew) if (relErrPoint > maxRelErrPoint) then maxRelErrPoint = relErrPoint @@ -356,7 +449,7 @@ subroutine GTASolver_GPU pzOld(zone) = pz enddo CorrectionZoneLoop - if (phiL2 /= zero) then + if (abs(phiL2) > zero) then relErrL2 = sqrt( abs(errL2/phiL2) ) else relErrL2 = zero @@ -371,13 +464,13 @@ subroutine GTASolver_GPU if ( GTA% enforceHardGTAIterMax .and. nGreyIter >= getMaxNumberOfIterations(greyControl) ) then - exit GreyIteration + exit BCGIteration else if ( (maxRelErrGrey < getEpsilonPoint(greyControl) .or. & nGreyIter >= getMaxNumberOfIterations(greyControl)) .and. & maxRelErrGrey < GTA%epsGrey ) then - exit GreyIteration + exit BCGIteration else if ( nGreyIter >= 100*getMaxNumberOfIterations(greyControl)) then @@ -394,26 +487,31 @@ subroutine GTASolver_GPU else rrProductOld = rrProduct - cycle GreyIteration + cycle BCGIteration endif - enddo GreyIteration + enddo BCGIteration - call PrintEnergies("GTASolver, after end of GreyIteration") + call PrintEnergies("GTASolver, after end of GreyIterations") + ngdart = getNumberOfIterations(greyControl) ngdart = ngdart + nGreyIter call setNumberOfIterations(greyControl,ngdart) ! Free memory - deallocate(pzOld, stat=alloc_stat) + deallocate(pzOld, stat=alloc_stat) - deallocate(CGResidual, stat=alloc_stat) - deallocate(CGDirection, stat=alloc_stat) - deallocate(CGAction, stat=alloc_stat) - deallocate(CGActionS, stat=alloc_stat) + deallocate(CGResidual, stat=alloc_stat) + deallocate(CGDirection, stat=alloc_stat) + deallocate(CGAction, stat=alloc_stat) + deallocate(CGActionS, stat=alloc_stat) + deallocate(CGDirectionB, stat=alloc_stat) + deallocate(CGResidualB, stat=alloc_stat) + deallocate(CGActionB, stat=alloc_stat) + deallocate(CGActionSB, stat=alloc_stat) return diff --git a/src/teton/rt/GreySweep.F90 b/src/teton/rt/GreySweep.F90 index 62ba9cf..6dc6b27 100644 --- a/src/teton/rt/GreySweep.F90 +++ b/src/teton/rt/GreySweep.F90 @@ -1,4 +1,5 @@ #include "macros.h" +#include "omp_wrappers.h" !*********************************************************************** ! Last Update: 10/2016, PFN * @@ -20,7 +21,7 @@ subroutine GreySweepNEW(PsiB, P, withSource) ! Arguments - real(adqt), intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) + real(adqt), intent(inout) :: PsiB(Size% nSurfElem,Size%nangGTA) real(adqt), intent(inout) :: P(Size%ncornr) logical (kind=1), intent(in) :: withSource @@ -29,18 +30,39 @@ subroutine GreySweepNEW(PsiB, P, withSource) integer :: zone + logical (kind=1) :: useGPU + + +! Set some constants + + useGPU = getGPUStatus(Size) + ! Perform a transport sweep to update the grey corrections, P - GTA%ID = 1 + if ( useGPU ) then - call GTASweep(P, PsiB) + TOMP_MAP(target enter data map(to: P)) - !$omp parallel do default(none) schedule(static) & - !$omp& shared(Size, P, withSource) - do zone=1,Size% nzones - call ScalarIntensityDecompose(zone, P, withSource) - enddo - !$omp end parallel do + call GTASweep_GPU(P, PsiB, withSource) + + call ScalarIntensitySolve_GPU(P) + + TOMP_MAP(target exit data map(from: P)) + + else + + GTA%ID = 1 + + call GTASweep(P, PsiB) + +!$omp parallel do default(none) schedule(static) & +!$omp& shared(Size, P, withSource) + do zone=1,Size% nzones + call ScalarIntensityDecompose(zone, P, withSource) + enddo +!$omp end parallel do + + endif return @@ -73,6 +95,9 @@ subroutine GreySweep(PsiB, P) nGTASets = getNumberOfGTASets(Quad) ! Perform a transport sweep to update the grey corrections, P +! +! BCY: This sets P --> Linv*(S*P + GTA%GreySource) +! Note that GTA%GreySource is zeroed out in GTASolver after the first sweep. GTA%OldGreyCorrection(:) = P(:) GTA%ID = 1 @@ -97,6 +122,7 @@ subroutine GreySweep(PsiB, P) GTA%ID = 2 +! BCY: With GTA%ID = 2, GTASweep applies Linv_{eps}*S_{eps} to (P-GTA%OldGreyCorrection) call GTASweep(P, GTA%TsaPsib) ! Add the "stretched" TSA correction @@ -105,7 +131,11 @@ subroutine GreySweep(PsiB, P) Set => getGTASetData(Quad, GTAsetID) P(:) = P(:) + Set% tPhi(:) enddo - +! BCY: The final result is P --> Linv_{eps}S_{eps}*(Linv*(S*P + GreySource) - P) + Linv*(S*P + GreySource) +! or equivalently, +! P --> ( I - (I + Linv_{eps}S_{eps})(I-Linv S) ) P + (I+Linv_{eps}S_{eps})*Linv GreySource +! When GreySource is zero, that simplifies to +! P --> Linv_{eps}S_{eps}(Linv*S*P - P) + Linv*S*P return end subroutine GreySweep diff --git a/src/teton/rt/GreySweep_OMPOL.F90 b/src/teton/rt/GreySweep_OMPOL.F90 deleted file mode 100644 index 8a7491a..0000000 --- a/src/teton/rt/GreySweep_OMPOL.F90 +++ /dev/null @@ -1,138 +0,0 @@ -#include "macros.h" -#include "omp_wrappers.h" - -!*********************************************************************** -! Last Update: 10/2016, PFN * -! * -! GREYSWEEP - Performs a grey sweep with a fixed residual with * -! stretched TSA preconditioner. This gives the "action" * -! of the grey-transport operator on the input vector P. * -! * -!*********************************************************************** - - subroutine GreySweepNEW(PsiB, P, withSource) - - use kind_mod - use constant_mod - use Size_mod - use GreyAcceleration_mod - - implicit none - -! Arguments - - real(adqt), intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) - real(adqt), intent(inout) :: P(Size%ncornr) - - logical (kind=1), intent(in) :: withSource - -! Local - - integer :: zone - - logical (kind=1) :: useGPU - - -! Set some constants - - useGPU = getGPUStatus(Size) - -! Perform a transport sweep to update the grey corrections, P - - if ( useGPU ) then - - TOMP(target enter data map(to: P)) - - call GTASweep_GPU(P, PsiB, withSource) - - if ( withSource ) then - call ScalarIntensityDecompose_GPU(P) - else - call ScalarIntensitySolve_GPU(P) - endif - - TOMP(target exit data map(from: P)) - - else - - GTA%ID = 1 - - call GTASweep(P, PsiB) - -!$omp parallel do default(none) schedule(static) & -!$omp& shared(Size, P, withSource) - do zone=1,Size% nzones - call ScalarIntensityDecompose(zone, P, withSource) - enddo -!$omp end parallel do - - endif - - - return - end subroutine GreySweepNEW - - - subroutine GreySweep(PsiB, P) - - use kind_mod - use constant_mod - use Size_mod - use GreyAcceleration_mod - use QuadratureList_mod - use SetData_mod - - implicit none - -! Arguments - - real(adqt), intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) - real(adqt), intent(inout) :: p(Size%ncornr) - -! Local - - type(SetData), pointer :: Set - - integer :: GTAsetID - integer :: nGTASets - - nGTASets = getNumberOfGTASets(Quad) - -! Perform a transport sweep to update the grey corrections, P - - GTA%OldGreyCorrection(:) = P(:) - GTA%ID = 1 - - call GTASweep(P, PsiB) - - P(:) = zero - - do GTAsetID=1,nGTASets - Set => getGTASetData(Quad, GTAsetID) - P(:) = P(:) + Set% tPhi(:) - enddo - -! Now use "stretched" TSA as a preconditioner. -! The source to the TSA equations is the residual -! from the grey transport sweep. The stretching parameter epsilon -! has been chosen such that the scattering source vanishes. - - GTA%TsaPsib(:,:) = zero - -! Perform a "stretched" transport sweep - - GTA%ID = 2 - - call GTASweep(P, GTA%TsaPsib) - -! Add the "stretched" TSA correction - - do GTAsetID=1,nGTASets - Set => getGTASetData(Quad, GTAsetID) - P(:) = P(:) + Set% tPhi(:) - enddo - - - return - end subroutine GreySweep - diff --git a/src/teton/rt/RecvFlux.F90 b/src/teton/rt/RecvFlux.F90 index 52092dc..00397b2 100644 --- a/src/teton/rt/RecvFlux.F90 +++ b/src/teton/rt/RecvFlux.F90 @@ -29,7 +29,7 @@ subroutine RecvFlux(SnSweep, cSetID, Angle, PsiB) logical (kind=1), intent(in) :: SnSweep integer, intent(in) :: cSetID integer, intent(in) :: Angle - real(adqt), optional, intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) + real(adqt), optional, intent(inout) :: PsiB(Size%nSurfElem,Size%nangGTA) ! Local @@ -46,6 +46,8 @@ subroutine RecvFlux(SnSweep, cSetID, Angle, PsiB) integer :: sharedID integer :: nShared +! Constants + nShared = getNumberOfShared(RadBoundary) CSet => getCommSetData(Quad, cSetID) diff --git a/src/teton/rt/SendFlux.F90 b/src/teton/rt/SendFlux.F90 index 3aae409..bc90c5a 100644 --- a/src/teton/rt/SendFlux.F90 +++ b/src/teton/rt/SendFlux.F90 @@ -29,7 +29,7 @@ subroutine SendFlux(SnSweep, cSetID, sendIndex, PsiB) logical (kind=1), intent(in) :: SnSweep integer, intent(in) :: cSetID integer, intent(in) :: sendIndex - real(adqt), optional, intent(in) :: PsiB(Size%nbelem,Size%nangGTA) + real(adqt), optional, intent(in) :: PsiB(Size%nSurfElem,Size%nangGTA) ! Local @@ -47,6 +47,8 @@ subroutine SendFlux(SnSweep, cSetID, sendIndex, PsiB) integer :: sharedID integer :: nShared +! Constants + nShared = getNumberOfShared(RadBoundary) CSet => getCommSetData(Quad, cSetID) @@ -77,6 +79,7 @@ subroutine SendFlux(SnSweep, cSetID, sendIndex, PsiB) enddo ! Start send for this communicator (odd numbered handle) + call MPIStart(CommT% irequest(1)) endif @@ -94,6 +97,7 @@ subroutine SendFlux(SnSweep, cSetID, sendIndex, PsiB) enddo ! Start send for this communicator (odd numbered handle) + call MPIStart(CommT% irequest(1)) endif diff --git a/src/teton/rt/addGreyCorrections.F90 b/src/teton/rt/addGreyCorrections.F90 index 3469540..9669194 100644 --- a/src/teton/rt/addGreyCorrections.F90 +++ b/src/teton/rt/addGreyCorrections.F90 @@ -39,6 +39,7 @@ subroutine addGreyCorrections integer :: c0 integer :: nCorner integer :: angle + integer :: exit_angle integer :: NumAngles integer :: g integer :: g0 @@ -98,7 +99,8 @@ subroutine addGreyCorrections ! Update Set dependent boundary fluxes !$omp parallel do default(none) schedule(static) & -!$omp& private(NumAngles, g0, Groups, c, nBdyElem, b, b0, Set, ASet, CSet, CommT, BdyT) & +!$omp& private(NumAngles, g0, Groups, c, nBdyElem, b, b0, exit_angle) & +!$omp& private(Set, ASet, CSet, CommT, BdyT) & !$omp& shared(nSets, Quad, GTA, RadBoundary, wtiso, nReflecting, nShared) SetLoop: do setID=1,nSets @@ -139,13 +141,13 @@ subroutine addGreyCorrections b0 = getFirstBdyElement(BdyT) - 1 do i=1,ASet% nExit(reflID) - angle = ASet% ExitAngleList(i,reflID) + exit_angle = ASet% ExitAngleList(i,reflID) do b=1,nBdyElem c = BdyT% BdyToC(b) do g=1,Groups - Set% PsiB(g,b0+b,angle) = Set% PsiB(g,b0+b,angle) + wtiso* & - GTA%GreyCorrection(c)*GTA% Chi(g0+g,c) + Set% PsiB(g,b0+b,exit_angle) = Set% PsiB(g,b0+b,exit_angle) + wtiso* & + GTA%GreyCorrection(c)*GTA% Chi(g0+g,c) enddo enddo enddo diff --git a/src/teton/rt/addGreyCorrections_OMPOL.F90 b/src/teton/rt/addGreyCorrections_OMPOL.F90 index f404b0a..795074e 100644 --- a/src/teton/rt/addGreyCorrections_OMPOL.F90 +++ b/src/teton/rt/addGreyCorrections_OMPOL.F90 @@ -35,6 +35,7 @@ subroutine addGreyCorrections_GPU type(CommSet), pointer :: CSet type(Communicator), pointer :: CommT type(Boundary), pointer :: BdyT + type(HypPlane), pointer :: HypPlanePtr integer :: zone integer :: nzones @@ -44,6 +45,7 @@ subroutine addGreyCorrections_GPU integer :: c0 integer :: nCorner integer :: angle + integer :: exit_angle integer :: NumAngles integer :: g integer :: g0 @@ -74,48 +76,87 @@ subroutine addGreyCorrections_GPU ! Compute the group-dependent corrections - TOMP(target update to(GTA% GreyCorrection)) + TOMP_UPDATE(target update to(GTA% GreyCorrection)) -TOMP(target enter data map(to: ngr)) -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nZoneSets, GTA, Geom, Rad, ngr)) + TOMP_MAP(target enter data map(to: ngr, wtiso)) + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nZoneSets, GTA, Geom, Rad, ngr)) +#endif do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else !$omp parallel do collapse(2) default(none) schedule(dynamic) & !$omp& shared(zSetID, GTA, Geom, Rad, ngr) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do g=1,ngr Rad% PhiTotal(g,c) = Rad% PhiTotal(g,c) + GTA%GreyCorrection(c)*GTA% Chi(g,c) enddo enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo -TOMP(end target teams distribute) - - -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nZoneSets, ZSet, Geom, Rad)) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nZoneSets, ZSet, Geom, Rad)) +#endif do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) schedule(dynamic) & !$omp& shared(zSetID, ZSet, Geom, Rad) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) ZSet% sumT(c) = sum( Rad% PhiTotal(:,c) ) enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo -TOMP(end target teams distribute) - - -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) -TOMPC(shared(nZoneSets, Geom, Rad, ZSet)&) -TOMPC(private(c0, nCorner)) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(c0, nCorner) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nZoneSets, Geom, Rad, ZSet)&) + TOMPC(private(c0, nCorner)) +#endif do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(c0, nCorner) +#else !$omp parallel do default(none) schedule(dynamic) & !$omp& shared(zSetID, Geom, Rad, ZSet) private(c0, nCorner) +#endif do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) nCorner = Geom% numCorner(zone) c0 = Geom% cOffSet(zone) @@ -126,18 +167,74 @@ subroutine addGreyCorrections_GPU Geom% Volume(c0+c)*ZSet% sumT(c0+c) enddo enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif + + enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(Set, ASet, HypPlanePtr, Groups, NumAngles, c, g0) +#else + TOMP(target teams distribute num_teams(nSets) thread_limit(omp_device_team_thread_limit) default(none) &) + TOMPC(shared(nSets, Quad, GTA, wtiso)&) + TOMPC(private(Set, ASet, HypPlanePtr, Groups, NumAngles, c, g0)) +#endif + do setID=1,nSets + Set => Quad% SetDataPtr(setID) + ASet => Quad% AngSetPtr(Set% angleSetID) + Groups = Set% Groups + g0 = Set% g0 + NumAngles = Set% NumAngles + + do angle=1,NumAngles + HypPlanePtr => ASet% HypPlanePtr(angle) + +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(c) +#else + !$omp parallel do collapse(2) default(none) & + !$omp& shared(Set, HypPlanePtr, GTA, angle, g0, Groups, wtiso) & + !$omp& private(c) +#endif + do i=1,HypPlanePtr% interfaceLen + do g=1,Groups + c = HypPlanePtr% interfaceList(i) + Set% PsiInt(g,i,angle) = Set% PsiInt(g,i,angle) + wtiso* & + GTA%GreyCorrection(c)*GTA% Chi(g0+g,c) + enddo + enddo +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif + + enddo enddo -TOMP(end target teams distribute) -TOMP(target exit data map(release: ngr)) -TOMP(target update from(Rad% radEnergy)) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +TOMP_MAP(target exit data map(release: ngr, wtiso)) + +TOMP_UPDATE(target update from(Rad% radEnergy)) ! Update Set dependent boundary fluxes - !$omp parallel do default(none) schedule(static) & - !$omp private(NumAngles, g0, Groups, b, c, reflID, nBdyElem, b0, Set, ASet, CSet, CommT, BdyT) & + !$omp parallel do default(none) schedule(static) & + !$omp& private(NumAngles, g0, Groups, b, c, reflID, nBdyElem, exit_angle) & + !$omp& private(b0, Set, ASet, CSet, CommT, BdyT) & !$omp& shared(nSets, nShared, nReflecting, Quad, GTA, RadBoundary, wtiso) SetLoop: do setID=1,nSets @@ -178,13 +275,13 @@ subroutine addGreyCorrections_GPU b0 = getFirstBdyElement(BdyT) - 1 do i=1,ASet% nExit(reflID) - angle = ASet% ExitAngleList(i,reflID) + exit_angle = ASet% ExitAngleList(i,reflID) do b=1,nBdyElem c = BdyT% BdyToC(b) do g=1,Groups - Set% PsiB(g,b0+b,angle) = Set% PsiB(g,b0+b,angle) + wtiso* & - GTA%GreyCorrection(c)*GTA% Chi(g0+g,c) + Set% PsiB(g,b0+b,exit_angle) = Set% PsiB(g,b0+b,exit_angle) + wtiso* & + GTA%GreyCorrection(c)*GTA% Chi(g0+g,c) enddo enddo enddo diff --git a/src/teton/rt/findexit.F90 b/src/teton/rt/findexit.F90 index f47088e..f2bf9d6 100644 --- a/src/teton/rt/findexit.F90 +++ b/src/teton/rt/findexit.F90 @@ -286,11 +286,6 @@ subroutine findexit(aSetID) enddo ConstructBufferLoop - -! Initialize communication handles for persistent communicators -! Move this to after comm set buffers are mapped in initializeSets -! call initcomm(aSetID) - endif DecompTest ! Loop over all boundaries and create a list of exiting boundary elements @@ -348,7 +343,6 @@ subroutine findexit(aSetID) deallocate( bdyList ) - return end subroutine findexit diff --git a/src/teton/rt/findexit1D.F90 b/src/teton/rt/findexit1D.F90 index 09eb008..a670c5e 100644 --- a/src/teton/rt/findexit1D.F90 +++ b/src/teton/rt/findexit1D.F90 @@ -152,15 +152,8 @@ subroutine findexit1D(cSetID) enddo CommunicatorLoop - -! Initialize communication handles for persistent communicators - - call initcomm(cSetID) - - endif DecompTest - return end subroutine findexit1D diff --git a/src/teton/rt/getCollisionRate_OMPOL.F90 b/src/teton/rt/getCollisionRate_OMPOL.F90 index 6ad20d8..89cea9e 100644 --- a/src/teton/rt/getCollisionRate_OMPOL.F90 +++ b/src/teton/rt/getCollisionRate_OMPOL.F90 @@ -44,15 +44,28 @@ subroutine getCollisionRate_GPU(residualFlag) ! Calculate the total energy absorption rate density -TOMP(target enter data map(to: ngr)) -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nZoneSets, ZSet, Geom, Rad, Mat, ngr) &) -TOMPC(private(zone)) +TOMP_MAP(target enter data map(to: ngr)) + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) & + !$acc& private(zone) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, ZSet, Geom, Rad, Mat, ngr) &) + TOMPC(private(zone)) +#endif + do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) & + !$acc& private(zone) +#else !$omp parallel do default(none) collapse(2) schedule(dynamic) & - !$omp& shared(ZSet, zSetID, Geom, Rad, Mat, ngr) & + !$omp& shared(nZoneSets, ZSet, zSetID, Geom, Rad, Mat, ngr) & !$omp& private(zone) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do g=1,ngr zone = Geom% CToZone(c) @@ -60,44 +73,86 @@ subroutine getCollisionRate_GPU(residualFlag) Rad% PhiTotal(g,c) enddo enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo -TOMP(end target teams distribute) -TOMP(target exit data map(release: ngr)) + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif + + +TOMP_MAP(target exit data map(release: ngr)) if (residualFlag == 0) then -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nZoneSets, ZSet, Geom, GTA)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, ZSet, Geom, GTA)) +#endif + do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) schedule(dynamic) & - !$omp& shared(ZSet, zSetID, Geom, GTA) + !$omp& shared(nZoneSets, ZSet, zSetID, Geom, GTA) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) GTA% GreySource(c) = sum( ZSet% ex(:,c) ) enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo -TOMP(end target teams distribute) + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif else -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nZoneSets, ZSet, Geom, GTA)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, ZSet, Geom, GTA)) +#endif do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) schedule(dynamic) & - !$omp& shared(ZSet, zSetID, Geom, GTA) + !$omp& shared(nZoneSets, ZSet, zSetID, Geom, GTA) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) GTA% GreySource(c) = sum( ZSet% ex(:,c) ) - GTA% GreySource(c) enddo +#ifndef TETON_ENABLE_OPENACC !$omp end parallel do +#endif enddo -TOMP(end target teams distribute) + +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif endif diff --git a/src/teton/rt/rtorder.F90 b/src/teton/rt/getDirectedGraph.F90 similarity index 50% rename from src/teton/rt/rtorder.F90 rename to src/teton/rt/getDirectedGraph.F90 index 9396f6a..496cfd1 100644 --- a/src/teton/rt/rtorder.F90 +++ b/src/teton/rt/getDirectedGraph.F90 @@ -1,17 +1,19 @@ !*********************************************************************** -! Last Update: 02/2012, PFN * +! Last Update: 04/2024, PFN * ! * -! RTORDER - This routine builds an ordered list of corners for each * -! unique direction. * +! getDirectedGraph - This routine builds an ordered list of corners * +! or zones (depending on sweeping method) for * +! each unique direction. * ! * !*********************************************************************** - subroutine rtorder(aSetID) + subroutine getDirectedGraph(aSetID) use kind_mod use Size_mod use constant_mod use QuadratureList_mod use AngleSet_mod + use Options_mod implicit none @@ -27,22 +29,26 @@ subroutine rtorder(aSetID) integer :: angle integer :: offSet integer :: mCycle - integer :: nDomains + integer :: nHyperDomains integer :: nAngleSets integer :: nGroupSets integer :: setID integer :: gSetID - integer :: maxZonesPerPlane + integer :: maxPerPlane + integer :: numAngles + integer :: sweepVersion ! Dynamic integer, allocatable :: badCornerList(:) ! Constants - ASet => getAngleSetData(Quad, aSetID) - nAngleSets = getNumberOfAngleSets(Quad) - nGroupSets = getNumberOfGroupSets(Quad) - nDomains = Quad% nHyperDomains + ASet => getAngleSetData(Quad, aSetID) + + nAngleSets = getNumberOfAngleSets(Quad) + nGroupSets = getNumberOfGroupSets(Quad) + numAngles = ASet% NumAngles + sweepVersion = Options% getSweepVersion() ! Determine the sweep order for each angle (i.e. the order in which the ! zones are solved: "nextZ") @@ -50,18 +56,38 @@ subroutine rtorder(aSetID) ASet% numCycles(:) = 0 ASet% cycleOffSet(:) = 0 ASet% nHyperPlanes(:) = 0 + maxPerPlane = 1 - AngleLoop: do angle=1,ASet% NumAngles + AngleLoop: do angle=1,numAngles if ( .not. ASet% FinishingDirection(angle) ) then - call snnext(aSetID, angle, nDomains) + + HypPlanePtr => ASet% HypPlanePtr(angle) + + if ( ASet% GTASet ) then + nHyperDomains = getNumberOfHyperDomains(Quad,2) + + call getZoneGraph(aSetID, angle, nHyperDomains) + else + nHyperDomains = getNumberOfHyperDomains(Quad,1) + + if ( sweepVersion == 0 ) then + call getZoneGraph(aSetID, angle, nHyperDomains) + maxPerPlane = max(maxPerPlane, HypPlanePtr% maxZones) + else + call getCornerGraph(aSetID, angle, nHyperDomains) + maxPerPlane = max(maxPerPlane, HypPlanePtr% maxCorners) + endif + + endif + endif enddo AngleLoop ASet% totalCycles = ASet% numCycles(1) - do angle=2,ASet% NumAngles + do angle=2,numAngles ASet% cycleOffSet(angle) = ASet% cycleOffSet(angle-1) + & ASet% numCycles(angle-1) ASet% totalCycles = ASet% totalCycles + & @@ -70,22 +96,26 @@ subroutine rtorder(aSetID) ! Construct cycle List - allocate( badCornerList(ASet% totalCycles) ) + ! Cray is having trouble with zero length arrays. If this array is + ! zero length it appears to cause code corruption in the cycle list constructor + ! below. + ! See https://rzlc.llnl.gov/gitlab/deterministic-transport/TRT/Teton/-/issues/429 + allocate( badCornerList(ASet% totalCycles +1 ) ) - offSet = 0 - maxZonesPerPlane = 1 + offSet = 0 + ASet% maxInterface = 1 - do angle=1,ASet% NumAngles + do angle=1,numAngles HypPlanePtr => ASet% HypPlanePtr(angle) do mCycle=1,ASet% numCycles(angle) badCornerList(offSet+mCycle) = HypPlanePtr% badCornerList(mCycle) enddo - offSet = offSet + ASet% numCycles(angle) + offSet = offSet + ASet% numCycles(angle) if ( .not. ASet% FinishingDirection(angle) ) then - maxZonesPerPlane = max(maxZonesPerPlane, HypPlanePtr%maxZones) + ASet% maxInterface = max( ASet% maxInterface , HypPlanePtr% interfaceLen ) endif enddo @@ -94,20 +124,23 @@ subroutine rtorder(aSetID) ! Allocate dynamic memory that can change size each cycle - if (aSetID <= nAngleSets) then + if ( .not. ASet% GTASet ) then offSet = (aSetID - 1)*nGroupSets do gSetID=1,nGroupSets setID = offSet + gSetID - call constructDynMemory(setID, maxZonesPerPlane) + call constructDynMemory(setID, maxPerPlane) enddo endif + + deallocate( badCornerList ) + return - end subroutine rtorder + end subroutine getDirectedGraph diff --git a/src/teton/rt/initFindExit.F90 b/src/teton/rt/initFindExit.F90 index 3fe9ae4..5b7a7f7 100644 --- a/src/teton/rt/initFindExit.F90 +++ b/src/teton/rt/initFindExit.F90 @@ -97,8 +97,6 @@ subroutine initFindExit(nAngleSets, nGTASets) enddo AngleSetLoop2 - - return end subroutine initFindExit diff --git a/src/teton/rt/rtmainsn.F90 b/src/teton/rt/rtmainsn.F90 index 889f969..fca161f 100644 --- a/src/teton/rt/rtmainsn.F90 +++ b/src/teton/rt/rtmainsn.F90 @@ -81,12 +81,6 @@ subroutine rtmainsn incidentFluxControl => getIterationControl(IterControls, "incidentFlux") nonlinearControl => getIterationControl(IterControls, "nonLinear") -! Check whether mesh is 3D for the corner sweep - - if ( sweepVersion == 1 .and. ndim /= 3 ) then - print *, "TETON WARNING: Corner sweep can only work with 3D meshes. Please set sweep version to 0, this run will use the zone sweep." - endif - ! Check that tolerances have been set identically on all ranks. epsilonCheck = getEpsilonPoint(temperatureControl) call MPIAllReduce(epsilonCheck, "max", MY_COMM_GROUP) diff --git a/src/teton/rt/setGTAOpacity.F90 b/src/teton/rt/setGTAOpacity.F90 index 7b3b012..262ac72 100644 --- a/src/teton/rt/setGTAOpacity.F90 +++ b/src/teton/rt/setGTAOpacity.F90 @@ -42,7 +42,7 @@ subroutine setGTAOpacityNEW(zone) real(adqt), parameter :: minRatio = 1.0e-10_adqt - tetonAssert(Size%ndim > 1, "setGTAOpacityNEW not implemented for 1D at this time.") + TETON_ASSERT(Size%ndim > 1, "setGTAOpacityNEW not implemented for 1D at this time.") ! Constants diff --git a/src/teton/rt/setGTAOpacity_OMPOL.F90 b/src/teton/rt/setGTAOpacity_OMPOL.F90 index 2cae486..183837e 100644 --- a/src/teton/rt/setGTAOpacity_OMPOL.F90 +++ b/src/teton/rt/setGTAOpacity_OMPOL.F90 @@ -48,61 +48,102 @@ subroutine setGTAOpacityNEW_GPU ! If the CUDA solver is used we need to map Eta/Chi if ( Size%useCUDASolver ) then - TOMP(target update to(GTA% Chi)) - TOMP(target update to(Mat% Eta)) + TOMP_UPDATE(target update to(GTA% Chi)) + TOMP_UPDATE(target update to(Mat% Eta)) endif - TOMP(target enter data map(to: tau, ngr)) + TOMP_MAP(target enter data map(to: tau, ngr)) + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, ZSet, Geom, Mat, tau, ngr)) +#endif do zSetID=1,nZoneSets -!$omp parallel do collapse(2) default(none) schedule(dynamic) & -!$omp& shared(ZSet, Geom, Mat, tau, ngr, zSetID) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector collapse(2) +#else + !$omp parallel do collapse(2) default(none) schedule(dynamic) & + !$omp& shared(ZSet, Geom, Mat, tau, ngr, zSetID) +#endif do zone=Geom% zone1(zSetID),Geom% zone2(zSetID) do g=1,ngr ZSet% B(g,zone) = one/(Mat% Siga(g,zone) + Mat% Sigs(g,zone) + tau) enddo enddo -!$omp end parallel do + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, ZSet, Geom)) +#endif do zSetID=1,nZoneSets -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(zSetID, ZSet, Geom) private(c) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) schedule(dynamic) & + !$omp& shared(zSetID, ZSet, Geom) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) ZSet% sumT(c) = zero ZSet% delta(c) = zero ZSet% netRate(c) = zero enddo -!$omp end parallel do + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(zone, SigtInv, ChiSigt) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, ZSet, Geom, Mat, GTA, ngr) &) - TOMPC(private(zSetID, zone, SigtInv, ChiSigt) ) + TOMPC(private(zone, SigtInv, ChiSigt) ) +#endif do zSetID=1,nZoneSets -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(ZSet, Geom, Mat, GTA, ngr, zSetID) & -!$omp& private(zone, SigtInv, ChiSigt) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(zone, SigtInv, ChiSigt) +#else + !$omp parallel do default(none) schedule(dynamic) & + !$omp& shared(ZSet, Geom, Mat, GTA, ngr, zSetID) & + !$omp& private(zone, SigtInv, ChiSigt) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) do g=1,ngr @@ -116,20 +157,35 @@ subroutine setGTAOpacityNEW_GPU GTA% Chi(g,c) = ChiSigt enddo enddo -!$omp end parallel do + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none) &) TOMPC(shared(nZoneSets, ZSet, GTA, Geom, ngr)) +#endif do zSetID=1,nZoneSets -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(zSetID, ZSet, GTA, Geom, ngr) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) schedule(dynamic) & + !$omp& shared(zSetID, ZSet, GTA, Geom, ngr) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) if ( ZSet% sumT(c) > zero ) then @@ -138,20 +194,35 @@ subroutine setGTAOpacityNEW_GPU enddo endif enddo -!$omp end parallel do + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(nZoneSets, ZSet, Geom, Mat, tau)) +#endif do zSetID=1,nZoneSets -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(zSetID, ZSet, Geom, Mat, tau) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) schedule(dynamic) & + !$omp& shared(zSetID, ZSet, Geom, Mat, tau) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) if ( ZSet% sumT(c) > zero ) then @@ -162,20 +233,38 @@ subroutine setGTAOpacityNEW_GPU ZSet% comptonSe(c) = tau endif enddo -!$omp end parallel do + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif + +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) & + !$acc& private(greysigs, scatRatio) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(nZoneSets, ZSet, GTA, Geom)&) TOMPC(private(greysigs, scatRatio)) +#endif do zSetID=1,nZoneSets -!$omp parallel do default(none) schedule(dynamic) & -!$omp& shared(zSetID, ZSet, GTA, Geom) private(greysigs, scatRatio) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector & + !$acc& private(greysigs, scatRatio) +#else + !$omp parallel do default(none) schedule(dynamic) & + !$omp& shared(zSetID, ZSet, GTA, Geom) private(greysigs, scatRatio) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) greysigs = ZSet% dTCompton(c) - ZSet% comptonSe(c) @@ -189,35 +278,57 @@ subroutine setGTAOpacityNEW_GPU GTA%GreySigTotal(c) = ZSet% dTCompton(c) endif enddo -!$omp end parallel do + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) vector_length(omp_device_team_thread_limit) +#else TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) TOMPC(shared(nZoneSets, GTA, Geom)) +#endif do zSetID=1,nZoneSets +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else !$omp parallel do default(none) schedule(dynamic) & !$omp& shared(zSetID, GTA, Geom) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) GTA%GreySigScatVol(c) = GTA%GreySigScat(c)*Geom% Volume(c) GTA%GreySigtInv(c) = one/GTA%GreySigTotal(c) enddo -!$omp end parallel do + +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else TOMP(end target teams distribute) +#endif - TOMP(target exit data map(release: tau, ngr)) + TOMP_MAP(target exit data map(release: tau, ngr)) - TOMP(target update from(GTA% GreySigScatVol)) - TOMP(target update from(GTA% Chi)) + TOMP_UPDATE(target update from(GTA% GreySigScatVol)) + TOMP_UPDATE(target update from(GTA% Chi)) return diff --git a/src/teton/snac/CMakeLists.txt b/src/teton/snac/CMakeLists.txt index 837f041..13ef7ae 100644 --- a/src/teton/snac/CMakeLists.txt +++ b/src/teton/snac/CMakeLists.txt @@ -16,11 +16,17 @@ target_sources( teton PRIVATE SweepUCBxyz.F90 UpdateScalarIntensity.F90 assoc_legendre.F90 - cyclebreaker.F90 + cycleBreakerCorner.F90 + cycleBreakerZone.F90 factorial.F90 - findseeds.F90 fixZone.F90 + getCornerDependency.F90 + getCornerGraph.F90 getDownStreamData.F90 + getNewCorners.F90 + getNewZones.F90 + getZoneDependency.F90 + getZoneGraph.F90 harmonics.F90 harmonics_1d.F90 harmonics_2d.F90 @@ -28,10 +34,9 @@ target_sources( teton PRIVATE kronecker.F90 reflectArbitrary.F90 reflectAxis.F90 - sccsearch.F90 + sccSearchCorner.F90 + sccSearchZone.F90 snmoments.F90 - snneed.F90 - snnext.F90 snpnmset.F90 snreflect.F90 snynmset.F90 diff --git a/src/teton/snac/GTASweep.F90 b/src/teton/snac/GTASweep.F90 index a70334f..b9883eb 100644 --- a/src/teton/snac/GTASweep.F90 +++ b/src/teton/snac/GTASweep.F90 @@ -23,7 +23,7 @@ subroutine GTASweep(P, PsiB) ! Arguments - real(adqt), intent(inout) :: P(Size%ncornr) + real(adqt), intent(in) :: P(Size%ncornr) real(adqt), intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) ! Local diff --git a/src/teton/snac/GTASweep_OMPOL.F90 b/src/teton/snac/GTASweep_OMPOL.F90 index f093dcf..4c036b1 100644 --- a/src/teton/snac/GTASweep_OMPOL.F90 +++ b/src/teton/snac/GTASweep_OMPOL.F90 @@ -25,7 +25,7 @@ subroutine GTASweep_GPU(P, PsiB, withSource) ! Arguments real(adqt), intent(in) :: P(Size%ncornr) - real(adqt), intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) + real(adqt), intent(inout) :: PsiB(Size% nSurfElem,Size%nangGTA) logical (kind=1), intent(in) :: withSource @@ -57,7 +57,7 @@ subroutine GTASweep_GPU(P, PsiB, withSource) nGTASets = getNumberOfGTASets(Quad) nZoneSets = getNumberOfZoneSets(Quad) nCommSets = getNumberOfCommSets(Quad) - ndim = Size% ndim + ndim = Size% ndim SnSweep = .FALSE. wtiso = Size% wtiso @@ -65,7 +65,7 @@ subroutine GTASweep_GPU(P, PsiB, withSource) ! Initialize Communication - TOMP(target enter data map(to: wtiso)) + TOMP_MAP(target enter data map(to: wtiso)) do cSetID=nCommSets+1,nCommSets+nGTASets @@ -83,47 +83,79 @@ subroutine GTASweep_GPU(P, PsiB, withSource) if ( withSource ) then -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nZoneSets, Geom, GTA, wtiso)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, Geom, GTA, wtiso, P)) +#endif ZoneSetLoop: do zSetID=1,nZoneSets -!$omp parallel do default(none) & -!$omp& shared(Geom, GTA, zSetID, wtiso) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, GTA, zSetID, wtiso, P) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) - GTA% PhiInc(c) = zero - GTA% Q(c) = wtiso*GTA%GreySigtInv(c)*GTA% GreySource(c) - GTA% TsaSource(c) = wtiso*Geom% Volume(c)*GTA% GreySource(c) + GTA% PhiInc(c) = GTA% Sscat(c) + GTA% Q(c) = wtiso*GTA%GreySigtInv(c)*( GTA% GreySource(c) + & + GTA% GreySigScat(c)*P(c) ) + GTA% TsaSource(c) = wtiso*Geom% Volume(c)*( GTA% GreySource(c) + & + GTA% GreySigScat(c)*P(c) ) enddo -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo ZoneSetLoop -TOMP(end target teams distribute) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif else -TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) -TOMPC(shared(nZoneSets, Geom, GTA, wtiso, P)) +#ifdef TETON_ENABLE_OPENACC + !$acc parallel loop gang num_gangs(nZoneSets) & + !$acc& vector_length(omp_device_team_thread_limit) +#else + TOMP(target teams distribute num_teams(nZoneSets) thread_limit(omp_device_team_thread_limit) default(none)&) + TOMPC(shared(nZoneSets, Geom, GTA, wtiso, P)) +#endif ZoneSetLoop2: do zSetID=1,nZoneSets -!$omp parallel do default(none) & -!$omp& shared(Geom, GTA, zSetID, wtiso, P) +#ifdef TETON_ENABLE_OPENACC + !$acc loop vector +#else + !$omp parallel do default(none) & + !$omp& shared(Geom, GTA, zSetID, wtiso, P) +#endif do c=Geom% corner1(zSetID),Geom% corner2(zSetID) GTA% PhiInc(c) = zero GTA% Q(c) = wtiso*GTA%GreySigtInv(c)*GTA% GreySigScat(c)*P(c) GTA% TsaSource(c) = wtiso*Geom% Volume(c)*GTA% GreySigScat(c)*P(c) enddo -!$omp end parallel do +#ifndef TETON_ENABLE_OPENACC + !$omp end parallel do +#endif enddo ZoneSetLoop2 -TOMP(end target teams distribute) +#ifdef TETON_ENABLE_OPENACC + !$acc end parallel loop +#else + TOMP(end target teams distribute) +#endif endif -TOMP(target exit data map(release: wtiso)) +TOMP_MAP(target exit data map(release: wtiso)) ! Loop over angles, solving for each in turn: diff --git a/src/teton/snac/cycleBreakerCorner.F90 b/src/teton/snac/cycleBreakerCorner.F90 new file mode 100644 index 0000000..21e206e --- /dev/null +++ b/src/teton/snac/cycleBreakerCorner.F90 @@ -0,0 +1,161 @@ +!*********************************************************************** +! Last Update: 04/2024, PFN * +! * +! CycleBreakerCorner - This routine breaks cycles in the mesh by * +! selecting a corner that will use some old * +! (i.e. previous iterate) incident fluxes. * +! * +!*********************************************************************** + subroutine cycleBreakerCorner(ndone, meshCycles, nextCorner, & + addedCorners, need, cornerList, & + cycleList, nDSC, DSC, onCycleList) + + use kind_mod + use constant_mod + use Size_mod + + implicit none + +! Arguments + + integer, intent(in) :: ndone + integer, intent(inout) :: meshCycles + integer, intent(inout) :: nextCorner + integer, intent(inout) :: addedCorners + + integer, intent(inout) :: need(Size%ncornr) + integer, intent(inout) :: cornerList(Size%ncornr) + integer, intent(inout) :: cycleList(Size%ncornr) + integer, intent(inout) :: nDSC(Size%ncornr) + integer, intent(inout) :: DSC(2*Size%maxcf,Size%ncornr) + + logical (kind=1), intent(inout) :: onCycleList(Size%ncornr) + +! Local Variables + + integer :: i + integer :: ngraph + integer :: nleft + integer :: ncount + integer :: stackindex + integer :: ncornr + integer :: c + integer :: nBreaks + +! Dynamic + + integer, allocatable :: list(:) + integer, allocatable :: cBreakList(:) + integer, allocatable :: dfnum(:) + integer, allocatable :: lowlink(:) + integer, allocatable :: stack(:) + integer, allocatable :: tempList(:) + + logical (kind=1), allocatable :: new(:) + logical (kind=1), allocatable :: onstack(:) + +! Mesh Constants + + ncornr = Size% ncornr + +! Allocate arrays for the number of corners in the graph (= ncornr - ndone) + + ngraph = ncornr - ndone + + allocate( list(ngraph) ) + allocate( cBreakList(ngraph) ) + allocate( dfnum(ncornr) ) + allocate( lowlink(ncornr) ) + allocate( stack(ngraph) ) + allocate( tempList(ngraph) ) + + allocate( new(ncornr) ) + allocate( onstack(ncornr) ) + +! Initialize arrays and counters + + new(:) = .TRUE. + onstack(:) = .FALSE. + + nBreaks = 0 + ncount = 0 + stackindex = 0 + + stack(:) = 0 + +! Make a list of all remaining corners + + nleft = 0 + + do c=1,ncornr + if (need(c) == 0) then + new(c) = .FALSE. + else + nleft = nleft + 1 + list(nleft) = c + endif + enddo + + if (nleft /= ngraph) then + call f90fatal("Miscount of remaining corners in CycleBreakerCorner") + endif + +! Loop over the number of corners in the graph + + do i=1,ngraph + + c = list(i) + + if ( new(c) ) then + + call sccSearchCorner(c, ngraph, ncount, stackindex, & + nBreaks, meshCycles, dfnum, lowlink, & + need, stack, new, onstack, tempList, & + cycleList, cBreakList, nDSC, DSC, & + onCycleList) + + endif + + enddo + + + if (nBreaks == 0) then + + call f90fatal("CycleBreakerCorner: detection failed, no dependencies broken") + + else + + addedCorners = 0 + do i=1,nBreaks + c = cBreakList(i) + + if (need(c) == 0) then + nextCorner = nextCorner + 1 + addedCorners = addedCorners + 1 + cornerList(nextCorner) = c + elseif (need(c) < 0) then + call f90fatal("CycleBreakerCorner, need < 0") + endif + enddo + + if (addedCorners == 0) then + call f90fatal("Cycles found, but not broken in CycleBreakerCorner") + endif + + endif + +! Release memory + + deallocate( list ) + deallocate( cBreakList ) + deallocate( dfnum ) + deallocate( lowlink ) + deallocate( stack ) + deallocate( tempList ) + deallocate( new ) + deallocate( onstack ) + + + return + end subroutine cycleBreakerCorner + diff --git a/src/teton/snac/cyclebreaker.F90 b/src/teton/snac/cycleBreakerZone.F90 similarity index 69% rename from src/teton/snac/cyclebreaker.F90 rename to src/teton/snac/cycleBreakerZone.F90 index e665299..7b38ada 100644 --- a/src/teton/snac/cyclebreaker.F90 +++ b/src/teton/snac/cycleBreakerZone.F90 @@ -1,18 +1,14 @@ !*********************************************************************** -! Version 1: 07/01, PFN * +! Last Update: 04/2024, PFN * ! * -! CYCLEBREAKER - This routine breaks cycles in the mesh by selecting * -! a corner that will use some old (i.e. previous * -! iterate) incident fluxes. * -! * -! Input: * -! * -! Output: * +! CycleBreakerZone - This routine breaks cycles in the mesh by * +! selecting a zone that will use some old * +! (i.e. previous iterate) incident fluxes. * ! * !*********************************************************************** - subroutine cyclebreaker(ndoneZ, MESHCYCLES, nextZone, addedZones, & - needZ, listZone, cycleList, & - exitFace, onCycleList) + subroutine cycleBreakerZone(ndoneZ, MESHCYCLES, nextZone, & + addedZones,needZ, listZone, cycleList, & + exitFace, onCycleList) use kind_mod use constant_mod @@ -96,7 +92,7 @@ subroutine cyclebreaker(ndoneZ, MESHCYCLES, nextZone, addedZones, & enddo if (nleft /= ngraph) then - call f90fatal("Miscount of remaining zones in CYCLEBREAKER") + call f90fatal("Miscount of remaining zones in CycleBreakerZone") endif ! Loop over the number of zones in the graph @@ -107,11 +103,11 @@ subroutine cyclebreaker(ndoneZ, MESHCYCLES, nextZone, addedZones, & if ( new(zone) ) then - call sccsearch(zone, ngraph, ncount, stackindex, & - nBreaks, meshCycles, dfnum, lowlink, & - needZ, stack, new, onstack, exitFace, & - tempList, cycleList, zoneBreakList, & - onCycleList) + call sccSearchZone(zone, ngraph, ncount, stackindex, & + nBreaks, meshCycles, dfnum, lowlink, & + needZ, stack, new, onstack, exitFace, & + tempList, cycleList, zoneBreakList, & + onCycleList) endif @@ -120,7 +116,7 @@ subroutine cyclebreaker(ndoneZ, MESHCYCLES, nextZone, addedZones, & if (nBreaks == 0) then - call f90fatal("CYCLEBREAKER: detection failed, no dependencies broken") + call f90fatal("CycleBreakerZone: detection failed, no dependencies broken") else @@ -134,12 +130,12 @@ subroutine cyclebreaker(ndoneZ, MESHCYCLES, nextZone, addedZones, & addedZones = addedZones + 1 listZone(nextZone) = zone elseif (needZ(zone) < 0) then - call f90fatal("CycleBreaker, needZ < 0") + call f90fatal("CycleBreakerZone, needZ < 0") endif enddo if (addedZones == 0) then - call f90fatal("Cycles found, but not broken") + call f90fatal("CycleBreakerZone found, but not broken") endif endif @@ -157,5 +153,5 @@ subroutine cyclebreaker(ndoneZ, MESHCYCLES, nextZone, addedZones, & return - end subroutine cyclebreaker + end subroutine cycleBreakerZone diff --git a/src/teton/snac/getCornerDependency.F90 b/src/teton/snac/getCornerDependency.F90 new file mode 100644 index 0000000..2627f06 --- /dev/null +++ b/src/teton/snac/getCornerDependency.F90 @@ -0,0 +1,155 @@ +!*********************************************************************** +! Last Update: 04/2019, PFN * +! * +! getCornerDependency - This routine builds the NEED array which * +! indicates the number of incoming fluxes * +! required to compute the outgoing flux for * +! a particular direction. * +! * +!*********************************************************************** + + subroutine getCornerDependency(omega, need, nDSC, DSC) + + use kind_mod + use constant_mod + use Size_mod + use Geometry_mod + + implicit none + +! Arguments + + integer, intent(inout) :: need(Size%ncornr) + integer, intent(inout) :: nDSC(Size%ncornr) + integer, intent(inout) :: DSC(2*Size%maxcf,Size%ncornr) + + real(adqt), intent(in) :: omega(Size%ndim) + +! Local Variables + + integer :: c + integer :: c0 + integer :: cez + integer :: cfp + integer :: cface + integer :: nCFaces + integer :: nCorner + integer :: zone + + real(adqt) :: aez + real(adqt) :: afp + +! For incoming corner-faces we increment the need array; for outgoing +! corner-faces we put the downstream corner number into an index list. + + need(:) = 0 + nDSC(:) = 0 + nCorner = Size% ncornr + + CornerLoopEZ: do c=1,nCorner + + zone = Geom% CToZone(c) + c0 = Geom% cOffSet(zone) + + if (Size% ndim == 2) then + nCFaces = 2 + else + nCFaces = Geom% nCFacesArray(c) + endif + + CornerFaceLoopEZ: do cface=1,nCFaces + + cez = c0 + Geom% cEZ(cface,c) + + if (c < cez) then + + aez = DOT_PRODUCT( omega(:),Geom% A_ez(:,cface,c) ) + + if ( aez < zero ) then + need(c) = need(c) + 1 + nDSC(cez) = nDSC(cez) + 1 + DSC(nDSC(cez),cez) = c + elseif ( aez > zero ) then + need(cez) = need(cez) + 1 + nDSC(c) = nDSC(c) + 1 + DSC(nDSC(c),c) = cez + endif + + endif + + enddo CornerFaceLoopEZ + + enddo CornerLoopEZ + +! Zone Faces + + if (Size% ndim == 2) then + + CornerLoop2D: do c=1,nCorner + + CornerFaceLoop2D: do cface=1,2 + + cfp = Geom% cFP(cface,c) + +! If an fp-face is on a boundary (cfp > nCorner) then do not +! increment "need" + + if (c < cfp .and. cfp <= nCorner) then + afp = DOT_PRODUCT( omega(:),Geom% A_fp(:,cface,c) ) + + if (afp < zero) then + need(c) = need(c) + 1 + nDSC(cfp) = nDSC(cfp) + 1 + DSC(nDSC(cfp),cfp) = c + + elseif (afp > zero) then + need(cfp) = need(cfp) + 1 + nDSC(c) = nDSC(c) + 1 + DSC(nDSC(c),c) = cfp + endif + + endif + + enddo CornerFaceLoop2D + + enddo CornerLoop2D + + elseif (Size% ndim == 3) then + + CornerLoop3D: do c=1,nCorner + + nCFaces = Geom% nCFacesArray(c) + + CornerFaceLoop3D: do cface=1,nCFaces + + cfp = Geom% cFP(cface,c) + +! If an fp-face is on a boundary (cfp > nCorner) then do not +! increment "need" + + if (c < cfp .and. cfp <= nCorner) then + afp = DOT_PRODUCT( omega(:),Geom% A_fp(:,cface,c) ) + + if (afp < zero) then + need(c) = need(c) + 1 + nDSC(cfp) = nDSC(cfp) + 1 + DSC(nDSC(cfp),cfp) = c + + elseif (afp > zero) then + need(cfp) = need(cfp) + 1 + nDSC(c) = nDSC(c) + 1 + DSC(nDSC(c),c) = cfp + endif + + endif + + enddo CornerFaceLoop3D + + enddo CornerLoop3D + + endif + + + return + end subroutine getCornerDependency + diff --git a/src/teton/snac/getCornerGraph.F90 b/src/teton/snac/getCornerGraph.F90 new file mode 100644 index 0000000..a2578c8 --- /dev/null +++ b/src/teton/snac/getCornerGraph.F90 @@ -0,0 +1,215 @@ +!*********************************************************************** +! Last Update: 04/2019, PFN * +! * +! getCornerGraph - This routine builds the sweep ordering array * +! (a directed graph) of CORNERS for a single * +! direction. * +! * +!*********************************************************************** + subroutine getCornerGraph(aSetID, angle, nHyperDomains) + + use kind_mod + use Size_mod + use Geometry_mod + use QuadratureList_mod + use AngleSet_mod + + implicit none + +! Arguments + + integer, intent(in) :: aSetID + integer, intent(in) :: angle + integer, intent(in) :: nHyperDomains + +! Local Variables + + type(AngleSet), pointer :: ASet + + integer :: i + integer :: c + integer :: cExit + integer :: newCorners + integer :: addedCorners + integer :: lastCorner + integer :: nextCorner + integer :: meshCycles + integer :: ncornr + integer :: nzones + + integer :: cID + integer :: ndone + integer :: nCorner + integer :: nHyperPlanes + + real(adqt) :: omega(Size%ndim) + +! Dynamic + + integer, allocatable :: need(:) + integer, allocatable :: cornerList(:) + integer, allocatable :: cornersInPlane(:) + integer, allocatable :: cycleList(:) + integer, allocatable :: CToHypPlane(:) + integer, allocatable :: nDSC(:) + integer, allocatable :: DSC(:,:) + + logical (kind=1), allocatable :: done(:) + logical (kind=1), allocatable :: onCycleList(:) + +! Constants + + ASet => getAngleSetData(Quad, aSetID) + + nzones = Size% nzones + ncornr = Size% ncornr + omega(:) = ASet% Omega(:,angle) + +! Allocate arrays + + allocate( need(ncornr) ) + allocate( cornerList(ncornr) ) + allocate( cornersInPlane(nzones) ) + allocate( cycleList(ncornr) ) + allocate( CToHypPlane(ncornr) ) + allocate( nDSC(ncornr) ) + allocate( DSC(2*Size%maxcf,ncornr) ) + allocate( done(ncornr) ) + allocate( onCycleList(ncornr) ) + + done(:) = .FALSE. + onCycleList(:) = .FALSE. + meshCycles = 0 + +! Build NEED array by computing Outward_Normal dot Omega(m) + + call getCornerDependency(omega, need, nDSC, DSC) + +! Create a list of corners to start the sweep + + call getNewCorners(newCorners, meshCycles, need, & + cornerList, cycleList, nDSC, DSC) + +! Create the "next" array. + + ndone = 0 + nextCorner = 0 + lastCorner = 0 + nHyperPlanes = 0 + + OuterIteration: do + +! Advance to a new hyper-plane + + nHyperPlanes = nHyperPlanes + 1 + cornersInPlane(nHyperPlanes) = newCorners + nextCorner = lastCorner + newCorners + addedCorners = 0 + +! Loop over all corners in the current list + + CornerLoop: do cID=1,newCorners + + c = cornerList(lastCorner+cID) + ndone = ndone + 1 + done(c) = .TRUE. + +! Loop over the down-stream corners for the corner just added +! to the next list, decrementing the need array for these +! neighboring corners + + do i=1,nDSC(c) + cExit = DSC(i,c) + + if ( .not. done(cExit) ) then + need(cExit) = need(cExit) - 1 + + if ( need(cExit) == 0 ) then + nextCorner = nextCorner + 1 + addedCorners = addedCorners + 1 + cornerList(nextCorner) = cExit + elseif ( need(cExit) < 0 ) then + write(6,100) Size% myRankInGroup,angle,cExit,c + call f90fatal("need < 0 in getCornerGraph") + endif + + endif + enddo + + ASet% nextC(ndone,angle) = c + CToHypPlane(c) = nHyperPlanes + + enddo CornerLoop + + lastCorner = lastCorner + newCorners + + if (lastCorner == ncornr) then + + exit OuterIteration + + else + + if (addedCorners > 0) then + + newCorners = addedCorners + + elseif (addedCorners == 0) then + +! Break a cycle to add a corner to the list + + call cycleBreakerCorner(ndone, meshCycles, nextCorner, & + addedCorners, need, cornerList, & + cycleList, nDSC, DSC, onCycleList) + + newCorners = addedCorners + + endif + + cycle OuterIteration + + endif + + enddo OuterIteration + +! End of Outer Loop, save the number of hyperplanes + + if (meshCycles > Size% ncornr) then + call f90fatal("MeshCycles exceeds the number of corners in getCornerGraph") + endif + + ASet% numCycles(angle) = meshCycles + + call constructHyperPlane( ASet, angle, nHyperPlanes, meshCycles, & + nHyperDomains, cornersInPlane(1:nHyperPlanes), & + CToHypPlane, cycleList(1:meshCycles) ) + +! Set the number of hyperplanes in the set module for this angle + + ASet% nHyperPlanes(angle) = nHyperPlanes + +! Final error check + + if (ndone /= ncornr) then + call f90fatal("Wrong number of corners in getCornerGraph") + endif + + 100 format("On Process ",i7," angle = ",i4," cExit = ",i9, & + " has already been done and is down stream of ", & + " c = ",i9," in getCornerGraph") + + +! Release memory + + deallocate( need ) + deallocate( cornerList ) + deallocate( cornersInPlane ) + deallocate( cycleList ) + deallocate( CToHypPlane ) + deallocate( nDSC ) + deallocate( DSC ) + deallocate( done ) + + + return + end subroutine getCornerGraph + diff --git a/src/teton/snac/getNewCorners.F90 b/src/teton/snac/getNewCorners.F90 new file mode 100644 index 0000000..23eed0a --- /dev/null +++ b/src/teton/snac/getNewCorners.F90 @@ -0,0 +1,127 @@ +!*********************************************************************** +! Last Update: 04/2019, PFN * +! * +! getNewCorners - This routine creates a list of starting corners * +! These corners are on the boundary of the grid * +! and require no incident fluxes except from * +! boundary conditions. There may be situations * +! where no starting corners can be found; in this * +! situation, we are forced to use some old * +! information to get started. * +! * +!*********************************************************************** + subroutine getNewCorners(newCorners, meshCycles, need, & + cornerList, cycleList, nDSC, DSC) + + use kind_mod + use constant_mod + use Size_mod + use Geometry_mod + use BoundaryList_mod + use Boundary_mod + + implicit none + +! Arguments + + integer, intent(inout) :: newCorners + integer, intent(inout) :: meshcycles + + integer, intent(inout) :: need(Size%ncornr) + integer, intent(inout) :: cornerList(Size%ncornr) + integer, intent(inout) :: cycleList(Size%ncornr) + integer, intent(inout) :: nDSC(Size%ncornr) + integer, intent(inout) :: DSC(2*Size%maxcf,Size%ncornr) + +! Local Variables + + type(Boundary), pointer :: BdyT + + integer :: i + integer :: c + integer :: c0 + integer :: cez + integer :: nCorner + integer :: cface + integer :: nCFaces + integer :: n + integer :: nBoundary + integer :: b + integer :: NumBdyElem + integer :: zone + +! Create a list of corners to begin the sweep + + nBoundary = getNumberOfBoundaries(RadBoundary) + newCorners = 0 + + CornerLoop: do c=1,Size% ncornr + if (need(c) == 0) then + newCorners = newCorners + 1 + cornerList(newCorners) = c + endif + enddo CornerLoop + + + if (newCorners == 0) then + +! If no corners were found, find a corner on the boundary that requires +! only one incident flux. Also, add all corners on "ez" faces to the +! cycle list to be conservative. + + BoundaryLoop: do n=1,nBoundary + + BdyT => getBoundary(RadBoundary, n) + NumBdyElem = getNumberOfBdyElements(BdyT) + + do b=1,NumBdyElem + c = BdyT% BdyToC(b) + zone = Geom% CToZone(c) + c0 = Geom% cOffSet(zone) + + if (need(c) == 1) then + newCorners = 1 + cornerList(newCorners) = c + need(c) = 0 + + if (Size% ndim == 3) then + nCFaces = Geom% nCFacesArray(c) + elseif (Size% ndim == 2) then + nCFaces = 2 + endif + + do cface=1,nCFaces + cez = c0 + Geom% cEZ(cface,c) + +! If corner "c" is downstream of "cez", add cez to the cycle list +! and break the connection + + do i=1,nDSC(cez) + if ( DSC(i,cez) == c ) then + meshCycles = meshCycles + 1 + cycleList(meshCycles) = cez + DSC(i,cez) = Size% ncornr + 1 + endif + enddo + enddo + + exit BoundaryLoop + + endif + enddo + + enddo BoundaryLoop + + endif + +! Error Check + + if (newCorners == 0) then + call f90fatal("No starting corners found in getNewCorners!") + endif + + + + return + end subroutine getNewCorners + diff --git a/src/teton/snac/findseeds.F90 b/src/teton/snac/getNewZones.F90 similarity index 62% rename from src/teton/snac/findseeds.F90 rename to src/teton/snac/getNewZones.F90 index 9bfc322..19ef4dc 100644 --- a/src/teton/snac/findseeds.F90 +++ b/src/teton/snac/getNewZones.F90 @@ -1,22 +1,17 @@ !*********************************************************************** -! Version 1: 04/02, PFN * +! Last Update: 05/2023, PFN * ! * -! FINDSEEDS - This routine creates a list of starting points or * -! "seeds" for the grid sweep. The seeds are on the * -! boundary of the grid and require no incident * -! fluxes except from boundary conditions. There may * -! be situations where no seeds can be found; this * -! will occur if there is a mesh cycle right at the * -! boundary. In this situation, we are forced to use * -! some old information to get started. * -! * -! Input: * -! * -! Output: * +! getNewZones - This routine creates a list of starting zones. * +! These zones are on the boundary of the grid * +! and require no incident fluxes except from * +! boundary conditions. There may be situations * +! where no starting zones can be found; in this * +! situation, we are forced to use some old * +! information to get started. * ! * !*********************************************************************** - subroutine findseeds(NSEED, MESHCYCLES, needZ, listZone, & - cycleList, exitFace, onCycleList) + subroutine getNewZones(newZones, meshCycles, needZ, listZone, & + cycleList, exitFace, onCycleList) use kind_mod use constant_mod @@ -27,7 +22,7 @@ subroutine findseeds(NSEED, MESHCYCLES, needZ, listZone, & ! Arguments - integer, intent(inout) :: nseed + integer, intent(inout) :: newZones integer, intent(inout) :: meshcycles integer, intent(inout) :: needZ(Size%nzones) @@ -53,19 +48,19 @@ subroutine findseeds(NSEED, MESHCYCLES, needZ, listZone, & nzones = Size% nzones -! Create a list of zone "seeds" +! Create a list of zones to begin the sweep - nseed = 0 + newZones = 0 ZoneLoop: do zone=1,nzones if (needZ(zone) == 0) then - nseed = nseed + 1 - listZone(nseed) = zone + newZones = newZones + 1 + listZone(newZones) = zone endif enddo ZoneLoop - if (nseed == 0) then + if (newZones == 0) then ! If no seeds were found, find a zone on the boundary that requires ! the fewest incident fluxes @@ -82,7 +77,7 @@ subroutine findseeds(NSEED, MESHCYCLES, needZ, listZone, & endif enddo BoundaryZoneLoop - nseed = 1 + newZones = 1 listZone(1) = zoneID needZ(zoneID) = 0 nFaces = Geom% zoneFaces(zoneID) @@ -109,12 +104,12 @@ subroutine findseeds(NSEED, MESHCYCLES, needZ, listZone, & ! Error Check - if (nseed == 0) then - call f90fatal("No seeds found in FINDSEEDS!") + if (newZones == 0) then + call f90fatal("No starting zones found in getNewZones!") endif return - end subroutine findseeds + end subroutine getNewZones diff --git a/src/teton/snac/snneed.F90 b/src/teton/snac/getZoneDependency.F90 similarity index 84% rename from src/teton/snac/snneed.F90 rename to src/teton/snac/getZoneDependency.F90 index 44c983f..97559c7 100644 --- a/src/teton/snac/snneed.F90 +++ b/src/teton/snac/getZoneDependency.F90 @@ -1,21 +1,15 @@ !*********************************************************************** -! Version 1: 09/96, PFN * +! Last Update: 05/2023, PFN * ! * -! SNNEED - This routine builds the NEED array which indicates the * -! number of incoming fluxes required to compute the * -! outgoing flux for a particular direction (this is the * -! number of incoming sides or faces the corner has for * -! this direction). This routine is a decendant of SNRZANEE * -! by MLA. * -! * -! Input: * -! * -! Output: * +! getZoneDependency - This routine builds the NEED array which * +! indicates the number of incoming fluxes * +! required to compute the outgoing flux for * +! a particular direction. * ! * !*********************************************************************** - subroutine snneed(MESHCYCLES, omega, NEEDZ, cycleList, & - exitFace, onCycleList) + subroutine getZoneDependency(MESHCYCLES, omega, NEEDZ, cycleList, & + exitFace, onCycleList) use kind_mod use constant_mod @@ -194,5 +188,5 @@ subroutine snneed(MESHCYCLES, omega, NEEDZ, cycleList, & return - end subroutine snneed + end subroutine getZoneDependency diff --git a/src/teton/snac/snnext.F90 b/src/teton/snac/getZoneGraph.F90 similarity index 79% rename from src/teton/snac/snnext.F90 rename to src/teton/snac/getZoneGraph.F90 index 24f6c50..fac4f6b 100644 --- a/src/teton/snac/snnext.F90 +++ b/src/teton/snac/getZoneGraph.F90 @@ -1,15 +1,11 @@ !*********************************************************************** -! Version 1: 09/96, PFN * +! Last Update: 04/2024, PFN * ! * -! SNNEXT - This routine builds the sweep ordering array NEXT for a * -! single direction. * -! * -! Input: * -! * -! Output: * +! getZoneGraph - This routine builds the sweep ordering array * +! (a directed graph) of ZONES for a single direction. * ! * !*********************************************************************** - subroutine snnext(aSetID, angle, nDomains) + subroutine getZoneGraph(aSetID, angle, nHyperDomains) use kind_mod use Size_mod @@ -23,12 +19,14 @@ subroutine snnext(aSetID, angle, nDomains) integer, intent(in) :: aSetID integer, intent(in) :: angle - integer, intent(in) :: nDomains + integer, intent(in) :: nHyperDomains ! Local Variables type(AngleSet), pointer :: ASet + integer :: c + integer :: c0 integer :: Zexit integer :: newZones integer :: addedZones @@ -43,7 +41,6 @@ subroutine snnext(aSetID, angle, nDomains) integer :: zone integer :: zID integer :: ndoneZ - integer :: nCorner integer :: nHyperPlanes real(adqt) :: omega(Size%ndim) @@ -54,6 +51,7 @@ subroutine snnext(aSetID, angle, nDomains) integer, allocatable :: listZone(:) integer, allocatable :: zonesInPlane(:) integer, allocatable :: cycleList(:) + integer, allocatable :: CToHypPlane(:) logical (kind=1), allocatable :: badZone(:) logical (kind=1), allocatable :: onCycleList(:) @@ -74,6 +72,7 @@ subroutine snnext(aSetID, angle, nDomains) allocate( listZone(nzones) ) allocate( zonesInPlane(nzones) ) allocate( cycleList(ncornr) ) + allocate( CToHypPlane(ncornr) ) allocate( badZone(nzones) ) allocate( onCycleList(nzones) ) allocate( doneZ(nzones) ) @@ -86,12 +85,12 @@ subroutine snnext(aSetID, angle, nDomains) ! Build NEED array by computing Outward_Normal dot Omega(m) - call snneed(meshCycles, omega, NEEDZ, cycleList, exitFace, onCycleList) + call getZoneDependency(meshCycles, omega, NEEDZ, cycleList, exitFace, onCycleList) -! Create a list of zones to start the sweep ("seeds") +! Create a list of zones to start the sweep - call findseeds(newZones, meshCycles, needZ, listZone, & - cycleList, exitFace, onCycleList) + call getNewZones(newZones, meshCycles, needZ, listZone, & + cycleList, exitFace, onCycleList) ! Check for zones the have circular dependencies @@ -120,7 +119,6 @@ subroutine snnext(aSetID, angle, nDomains) ZoneLoop: do zID=1,newZones zone = listZone(lastZone+zID) - nCorner = Geom% numCorner(zone) nFaces = Geom% zoneFaces(zone) ndoneZ = ndoneZ + 1 @@ -163,6 +161,14 @@ subroutine snnext(aSetID, angle, nDomains) ASet% nextZ(ndoneZ,angle) = zone endif +! For the zone just added, create a map from corner to hyperplane + + c0 = Geom% cOffSet(zone) + + do c=1,Geom% numCorner(zone) + CToHypPlane(c0+c) = nHyperPlanes + enddo + enddo ZoneLoop lastZone = lastZone + newZones @@ -181,8 +187,9 @@ subroutine snnext(aSetID, angle, nDomains) ! Break a cycle to add a zone to the list - call cyclebreaker(ndoneZ, meshCycles, nextZone, addedZones, & - needZ, listZone, cycleList, exitFace, onCycleList) + call cycleBreakerZone(ndoneZ, meshCycles, nextZone, addedZones, & + needZ, listZone, cycleList, exitFace, & + onCycleList) newZones = addedZones @@ -203,8 +210,8 @@ subroutine snnext(aSetID, angle, nDomains) ASet% numCycles(angle) = meshCycles call constructHyperPlane( ASet, angle, nHyperPlanes, meshCycles, & - nDomains, zonesInPlane(1:nHyperPlanes), & - cycleList(1:meshCycles) ) + nHyperDomains, zonesInPlane(1:nHyperPlanes), & + CToHypPlane, cycleList(1:meshCycles) ) ! Set the number of hyperplanes in the set module for this angle @@ -224,6 +231,7 @@ subroutine snnext(aSetID, angle, nDomains) deallocate( listZone ) deallocate( zonesInPlane ) deallocate( cycleList ) + deallocate( CToHypPlane ) deallocate( badZone ) deallocate( onCycleList ) deallocate( doneZ ) @@ -231,5 +239,5 @@ subroutine snnext(aSetID, angle, nDomains) return - end subroutine snnext + end subroutine getZoneGraph diff --git a/src/teton/snac/sccSearchCorner.F90 b/src/teton/snac/sccSearchCorner.F90 new file mode 100644 index 0000000..36eb6ca --- /dev/null +++ b/src/teton/snac/sccSearchCorner.F90 @@ -0,0 +1,206 @@ +!*********************************************************************** +! Last Update: 04/2024, PFN * +! * +! sccSearchCorner- This recursive routine search the dependency graph * +! for strongly-connected components (SCC). * +! * +!*********************************************************************** + recursive subroutine sccSearchCorner(c, ngraph, ncount, stackindex, & + nBreaks, meshCycles, dfnum, lowlink, & + need, stack, new, onstack, tempList, & + cycleList, cBreakList, nDSC, DSC, & + onCycleList) + + use kind_mod + use Size_mod + use Geometry_mod + + implicit none + +! Arguments + + integer, intent(in) :: c + integer, intent(in) :: ngraph + integer, intent(inout) :: ncount + integer, intent(inout) :: stackindex + integer, intent(inout) :: nBreaks + integer, intent(inout) :: meshCycles + + integer, intent(inout) :: dfnum(Size%ncornr) + integer, intent(inout) :: lowlink(Size%ncornr) + integer, intent(inout) :: need(Size%ncornr) + integer, intent(inout) :: stack(ngraph) + integer, intent(inout) :: cBreakList(ngraph) + integer, intent(inout) :: tempList(ngraph) + integer, intent(inout) :: cycleList(Size%ncornr) + integer, intent(inout) :: nDSC(Size%ncornr) + integer, intent(inout) :: DSC(2*Size%maxcf,Size%ncornr) + + logical (kind=1), intent(inout) :: new(Size%ncornr) + logical (kind=1), intent(inout) :: onstack(Size%ncornr) + logical (kind=1), intent(inout) :: onCycleList(Size%ncornr) + +! Local Variables + + integer :: i + integer :: cExit + integer :: cyclesize + integer :: cBreak + integer :: lowlinkC + integer :: cface + integer :: nCFaces + integer :: c0 + integer :: zone + +! Start the search procedure + + ncount = ncount + 1 + dfnum(c) = ncount + lowlink(c) = ncount + new(c) = .FALSE. + +! Put current "corner" on the stack + + stackindex = stackindex + 1 + stack(stackindex) = c + onstack(c) = .TRUE. + +! Loop over all downstream corners that have not been completed + + DownStreamC: do i=1,nDSC(c) + + cExit = DSC(i,c) + + if ( new(cExit) ) then + + call sccSearchCorner(cExit, ngraph, ncount, stackindex, & + nBreaks, meshCycles, dfnum, lowlink, & + need, stack, new, onstack, tempList, & + cycleList, cBreakList, nDSC, DSC, & + onCycleList) + + if (lowlink(cExit) < lowlink(c)) then + lowlink(c) = lowlink(cExit) + endif + + else + + if (dfnum(cExit) < dfnum(c) .and. & + onstack(cExit) .and. & + lowlink(cExit) < lowlink(c)) then + + lowlink(c) = lowlink(cExit) + endif + + endif + + enddo DownStreamC + +! Cycle Check + + CheckCycle: if (lowlink(c) == dfnum(c)) then + + cExit = stack(stackindex) + stackindex = stackindex - 1 + onstack(cExit) = .FALSE. + + DetectCycle: if (cExit /= c) then + + cyclesize = 0 + + do while (cExit /= c) + cyclesize = cyclesize + 1 + tempList(cyclesize) = cExit + + cExit = stack(stackindex) + stackindex = stackindex - 1 + enddo + + cyclesize = cyclesize + 1 + tempList(cyclesize) = cExit + onstack(tempList(1)) = .TRUE. + +!*********************************************************************** +! Now break all connections of corners on the stack to the lowest * +! link. * +!*********************************************************************** + + lowlinkC = tempList(cyclesize) + zone = Geom% CToZone(lowlinkC) + c0 = Geom% cOffSet(zone) + +! Loop over all neighbors for this corner and find the ones on the stack + + if (Size% ndim == 2) then + nCFaces = 2 + else + nCFaces = Geom% nCFacesArray(lowlinkC) + endif + + CornerFaceLoop: do cface=1,nCFaces + +! Corners in the same zone + + cBreak = Geom% cEZ(cface,lowlinkC) + c0 + + if ( onstack(cBreak) .and. (.not. onCycleList(cBreak))) then + + do i=1,nDSC(cBreak) + cExit = DSC(i,cBreak) + + if (cExit == lowlinkC) then + DSC(i,cBreak) = Size% ncornr + 1 + meshCycles = meshCycles + 1 + cycleList(meshCycles) = cBreak + onCycleList(cBreak) = .TRUE. + need(lowlinkC) = need(lowlinkC) - 1 + endif + enddo + + endif + +! Corners in neighboring zones + + cBreak = Geom% cFP(cface,lowlinkC) + + if ( cBreak <= Size% ncornr ) then + + if ( onstack(cBreak) .and. (.not. onCycleList(cBreak))) then + + do i=1,nDSC(cBreak) + cExit = DSC(i,cBreak) + + if (cExit == lowlinkC) then + DSC(i,cBreak) = Size% ncornr + 1 + meshCycles = meshCycles + 1 + cycleList(meshCycles) = cBreak + onCycleList(cBreak) = .TRUE. + need(lowlinkC) = need(lowlinkC) - 1 + endif + enddo + + endif + endif + + enddo CornerFaceLoop + + if (need(lowlinkC) == 0) then + nBreaks = nBreaks + 1 + cBreakList(nBreaks) = lowlinkC + endif + +! Reset the stack + + do i=1,cyclesize + onstack( tempList(i) ) = .FALSE. + enddo + + endif DetectCycle + + endif CheckCycle + + + + return + end subroutine sccSearchCorner + diff --git a/src/teton/snac/sccsearch.F90 b/src/teton/snac/sccSearchZone.F90 similarity index 84% rename from src/teton/snac/sccsearch.F90 rename to src/teton/snac/sccSearchZone.F90 index 4c71580..079f1ec 100644 --- a/src/teton/snac/sccsearch.F90 +++ b/src/teton/snac/sccSearchZone.F90 @@ -1,15 +1,16 @@ !*********************************************************************** -! Last Update: 11/2017, PFN * +! Last Update: 04/2024, PFN * ! * -! SCCSEARCH - This recursive routine search the dependency graph * +! sccSearchZone- This recursive routine search the dependency graph * ! for strongly-connected components (SCC). * ! * !*********************************************************************** - recursive subroutine sccsearch(zone, ngraph, ncount, stackindex, & - nBreaks, meshCycles, dfnum, lowlink, & - needZ, stack, new, onstack, exitFace, & - tempList, cycleList, zoneBreakList, & - onCycleList) + recursive subroutine sccSearchZone(zone, ngraph, ncount, stackindex, & + nBreaks, meshCycles, dfnum, & + lowlink, needZ, stack, new, & + onstack, exitFace, tempList, & + cycleList, zoneBreakList, & + onCycleList) use kind_mod use Size_mod @@ -78,11 +79,11 @@ recursive subroutine sccsearch(zone, ngraph, ncount, stackindex, & if ( new(zone2) ) then - call sccsearch(zone2, ngraph, ncount, stackindex, & - nBreaks, meshCycles, dfnum, lowlink, & - needZ, stack, new, onstack, exitFace, & - tempList, cycleList, zoneBreakList, & - onCycleList) + call sccSearchZone(zone2, ngraph, ncount, stackindex, & + nBreaks, meshCycles, dfnum, lowlink, & + needZ, stack, new, onstack, exitFace, & + tempList, cycleList, zoneBreakList, & + onCycleList) if (lowlink(zone2) < lowlink(zone)) then lowlink(zone) = lowlink(zone2) @@ -184,5 +185,5 @@ recursive subroutine sccsearch(zone, ngraph, ncount, stackindex, & return - end subroutine sccsearch + end subroutine sccSearchZone diff --git a/src/teton/snac/snreflect.F90 b/src/teton/snac/snreflect.F90 index b14bf29..68e9992 100644 --- a/src/teton/snac/snreflect.F90 +++ b/src/teton/snac/snreflect.F90 @@ -26,7 +26,7 @@ subroutine snreflect(SnSweep, setID, Minc, PsiB) logical (kind=1), intent(in) :: SnSweep integer, intent(in) :: setID integer, intent(in) :: Minc - real(adqt), optional, intent(inout) :: PsiB(Size%nbelem,Size%nangGTA) + real(adqt), optional, intent(inout) :: PsiB(Size% nSurfElem,Size% nangGTA) ! Local Variables