Authors: Simon Schwitanski, Joachim Jenke, Felix Tomski, Christian Terboven, Matthias S. Müller
This is supplemental material for the paper "On-the-Fly Data Race Detection for MPI RMA Programs with MUST".
- must_rma: Sources of MUST-RMA with helper script for installation
- docker: Dockerfile to build the software environment for the classification quality benchmarks
- classification_quality: Script to generate the classification quality table out of the test cases
- overhead_measurement: JUBE scripts to reproduce the measurements
- overhead_results: Results of the overhead measurements on CLAIX18 (RWTH cluster)
The sources of MUST-RMA are available in must_rma/src. Note that the folder contains a bunch of files unrelated to the paper. The contributions / tests can be found in the following folders and files:
- Analysis modules (RMA state tracking, concurrent region analysis)
- Own tests
- must_rma/src/tests/OneSidedChecks/ProcessLocal: Local buffer races
- must_rma/src/tests/OneSidedChecks/AcrossProcesses: Remote races
- MPI Bugs Initiative tests
The following software packages are needed to reproduce the results:
- Clang compiler (preferably in version 12.0.1)
- MPI library with support for at least MPI 3.0 (preferably Intel MPI or MPICH)
- CMake in version 3.20 or newer
- libxml2 parser (libxml2-dev)
- Python 3
The classification quality benchmarks in addition need:
- LLVM lit in version 14.0.0 (available via PyPI)
- FileCheck binary (distributed with LLVM)
The overhead evaluation in addition needs:
- JUBE benchmarking environment in version 2.4.2 or newer (http://www.fz-juelich.de/jsc/jube)
- Slurm scheduler to submit the batch scripts
To simplify the reproduction of the classification quality benchmarks, we provide a Dockerfile that provides the required software environment to build and run MUST-RMA with the benchmarks. If instead a cluster environment is used, the following Docker build and run steps can be skipped.
Build the docker image with tag must-rma
, adjust permissions for the
must_rma
subfolder to match with the container user, and run the
produced docker image with the MUST source code mounted as volume:
# cd $ROOT
# docker build docker -t must-rma
# chown -R 1000:1000 ./must_rma
# docker run --rm -it \
-v $(pwd)/must_rma:/must_rma must-rma /bin/bash
Change to the must_rma
directory. Install MUST-RMA by using the
provided install script build_must.sh
:
$ cd $ROOT/must_rma
$ ./build_must.sh
Build and installation path can be set within the script. In the
following, we assume that MUST-RMA was built in the folder $BUILD
and
installed in $INSTALL
.
Change into the $BUILD
directory and run the tests:
$ cd $BUILD
$ lit -j 1 tests/OneSidedChecks/ | tee test_output.log
This will run all 81 test cases and output the results (number of passed
and failed tests). Passed tests are marked as PASS
, failed tests with
FAIL
or XFAIL
. The number of workers (parameter -j
) can be
increased, however spawning too many workers might lead to failed test
cases if there are not enough cores available to run the tests.
To produce the result table, we provide a Python script
that parses the test_output.log
file. Change back to the
classification_quality
folder and pass the test output log file to the
script:
$ cd $ROOT/classification_quality
$ python3 generate_classification_quality_table.py \
$BUILD/test_output.log
To run tests on own applications / binaries, MUST-RMA can be run with:
$ $INSTALL/bin/mustrun --must:distributed \
--must:tsan --must:rma \
-np <number of processes> <binary>
The overhead evaluation is specific to the CLAIX cluster, so running the
benchmarks in another environment will need manual adaptations. We
provide a JUBE configuration to make reproducibility easier. Important
parameter sets within the JUBE configuration (prk_rma.xml
) to
consider:
prk_kernel_args_pset
: number of iterations and grid size to be used in the kernelsprk_system_pset
: system configuration, e.g., number of nodes to be used
After configuring all required parameters, the benchmarks can be run with
$ cd $ROOT/overhead_measurement
$ jube run prk_rma.xml -t kernel_name
where kernel_name
can be stencil
or transpose
.
The JUBE configuration (1) builds MUST-RMA, (2) builds the chosen kernel with and without TSan instrumentation, (3) submits per requested number of nodes a Slurm job that runs the three different configurations (plain, tsan, must-rma). After the Slurm jobs finished, the results can be retrieved with
$ cd $ROOT/overhead_measurement
$ jube result -a bench_run --id <id of JUBE run>
This will print out the results (average iteration time per second per configuration) as a table.