Parallel Scan in CUDA and OpenMP

How to Run

The OpenMP version requires OpenMP to be installed and for "openmp" to be on the GCC path. The Brent-Kung version requires an NVIDIA CUDA-capable GPU with at least 2GB memory.

make benchmarks
- Runs benchmarks for all versions
make benchmarks VERSION=iterative
- Runs benchmarks for the single-threaded iterative version
make benchmarks VERSION=openmp_release
- Runs benchmakrs for the OpenMP version
make benchmarks VERSION=brent_release
- Runs benchmarks for the Brent-Kung version

The output can be found in a local file benchmarks.csv

CUDA (Brent-Kung)

Tested and works on wino.cs.pdx.edu on array sizes up to 268,435,456 (2^28) for section sizes 1024 and 2048.
- The GPUs have ~2GB of memory, and an array of 2^28 float uses ~1GB. It's safer to not approach 2GB since the GPU is shared with the display device, etc.
The algorithm is done in-place on the device to conserve memory, and the output is copied to the output array, so a user would be oblivious to this.

OpenMP

Tested and works on wino.cs.pdx.edu and babbage.cs.pdx.edu on array sizes up to 268,435,456 (2^28) for 2, 4, 8,16, and 32 threads.
Modeled after https://www.cs.fsu.edu/~engelen/courses/HPC/Synchronous.pdf

Testing

make tests runs various tests on the CUDA and OMP versions and compares it to the iterative version.

Scalability Study

make benchmarks will produce a .csv with various array sizes, sections sizes of 1024 and 2048, and threads of 2, 4, 8, 16, and 32 for the CUDA, OMP, and iterative version.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
Report.pdf		Report.pdf
benchmarks.all.xlsx		benchmarks.all.xlsx
brent-kung.cu		brent-kung.cu
common.h		common.h
iterative.c		iterative.c
openmp_inclusiveScan.c		openmp_inclusiveScan.c
runBenchmarks.py		runBenchmarks.py
runTests.py		runTests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Scan in CUDA and OpenMP

How to Run

CUDA (Brent-Kung)

OpenMP

Testing

Scalability Study

About

Releases

Packages

Contributors 2

Languages

JordanKremer/accel-final

Folders and files

Latest commit

History

Repository files navigation

Parallel Scan in CUDA and OpenMP

How to Run

CUDA (Brent-Kung)

OpenMP

Testing

Scalability Study

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages