Resource-based, Declarative task-Graphs for Parallel, Event-driven Scheduling
RedGrapes is a C++20 framework for declaratively creating and scheduling task-graphs, based on a high-level resource description.
Modern compute nodes concurrently perform computational tasks over various memory resource pools, cores and accelerator devices. In order to achieve high scalability in such a system, communication and computation tasks need to be overlapped extensively.
Up until now, software developers who took up to this challenge had to juggle data and in-node execution dependencies manually, but that is a tedious and error-prone process. Real-world applications always use global shared states and also vary the workload at runtime depending on input parameters or other variables. In addition, asynchronous communication models complicate the program flow even further.
For this reason, one should decouple aforementioned computational tasks from their execution model altogether. A typical approach involves task-graphs, which are directed acyclic graphs (DAGs), whose vertices are some sort of computation (or communication) and the edges denote the execution precedence order. The execution precedence arises from the order in which those tasks were declared by the programmer but also have to take into account the data dependencies between the tasks.
Consequently, RedGrapes provides you with a light-weight, application-level, task-based C++ programming framework. Herein, a task-graph is generated declaratively from access to resources and order of your code, just as in serial programming.
The program shall be divided into tasks. A task is can be a sub-step in a computational solver, the exchange of data between two memory resource pools, or anything else. Tasks are the smallest unit the RedGrapes scheduler operates with. Data dependencies are described via resources, which are accessed and potentially manipulated by tasks.
Each task has an annotation how the resources are accessed. Therein allowed access modes depend on the type of the resource. A simple example would be read/write, but also more complex operations are possible, e.g., accessing sub-regions of a sequence-container or other atomic, commutative operations besides read. A resource can be associated with a specific access mode forming a resource access. These instances of a resource access can then be pairwise tested wheter they are conflicting and thereby creating a data-dependency (e.g., two writes to the same resource). So each task carries a list of these resource-accesses in its so-called task-properties. If two tasks have conflicting resource-accesses, the first created task is executed first. This is exactly the behavior that one would also achieve when programming serially, without hints given via resources.
When tasks are created, their resource-access list is compared against the previous enqueued tasks and corresponding dependencies are created in the task-graph. The resulting task-graph is read by a scheduling algorithm that executes individual tasks, e.g., across parallel threads.
See examples for examples covering more features.
#include <cassert>
#include <iostream>
#include <redGrapes/redGrapes.hpp>
#include <redGrapes/resource/ioresource.hpp>
namespace rg = redGrapes;
int main()
{
rg::init();
rg::IOResource< int > a;
rg::emplace_task(
[] ( auto a ) { *a = 123; },
a.write()
);
/* the following tasks may run in parallel,
* but will only start once the first is done.
*/
rg::emplace_task(
[] ( auto a ) { assert( *a == 123 ); },
a.read()
);
rg::emplace_task(
[] ( auto a ) { std::cout << a << std::endl; },
a.read()
);
rg::finalize();
return 0;
}
RedGrapes is documented using in-code doxygen comments and reStructured-text files (in docs/source), build with Sphinx.
There are several other libraries and toolchains with similar goals, enabling some kind of task-based programming in C++. Firstly we should classify such programming systems by how the task-graph is built. The more low-level approach is to just create tasks as executable unit and imperatively define task-dependencies. This approach may be called "data-driven", because the dependencies can be created by waiting for futures of other tasks. However since we want to achieve declarative task dependencies, for which the runtime must also be aware of shared states to automatically detect data dependencies in order to derive the task-graph, the aforementioned approach does not suffice and we can exclude this entire class of runtime-systems.
compile time checked memory access: The automatic creation of a task graph is often done via annotations, e.g., a pragma in OpenMP, but that does not guarantee the correctness of the access specifications. RedGrapes leverages the type system to write relatively safe code in that regard.
native C++: PaRSEC has a complicated toolchain using additional compilers, OpenMP makes use of pragmas that require compiler support. RedGrapes only requires the C++20 standard.
typesafe: Some libraries like Legion or StarPU use an untyped argc
/argv
interface to pass parameters to tasks, which is error-prone. Both libraries in general also require a lot of C-style boilerplate.
custom access modes: RedGrapes supports arbitrary, user-configurable access modes beyond read/write, e.g., accesses to sub-areas of a multi-dimensional buffer can be described properly.
integration with asynchronous APIs: To correctly model asynchronous MPI or CUDA calls, the complete operation should be a task, but still not block. The finishing of the asynchronous operation has to be triggered externally. Systems that implement distributed scheduling do not leave this option since the communication is done by the runtime itself.
inter-process scheduling: Legion, StarPU, HPX, etc. add another layer of abstraction to provide a virtualized programming interface for multiple nodes in a HPC-cluster. This implies that the domain decomposition, communication and task-migration is handled to some extent implicitly by the tasking-runtime. This is out of scope for RedGrapes, but could be built on top rather than tightly coupling it.
Feature | native C++ | typesafe | custom access modes | compile time checked memory access | CUDA | MPI | other async APIs | inter-process scheduling |
---|---|---|---|---|---|---|---|---|
declarative task-dependencies | ||||||||
RedGrapes | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ 1 | ✔️ 1 | ✔️2 | ❌ |
MetaPass | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ 3 | ❌ | ✔️ |
Legion | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ 3 | ❌ | ✔️ |
StarPU | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ 3 | ❌ | ✔️ |
PaRSEC | ❌ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ 3 | ❌ | ✔️ |
SYCL | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ❌ 4 | ❌ |
OpenMP | ❌ | ✔️ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
imperative task-dependencies | ||||||||
Realm | ✔️ | ✔️ | - | - | ✔️ | ✔️ 3 | ❌ | ✔️ |
HPX | ✔️ | ✔️ | - | - | ✔️ | ✔️ 3 | ❌ | ✔️ |
TaskFlow | ✔️ | ✔️ | - | - | ✔️ | ❌ | ❌ | ❌ |
- user controllable, decoupled helper code, but included with RedGrapes
- events can be triggered externally, e.g., from a polling loop
- only implicitly managed, not user controlled
- see hipSYCL#181
Note: Should any libraries be misrepresented here, corrections are welcome.
This Project is free software, licensed under the Mozilla MPL 2.0 license.
RedGrapes is developed by members of the Computational Radiation Physics Group at HZDR. Its conceptual design is based on a whitepaper by A. Huebl, R. Widera, and A. Matthes (2017).
- Michael Sippel: library design & implementation
- Dr. Sergei Bastrakov: supervision
- Dr. Axel Huebl: whitepaper, supervision, CI
- René Widera: whitepaper, supervision
- Alexander Matthes: whitepaper
RedGrapes requires a compiler supporting the C++20 standard. RedGrapes further depends on the following libraries:
- ConcurrentQueue by Cameron Desrochers
- spdlog
- {fmt}
- (Optionally for testing) Catch2