Overview of distributed workloads

You can use the distributed workloads feature to queue, scale, and manage the resources required to run data science workloads across multiple nodes in an OpenShift cluster simultaneously. Typically, data science workloads include several types of artificial intelligence (AI) workloads, including machine learning (ML) and Python workloads.

Distributed workloads provide the following benefits:

You can iterate faster and experiment more frequently because of the reduced processing time.
You can use larger datasets, which can lead to more accurate models.
You can use complex models that could not be trained on a single node.
You can submit distributed workloads at any time, and the system then schedules the distributed workload when the required resources are available.

The distributed workloads infrastructure includes the following components:

CodeFlare Operator

Secures deployed Ray clusters and grants access to their URLs

CodeFlare SDK

Defines and controls the remote distributed compute jobs and infrastructure for any Python-based environment

Note	The CodeFlare SDK is not installed as part of {productname-short}, but it is contained in some of the notebook images provided by {productname-short}.

KubeRay

Manages remote Ray clusters on OpenShift for running distributed compute workloads

Kueue

Manages quotas and how distributed workloads consume them, and manages the queueing of distributed workloads with respect to quotas

You can run distributed workloads from data science pipelines, from Jupyter notebooks, or from Microsoft Visual Studio Code files.

Note	Data science pipelines workloads are not managed by the distributed workloads feature, and are not included in the distributed workloads metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overview-of-distributed-workloads.adoc

overview-of-distributed-workloads.adoc

Overview of distributed workloads

Files

overview-of-distributed-workloads.adoc

Latest commit

History

overview-of-distributed-workloads.adoc

File metadata and controls

Overview of distributed workloads