Creating a Ray Cluster in DKubeX

MLOps Tutorials

To run a ML model training job in DKubeX, you need to create a Ray cluster first. The following steps will guide you through the process of creating and managing Ray clusters in DKubeX.

  • Open the Clusters page on your DKubeX workspace. This page lists all the Ray clusters that are currently available to work on.

  • To create a new Ray cluster, click on the “Create Cluster” button (shown as a “+” button on the top left corner of the Clusters page).

    Create Cluster Button (+)

    Create Cluster Button (+)

  • On the General page, provide the following details:

    General Page – Cluster Creation

    General Page – Cluster Creation

    • On the General section, select Ray as the type of the cluster to be created. In the fields below, provide the following details:

      Field

      Description

      Name

      Provide a unique name for the Ray cluster. For this tutorial, provide mnistraycluster.

      Version

      Provide the Ray version to be used for this cluster (Optional). Default: 2.11.0

    • Optionally, in the Docker Configuration section, you can provide a custom Docker image for the Ray cluster to be created. Provide the Docker registry server, the image tag, username and password for the custom Docker image.

    • Once done, click on the Next button to proceed to the Resources page.

  • On the Resources page, provide the following details:

    Resources Page – Cluster Creation

    Resources Page – Cluster Creation

    • In the Head Node section, provide the following details:

      Field

      Description

      CPU

      Number of CPUs to be allocated to the head node of the Ray cluster (Optional). Default: 2

      Memory

      Amount of memory (in GB) to be allocated to the head node of the Ray cluster (Optional). Default: 4

      GPU

      Number of GPUs to be allocated to the head node of the Ray cluster (Optional). Default: 0

      Instance Type

      Select the node instance in which the head node of the Ray cluster will be launched.

    • Optionally, in the Worker Nodes section, provide the following details:

      Field

      Description

      CPU

      Number of CPUs to be allocated to each worker node of the Ray cluster (Optional). Default: 2

      Memory

      Amount of memory (in GB) to be allocated to each worker node of the Ray cluster (Optional). Default: 4

      GPU

      Number of GPUs to be allocated to each worker node of the Ray cluster (Optional). Default: 0

      Instance Type

      Select the node instance in which the worker nodes of the Ray cluster will be launched (Optional).

  • Once done, click on the Submit button to launch the Ray cluster. Once the cluster goes into the Ready state, you can start submitting Ray jobs to the cluster.

    Ray Cluster in Ready State

    Ray Cluster in Ready State

  • You can also click on the cluster name to see the details of the Ray cluster including the head node and worker nodes information. Also, you can access the Ray dashboard of the said cluster by clicking on the Ray Dashboard button on the right side of the cluster entry on the Clusters page.

Now that your Ray cluster is up and running, you can proceed to train a ML model using Ray jobs in DKubeX. To learn how to train a ML model using Ray jobs, please refer to Training a ML Model in DKubeX / Running Ray Jobs.