Creating a Sky Cluster in DKubeX

MLOps Tutorials

To run a ML model training job in DKubeX, you need to create a Sky cluster first. The following steps will guide you through the process of creating and managing Sky clusters in DKubeX.

  • Open the Clusters page on your DKubeX workspace. This page lists all the clusters that are currently available to work on.

  • To create a new Sky cluster, click on the “Create Cluster” button (shown as a “+” button on the top left corner of the Clusters page).

  • On the General page, provide the following details:

    • On the General section, select sky as the type of the cluster to be created. In the fields below, provide the following details:

      Field

      Description

      Name

      Provide a unique name for the Sky cluster. For this tutorial, provide mnistskycluster.

    • Optionally, in the Docker Configuration section, you can provide a custom Docker image for the Sky cluster to be created. Provide the Docker registry server, the image tag, username and password for the custom Docker image. You can also optionally profide spot cluster details as per your requirements.

    • Once done, click on the Next button to proceed to the Resources page.

  • On the Resources page, provide the following details:

    • In the Provisioning section, provide the following details:

      Field

      Description

      Cloud

      Select the cloud provider to be used for provisioning the Sky cluster.

      GPU Accelerator

      Select the GPU accelerator type to be used for the Sky cluster.

      GPU Count

      Select the number of GPUs to be used for each node in the Sky cluster.

      Region

      Select the region where the Sky cluster needs to be provisioned.

      CPU

      Number of CPUs to be allocated to each node of the Sky cluster.

      Memory

      Amount of memory (in GB) to be allocated to each node of the Sky cluster.

  • Once done, click on the Submit button to launch the Sky cluster. Once the cluster goes into the Ready state, you can start submitting Sky jobs to the cluster.

Now that your Sky cluster is up and running, you can proceed to train a ML model using Sky jobs in DKubeX. To learn how to train a ML model using Sky jobs, please refer to Training a ML Model in DKubeX / Running Ray Jobs.