Deploying LLMs with SkyPilot

SkyPilot, Deploying LLMs in DKubeX

While deploying a model (embedding/LLM) on DKubeX with SkyPilot, first a Sky-serve controller is created. This Sky-serve controller manages all the deployments done on and from this particular DKubeX setup.

You can deploy LLMs on DKubeX with Skypilot in the following ways-

  1. Deploying LLMs from Catalog with SkyPilot

  2. Deploying Custom LLMs with SkyPilot

Note

Only one Sky-serve controller can be launched in a particular cloud region. If the Sky-serve controller tries to launch in a cloud region where another Sky-serve controller is already present, the new controller will be stuck in init state. To resolve this issue, follow the instructions provided here:

Serve/SkyPilot: Sky Serve Controller is not launching/stuck in init status.

Prerequisites

Make sure SkyPilot is configured and set up properly on your DKubeX setup. For details, visit Configuring SkyPilot on DKubeX.

Deploying LLMs from Catalog with SkyPilot

  • To check the LLM catalog on DKubeX, run the following command. This will show the list of all LLMs registered in DKubeX catalog. For the complete list of LLMs registered in DKubeX catalog, visit List of LLMs in DKubeX LLM Catalog.

    d3x llms list
    
  • To deploy an LLM from the DKubeX catalog, use the following command. Fields to provide proper information are also described below.

    d3x llms deploy --name <deployment-name> --model <model-name> --token <access token> -sky
    
    • Provide an unique name of the LLM deployment after the --name flag replacing <deployment-name> in the command.

    • Replace <model-name> with the name of the LLM from the DKubeX catalog after the --model flag.

    • Provide the Huggingface access token for the LLM after the --token flag replacing <access token> in the command.

    • Use the --publish flag to make the deployment details available for all users on the DKubeX setup.

    • Use the --min_replicas and --max_replicas flags along with number of replicas to set the minimum and maximum number of replicas configuration for the deployment. For example, --min_replicas 1.

    • If you want to create the deployment with a different type of accelerator from the default one, use the --sky-accelerator flag along with the type and number of accelerators to be used. For example, --sky-accelerator A10:1.

  • To check the status of the deployment and the service replica on SkyPilot, run the following command:

    d3x sky status -ra
    

    Once both the service and the service replica are in ready status, the deployment is ready to use.

    Note

    If the deployment is created with 0 replicas, in that case the deployment service will show a NO REPLICA state. To bring up a service replica for the deployment, send a test request to the service endpoint of the deployment like shown below:

    curl xxx.xxx.xxx.xxx:xxxxx
    
  • Once the deployment is ready, you can get the service endpoint and the service token of the deployment by visiting the Deployments page in DKubeX and opening the particular deployment’s details page, or by running the following command on the terminal:

    d3x serve list
    

Deploying Custom LLMs with SkyPilot

To deploy a custom LLM on DKubeX with SkyPilot, you must provide the configuration file of the model on your workspace. Once done, use the following command to deploy the LLM. Fields to provide proper information are also described below.

d3x llms deploy --name <deployment-name> --config <config-file-path> --token <access token> -sky
  • Provide an unique name of the LLM deployment after the --name flag replacing <deployment-name> in the command.

  • Provide the absolute path of the LLM configuration file in your workspace after the --config flag replacing <config-file-path in the command.

  • Provide the Huggingface access token for the LLM after the --token flag replacing <access token> in the command.

  • Use the --publish flag to make the deployment details available for all users on the DKubeX setup.

  • Use the --min_replicas and --max_replicas flags along with number of replicas to set the minimum and maximum number of replicas configuration for the deployment. For example, --min_replicas 1.

  • If you want to create the deployment with a different type of accelerator from the default one, use the --sky-accelerator flag along with the type and number of accelerators to be used. For example, --sky-accelerator A10:1.

  • To check the status of the deployment and the service replica on SkyPilot, run the following command:

    d3x sky status -ra
    

    Once both the service and the service replica are in ready status, the deployment is ready to use.

    Note

    If the deployment is created with 0 replicas, in that case the deployment service will show a NO REPLICA state. To bring up a service replica for the deployment, send a test request to the service endpoint of the deployment like shown below:

    curl xxx.xxx.xxx.xxxi:xxxxx
    
  • Once the deployment is ready, you can get the service endpoint and the service token of the deployment by visiting the Deployments page in DKubeX and opening the particular deployment’s details page, or by running the following command on the terminal:

    d3x serve list