Deploying LLMs in DKubeX¶
LLMs can be deployed in DKubeX using KServe. The steps to deploy them are given below.
Note
To make the LLM deployment accessible for all users on a particular DKubeX setup, please use the
--publicflag in the deployment command.To deploy an LLM on DKubeX with SkyPilot and Sky-Serve, visit Deploying LLMs with SkyPilot.
Deploying LLMs on DKubeX UI¶
To deploy an LLM through DKubeX UI, use the following steps:
Open the Deployments page on your DKubeX workspace. This page lists all the model deployments that are currently running or have been executed previously.
To create a new LLM deployment, click on the “Create Deployment” button (shown as a “+” button on the top left corner of the Deployments page).
On the General page, provide the following details:
Select
LLMas the type of deployment to be launched.Provide a unique name for the LLM deployment in the Name field.
Select
KServeas the deployment framework if you want to deploy the LLM with local resources, or selectSkyif you want to deploy the LLM with SkyPilot. In case of Sky, you can also provide deployment configuration details.Once done, click on the Next button to proceed to the Configuration page.
On the Configuration page, provide the following details:
In the Model Configuration section, provide the following details:
Field
Description
Source
Select the source from where the LLM has to be deployed.
Provider
Select the provider of the LLM.
LLM Model
Select the LLM model to be deployed from the provided catalog.
Token
Provide the access token for the LLM model (if required).
Once done, click on the Next button to proceed to the Advanced page.
On the Advanced page, provide the following details:
In the Advanced Configuration section, provide the following details:
Field
Description
CPU
Provide the number of CPU cores to be allocated for the LLM deployment.
Memory
Provide the amount of memory to be allocated for the LLM deployment.
GPU
Provide the number of GPUs to be allocated for the LLM deployment.
QPS
Provide the Queries Per Second (QPS) limit for the LLM deployment.
Replicas
Provide the number of minimum and maximum replicas for the LLM deployment.
Once done, click on the Submit button to create the LLM deployment.
Once the deployment is in Running state, the deployment is ready to be used. You can access the deployment details by clicking on the deployment name on the Deployments page.
Deploying LLM models from CLI¶
You can deploy base LLMs which are registered with the DKubeX LLM Catalog or from Huggingface repository with a custom configuration file.
To list all base LLMs registered with DKubeX, use the following command.
d3x llms listInformation
To see the full list of LLMs registered with DKubeX LLM Catalog, please visit the List of LLMs in DKubeX LLM Catalog page.
To deploy a base LLM registered with the DKubeX LLM Catalog, use the following command. Replace the parts enclosed within <> with the appropriate details.
Note
In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.
d3x llms deploy --name <name of the deployment> --model <LLM Name> --type <GPU Type> --token <access token for the model (if required)>
d3x llms deploy --name llama38b --model meta-llama/Meta-Llama-3-8B-Instruct --type a10 --token hf_Ahq***********jWmO
Provide an unique name of the LLM deployment after the
--nameflag replacing<deployment-name>in the command.Replace
<model-name>with the name of the LLM from the DKubeX catalog after the--modelflag.Provide the Huggingface access token for the LLM after the
--tokenflag replacing<access token>in the command.Use the
--publishflag to make the deployment details available for all users on the DKubeX setup.Use the
--kserveflag to deploy the LLM using KServe.Use the
--min_replicasand--max_replicasflags along with number of replicas to set the minimum and maximum number of replicas configuration for the deployment. For example,--min_replicas 1.You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.
d3x serve list
You can deploy base LLMs from the Huggingface repository using a custom configuration file. Replace the parts enclosed within <> with the appropriate details.
Attention
Make sure you have provided the deployment configuration file for the LLM that you want to deploy in your workspace.
Note
In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.
d3x llms deploy --name <deployment name> --config <path to deployment config file> --type <GPU type> --token <access token for the model (if required)>
d3x llms deploy --name llama38b --config /home/demo/llama38b.yaml --type a10 --token hf_Ahq***************WmO
Provide an unique name of the LLM deployment after the
--nameflag replacing<deployment-name>in the command.Provide the absolute path of the LLM configuration file in your workspace after the
--configflag.Provide the Huggingface access token for the LLM after the
--tokenflag replacing<access token>in the command.Use the
--publishflag to make the deployment details available for all users on the DKubeX setup.Use the
--kserveflag to deploy the LLM using KServe.Use the
--min_replicasand--max_replicasflags along with number of replicas to set the minimum and maximum number of replicas configuration for the deployment. For example,--min_replicas 1.You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.
d3x serve list