Deploying LLMs in DKubeX¶
Both base and finetuned LLMs can be deployed in DKubeX using KServe. The steps to deploy them are given below.
Note
To make the LLM deployment accessible for all users on a particular DKubeX setup, please use the
--public
flag in the deployment command.To deploy an LLM on DKubeX with SkyPilot and Sky-Serve, visit Deploying LLMs with SkyPilot.
Deploying Base LLMs¶
You can deploy base LLMs which are registered with the DKubeX LLM Catalog or from Huggingface repository with a custom configuration file.
To list all base LLMs registered with DKubeX, use the following command.
d3x llms list
Information
To see the full list of LLMs registered with DKubeX LLM Catalog, please visit the List of LLMs in DKubeX LLM Catalog page.
To deploy a base LLM registered with the DKubeX LLM Catalog, use the following command. Replace the parts enclosed within <> with the appropriate details.
Note
In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.
d3x llms deploy --name <name of the deployment> --model <LLM Name> --type <GPU Type> --token <access token for the model (if required)> --kserve
d3x llms deploy --name llama38b --model meta-llama/Meta-Llama-3-8B-Instruct --type a10 --token hf_Ahq***********jWmO --kserve
Provide an unique name of the LLM deployment after the
--name
flag replacing<deployment-name>
in the command.Replace
<model-name>
with the name of the LLM from the DKubeX catalog after the--model
flag.Provide the Huggingface access token for the LLM after the
--token
flag replacing<access token>
in the command.Use the
--publish
flag to make the deployment details available for all users on the DKubeX setup.Use the
--kserve
flag to deploy the LLM using KServe.Use the
--min_replicas
and--max_replicas
flags along with number of replicas to set the minimum and maximum number of replicas configuration for the deployment. For example,--min_replicas 1
.You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.
d3x serve list
You can deploy base LLMs from the Huggingface repository using a custom configuration file. Replace the parts enclosed within <> with the appropriate details.
Attention
Make sure you have provided the deployment configuration file for the LLM that you want to deploy in your workspace.
Note
In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.
d3x llms deploy --name <deployment name> --config <path to deployment config file> --type <GPU type> --token <access token for the model (if required)> --kserve
d3x llms deploy --name llama38b --config /home/demo/llama38b.yaml --type a10 --token hf_Ahq***************WmO --kserve
Provide an unique name of the LLM deployment after the
--name
flag replacing<deployment-name>
in the command.Provide the absolute path of the LLM configuration file in your workspace after the
--config
flag.Provide the Huggingface access token for the LLM after the
--token
flag replacing<access token>
in the command.Use the
--publish
flag to make the deployment details available for all users on the DKubeX setup.Use the
--kserve
flag to deploy the LLM using KServe.Use the
--min_replicas
and--max_replicas
flags along with number of replicas to set the minimum and maximum number of replicas configuration for the deployment. For example,--min_replicas 1
.You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.
d3x serve list
Deploying Finetuned LLMs¶
You can deploy LLMs finetuned and saved in your workspace, or you can also deploy finetuned LLMs registered in MLFlow in your workspace.
To deploy a finetuned LLM saved in your workspace, use the following command. Replace the parts enclosed within <> with the appropriate details.
Note
In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.
d3x llms deploy -n <name of the deployment> --base_model <base LLM name> -m <absolute path to the finetuned model> --type <GPU type> --token <access token for the model (if required)> --kserve
d3x llms deploy -n llama38bft --base_model meta-llama/Meta-Llama-3-8B-Instruct -m /home/demo/finetuned_llama38b --type a10 --token hf_Ahq*********************WmO --kserve
You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.
d3x serve list
To deploy a finetuned LLM registered in MLFlow in your workspace, use the steps provided below:
To list all LLMs registered in MLFlow, use the following command.
d3x mlflow models list
To deploy a finetuned LLM registered in MLFlow, use the following command. Replace the parts enclosed within <> with the appropriate details.
Note
In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.
d3x llms deploy -n <name of the deployment> --base_model <base LLM name> --mlflow <name of registered model>:<model version> --type <GPU type> --token <access token for the model (if required)> --kserve
d3x llms deploy -n llama38bft --base_model meta-llama/Meta-Llama-3-8B-Instruct --mlflow llama38b:1 --type a10 --token hf_Ahq****************WmO --kserve
You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.
d3x serve list