Deploying Embedding Models with SkyPilot¶
SkyPilot, Deploying Embedding Models on DKubeX
While deploying a model (embedding/LLM) on DKubeX with SkyPilot, first a Sky-serve controller is created locally on Kubernetes. This Sky-serve controller manages all the deployments done on and from this particular DKubeX setup.
You can deploy embedding models on DKubeX with Skypilot from the DKubeX embedding model catalog.
Prerequisites¶
Make sure SkyPilot is configured and set up properly on your DKubeX setup. For details, visit Configuring SkyPilot on DKubeX.
Deploying Embedding Models from Catalog with SkyPilot¶
To check the embedding model catalog on DKubeX, run the following command. This will show the list of all embedding models registered in DKubeX catalog. For the complete list of embedding models registered in DKubeX catalog, visit List of Embedding Models in DKubeX Embedding Model Catalog.
d3x emb listTo deploy an embedding model from the DKubeX catalog, use the following command. Fields to provide proper information are also described below.
d3x emb deploy --name <deployment-name> --model <model-name> --token <access token> -skyd3x emb deploy --name bgecloud --model BAAI--bge-large-en-v1-5 --token hf_hstxh*******hd9cb -skyProvide an unique name of the embedding model deployment replacing
<deployment-name>in the command.Replace
<model-name>with the embedding model name from the DKubeX catalog.Provide the Huggingface access token for the embedding model after the
--tokenflag replacing<access token>in the command.Use the
--publishflag to make the deployment details available for all users on the DKubeX setup.Use the
--min_replicasand--max_replicasflags along with number of replicas to set the minimum and maximum number of replicas configuration for the deployment. For example,--min_replicas 1.If you want to create the deployment with a different type of accelerator from the default one, use the
--sky-acceleratorflag along with the type and number of accelerators to be used. For example,--sky-accelerator A10G.
To check the status of the deployment and the service replica on SkyPilot, run the following command:
d3x sky status -raOnce both the service and the service replica are in
readystatus, the deployment is ready to use.Note
If the deployment is created with 0 replicas, in that case the deployment service will show a
NO REPLICAstate. To bring up a service replica for the deployment, send a test request to the service endpoint of the deployment like shown below:curl xxx.xxx.xxx.xxx:xxxxxcurl 123.45.67.89:30001Once the deployment is ready, you can get the service endpoint and the service token of the deployment by visiting the Deployments page in DKubeX and opening the particular deployment’s details page, or by running the following command on the terminal:
d3x serve list