Deploying Embedding Models on DKubeX¶
Embedding models registered in the DKubeX LLM Catalog can be deployed on the DKubeX platform using local resources with KServe. The steps to deploy them are given below.
For a detailed guide regarding deploying embedding models in DKubeX using SkyPilot, visit Deploying Embedding Models with SkyPilot.
To list all embeddding models registered in the DKubeX LLM Catalog, use the following command.
d3x emb list
Deploying Embedding Models from DKubeX Embedding Model Catalog¶
To deploy an embedding model from DKubeX embedding model catalog using KServe, you can use the command given below.
d3x emb deploy --name <name of the deployment> --model <emb Name> --publish --kserve
d3x emb deploy -n bge-large --model BAAI--bge-large-en-v1-5 --publish --kserve
Provide an unique name of the embedding model deployment after the
--name
flag replacing<deployment-name>
in the command.Replace
<model-name>
with the name of the embedding model from the DKubeX catalog after the--model
flag.Use the
--publish
flag to make the deployment details available for all users on the DKubeX setup.Use the
--kserve
flag to deploy the model using KServe.Use the
--min_replicas
and--max_replicas
flags to specify the minimum and maximum number of replicas for the deployment. For example,--min_replicas 1
.
Deploying Embedding Models with Custom Configuration File¶
To deploy an embedding model from Huggingface repository using a custom configuration file using KServe, you can use the command given below.
d3x emb deploy --name <name of the deployment> --config <path to config file> --publish --kserve
d3x emb deploy -n bge-large --config /home/demo/bge_config.yaml --publish --kserve
Provide an unique name of the embedding model deployment after the
--name
flag replacing<deployment-name>
in the command.Provide the absolute path of the embedding model configuration file in your workspace after the
--config
flag.Use the
--publish
flag to make the deployment details available for all users on the DKubeX setup.Use the
--kserve
flag to deploy the model using KServe.Use the
--min_replicas
and--max_replicas
flags to specify the minimum and maximum number of replicas for the deployment. For example,--min_replicas 1
.