Deploying Embedding Models on DKubeX

Embedding models registered in the DKubeX LLM Catalog can be deployed on the DKubeX platform using local resources with KServe. The steps to deploy them are given below.

For a detailed guide regarding deploying embedding models in DKubeX using SkyPilot, visit Deploying Embedding Models with SkyPilot.

  • To list all embeddding models registered in the DKubeX LLM Catalog, use the following command.

    d3x emb list
    

Deploying Embedding Models from DKubeX Embedding Model Catalog

  • To deploy an embedding model from DKubeX embedding model catalog using KServe, you can use the command given below.

    d3x emb deploy --name <name of the deployment> --model <emb Name> --publish --kserve
    
    • Provide an unique name of the embedding model deployment after the --name flag replacing <deployment-name> in the command.

    • Replace <model-name> with the name of the embedding model from the DKubeX catalog after the --model flag.

    • Use the --publish flag to make the deployment details available for all users on the DKubeX setup.

    • Use the --kserve flag to deploy the model using KServe.

    • Use the --min_replicas and --max_replicas flags to specify the minimum and maximum number of replicas for the deployment. For example, --min_replicas 1.

Deploying Embedding Models with Custom Configuration File

  • To deploy an embedding model from Huggingface repository using a custom configuration file using KServe, you can use the command given below.

    d3x emb deploy --name <name of the deployment> --config <path to config file> --publish --kserve
    
    • Provide an unique name of the embedding model deployment after the --name flag replacing <deployment-name> in the command.

    • Provide the absolute path of the embedding model configuration file in your workspace after the --config flag.

    • Use the --publish flag to make the deployment details available for all users on the DKubeX setup.

    • Use the --kserve flag to deploy the model using KServe.

    • Use the --min_replicas and --max_replicas flags to specify the minimum and maximum number of replicas for the deployment. For example, --min_replicas 1.