Deploying Embedding Models on DKubeX

Embedding models registered in the DKubeX LLM Catalog can be deployed on the DKubeX platform using local resources. The steps to deploy them are given below.

For a detailed guide regarding deploying embedding models in DKubeX using SkyPilot, visit Deploying Embedding Models with SkyPilot.

Deploying Embedding Models on DKubeX UI

To deploy an Embedding Model through DKubeX UI, use the following steps:

  • Open the Deployments page on your DKubeX workspace. This page lists all the model deployments that are currently running or have been executed previously.

  • To create a new Embedding Model deployment, click on the “Create Deployment” button (shown as a “+” button on the top left corner of the Deployments page).

  • On the General page, provide the following details:

    • Select Embedding Model as the type of deployment to be launched.

    • Provide a unique name for the Embedding Model deployment in the Name field.

    • Select KServe as the deployment framework if you want to deploy the Embedding Model with local resources, or select Sky if you want to deploy the Embedding Model with SkyPilot. In case of Sky, you can also provide deployment configuration details.

    • Once done, click on the Next button to proceed to the Configuration page.

  • On the Configuration page, provide the following details:

    • In the Model Configuration section, provide the following details:

      Field

      Description

      Source

      Select the source from where the Embedding Model has to be deployed.

      Provider

      Select the provider of the Embedding Model.

      Embedding Model

      Select the Embedding Model to be deployed from the provided catalog.

      Token

      Provide the access token for the Embedding Model (if required).

    • Once done, click on the Next button to proceed to the Advanced page.

  • On the Advanced page, provide the following details:

    • In the Advanced Configuration section, provide the following details:

      Field

      Description

      CPU

      Provide the number of CPU cores to be allocated for the Embedding Model deployment.

      Memory

      Provide the amount of memory to be allocated for the Embedding Model deployment.

      GPU

      Provide the number of GPUs to be allocated for the Embedding Model deployment.

      QPS

      Provide the Queries Per Second (QPS) limit for the Embedding Model deployment.

      Replicas

      Provide the number of minimum and maximum replicas for the Embedding Model deployment.

    • Once done, click on the Submit button to create the Embedding Model deployment.

Once the deployment is in Running state, the deployment is ready to be used. You can access the deployment details by clicking on the deployment name on the Deployments page.

Deploying Embedding Models from CLI

  • To list all embeddding models registered in the DKubeX LLM Catalog, use the following command.

    d3x emb list
    

Deploying Embedding Models from DKubeX Embedding Model Catalog

  • To deploy an embedding model from DKubeX embedding model catalog using KServe, you can use the command given below.

    d3x emb deploy --name <name of the deployment> --model <emb Name> --token <access token> --publish
    
    • Provide an unique name of the embedding model deployment after the --name flag replacing <deployment-name> in the command.

    • Replace <model-name> with the name of the embedding model from the DKubeX catalog after the --model flag.

    • Provide the Huggingface access token for the embedding model after the --token flag replacing <access token> in the command.

    • Use the --publish flag to make the deployment details available for all users on the DKubeX setup.

    • Use the --kserve flag to deploy the model using KServe.

    • Use the --min_replicas and --max_replicas flags to specify the minimum and maximum number of replicas for the deployment. For example, --min_replicas 1.

Deploying Embedding Models with Custom Configuration File

  • To deploy an embedding model from Huggingface repository using a custom configuration file using KServe, you can use the command given below.

    d3x emb deploy --name <name of the deployment> --config <path to config file> --token <access token> --publish
    
    • Provide an unique name of the embedding model deployment after the --name flag replacing <deployment-name> in the command.

    • Provide the absolute path of the embedding model configuration file in your workspace after the --config flag.

    • Provide the Huggingface access token for the embedding model after the --token flag replacing <access token> in the command.

    • Use the --publish flag to make the deployment details available for all users on the DKubeX setup.

    • Use the --kserve flag to deploy the model using KServe.

    • Use the --min_replicas and --max_replicas flags to specify the minimum and maximum number of replicas for the deployment. For example, --min_replicas 1.