Deploying Models from UI¶

Deploying models on KServe¶

To create an model deployment in DKubeX with KServe, go to the Deployments tab on the left panel and click the “+” icon. Then follow these steps:

General – Basic Deployment Settings¶

Deployment Type: Select the LLM radio button to create an LLM deployment, or Embedding for an embedding model.
Name: Enter a unique name for your deployment.

Framework * Type: Select KServe from the dropdown. This ensures the model is deployed using KServe. * Published: (Optional) Check this box to make the deployment publicly accessible.

Buttons * Cancel: Exits the deployment setup process. * Back: This button is disabled by default on this screen. * Next: Proceeds to the next step — Model Configuration.

Model Configuration¶

Source: Choose the source from which the model is pulled.

Option 1 – MLflow Registered Model * Image: A Docker image that serves the MLflow model. * Token: Access token for the image registry or MLflow server. * Model: The name of the MLflow-registered model. * Model Version: The specific version of the model to deploy. * Base Model Configuration: General configuration for the model. * Name: Name for the deployment instance.

Option 2 – Custom LLM Model * Token: Provide the API key or access token for external sources like Hugging Face or private servers. * Configuration File: Upload a file that defines model parameters, endpoints, and authentication.

Option 3 – DKubeX LLM Deployment * Source: Default option for models already present in DKubeX. * Provider: Select from the dropdown (e.g., “Dkubex”, “Nim”). * LLM Model: Choose from the list of supported models. * Token: Required when deploying external models.

Buttons * Cancel: Aborts and exits the configuration. * Back: Returns to the previous step. * Next: Proceeds to the final step — Advanced Configuration.

Advanced Configuration¶

Instance Type: Enter the instance type based on your cloud provider (e.g., “g4dn.xlarge”).
CPU: Allocate the number of virtual CPUs (e.g., 8).
Memory: Enter the memory allocation (e.g., “16GB”).
GPU: Specify the number of GPUs (e.g., 1).
QPS: (Optional) Set the number of Queries Per Second if rate limiting is needed.
Replicas (Min): Set the minimum number of instances (e.g., 1).
Replicas (Max): Set the maximum number of instances (e.g., 2).

Buttons * Cancel: Aborts and exits the deployment setup. * Back: Returns to the previous step. * Submit: Finalizes and submits the deployment.

Once submitted, the deployment will appear on the Deployments page. Wait until the status changes to running.

Deploying models with SkyPilot¶

To deploy a model using SkyPilot in DKubeX, go to the Deployments tab and click the “+” icon. Then follow these steps: