Deploying Models from UI¶
Deploying models on KServe¶
To create an model deployment in DKubeX with KServe, go to the Deployments tab on the left panel and click the “+” icon. Then follow these steps:
General – Basic Deployment Settings¶
Deployment Type: Select the LLM radio button to create an LLM deployment, or Embedding for an embedding model.
Name: Enter a unique name for your deployment.
Framework * Type: Select KServe from the dropdown. This ensures the model is deployed using KServe. * Published: (Optional) Check this box to make the deployment publicly accessible.
Buttons * Cancel: Exits the deployment setup process. * Back: This button is disabled by default on this screen. * Next: Proceeds to the next step — Model Configuration.
Model Configuration¶
Source: Choose the source from which the model is pulled.
Option 1 – MLflow Registered Model * Image: A Docker image that serves the MLflow model. * Token: Access token for the image registry or MLflow server. * Model: The name of the MLflow-registered model. * Model Version: The specific version of the model to deploy. * Base Model Configuration: General configuration for the model. * Name: Name for the deployment instance.
Option 2 – Custom LLM Model * Token: Provide the API key or access token for external sources like Hugging Face or private servers. * Configuration File: Upload a file that defines model parameters, endpoints, and authentication.
Option 3 – DKubeX LLM Deployment * Source: Default option for models already present in DKubeX. * Provider: Select from the dropdown (e.g., “Dkubex”, “Nim”). * LLM Model: Choose from the list of supported models. * Token: Required when deploying external models.
Buttons * Cancel: Aborts and exits the configuration. * Back: Returns to the previous step. * Next: Proceeds to the final step — Advanced Configuration.
Advanced Configuration¶
Instance Type: Enter the instance type based on your cloud provider (e.g., “g4dn.xlarge”).
CPU: Allocate the number of virtual CPUs (e.g., 8).
Memory: Enter the memory allocation (e.g., “16GB”).
GPU: Specify the number of GPUs (e.g., 1).
QPS: (Optional) Set the number of Queries Per Second if rate limiting is needed.
Replicas (Min): Set the minimum number of instances (e.g., 1).
Replicas (Max): Set the maximum number of instances (e.g., 2).
Buttons * Cancel: Aborts and exits the deployment setup. * Back: Returns to the previous step. * Submit: Finalizes and submits the deployment.
Once submitted, the deployment will appear on the Deployments page. Wait until the status changes to running.
Deploying models with SkyPilot¶
To deploy a model using SkyPilot in DKubeX, go to the Deployments tab and click the “+” icon. Then follow these steps:
General – Basic Deployment Settings¶
Deployment Type: Select the LLM radio button to create an LLM deployment, or Embedding for an embedding model.
Name: Enter a name for your deployment.
Framework * Type: Select Sky from the dropdown. * Published: (Optional) Check to publish the deployment. * Sky Cloud Name: Enter your SkyPilot-configured cloud name (e.g., “aws:us-west-2”). * Accelerator: Define the hardware (e.g., “A100:1”, “V100:1”, “T4:1”). * Sky Configuration: Upload your .yaml SkyPilot configuration file.
Buttons * Cancel: Exits the deployment setup process. * Back: This button is disabled by default. * Next: Proceeds to Model & Source Configuration.
Model Configuration¶
Source: Choose the source from which the model is pulled.
Option 1 – MLflow Registered Model * Image: A Docker image that serves the MLflow model. * Token: Access token for the image registry or MLflow server. * Model: The name of the MLflow-registered model. * Model Version: The specific version of the model to deploy. * Base Model Configuration: General configuration for the model. * Name: Name for the deployment instance.
Option 2 – Custom LLM Model * Token: Provide the API key or access token for external sources like Hugging Face or private servers. * Configuration File: Upload a file that defines model parameters, endpoints, and authentication.
Option 3 – DKubeX LLM Deployment * Source: Default option for models already present in DKubeX. * Provider: Select from the dropdown (e.g., “Dkubex”, “Nim”). * LLM Model: Choose from the list of supported models. * Token: Required when deploying external models.
Buttons * Cancel: Aborts and exits the configuration. * Back: Returns to the previous step. * Next: Proceeds to the final step — Advanced Configuration.
Advanced Configuration¶
Instance Type: Enter the instance type based on your cloud provider (e.g., “g4dn.xlarge”).
CPU: Allocate the number of virtual CPUs (e.g., 8).
Memory: Enter the memory allocation (e.g., “16GB”).
GPU: Specify the number of GPUs (e.g., 1).
QPS: (Optional) Set the number of Queries Per Second if rate limiting is needed.
Replicas (Min): Set the minimum number of instances (e.g., 1).
Replicas (Max): Set the maximum number of instances (e.g., 2).
Buttons * Cancel: Aborts and exits the deployment setup. * Back: Returns to the previous step. * Submit: Finalizes and submits the deployment.
Once submitted, the deployment will appear on the Deployments page. Wait until the status changes to running.