Serving Models and Open-source LLMs

Both Hugging-face and MLFlow models as well as LLM models can be deployed in DKubeX. For the steps required to serve the particular type of models, please click on the appropriate link beside the descriptions given below.

Description

Reference

Deploying models registered in MLFlow in DKubeX

Deploying models from MLFlow

Deploying models from Huggingface repo

Deploying Hugging-face models

Deploying LLM registered in DKubeX

Deploying LLMs in DKubeX

Deploying models from MLFlow

The registered models in the Models page in MLFlow in DKubeX can be deployed.

  • On the DKubeX Workspace, click on the terminal application to open the DKubeX CLI on a new tab.

  • Type the following command while replacing with the appropriate details enclosed in hashes.

d3x serve create -n <deployment name> -r mlflow --model <model name> --model_version <model version number> --repo_name <name of repo containing serving script> --repo_org <username of repo owner> --is_private_repo --access_token <your github personal access token> --branch_name <name of the branch containing the serving script> --depfilepath <deployment file path> --ngpus <number of gpu> --min_cpu <min number of cpus for cluster creation> --max_cpu <max number of cpus for cluster creation> --min_memory <min memory for cluster creation> --max_memory <max memory for cluster creation> --min_replicas <min number of deployment replicas> --max_replicas <max number of deployment replicas>

Note

  • In the serving command, if the values of hardware type, cpu, gpu, memory and replica are not provided, it automatically accepts the default values for them.

  • Use –is_private_repo and –access_token flags if the repository containing the serving script is private. Also make sure that you have access to the repository containing the serving script.

  • To find the model name and version number, use the following steps-

    • Open the MLFlow tab on DKubeX UI.

      _images/mlflowexperiments0.png
    • Open the Models page in MLFlow. This will show the list of all registered models.

      _images/mlflowmodels1.png
    • Note down the name and the version number of the model you are going to deploy.

  • After entering all necessary details, press Enter. The deployment creation will start automatically.

  • On the DKubeX UI, go to the Deployments tab. The list on the screen should show your deployment, along with it’s status, the name of the serve cluster and the serve endpoint.

    • The endpoint will show up after the deployment is successful, and the status turns to running.

    • Clicking on the name of the deployment opens the page containing all the details of that particular deployment.

    • Clicking on the name of the serve cluster opens the Ray dashboard for that cluster.

      _images/mlflowdeploy4.png

Deploying Hugging-face models

Pre-trained models available in Hugging-face can be deployed by DKubeX.

  • On the DKubeX Workspace, click on the terminal application to open the DKubeX CLI on a new tab.

  • Type the following command while replacing with the appropriate details enclosed in <>.

    d3x serve create -n <deployment name> -r hugging_face --hface_repoid <repo-id> --hface_tokenizer <tokenizer> --hface_classifier <classifier> --repo_name <name of repo containing serving script> --repo_org <username of repo owner> --is_private_repo --access_token <your github personal access token> --branch_name <name of the branch containing the serving script> --depfilepath <deployment file path> --ngpus <number of gpu> --min_cpu <min number of cpus for cluster creation> --max_cpu <max number of cpus for cluster creation> --min_memory <min memory for cluster creation> --max_memory <max memory for cluster creation> --min_replicas <min number of deployment replicas> --max_replicas <max number of deployment replicas>
    

Note

  • In the serving command, if the values of hardware type, cpu, gpu, memory and replica are not provided, it automatically accepts the default values for them.

  • Use –is_private_repo and –access_token flags if the repository containing the serving script is private. Also make sure that you have access to the repository containing the serving script.

  • To find the Hugging-face repo ID, tokenizer and model classifier, use the following steps-

    • Open the model’s page on the Hugging-face website.

      _images/hface1.png
    • Click on the Copy model name to clipboard button next to the name of the model name. This copies the model repo ID to the clipboard. Make a note of it in a text editing software to use it later.

      _images/hface2.png
    • In the model description, from the sample code provided, you can find the tokenizer for the model. For this example in the next photo, the tokenizer is BioGptTokenizer. Make a note of it.

      _images/hface3.png
    • For the model classifier, check the first tag below the name of the model. This is the model classifier. Make a note of it too.

      _images/hface4.png
  • After entering all necessary details, press Enter. The deployment creation will start automatically.

  • On the DKubeX UI, go to the Deployments tab. The list on the screen should show your deployment, along with it’s status, the name of the serve cluster and the serve endpoint.

    • The endpoint will show up after the deployment is successful, and the status turns to running.

    • Clicking on the name of the deployment opens the page containing all the details of that particular deployment.

    • Clicking on the name of the serve cluster opens the Ray dashboard for that cluster.

      _images/hfdeploy4.png

Deploying LLMs in DKubeX

Both base and finetuned LLMs can be deployed in DKubeX. The steps to deploy them are given below.

Deploying Base LLMs

You can deploy base LLMs which are registered with the DKubeX LLM Registry and base LLMs available on the Huggingface repository.

  • To list all base LLMs registered with DKubeX, use the following command.

    d3x llms list
    

    Information

    To see the full list of LLMs registered with DKubeX LLM Registry, please visit the List of LLMs in DKubeX LLM Registry page.

To deploy a base LLM registered with the DKubeX LLM registry, use the following command. Replace the parts enclosed within <> with the appropriate details.

Note

In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.

d3x llms deploy --name <name of the deployment> --model<LLM Name> --type <GPU Type> --token <access token for the model (if required)>
  • You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.

    d3x serve list
    

Deploying Finetuned LLMs

You can deploy LLMs finetuned and saved in your workspace, or you can also deploy finetuned LLMs registered in MLFlow in your workspace.

To deploy a finetuned LLM saved in your workspace, use the following command. Replace the parts enclosed within <> with the appropriate details.

Note

In case you are using a EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the following command.

d3x llms deploy -n <name of the deployment> --base_model <base LLM name> -m <absolute path to the finetuned model> --type <GPU type> --token <access token for the model (if required)>
  • You can check the status of the deployment from the Deployment page in DKubeX or by running the following command.

    d3x serve list