Merging Finetuned Models

Finetuning Embedding Models, Finetuning LLMs

In this tutorial, we will go through the steps of merging finetuned model checkpoints on DKubeX.

Prerequisites

  • Make sure that your embedding model or LLM finetuning run is successfully completed. You can refer to the respective tutorials for finetuning embedding models and LLMs on DKubeX.

    Finetuning tutorials
    Finetuning Embedding Models

    Tutorial regarding finetuning embedding models locally on DKubeX.

    ./emb-finetuning.html
    Finetuning LLMs

    Tutorial regarding finetuning LLMs locally on DKubeX.

    ./llm-finetuning.html

    For this example, we will assume that you have successfully finetuned the meta-llama/Meta-Llama-3-8B-Instruct LLM model. The finetuning run name assumed for this example is llama-3-ft.

  • Make sure that at least one of the worker nodes running your cluster running DKubeX contains an NVIDIA A10 GPU (with minimum resource of AWS- g5.4xlarge type instance, with at least 16 vCPU cores and 64 GiB of memory).

    • In case of a RKE2 cluster, make sure the node is labeled as a10 during installation.

      Command to label a node as a10 on a RKE2 cluster
      Command
      kubectl label node <node-name> node.kubernetes.io/nodetype=a10
      
      Example
      kubectl label node demo-worker-node node.kubernetes.io/nodetype=a10
      
    • In case of an AWS EKS cluster, make sure that the cluster contains a g5.4xlarge type nodegroup with maximum size of 1 or more.

  • Make sure that you have an active Huggingface access token which has access to the model you are finetuning and merging. For this example you need to have an active access token for the meta-llama/Meta-Llama-3-8B-Instruct model on Huggingface. You can generate these tokens on the Access Tokens page on Huggingface. For more information, refer to the Huggingface documentation.

  • Open DKubeX terminal and export the following environment variables. Replace <username> with your DKubeX username, and <access-token> with your Huggingface access token.

    export HOMEDIR=/home/<username>
    export HF_TOKEN=<access-token>
    

Merging Finetuned Model Checkpoints

  • Create a Ray cluster on which the merge job will run by using the command provided below. Replace <cluster-name> with the name of the Ray cluster you want to create. For this example, to create the Ray cluster, run the example command provided below.

    d3x ray create --name <cluster-name> --cpu 8 --memory 64 --gpu 1 --type a10
    

    Note

    In case you are using an AWS EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the command.

  • Check the status of the Ray cluster by running the following command:

    d3x ray list
    
  • Once the Ray cluster status becomes running, you can run the merge job on the cluster by running the following command. Replace the following in the command:

    Variable

    Replace with

    <merge-job-name>

    Unique name for the merge job.

    <ft-run-name>

    Name of the finetuning run.

    <ray-cluster-name>

    Name of the Ray cluster on which the merge job will run.

    <hf-token>

    Huggingface access token.

    d3x ft merge --name <merge-job-name> --ft-name <ft-run-name> --ray_cluster <ray-cluster-name> --hf-token <hf-token>
    
  • Once the merge run goes into succeeded state, open MLFlow on DKubeX workspace, and open the experiment corresponding to the merge run to view the merge run metrics and artifacts, along with the recorded merged model checkpoint. The experiment name in MLFlow will be same as the merge run name (llama3merge for this example).