Finetuning Embedding Models

In this tutorial we will go through the steps of finetuning an embedding model locally on DKubeX. You can finetune the embedding model using the default configurations provided in the DKubeX finetuning catalog, or you can provide your own custom configuration to finetune the embedding model.

Prerequisites

  • Make sure that at least one of the worker nodes running your cluster running DKubeX contains an NVIDIA A10 GPU (with minimum resource of AWS- g5.4xlarge type instance, with at least 16 vCPU cores and 64 GiB of memory).

    • In case of a RKE2 cluster, make sure the node is labeled as a10 during installation.

      Command to label a node as a10 on a RKE2 cluster
      Command
      kubectl label node <node-name> node.kubernetes.io/nodetype=a10
      
      Example
      kubectl label node demo-worker-node node.kubernetes.io/nodetype=a10
      
    • In case of an AWS EKS cluster, make sure that the cluster contains a g5.4xlarge type nodegroup with maximum size of 1 or more.

  • Make sure that you have an active Huggingface access token which has access to the model you are finetuning. For this example you need to have an active access token for the BAAI/bge-large-en-v1.5 and BAAI/bge-m3 models on Huggingface. You can generate these tokens on the Access Tokens page on Huggingface. For more information, refer to the Huggingface documentation.

  • Open DKubeX terminal and export the following environment variables. Replace <username> with your DKubeX username, and <access-token> with your Huggingface access token.

    export HOMEDIR=/home/<username>
    export HF_TOKEN=<access-token>
    
  • To download the sample data that will be used to finetune this model run the following command on the terminal.

    wget -P ${HOMEDIR}/ https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.7.1/rag/finetuning/sample.json
    

Once the prerequisites are satisfied, click on the appropriate link provided below to get started with finetuning an embedding model on DKubeX.

With Default Configuration

Tutorial regarding finetuning embedding models with default configuration provided by DKubeX.

./emb-finetuning.html#emb-ft-default
With Custom Configuration

Tutorial regarding finetuning embedding models with custom configuration provided by user.

./emb-finetuning.html#emb-ft-custom

Finetuning Embedding Models with Default Configuration

We will finetune the BAAI/bge-large-en-v1.5 model in this tutorial. The finetuning configuration is already provided in the DKubeX finetuning catalog.

  • To check the list of embedding model finetuning configurations provided by DKubeX, run the following command:

    d3x ft list-configs --kind flagemb
    
  • To check the provided finetuning configuration run the following command. Replace the <emb-config-name> part with the name of the embedding model finetuning configuration you want to check. For this example, to check the BAAI-bge-large-en-v1.5 configuration, run the command provided in the example below.

    d3x ft get-config --name <emb-config-name> --kind flagemb
    
  • To trigger the finetuning process with the default configuration, run the following command. Replace the following in the command:

    Variable

    Replace with

    <ft-run-name>

    Unique finetuning run name which was not used before.

    <emb-config-name>

    Name of the embedding model finetuning configuration you want to use.

    <ft-data-path>

    Absolute path to the training data.

    <hf-token>

    Huggingface access token.

    For this example, to finetune the BAAI/bge-large-en-v1.5 model, run the command provided in the example below.

    d3x ft finetune --name <ft-run-name> --config <emb-config-name> --gpu 1 --hf-token <hf-token> --kind flagemb --type a10 --train_data=<ft-data-path>
    

    Note

    In case you are using an AWS EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the command.

  • Once the finetuning run goes into succeeded state, open MLFlow on DKubeX workspace, and open the experiment corresponding to the finetuning run to view the finetuning run metrics and artifacts, along with the recorded finetuned model checkpoint. The experiment name in MLFlow will be same as the finetuning run name (bge-large-ft for this example).

Once the model finetuning is completed, you can proceed to the following link for tutorial regarding merging finetuned model checkpoints.

Merging Finetuned Model

Tutorial regarding merging finetuned model checkpoints.

./merge.html

Finetuning Embedding Models with Custom Configuration

We will finetune the BAAI/bge-m3 model in this tutorial with a user-provided custom finetuning configuration.

  • For this example, we will need to provide a custom finetuning configuration file. To download the file to be used for this example on your workspace, run the following command on the DKubeX terminal.

    wget -P ${HOMEDIR}/ https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.7.1/rag/finetuning/ft-configs/flagemb/BAAI-bge-m3-ft-config.yaml
    
  • To trigger the finetuning process with the custom configuration, run the following command. Replace the following in the command:

    Variable

    Replace with

    <ft-run-name>

    Unique finetuning run name which was not used before.

    <emb-config-path>

    Absolute path to the custom finetuning configuration file.

    <ft-data-path>

    Absolute path to the training data.

    <hf-token>

    Huggingface access token.

    For this example, to finetune the BAAI/bge-m3 model using the custom configuration, run the command provided in the example below.

    d3x ft finetune --name <ft-run-name> --config <emb-config-path> --gpu 1 --hf-token <hf-token> --kind flagemb --type a10 --train_data=<ft-data-path>
    

    Note

    In case you are using an AWS EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the command.

  • Once the finetuning run goes into succeeded state, open MLFlow on DKubeX workspace, and open the experiment corresponding to the finetuning run to view the finetuning run metrics and artifacts, along with the recorded finetuned model checkpoint. The experiment name in MLFlow will be same as the finetuning run name (bge-m3-ft for this example).

Once the model finetuning is completed, you can proceed to the following link for tutorial regarding merging finetuned model checkpoints.

Merging Finetuned Model

Tutorial regarding merging finetuned model checkpoints.

./merge.html