Finetuning Embedding Models¶
In this tutorial we will go through the steps of finetuning an embedding model locally on DKubeX. You can finetune the embedding model using the default configurations provided in the DKubeX finetuning catalog, or you can provide your own custom configuration to finetune the embedding model.
Prerequisites¶
Make sure that at least one of the worker nodes running your cluster running DKubeX contains an NVIDIA A10 GPU (with minimum resource of AWS- g5.4xlarge type instance, with at least 16 vCPU cores and 64 GiB of memory).
In case of a RKE2 cluster, make sure the node is labeled as
a10
during installation.In case of an AWS EKS cluster, make sure that the cluster contains a
g5.4xlarge
type nodegroup with maximum size of 1 or more.
Make sure that you have an active Huggingface access token which has access to the model you are finetuning. For this example you need to have an active access token for the BAAI/bge-large-en-v1.5 and BAAI/bge-m3 models on Huggingface. You can generate these tokens on the Access Tokens page on Huggingface. For more information, refer to the Huggingface documentation.
Open DKubeX terminal and export the following environment variables. Replace
<username>
with your DKubeX username, and<access-token>
with your Huggingface access token.export HOMEDIR=/home/<username> export HF_TOKEN=<access-token>
export HOMEDIR=/home/demo export HF_TOKEN=hf_aJ0eX**************WJlIn0
To download the sample data that will be used to finetune this model run the following command on the terminal.
wget -P ${HOMEDIR}/ https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.7.1/rag/finetuning/sample.json
Once the prerequisites are satisfied, click on the appropriate link provided below to get started with finetuning an embedding model on DKubeX.
Tutorial regarding finetuning embedding models with default configuration provided by DKubeX.
Tutorial regarding finetuning embedding models with custom configuration provided by user.
Finetuning Embedding Models with Default Configuration¶
We will finetune the BAAI/bge-large-en-v1.5 model in this tutorial. The finetuning configuration is already provided in the DKubeX finetuning catalog.
To check the list of embedding model finetuning configurations provided by DKubeX, run the following command:
d3x ft list-configs --kind flagemb
To check the provided finetuning configuration run the following command. Replace the
<emb-config-name>
part with the name of the embedding model finetuning configuration you want to check. For this example, to check theBAAI-bge-large-en-v1.5
configuration, run the command provided in the example below.d3x ft get-config --name <emb-config-name> --kind flagemb
d3x ft get-config --name BAAI-bge-large-en-v1.5 --kind flagemb
To trigger the finetuning process with the default configuration, run the following command. Replace the following in the command:
Variable
Replace with
<ft-run-name>
Unique finetuning run name which was not used before.
<emb-config-name>
Name of the embedding model finetuning configuration you want to use.
<ft-data-path>
Absolute path to the training data.
<hf-token>
Huggingface access token.
For this example, to finetune the
BAAI/bge-large-en-v1.5
model, run the command provided in the example below.d3x ft finetune --name <ft-run-name> --config <emb-config-name> --gpu 1 --hf-token <hf-token> --kind flagemb --type a10 --train_data=<ft-data-path>
d3x ft finetune --name bge-large-ft --config BAAI-bge-large-en-v1.5 --gpu 1 --hf-token ${HF_TOKEN} --kind flagemb --type a10 --train_data=${HOMEDIR}/sample.json
Note
In case you are using an AWS EKS setup, please change the value of the flag
--type
froma10
tog5.4xlarge
in the command.Once the finetuning run goes into
succeeded
state, open MLFlow on DKubeX workspace, and open the experiment corresponding to the finetuning run to view the finetuning run metrics and artifacts, along with the recorded finetuned model checkpoint. The experiment name in MLFlow will be same as the finetuning run name (bge-large-ft
for this example).
Once the model finetuning is completed, you can proceed to the following link for tutorial regarding merging finetuned model checkpoints.
Tutorial regarding merging finetuned model checkpoints.
Finetuning Embedding Models with Custom Configuration¶
We will finetune the BAAI/bge-m3 model in this tutorial with a user-provided custom finetuning configuration.
For this example, we will need to provide a custom finetuning configuration file. To download the file to be used for this example on your workspace, run the following command on the DKubeX terminal.
wget -P ${HOMEDIR}/ https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.7.1/rag/finetuning/ft-configs/flagemb/BAAI-bge-m3-ft-config.yaml
To trigger the finetuning process with the custom configuration, run the following command. Replace the following in the command:
Variable
Replace with
<ft-run-name>
Unique finetuning run name which was not used before.
<emb-config-path>
Absolute path to the custom finetuning configuration file.
<ft-data-path>
Absolute path to the training data.
<hf-token>
Huggingface access token.
For this example, to finetune the
BAAI/bge-m3
model using the custom configuration, run the command provided in the example below.d3x ft finetune --name <ft-run-name> --config <emb-config-path> --gpu 1 --hf-token <hf-token> --kind flagemb --type a10 --train_data=<ft-data-path>
d3x ft finetune --name bge-m3-ft --config ${HOMEDIR}/BAAI-bge-m3-ft-config.yaml --gpu 1 --hf-token ${HF_TOKEN} --kind flagemb --type a10 --train_data=${HOMEDIR}/sample.json
Note
In case you are using an AWS EKS setup, please change the value of the flag
--type
froma10
tog5.4xlarge
in the command.Once the finetuning run goes into
succeeded
state, open MLFlow on DKubeX workspace, and open the experiment corresponding to the finetuning run to view the finetuning run metrics and artifacts, along with the recorded finetuned model checkpoint. The experiment name in MLFlow will be same as the finetuning run name (bge-m3-ft
for this example).
Once the model finetuning is completed, you can proceed to the following link for tutorial regarding merging finetuned model checkpoints.
Tutorial regarding merging finetuned model checkpoints.