Finetuning LLMs

In this tutorial we will go through the steps of finetuning an LLM locally on DKubeX. You can finetune the LLM using the default configurations provided in the DKubeX finetuning catalog, or you can provide your own custom configuration to finetune the LLM.

Prerequisites

  • Make sure that at least one of the worker nodes running your cluster running DKubeX contains an NVIDIA A10 GPU (with minimum resource of AWS- g5.4xlarge type instance, with at least 16 vCPU cores and 64 GiB of memory).

    • In case of a RKE2 cluster, make sure the node is labeled as a10 during installation.

      Command to label a node as a10 on a RKE2 cluster
      Command
      kubectl label node <node-name> node.kubernetes.io/nodetype=a10
      
      Example
      kubectl label node demo-worker-node node.kubernetes.io/nodetype=a10
      
    • In case of an AWS EKS cluster, make sure that the cluster contains a g5.4xlarge type nodegroup with maximum size of 1 or more.

  • Make sure that you have an active Huggingface access token which has access to the model you are finetuning. For this example you need to have an active access token for the meta-llama/Meta-Llama-3-8B-Instruct and meta-llama/Llama-3.1-8B-Instruct models on Huggingface. You can generate these tokens on the Access Tokens page on Huggingface. For more information, refer to the Huggingface documentation.

  • Open DKubeX terminal and export the following environment variables. Replace <username> with your DKubeX username, and <access-token> with your Huggingface access token.

    export HOMEDIR=/home/<username>
    export HF_TOKEN=<access-token>
    
  • To download the sample data that will be used to finetune this model run the following command on the terminal.

    wget -P ${HOMEDIR}/ https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.7.1/rag/finetuning/sample.json
    

Once the prerequisites are satisfied, click on the appropriate link provided below to get started with finetuning an LLM on DKubeX.

With Default Configuration

Tutorial regarding finetuning LLMs with default configuration provided by DKubeX.

./llm-finetuning.html#llm-ft-default
With Custom Configuration

Tutorial regarding finetuning LLMs with custom configuration provided by user.

./llm-finetuning.html#llm-ft-custom

Finetuning LLMs with Default Configuration

We will finetune the meta-llama/Meta-Llama-3-8B-Instruct model in this tutorial. The finetuning configuration is already provided in the DKubeX finetuning catalog.

  • To check the list of LLM finetuning configurations provided by DKubeX, run the following command:

    d3x ft list-configs --kind llm
    
  • To check the provided finetuning configuration run the following command. Replace the <llm-config-name> part with the name of the LLM finetuning configuration you want to check. For this example, to check the llama-3-qlora configuration, run the command provided in the example below.

    d3x ft get-config --name <llm-config-name> --kind llm
    
  • To trigger the finetuning process with the default configuration, run the following command. Replace the following in the command:

    Variable

    Replace with

    <ft-run-name>

    Unique finetuning run name which was not used before.

    <llm-config-name>

    Name of the LLM finetuning configuration you want to use.

    <ft-data-path>

    Absolute path to the training data.

    <hf-token>

    Huggingface access token.

    For this example, to finetune the meta-llama/Meta-Llama-3-8B-Instruct model, run the command provided in the example below.

    d3x ft finetune --name <ft-run-name> --config <llm-config-name> --gpu 1 --hf-token <hf-token> --kind llm --type a10 --train_data=<ft-data-path>
    

    Note

    In case you are using an AWS EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the command.

  • Once the finetuning run goes into succeeded state, open MLFlow on DKubeX workspace, and open the experiment corresponding to the finetuning run to view the finetuning run metrics and artifacts, along with the recorded finetuned model checkpoint. The experiment name in MLFlow will be same as the finetuning run name (llama-3-ft for this example).

Once the model finetuning is completed, you can proceed to the following link for tutorial regarding merging finetuned model checkpoints.

Merging Finetuned Model

Tutorial regarding merging finetuned model checkpoints.

./merge.html

Finetuning LLMs with Custom Configuration

We will finetune the meta-llama/Llama-3.1-8B-Instruct model in this tutorial with a user-provided custom finetuning configuration.

  • For this example, we will need to provide a custom finetuning configuration file. To download the file to be used for this example on your workspace, run the following command on the DKubeX terminal.

    wget -P ${HOMEDIR}/ https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.7.1/rag/finetuning/ft-configs/llm/llama-3.1-ft-config.yaml
    
  • To trigger the finetuning process with the custom configuration, run the following command. Replace the following in the command:

    Variable

    Replace with

    <ft-run-name>

    Unique finetuning run name which was not used before.

    <llm-config-path>

    Absolute path to the custom finetuning configuration file.

    <ft-data-path>

    Absolute path to the training data.

    <hf-token>

    Huggingface access token.

    For this example, to finetune the meta-llama/Llama-3.1-8B-Instruct model using the custom configuration, run the command provided in the example below.

    d3x ft finetune --name <ft-run-name> --config <llm-config-path> --gpu 1 --hf-token <hf-token> --kind llm --type a10 --train_data=<ft-data-path>
    

    Note

    In case you are using an AWS EKS setup, please change the value of the flag --type from a10 to g5.4xlarge in the command.

  • Once the finetuning run goes into succeeded state, open MLFlow on DKubeX workspace, and open the experiment corresponding to the finetuning run to view the finetuning run metrics and artifacts, along with the recorded finetuned model checkpoint. The experiment name in MLFlow will be same as the finetuning run name (llama-31-ft for this example).

Once the model finetuning is completed, you can proceed to the following link for tutorial regarding merging finetuned model checkpoints.

Merging Finetuned Model

Tutorial regarding merging finetuned model checkpoints.

./merge.html