Finetuning Open Source LLMs

Fine-tuning Large Language Models (LLMs) refers to the process of retraining a pre-trained language model on a specific task or dataset to adapt it for a particular application. In this example, the Llama2-7b model is finetuned using data-chunks from a set of documents.

Prerequisites

  • This workflow requires an a10 GPU node. Make sure your cluster is equipped with such.

  • You need to complete the ingestion for your dataset which will be recorded on MLFlow.

  • On the Terminal application on DKubeX, export the following variables to your workspace by running the following commands on your terminal.

    • Replace the <your huggingface token> part with your Huggingface token, and <username> with your DKubeX workspace name.

      Hint

      Use the following steps to find your DKubeX API key:

      • Open the DKubeX UI and click on your username on upper-right corner of the UI.

      • Click on the API Key option from the dropdown menu. A pop-up dialog box containing your DKubeX API key will open. Copy and note down this key.

      export HF_TOKEN="<your huggingface token>"
      export NAMESPACE="<username>"
      export HOMEDIR=/home/${NAMESPACE}
      

Generating chunks for finetuning

  • You will need to use a custom python script for getting the chunks to finetune your LLM model. Run the following command to pull the script to your workspace.

    cd && git clone -b v0.8.4.1 https://github.com/dkubeio/dkubex-examples.git
    cp dkubex-examples/rag/finetuning/extract_chunks.py ${HOMEDIR}/extract_chunks.py
    
  • Generate the chunks using the following command. Replace the <ingestion run ID on MLFlow> part with the run ID of the ingestion run for your dataset on the MLFlow application.

    python3 ${HOMEDIR}/extract_chunks.py --experiment_name chunk-generation --run_id <ingestion run ID on MLFlow> -d ${HOMEDIR}/chunks_for_finetuning/
    

Finetuning workflow

  • Train the LLM with the chunks generated earlier using the following command.

d3x fm tune model finetune -j <name of finetuning job> -e <number of epochs> -b <batch size> -l <training chunks folder path> -o <output folder path> -c <CPU> -m <memory> -g <GPU> -t <instance type> -n <name of model to be finetuned> --ctx-len <context length>

Note

In case of setups brought up on a Rancher cluster, the -t or --type option in this command denotes the node or instance type which you have provided in the Installing DKubeX section.

Attention

  • The time taken by the finetuning process depends on the size of the dataset. Please wait patiently for the process to complete.

  • In case the terminal shows a timed-out error, that means the finetuning is still in progress, and you will need to run the command provided on the CLI after the error message to continue to get the finetuning logs.

  • You will need the absolute path to the finetuned model checkpoint to merge the finetuned model with the base model. Use the following command to get the absolute path to the finetuned model checkpoint. Replace <model_name> with the full name of the model being finetuned.

    echo ${HOMEDIR}/<output folder path for finetuned model>/<model_name>/TorchTrainer_*/TorchTrainer_*/checkpoint*/
    
    • Export the absolute path to the finetuned model checkpoint to be used during the merge process with the following command. export the <checkpoint absolute path> part with the absolute path to the finetuned model checkpoint you got in the previous step.

      export CHECKPOINT="<checkpoint absolute path>"
      
  • Merge the finetuned model checkpoint with the base model to create the final finetuned model using the following command:

    d3x fm tune model merge -j <merge job name> -n <full HF path to the base model> -cp <absolute path to the finetuned checkpoint> -o <absolute path to merged finetuned model output folder>
    
  • To quantize the finetuned model, use the following command:

    d3x fm tune model quantize -j <quantization job name> -p <absolute path to merged finetuned model> -o <absolute path to quantized model output folder>
    

    Attention

    • The time taken by the quantization process depends on the size of the dataset. Please wait patiently for the process to complete.

    • In case the terminal shows a timed-out error, that means the quantization is still in progress, and you will need to run the command provided on the CLI after the error message to continue to get the quantization logs.