Evaluating LLMs¶

In this tutorial, we will evaluate the performance of a LLM. For this example, the Llama3.8B model will be used.

Prerequisites¶

You need to ingest your data corpus and create a dataset from it. You can refer to the Data ingestion and creating dataset tutorial for more information on how to do this.
- The dataset name used in this tutorial is contracts.

You need to deploy the BGE-Large embedding model on DKubeX. To learn how to deploy an embedding model on DKubeX, refer to the Deploying Embedding Models on DKubeX tutorial.
- The name of the BGE-Large deployment used in this tutorial is bge-large.
You need to deploy the Llama3-8B LLM model on DKubeX. To learn how to deploy a LLM on DKubeX, refer to the Deploying LLMs in DKubeX tutorial.
- The names of the Llama3-8B deployment used in this tutorial is llama38bbase.

Create a combined SecureLLM application key with the embedding model and LLM deployments which will be used in the evaluation process. The steps are provided below.
- From the DKubeX UI, open and log into the SecureLLM application. Once open, click on the Admin Login button and log in using the admin credentials provided during installation.
  
  Hint
  
  In case you do not have the credentials for logging in to SecureLLM, please contact your administrator.
- On the left sidebar, click on the Keys menu and go to the Application Keys tab on that page.
- On the API key name field, provide a unique name for the key to be created.
- From the LLM Keys dropdown list, select DKUBEX.
- From the Models dropdown list, select your deployed models (bge-large and llama38base).
- Click on the Generate Key button.
- A pop-up window will show up on your screen containing the application key for your new application. Alternatively, you can also access your application key from the list of keys in the Application Key tab. Copy this application key for further use, as it will be required to create the chatbot application. Also make sure that you are copying the entire key including the sk- part.
Export the following variable to your workspace by running the following commands on your DKubeX Terminal. Replace the <username> part with your DKubeX workspace name.
```
export HOMEDIR=/home/<username>
```

A few files are required to be used in the evaluation process. Run the following commands to download these files to your workspace.

wget https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.8.1/rag/evaluation/eval.yaml -P ${HOMEDIR}/
wget https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.8.1/rag/evaluation/query-eval.yaml -P ${HOMEDIR}/
wget https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.8.1/rag/query/additional/default_system_prompt.txt -P ${HOMEDIR}/
wget https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.8.1/rag/query/additional/default_user_prompt.yaml -P ${HOMEDIR}/

Evaluating LLM Models¶

To evaluate the Llama3-8B model, follow the steps provided below:

Provide the appropriate details on the query-eval.yaml file which will be used during the evaluation process. Run vim ${HOMEDIR}/query-eval.yaml on DKubeX Terminal and provide the following details:
- On the dataset field, provide the name of the dataset you created earlier, i.e. contracts.
- On the embedding field, provide the type of the embedding model used for ingestion, i.e. dkubex.
- In the synthesizer section, provide the following details:
  - On the llm field, make sure dkubex is selected.
  - On the llm_url field, provide the endpoint URL of the deployed model (llama38base) to be used. The endpoint URL can be found on the Deployments page of DKubeX UI.
  - On the llm_key field, provide the serving token for the deployed model (llama38base) to be used. To find the serving token, go to the Deployments page of DKubeX UI and click on the deployed model name. The serving token will be available on the model details page.
- Under the Embedding Model config section, uncomment the entire dkubex section. Here the details of the embedding model to be used (bge-large) is provided.
  - In the embedding_url field, provide the serving endpoint of the deployment. You can find this by going to the Deployments page in DKubeX UI and clicking on the deployed model name. The serving endpoint will be available on the model details page.
  - In the embedding_key field, provide the serving token for the deployed model to be used. To find the serving token, go to the Deployments page in DKubeX UI and click on the deployed model name. The serving token will be available on the model details page.
- In the securellm section, provide the following details:
  - On the appkey field, provide the application key that you created earlier on the SecureLLM application.
  - On the dkubex_url field, provide the URL to access DKubeX.
- Under query > post_processor section, make sure query_rewrite is uncommented. Provide the following details in the query_rewrite section:
  - On the system_prompt field, provide the absolute path to the default_system_prompt.txt file. In this case it will be /home/<username>/default_system_prompt.txt, where <username> is the name of your DKubeX workspace.
  - On the user_prompt field, provide the absolute path to the default_user_prompt.yaml file. In this case it will be /home/<username>/default_user_prompt.yaml, where <username> is the name of your DKubeX workspace.
Provide the appropriate details on the eval.yaml file which will be used during the evaluation process. Run vim ${HOMEDIR}/eval.yaml and provide the following details:
- On the dataset field, provide the name of the dataset you created earlier, i.e. contracts.
- Make sure generate_ground_truth is set to true and extract_ground_truth is set to false. This will allow you to generate ground truth with the deployed LLM.
- Under the questions_generator section, provide the following details:
  - On the llm field, provide openai.
  - On the llm_url field, keep it blank.
  - On the llm_key field, provide the OpenAI API key.
- On the rag_configuration field, provide the absolute path to the RAG config (query-eval.yaml) file. In this case it will be /home/<username>/query-eval.yaml, where <username> is the name of your DKubeX workspace.
- Under the semantic_similarity_evaluator section, provide the following details.
  - On the embedding_provider section, provide dkubex.
  - On the embedding_url field, provide the endpoint URL of the deployed model to be used. The endpoint URL can be found on the Deployments page of DKubeX UI.
  - On the embedding_key field, provide the serving token for the deployed model to be used. To find the serving token, go to the Deployments page of DKubeX UI and click on the deployed model name. The serving token will be available on the model details page.
- Under the correctness_evaluator section, provide the following details:
  - On the llm field, provide dkubex.
  - On the llm_key field, provide the serving token for the deployed model to be used. To find the serving token, go to the Deployments page of DKubeX UI and click on the deployed model name. The serving token will be available on the model details page.
  - On the llm_url field, provide the endpoint URL of the deployed model to be used. The endpoint URL can be found on the Deployments page of DKubeX UI.
- On the tracking section, provide a unique name for the MLFlow experiment, allowing for tracking and comparison of different runs of the pipeline.
Once done, run the following command to start the evaluation process of the Llama3-8B model.
```
d3x dataset evaluate --config ${HOMEDIR}/eval.yaml
```
Once the evaluation run is finished, you can open MLFlow on DKubeX workspace to view the results. The run results will be available under the experiment name you provided in the tracking section of the eval.yaml file.