Evaluating LLMs¶
In this tutorial, we will evaluate the performance of a LLM while comparing it to the performance of OpenAI. For this example, the Llama3.8B model will be used. .. In this tutorial, we will evaluate the performance of a base and finetuned LLM while comparing it to the performance of OpenAI. For this example, the base Llama2-7B and finetuned Llama2-7B models will be used.
Prerequisites¶
- You need to ingest your data corpus and create a dataset from it. You can refer to the Data ingestion and creating dataset tutorial for more information on how to do this. - The dataset name used in this tutorial is - contracts.
 
- You need to deploy the Llama3-8B model on DKubeX. To learn how to deploy a LLM on DKubeX, refer to the Deploying LLMs in DKubeX tutorial. - The names of the Llama3-8B deployment used in this tutorial ia - llama38bbase.
 
- Create a SecureLLM application key with the LLM deployment which will be used in the evaluation process. The steps are provided below. - From the DKubeX UI, open and log into the SecureLLM application. Once open, click on the Admin Login button and log in using the admin credentials provided during installation. - Hint - In case you do not have the credentials for logging in to SecureLLM, please contact your administrator. 
- On the left sidebar, click on the Keys menu and go to the Application Keys tab on that page. 
- To create a new key for your application, use the following steps: - On the API key name field, provide a unique name for the key to be created. 
- From the LLM Keys dropdown list, select DKUBEX. 
- From the Models dropdown list, select your deployed base model. 
- Click on the Generate Key button. 
 
- A pop-up window will show up on your screen containing the application key for your new application. Alternatively, you can also access your application key from the list of keys in the Application Key tab. - Copy this application key for further use, as it will be required to create the chatbot application. Also make sure that you are copying the entire key including the sk- part. 
 
 
- Export the following variables to your workspace by running the following commands on your DKubeX Terminal. - Replace the - <username>part with your DKubeX workspace name.- export NAMESPACE="<username>" export HOMEDIR=/home/${NAMESPACE} 
 
- A few .yaml files are required to be used in the evaluation process. - On the Terminal application in DKubeX UI, run the following commands: - wget https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.5.4.1/rag/evaluation/eval.yaml -P ${HOMEDIR}/ wget https://raw.githubusercontent.com/dkubeio/dkubex-examples/refs/tags/v0.8.5.4.1/rag/query/query.yaml -P ${HOMEDIR}/ 
 
Evaluating LLM Models¶
To evaluate the Llama3-8B model against OpenAI, follow the steps provided below:
- Provide the appropriate details on the - query.yamlfile which will be used during the evaluation process. Run- vim query.yamland provide the following details:- On the - datasetfield, provide the name of the dataset you created earlier, i.e.- contracts.
- On the - embeddingfield, provide the type of the embedding model used for ingestion, i.e.- huggingface.
- In the - synthesizersection, provide the following details:- On the - llmfield, make sure- dkubexis selected.
- On the - llm_urlfield, provide the endpoint URL of the deployed model to be used. The endpoint URL can be found on the Deployments page of DKubeX UI.
- On the - llm_keyfield, provide the serving token for the deployed model to be used. To find the serving token, go to the Deployments page of DKubeX UI and click on the deployed model name. The serving token will be available on the model details page.
 
- Under the - Embedding Model configsection, uncomment the entire- huggingfacesection. Here the name of the embedding model to be used is provided.
- In the - securellmsection, provide the following details:- On the - appkeyfield, provide the application key that you created earlier on the SecureLLM application.
- On the - dkubex_urlfield, provide the URL to access DKubeX.
 
 
- Provide the appropriate details on the - eval.yamlfile which will be used during the evaluation process. Run- vim eval.yamland provide the following details:- On the - datasetfield, provide the name of the dataset you created earlier, i.e.- contracts.
- Under the - questions_generatorsection, provide the following details:- On the - llmfield, provide- openai.
- On the - llm_urlfield, keep it blank.
- On the - llm_keyfield, provide the OpenAI API key.
 
- On the - rag_configurationfield, provide the absolute path to the RAG config (query.yaml) file. In this case it will be- /home/<username>/query.yaml, where- <username>is the name of your DKubeX workspace.
- Under the - correctness_evaluatorsection, provide the following details:- On the - llmfield, provide- dkubex.
- On the - llm_keyfield, provide the serving token for the deployed model to be used. To find the serving token, go to the Deployments page of DKubeX UI and click on the deployed model name. The serving token will be available on the model details page.
- On the - llm_urlfield, provide the endpoint URL of the deployed model to be used. The endpoint URL can be found on the Deployments page of DKubeX UI.
 
- On the - answer_relevancy_evaluatorsection, provide the following details:- On the - llmfield, provide- dkubex.
- On the - llm_keyfield, provide the serving token for the deployed model to be used. To find the serving token, go to the Deployments page of DKubeX UI and click on the deployed model name. The serving token will be available on the model details page.
- On the - llm_urlfield, provide the endpoint URL of the deployed model to be used. The endpoint URL can be found on the Deployments page of DKubeX UI.
 
- On the - retrieval_evaluatorsection, provide the following details:- On the - embedding_providerfield, provide- huggingface.
- On the - embedding_modelfield, provide the name of the embedding model used to create the dataset, i.e.- BAAI/bge-large-en-v1.5.
 
 
- Once done, run the following command to start the evaluation process of the Llama3-8B model. - d3x dataset evaluate --config ${HOMEDIR}/eval.yaml- d3x dataset evaluate --config ${HOMEDIR}/eval.yaml
