Deploying a RAG-based Chatbot

You can deploy a RAG-based chat application from DKubeX UI by going through the following guide.

Prerequisites

  • You should have a dataset pre-ingested which will be used to create the RAG application. For the steps to ingest a dataset, refer to the Data Ingestion and Creating Datasets guide.

  • If you are going to use custom deployments for your chat application, the embedding model deployment and LLM deployment should be created beforehand and be in running state. For the steps to create a deployment, refer to the Deploying Embedding Models on DKubeX and Deploying LLMs in DKubeX guides.

  • If you are going to use SecureLLM along with your chat application, you should have a SecureLLM application key created beforehand. For the steps to create a SecureLLM application key, refer to the SecureLLM guide.

Configuring and deploying the RAG Application

To configure and deploy the RAG-based chat application, follow the steps below:

  • On the DKubeX UI, click on the Applications button on the left sidebar. The Applications page will be displayed.

  • Click on the + button on the top left of the Applications window. This is the create application button. The Application Create window will open.

  • On the General page, under the General Configuration section, provide the following details:

    RAG Application - General Configuration

    Field

    Description

    Name

    Provide a name for the RAG application.

    Description

    (OPTIONAL) Provide a description for the RAG application.

  • Once done, click on the Next button to go to the Image Configuration page.

    • Here, under the Image Configuration section, provide the following details:

      RAG Application - Image Configuration

      Field

      Description

      Type

      Select the type of the application you are going to deploy. For RAG-based chat applications, select Secure Chat, or to deploy a custom application, select Custom.

      Note

      If you select Custom as the type of the application, you will need to provide the docker image and account details in the respective fields that shows up.

    • Under the Access Control section, provide the following details:

      RAG Application - Access Control

      Field

      Description

      Secure Access Key

      Provide a custom password which will be used during logging in to the RAG application.

      Publish

      Select this option to make the RAG application available for all users on the DKubeX setup. If you do not select this option, the RAG application will be available only for you.

    • Under the Resource Settings dropdown, provide the following details:

      RAG Application - Resource Settings

      Field

      Description

      CPU Count

      Number of CPUs to be allocated for the RAG application. The default value is 1.

      GPU Count

      Number of GPUs to be allocated for the RAG application. The default value is 0.

      Memory

      Amount of memory to be allocated for the RAG application. The default value is 4.

      Port

      Port number on which the RAG application will be running. The default value is 3000.

      Mount Home

      Select the location you want to be mounted. The default value is All.

      MLFlow Experiment

      (OPTIONAL) Provide an unique MLFlow experiment in which you want to log the RAG records.

      Cache Enabled

      (OPTIONAL) Select this option to enable caching for the RAG application. This improves performance and response time by storing responses temporarily for repeated identical requests.

      Rewrite Target

      (OPTIONAL) Select this option to rewrite the target of the RAG application. This helps to modify the path of incoming requests before routing them to the backend service (usually your LLM or inference server).

    • In the Environment Variables section, you can define environment variables by clicking on + Add ENV, and then entering the environment variable in the format of key and value in the respective fields.

  • Once done, click on the Next button to go to the Advanced page.

    • Here, under the Advanced Configuration section, provide the following details:

      RAG Application - Advanced Configuration

      Field

      Description

      Enable RAG

      Select this option to enable the RAG functionality for the application to be deployed. For Secure Chat applications, this option is enabled by default.

      Dataset Name

      Select the dataset that you want to use for the RAG application from the dropdown. The dataset should be pre-ingested before deploying the RAG application.

      Secure LLM

      (OPTIONAL) If you want to use SecureLLM with the RAG application, provide the SecureLLM application key in this field. If you do not want to use SecureLLM, leave this field blank.

      Use Adjacent Chunks

      (OPTIONAL) Select this option to use adjacent chunks for the RAG application. This helps to improve the performance of the RAG application by using adjacent chunks of data.

    • Under the Embedding section, provide the following details:

      RAG Application - Embedding Configuration

      Field

      Description

      Provider

      Select the embedding model provider to be used from the dropdown. The available options are DkubeX, Sky, and OpenAI.

      Model

      Select the embedding model/deployment to be used from the dropdown.

      Key

      If you have selected OpenAI as the embedding model provider, provide the OpenAI API key in this field.

    • Under the Retrieval section, provide the following details:

      RAG Application - Retrieval Configuration

      Field

      Description

      Provider

      Select the retrieval LLMprovider to be used from the dropdown. The available options are DkubeX, Sky, and OpenAI.

      Model

      Select the retrieval LLM/model deployment to be used from the dropdown.

      Key

      If you have selected OpenAI as the retrieval LLM/provider, provide the OpenAI API key in this field.

    • Under the Vector Store Settings section, provide the following details:

      RAG Application - Vector Store Configuration

      Field

      Description

      Vector Store Provider

      Select the vector store provider to be used from the dropdown. By default, dkubex will be selected.

      Vector Store

      Select the vector store to be used from the dropdown. By default, weaviate_vectorstore will be selected.

      URL

      (OPTIONAL) Provide the URL of the vector store to be used.

      Auth Key

      (OPTIONAL) Provide the authentication key for the vector store to be used.

      Text Key

      (OPTIONAL) Provide the text key for the vector store to be used.

    • Under the Search section, provide the following details:

      RAG Application - Search Configuration

      Field

      Description

      Search Type

      Select the vector search type to be used from the dropdown. By default, vector_search will be selected.

      Top K

      Provide the number of top results to be returned from the vector search. The default value is 3.

      Rerank

      (OPTIONAL) Select this option to enable reranking of the search results. This helps to improve the relevance of the search results by reordering them based on their relevance to the query.

    • Under the Query Settings section, provide the following details:

      RAG Application - Query Settings Configuration

      Field

      Description

      Batch Size

      Enter the batch size for query processing. The default value is 32.

      Acronym Expander

      (OPTIONAL) Select this option to enable acronym expansion in the query processing. This helps to expand acronyms in the query to their full form.

      Query Rewrite

      (OPTIONAL) Select this option to enable query rewriting. This helps to rewrite the query to improve the relevance of the search results.

      Note

      If you enable the Acronym Expander or Query Rewrite options, provide the relevant details in the form that appears.

    • Under the Synthesizer Settings section, provide the following details:

      RAG Application - Synthesizer Settings Configuration

      Field

      Description

      Chat Memory Tokens

      Enter the number of chat memory tokens to be used for the synthesizer. The default value is 6000.

      Max Tokens

      Enter the maximum number of tokens to be used for the synthesizer. The default value is 1024.

      Window Size

      Enter the window size for the synthesizer. The default value is 2.

      User Prompt

      Enter custom user prompt to be used for the synthesizer. The default value is default.

      System Prompt

      (OPTIONAL) Enter custom system prompt to be used for the synthesizer.

      Use Adjacent Chunks

      (OPTIONAL) Select this option to use adjacent chunks for the synthesizer. This helps to improve the performance of the synthesizer by using adjacent chunks of data.

    • Under the FAQ Settings section, provide the following details:

      RAG Application - FAQ Settings Configuration

      Field

      Description

      FAQ

      (OPTIONAL) Select this option to enable FAQ functionality in the RAG application. This helps to provide answers to frequently asked questions.

  • Once done, click on the Submit button on the bottom right of the page. The RAG application will be created and deployed on the DKubeX platform. You can open and manage the application from the Applications page.