Query Rewriting in RAG Applications

Creating an AI assistant that generate helpful answers from a knowledge base is a complex problem. A significant hurdle is the frequent mismatch between how users ask questions and how information is structured within the data. Most people struggle to ask good questions. This often results in irrelevant or incomplete answers, frustrating users.

As builders of these systems we should not expect users to write well-crafted queries. In our application, we have implemented query rewriting to rephrase user queries to better align with the underlying data. This has dramatically improved the accuracy and helpfulness of our AI assistant responses.

In this post I will share details on how we implemented query rewriting in our application. We will end the post by looking at how popular open source systems do query rewriting.

You can hear this blog in podcast format here – https://notebooklm.google.com/notebook/ed6e648e-c95c-4ad8-88a2-767be02c7c4d/audio

Query Rewriting

Following is a list of query rewriting transformations we perform to retrieve relevant information and generate helpful responses.

1. Leveraging Conversation History for Contextual Queries

Imagine a user asks, "case study on pre-approved loans." We rewrite this using the chat history to: "Can you provide information on case studies involving pre-approved loans?"

Similarly, if a user follows up with "in travel" after asking about "what work we have done in retail?" , we rewrite it to: "What work have we done in travel?" This ensures the AI assistant understands the user’s intent in follow-up messages.

2. Transforming Keyword Searches into Meaningful queries

Many users approach AI assistants like Google search, throwing in a few keywords and expecting an answer. RAG systems require more specific queries for optimal performance.

The Data Speaks: We found 20% of user queries are under five words. These initial queries are crucial – a poor experience here might lead to user abandonment.

Here’s how we transform keyword searches:

"Scala" becomes: "What work have we done using Scala?"
"CapitalOne" becomes: "What work have we done for Capital One?"
"Hyperpersonalization" becomes: "What work have we done for our customers in Hyperpersonalization?"

This not only aids retrieval but also improves answer generation by providing context for the LLM.

3. Expanding Context-Specific Abbreviations

Users often employ abbreviations in their queries. Expanding these improves retrieval accuracy:

"Share IA capability deck" becomes: "Share the Intelligent Automation (IA) capability deck."
"latest MX deck" becomes: "What are the latest offerings and capabilities in the Mendix (MX) deck?"

4. Enriching Short Queries with Entity Background

For short keyword queries, adding a one-liner about the entity enhances results and facilitates informative “I don’t know” responses.

"Old National Bank" becomes: "What work have we done for Old National Bank? Old National Bank is a regional bank in the United States, providing a range of financial services."
"Blue Prism" becomes: "What work have we done with Blue Prism? Blue Prism is a leading robotic process automation (RPA) software company that provides a digital workforce for automating business processes."

5. Pre-Built Queries for Common Keywords

Xebia offers many solution accelerators. They are listed on our website. When a user enters a keyword like "xflake" they likely expect a writeup on it. For such queries, we bypass the LLM and use a pre-built query like: "Write a 500-word write up on XFlake solution accelerator." This ensures a consistent and informative answer.

Prompt for Query Rewriting

Below is the prompt we use for Query Rewriting

# IDENTITY and PURPOSE
You are expert at rewriting questions. You are given a conversation between a human and an assistant and a follow-up query from human, you have to rewrite the message to a standalone question that captures all relevant context from the conversation.

Take a deep breath and think step by step about how to best accomplish this goal using the following steps.

# STEPS
- Consume the entire conversation and follow-up message and think deeply about it.
- Expand the abbreviations. For example, ${commonAbbrevations}
- Fix common typos in the generated standalone question. For example, xebec is Xebia.
- If the user query is about a ${solutionAccelerators} then make sure to add "solution accelerator" in the standalone question
- If the user query is about a company or organization or technology then include a one line about it in the generated standalone question

# EXAMPLES

A set of examples

# CHAT HISTORY

${chatHistory}

# INPUT

User query: ${query}
Standalone question:

We do the obvious steps like trimming the chat history to last N messages. In our case N is 10. We also check for context length limits and drop old messages if required.

We use gpt-3.5-turbo model for query rewriting and limit max response tokens to 250. This keeps latency low.

How open source RAG systems are rewriting queries?

It is always a good idea to see how others are solving the same problem. So, I looked at source code of three popular open source systems to understand how they perform query rewriting. I found that they are also following a similar approach.

Quivr: Quivr is a popular open-source RAG Framework for building GenAI Second Brains. Follow is the prompt they use for generating standalone questions.

Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language. Keep as much details as possible from previous messages. Keep entity names and all.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:

Important points to note in the prompt are:

One useful instruction they add is Keep entity names and all since you don’t want standalone question to lose the important details.

Second system I looked at is Azure Search OpenAI Demo. It is a sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.

They use the following prompt to generate search query for Azure AI search database.

Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base.
You have access to Azure AI Search index with 100's of documents.
Generate a search query based on the conversation and the new question.
Do not include cited source filenames and document names e.g info.txt or doc.pdf in the search query terms.
Do not include any text inside [] or <<>> in the search query terms.
Do not include any special characters like '+'.
If the question is not in English, translate the question to English before generating the search query.
If you cannot generate a search query, return just the number 0.

Important points to note in the prompt are:

Include reference to the search system they will be retrieving the documents
Translate non-English queries to English
Special failure scenario handling in case LLM is unable to generate search query. They return 0 for such failure scenarios.

RAGFlow is another open-source RAG (Retrieval-Augmented Generation) engine. Below is the prompt they use.

You are an expert at query expansion to generate a paraphrasing of a question.
I can't retrieval relevant information from the knowledge base by using user's question directly.     
You need to expand or paraphrase user's question by multiple ways such as using synonyms words/phrase, 
writing the abbreviation in its entirety, adding some extra descriptions or explanations, 
changing the way of expression, translating the original question into another language (English/Chinese), etc. 
And return 5 versions of question and one is from translation.
Just list the question. No other words are needed.

Important points to note in the prompt are:

They also do add special instructions to handle abbreviations and add extra details to the rephrased query
Just like Azure Search OpenAI Demo they also do language translation
They return 5 versions of the question instead of 1

Conclusion

Query rewriting is essential for effective RAG applications. It bridges the gap between user intent and the language of knowledge sources, improving retrieval and answer generation.

We explored techniques used in our presales AI assistant, like leveraging conversation history and enriching short queries. These techniques ensure accurate retrieval and empower the LLM to generate informative answers.

We also analyzed how open-source RAG systems approach query rewriting, gaining valuable insights for improvement.

By continuously refining our strategies, we can ensure a seamless user experience with RAG applications.

Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

Query Rewriting

Prompt for Query Rewriting

How open source RAG systems are rewriting queries?

Conclusion

Discover more from Shekhar Gulati

Share this:

Related

Leave a comment Cancel reply