Adding Web Search to a LLM Chat Application

One common request from most customers I have helped build custom ChatGPTs for is the ability to support web search. Users want the capability to fetch relevant data from web searches directly within the chat interface. In this post, I will show you how you can quickly add web search functionality to a chat application. We will use Gradio to build a quick chatbot.

Gradio is an open-source Python library that allows you to build interactive web interfaces for machine learning models with ease. It simplifies the process of creating user-friendly applications by providing pre-built components and a straightforward API, enabling developers to focus on the core functionality of their projects without worrying about the intricacies of web development.

Let’s get started.

Create a Gradio App

Start by creating a new directory for your project. We will call it webchat. Create a new virtual environment.

Install Gradio and OpenAI’s Python package:

pip install --upgrade gradio
pip install openai

Make sure to set the OPENAI_API_KEY environment variable with your OpenAI API key.

Create a new file main.py and populate it with the following code:

import gradio as gr
from openai import OpenAI

client = OpenAI()


def chat_handler(message, history):
    history.append({"role": "user", "content": message})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
        ] + history,
        temperature=0.7,
        max_tokens=800,
        stream=True
    )

    partial_message = ""
    for chunk in response:
        if chunk.choices[0].delta.content is not None:
            partial_message = partial_message + chunk.choices[0].delta.content
            yield partial_message


gr.ChatInterface(chat_handler, type="messages").launch()

Code Explanation

Let’s break down the main.py file:

Imports and Initialization: import gradio as gr from openai import OpenAI client = OpenAI()
- gradio: Imported to create the web interface.
- OpenAI: Imported to interact with OpenAI’s API.
- client: An instance of the OpenAI client to make API calls.
Chat Handler Function: def chat_handler(message, history): history.append({"role": "user", "content": message})response = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "You are a helpful assistant."}, ] + history, temperature=0.7, max_tokens=800, stream=True ) partial_message = "" for chunk in response: if chunk.choices[0].delta.content is not None: partial_message = partial_message + chunk.choices[0].delta.content yield partial_message</code></pre> Parameters: message: The user's input message. history: The conversation history. Functionality: Appends the user's message to the conversation history. Sends a request to the OpenAI API to generate a response using the specified model. Streams the response back to the user incrementally for a more dynamic experience.
Launching the Gradio Interface:
python gr.ChatInterface(chat_handler, type="messages").launch()
- Initializes and launches the Gradio chat interface, connecting it to the chat_handler function.

You can run the app using:

python main.py

The above code sets up a simple chat application. You can ask any query, and it will make a call to OpenAI to generate the response.

Adding Web Search to the Chat Application

We will use SearXNG to add search capability to our chat app.

SearXNG is an open-source, privacy-respecting metasearch engine that aggregates results from multiple search services. It allows users to perform searches without tracking their activities, making it an excellent choice for integrating web search functionality into applications where user privacy is a priority. SearXNG is highly customizable and can be self-hosted, giving developers full control over the search experience.

To run SearXNG, we will use Docker. The following is the corresponding docker-compose.yml file:

services:
  searxng:
    image: docker.io/searxng/searxng:latest
    volumes:
      - ./searxng:/etc/searxng:rw
    ports:
      - 4000:8080
    networks:
      - webchat-nwk
    restart: unless-stopped
networks:
  webchat-nwk:

You can access the SearXNG UI at http://localhost:4000/.

Now, let’s integrate SearXNG with our application.

First, let’s add a checkbox in the UI:

gr.ChatInterface(chat_handler,
                 type="messages",
                 additional_inputs=[
                     gr.Checkbox(False, label="WebSearch Mode", render=True),
                 ]).launch()

This code adds a WebSearch Mode checkbox to the UI, allowing users to toggle web search functionality on or off.

We can then use the WebSearch Mode in our chat_handler function:

def chat_handler(message, history, websearch_mode):
    if websearch_mode:
        return handle_websearch(message, history)
    # same as previous

Handling Web Search Queries

Let’s implement the handle_websearch method that will use SearXNG. The first thing we do is rewrite the query to make it standalone.

def rewrite_query(message, history):
    history_str = "\n".join([f"{m['role']}: {m['content']}" for m in history])
    prompt = f"""You are an AI question rephraser. You will be given a conversation and a follow-up question. You will rephrase the follow-up question so it is a standalone question and can be used by another LLM to search the web for information to answer it.

<conversation>
    {history_str}
</conversation>
Follow up question: {message}
Rephrased question:
    """
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature=0
    )
    return response.choices[0].message.content

Code Explanation and Example

The rewrite_query function transforms a follow-up question into a standalone query suitable for web searching. Here’s how it works:

Concatenate Conversation History: history_str = "\n".join([f"{m['role']}: {m['content']}" for m in history])
- Combines the roles and contents of all messages in the conversation history into a single string.
Rewrite query: prompt = f"""You are an AI question rephraser. You will be given a conversation and a follow-up question. You will rephrase the follow-up question so it is a standalone question and can be used by another LLM to search the web for information to answer it. <conversation> {history_str} </conversation> Follow up question: {message} Rephrased question: """ response = client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "user", "content": prompt } ], temperature=0 )
- Sends the prompt to the OpenAI API to obtain the rephrased question.

Example:

Conversation History:

    user: How does Uber utilize AI in its operations?
    assistant: Uber uses AI for various purposes including route optimization, demand prediction, and enhancing user experience.

Follow-up Question:
user: How Uber is using generative AI in software engineering
Rephrased Question:
How is Uber utilizing generative AI in its software engineering processes?

Searching the Web with SearXNG

Next, we will pass the rewritten query to SearXNG to perform a web search.

def handle_websearch(message, history):
    rewritten_query = rewrite_query(message, history)
    print("Rewritten Query: ", rewritten_query)
    search_results = search_searxng(rewritten_query)
    print("Search Results", len(search_results))

Explanation of `search_searxng` Function

def search_searxng(query):
    response = requests.get("http://localhost:4000/search", params={
        "format": "json",
        "q": query
    })
    data = response.json()
    return data["results"]

The search_searxng function performs the following steps:

Send HTTP GET Request: response = requests.get("http://localhost:4000/search", params={ "format": "json", "q": query })
- Sends a GET request to the SearXNG instance running locally on port 4000.
- The query parameter q contains the rewritten search query.
- The format parameter is set to json to receive the search results in JSON format.
Parse JSON Response: data = response.json()
- Parses the JSON response from SearXNG.
Return Search Results:
python return data["results"]
- Extracts and returns the list of search results from the parsed JSON data.

Answer Generation

Finally, we will generate a response based on the search results.

def handle_websearch(message, history):
    rewritten_query = rewrite_query(message, history)
    print("Rewritten Query: ", rewritten_query)
    search_results = search_searxng(rewritten_query)
    print("Search Results", len(search_results))
    search_results = search_results[:20]
    documents = [
        {
            "content": r['content'],
            "metadata": {
                "title": r['title'],
                "url": r['url'],
            }
        } for r in search_results
    ]

    return generate_response(rewritten_query, documents)

Code Explanation

Rewrite the Query: rewritten_query = rewrite_query(message, history)
- Transforms the user’s follow-up message into a standalone search query.
Perform Web Search: search_results = search_searxng(rewritten_query)
- Uses the search_searxng function to fetch search results from SearXNG.
Limit and Structure Search Results: search_results = search_results[:20] documents = [ { "content": r['content'], "metadata": { "title": r['title'], "url": r['url'], } } for r in search_results ]
- Limits the search results to the first 20 entries to manage performance.
- Structures each search result into a document containing the content, title, and URL for further processing.
Generate the Final Response:
python return generate_response(rewritten_query, documents)
- Calls the generate_response function to create a comprehensive answer based on the search results.

def generate_response(query, documents):
    context = "\n\n".join([
        f"Title: {d['metadata']['title']}\nPage URL: {d['metadata']['url']}\nPage Content: {d['content']}\n"
        for d in documents
    ])

    prompt = f"""You are a helpful assistant at searching the web and answering user's queries. 

- Generate a response that is informative and relevant to the user's query based on provided context (the context consists of search results containing a brief description of the content of that page).
- You must use this context to answer the user's query in the best way possible. Use an unbiased and journalistic tone in your response. Do not repeat the text.
- You must not tell the user to open any link or visit any website to get the answer. You must provide the answer in the response itself. If the user asks for links, you can provide them.
- Your responses should be medium to long in length, be informative and relevant to the user's query. You can use markdown to format your response. You should use bullet points to list the information. Make sure the answer is not short and is informative.
- You have to cite the answer using [number](URL) notation along with the associated URL. You must cite the sentences with their relevant context number. You must cite each and every part of the answer so the user can know where the information is coming from.
- Place these citations at the end of that particular sentence. You can cite the same sentence multiple times if it is relevant to the user's query like [number1][number2].
- However, you do not need to cite it using the same number. You can use different numbers to cite the same sentence multiple times. The number refers to the number of the search result (passed in the context) used to generate that part of the answer.

Anything inside the following `context` HTML block provided below is for your knowledge returned by the search engine and is not shared by the user. You have to answer the question on the basis of it and cite the relevant information from it but you do not have to talk about the context in your response.

<context>
{context}
</context>

If you think there's nothing relevant in the search results, you can say that 'Hmm, sorry I could not find any relevant information on this topic. Would you like me to search again or ask something else?'. You do not need to do this for summarization tasks.
Anything between the `context` is retrieved from a search engine and is not a part of the conversation with the user. Today's date is {datetime.now()}
"""
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": prompt
            },
            {
                "role": "user",
                "content": query
            }
        ],
        temperature=0,
        stream=True
    )

Code Explanation

Prepare the Context: context = "\n\n".join([ f"Title: {d['metadata']['title']}\nPage URL: {d['metadata']['url']}\nPage Content: {d['content']}\n" for d in documents ])
- Combines the titles, URLs, and content snippets of the search results into a single context string. This context will be used to generate a well-informed response.
Create the Prompt for Response Generation: prompt = f"""You are a helpful assistant at searching the web and answering user's queries. ... """
- Provides detailed instructions to the language model on how to generate the response, emphasizing the use of the provided context, citation of sources, and the tone of the response.
Call OpenAI API to Generate Response:
python return client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": prompt }, { "role": "user", "content": query } ], temperature=0, stream=True )
- Sends the prompt and the user’s query to the OpenAI API to generate a response.
- Temperature is set to 0 to ensure deterministic and focused responses.
- Stream is enabled to allow incremental delivery of the response.

Now, if you start the app and ask a question like how Uber is using generative AI in software engineering, you will receive a detailed response citing relevant sources from the search results.

The number citations are links to the actual URLs, allowing users to reference the original sources of the information provided.

Limitations

Incomplete Page Content: We use partial content of the webpage returned by SearXNG, potentially missing out on crucial information that could be on the actual page.
Clickbait Articles: Search results include clickbait articles so you need to handle them so that you can generate good high quality answer.
Intelligent Filtering: Implementing intelligent filtering of search results is necessary to ensure that only relevant and high-quality pages contribute to the answer generation.

Thanks to Perplexica

The prompts I use come from Perplexica. Perplexica is an open-source alternative to Perplexity. Perplexity AI is a conversational search engine that uses large language models to answer queries.

I also created a video where I walk through the Perplexica architecture and code. You can watch it if you want to learn more:

Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.