Why you should consider building your own AI assistant?


For the past six months, I’ve been leading the development of a custom AI assistant for our organization. It began with the straightforward concept of offering an alternative to publicly available chat assistants like OpenAI’s ChatGPT or Google’s Gemini. However, it has evolved into a comprehensive platform powering multiple bots tailored to specific business units, departments, and individual needs.

The feedback on the AI Assistant has been positive, with people reporting productivity gains. It is also helping to break down knowledge and information silos within the organization.

A common question I receive is why we opted to build our own solution instead of leveraging existing assistants like ChatGPT, Perplexity, Microsoft’s Enterprise Copilot, or the plethora of other options available. After all, isn’t the AI chat assistant landscape already saturated? While there are indeed numerous products vying for a slice of this multi-billion dollar market, I believe we are still in its nascent stages. The optimal workflows and functionalities to integrate into these tools are still under exploration by all players.

In this blog post, I’ll delve into the reasons why I believe organizations should strongly consider building their own AI assistants. It’s important to clarify that I’m not advocating for everyone to embark on building entirely from scratch.

Reason 1. You know how to best answer your user queries

One of the problems these AI assistants solve is enabling Q&A over the organization’s knowledge base. This is our standard Retrieval-Augmented Generation (RAG) use case. Here’s a high-level overview of a RAG solution:

  1. Indexing Pipeline: Documents (PDF, Word, PPT, text, web pages) are chunked and stored in a vector database. Chunking typically involves dividing documents by paragraph, word count, or character count. There are many chunking strategies, and evaluating which one yields the best results can be challenging.
  2. Retrieval: When a user query arrives, the system finds relevant chunks in the vector database.
  3. Answer Generation: Using the chunks retrieved in step 2, the answer is built. The typical prompt people use for answer generation looks like the one below, which I found on the Azure Search OpenAI Demo GitHub repository.
Assistant helps the company employees with their healthcare plan questions, and questions about the employee handbook. Be brief in your answers.
Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
For tabular information return it as an html table. Do not return markdown format. If the question is not in English, answer in the language used in the question.
Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf].
Question: {question}
Context: {context}

This is what most demo RAG solutions look like. They work well in demos but fail miserably in the real world. In the real world, you have to address various concerns at each stage of your RAG solution.

Indexing

  • What document types do you need to support?
  • What is the structure of the documents? Do they contain tables, images, or handwritten text? These factors will determine the complexity of the ingestion pipeline.
  • Which chunking strategy should you use? What is the optimal chunk size? Which embedding model should you use, and when? Do you need separate indexes for better search?
  • How do you handle new data sources?
  • How do you evaluate and test your indexing pipeline?
  • What kind of questions will people ask, and how will that impact your indexing strategy?
  • Do you need different indexing strategies depending on the document type? How many such strategies do you need?
  • How can you continuously refresh the index without impacting search quality?

Retrieval

  • What kind of queries will users be asking? Do they ask search-engine-like keyword-based queries, or do they write proper questions?
  • Do you want to support follow-up questions?
  • Do you want to rewrite the queries? If yes, should you consider a fine-tuned model or a generic model for that?
  • Do you need hybrid or semantic search, or keyword-based search?
  • Do you want to remove response format instructions from search queries?
  • How many chunks should you retrieve from the vector database?
  • Do you need to search in multiple indexes, or only one?
  • Do you want to break the user query into multiple queries for better retrieval?
  • Do you want to generate HYDE documents and use them for searching?
  • Do you want to rerank retrieved chunks?
  • How many chunks do you want to pass to your answer generation prompt?

Answer Generation

  • What answer format do you want to have? Do you have any examples?
  • Do you want to support different answer formats depending on the question?
  • Do you want users to be able to change the tone of the answer?
  • Do you want to enforce any content guidelines? If yes, how will it work with streaming responses?
  • Do you want to support document summarisation? If yes, then is no single right summarization format? Depending on the document and user need you will have to generate summaries.
  • How do you want to handle “I don’t know” answers?
  • Do you want to support citations?
  • How will you ensure consistent responses?
  • What follow-up actions do you want to enable for each answer?
  • How do you want to analyze the responses and incorporate user feedback?

The questions I’ve listed above are some of the key considerations for building a robust and production-ready RAG application. The answers to many of these questions will be specific to each organization and its use case. This means that even if you purchase a pre-built solution, significant customization will likely be necessary to ensure the system functions consistently within your unique environment.

One challenge I’ve encountered with chat interfaces is the user’s tendency to ask questions based on their own interests, which may not always align with the your LLM application current capabilities. Training users and designing a UX that guides them towards optimal queries can be a complex endeavor.

For instance, we built a bot to answer queries related to our customer work case studies. It’s a valuable tool for users to explore our past projects by client, technology, or domain. However, some users might ask broader questions like “who are your top 10 clients?” or “list all retail case studies in the UK.”

Structured databases can offer more consistent responses for such inquiries. This approach involves parsing documents, extracting relevant information, and storing it within the database. During query time, the system can then identify whether a structured database holds the optimal answer and utilize that data for response generation.

Reason 2. It is about your workflow and your system integrations

While many AI assistants begin by providing chat interfaces for querying knowledge bases, organizations quickly discover the importance of integrating with structured data sources and APIs to create a unified user experience.

Take an HR AI assistant for example. Sure, it can answer questions about company policies. But wouldn’t it be even more valuable if it could also connect to the HRMS system via APIs? This would allow employees to manage leave, approve requests, and perform other common HR tasks directly through the chatbot, streamlining their workflows.

Integrating with structured data sources and APIs necessitates careful consideration of the user journey within a chat context. Here are some key questions to address:

  • User Interface vs. Natural Language: Will your assistant utilize UI components for actions or rely solely on natural language?
  • Communicating Workflow Options: How will users discover available workflows? Consider user training methods like slash commands or dedicated bots for specific workflows.
  • Read-Only vs. Write Access: Should your assistant limit users to read-only actions in the chat, directing them to the existing system UI for modifications?
  • Personalizing Responses: Can you leverage data from downstream systems to personalize responses? For instance, HR data on designation, location, or experience can enhance the user experience.
  • Follow-up Actions: Providing follow-up actions within the assistant empowers users to complete tasks. One example is offering to draft an email based on the chat conversation.
  • Identifying Repetitive Tasks: How will you uncover repetitive tasks suitable for workflow automation? Analyzing user interactions alongside surveys and conversations can be helpful.
  • Linear vs. Non-Linear Chat History: Depending on the workflow, do you need a strictly linear chat history, or can the conversation deviate for a more natural flow?

In addition to the functional aspects, building an AI assistant necessitates addressing security and manageability:

  • User Consent Management: How will you obtain and manage user consent for data access?
  • API Token Management: What secure practices will you implement to manage API tokens for accessing external systems?
  • Resiliency Patterns: How will you design your assistant and connected systems to handle potential issues and maintain uptime?

Just like the first point, you will have to do custom engineering to build useful workflows and build integrations in your AI assistant. If you are planning to buy or subscribe to one then consider the above.

Reason 3. Avoid Vendor Lock-in and Leverage the Evolving AI Landscape

Building your own AI assistant allows you to escape the limitations of being tightly coupled to a single Large Language Model (LLM) provider. This was one of the core reasons why I built an AI assistant. Here’s why this flexibility is crucial:

  • Embrace the Pace of Innovation: The field of LLMs is rapidly evolving. New models are constantly emerging, offering improved capabilities and functionalities. By building your own assistant, you’re not confined to the release schedule of a single vendor. You can have hybrid model strategy and use different models for different use cases. You can experiment faster and avoid lockin to on model or vendor.
  • Optimize Cost and Efficiency: Building your own assistant allows you to explore alternative approaches that might be more cost-effective for your specific needs. It is possible to cache some responses and use smaller fine-tuned LLMs for specific use cases. Depending on your workload and usage patterns this can bring real cost savings.
  • Expand Your LLM Choices: This is linked to the point 1 but given its importance it does no harm to repeat again. Building your own assistant opens doors to a wider range of LLMs, including open-source options that can be customized to address your specific requirements. In our AI assistant we use OpenAI models(GPT 3.5-turbo and GPT 4 Turbo), Llama 2, and Mistral 7B. We will soon add support for Llama 3. You can A/B test with multiple models and find the one that best suits your needs.
  • Data curation for fine-tuning Small LLMs: Large LLMs require vast amounts of data and computational resources for training, making them inaccessible to some organizations. I believe the future lies in fine-tuning smaller models for most use cases. However, fine-tuning requires question-and-answer pairs (QA pairs) as training data. Building your own AI assistant offers the flexibility to collect this data, which you can then leverage to fine-tune smaller models specifically for your needs. This approach allows you to conduct A/B testing, comparing the accuracy and quality of your fine-tuned model against existing LLMs. pen_spark

By building your own AI assistant, you gain the flexibility to adapt and evolve alongside the rapidly changing LLM landscape. You can choose the most cost-effective and efficient solutions, while ensuring your assistant offers the specific capabilities required for your organization’s success.

Reason 4: Most problems are logical extensions to chat

While chat interfaces are a popular starting point, building your own AI assistant unlocks a world of possibilities beyond simple conversations. It establishes a foundational platform that can be extended to tackle diverse challenges and broaden its reach within your organization. Here are some real-world examples of how we’ve helped clients leverage AI assistants for various use cases, all of which can be seen as logical extensions of chat functionality:

  • Search Reimagined with Generative AI: This application isn’t fundamentally different from chat. The same core components discussed earlier can power a generative search experience. However, integrating with multiple data sources and adding personalization becomes crucial to deliver a powerful user experience.
  • Transforming Customer Service: Customer service chatbots are essentially AI assistants tailored for external users. They leverage different data sources and APIs but fulfill similar functions: answering FAQs, troubleshooting basic issues, and escalating complex inquiries. The investment in building an internal employee-focused AI assistant can often be leveraged to create a customer-facing chatbot. Klarna’s case study exemplifies this perfectly: their AI assistant handled two-thirds of customer service chats, saving costs and boosting customer satisfaction.
    • The AI assistant has conducted 2.3 million conversations, handling two-thirds of Klarna’s customer service chats, equivalent to the work of 700 full-time agents.
    • It has matched human agents in customer satisfaction and improved errand resolution accuracy, leading to a 25% decrease in repeat inquiries.
    • Customers now resolve their errands in less than 2 minutes compared to 11 minutes previously, with availability in 23 markets and communication in over 35 languages.
    • The AI assistant is estimated to drive a $40 million USD profit improvement for Klarna in 2024.
    • Klarna has enhanced communication with local immigrant and expat communities across its markets through language support.
  • Code Exploration and Analysis: We recently assisted a client with a legacy modernization effort. They wanted to explore how generative AI could help with rapid prototyping. This involved integrating static code analyzers, parsers, and generative AI to build comprehensive documentation that powered a chat interface. While the data sources and storage mechanisms differed, the core process of building a code exploration and analysis tool mirrored building a knowledge AI assistant.
  • Conversational Banking: We’ve also helped clients create natural language interfaces for their transaction data. This application utilizes the AI assistant foundation to provide insights and visualizations, allowing users to understand their spending patterns more effectively.

By building your own AI assistant, you gain a versatile platform that can be adapted to address a wide range of challenges within your organization. You’re not limited to a single use case; you can leverage the core functionalities to create custom solutions that meet your specific needs.

Conclusion

I know I have not given many answers in this post. These are early days of LLM based apps. The landscape of AI assistants is rapidly evolving. While pre-built solutions offer a starting point, building your own assistant unlocks a new level of flexibility and control. You gain the power to tailor functionalities to your specific requirements, integrate seamlessly with existing systems, and leverage the latest advancements in AI technology.

Remember, a custom AI assistant isn’t just a chat interface; it’s a foundational platform with the potential to transform the way your organization works. It can empower employees, enhance customer experiences, and drive innovation across all departments.

Leave a comment