From Screenshots to Markdown Tables with LLMs

One of the tasks I frequently use ChatGPT-like tools for is extracting markdown text from images. I enjoy watching conference videos on YouTube. Often, I find slides during these videos that I want to keep for future reference. To achieve this, I take screenshots and add them to my notebook. However, if I forget to add any textual comments with the screenshots, searching for them later becomes difficult. Additionally, there are times when I need to extract text in markdown format from the screenshots for future use.

Let’s look at an example screenshot that I took yesterday from a talk by OpenAI engineer on Fine Tuning.

Continue reading “From Screenshots to Markdown Tables with LLMs”

Query Rewriting in RAG Applications

Creating an AI assistant that generate helpful answers from a knowledge base is a complex problem. A significant hurdle is the frequent mismatch between how users ask questions and how information is structured within the data. Most people struggle to ask good questions. This often results in irrelevant or incomplete answers, frustrating users.

As builders of these systems we should not expect users to write well-crafted queries. In our application, we have implemented query rewriting to rephrase user queries to better align with the underlying data. This has dramatically improved the accuracy and helpfulness of our AI assistant responses.

In this post I will share details on how we implemented query rewriting in our application. We will end the post by looking at how popular open source systems do query rewriting.

You can hear this blog in podcast format here – https://notebooklm.google.com/notebook/ed6e648e-c95c-4ad8-88a2-767be02c7c4d/audio

Continue reading “Query Rewriting in RAG Applications”

A simple optimization that reduced output tokens by 30% in our LLM-based RAG solution

I’ve been running a chat assistant application built on OpenAI for the past year. My biggest learning has come from analyzing our AI assistant responses and finding ways to optimize(both cost and quality) them. Like all RAG applications, we add source URLs to all chunks and instruct the LLM to include citations referencing the source link. Here’s a snippet of our answer generation prompt:

For each document indicate which sources most support it via valid citation markers at the end of sentence in the markdown format. Add a link to the source using markdown format. Also, include page number with the source.

Our analysis revealed that over 60% of our answers contain more than five source links, with listing questions exceeding ten links. These links inflate both input and output tokens.

Continue reading “A simple optimization that reduced output tokens by 30% in our LLM-based RAG solution”

RouteLLM Paper

Paper Link : https://arxiv.org/pdf/2406.18665v2

Paper Title: RouteLLM: Learning to Route LLMs with Preference Data

With the growing capabilities of large language models (LLMs), efficiently utilizing them becomes crucial. LLM routing emerges as a promising solution. It directs user queries to the most suitable LLM based on factors like complexity and domain. This approach aims to optimize response quality while minimizing costs. However, optimal routing presents a challenge: the router model needs to understand the query’s intent, complexity, and domain, along with the capabilities of candidate LLMs. Additionally, it should be economical, fast, and adaptable to new, improved models.

Continue reading “RouteLLM Paper”

Practical Takeaways from “APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets” Paper

A recent paper by the Salesforce AI research team describes a method for generating function-calling datasets for Large Language Models (LLMs). Function calling enables LLMs to interact with external systems, like remote APIs, databases, or in-process code. This equips LLMs with tools to perform specific actions, such as retrieving weather information, booking reservations, or fetching stock data from APIs.

If you’re unfamiliar with function calling, refer to the OpenAI docs to learn more.

This post explores practical takeaways for developers building LLM applications.

Continue reading “Practical Takeaways from “APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets” Paper”

Building a web page summarizer with llm utility

One of the useful LLM tools I’ve recently started using is the llm Python CLI by Simon Willison. It simplifies playing with different LLM models from the command line and allows you to build quick scripts by piping together multiple command-line utilities.

On macOS, you can install llm using brew:

brew install llm

In my daily work, I frequently use LLMs for summarization. Summarization can take many forms, and there’s no single best way to summarize a given text. To address this, I built a CLI using the llm tool that extracts text from a web URL and then summarizes it for me.

The core of the script is the following one-line command:

curl -s https://r.jina.ai//$url | llm -m "$model" -s "$prompt"
Continue reading “Building a web page summarizer with llm utility”

Extracting structured output with instructor LLM library

For the past year, I’ve been building applications using Large Language Models (LLMs). A common task I’ve used LLMs for is extracting structured information from unstructured text.

Imagine you’re an IT service provider like Accenture with a set of customer case studies. You want to extract specific information from each case study, such as: customer name, industry, the business problem addressed, technologies used in the solution, whether it’s a cloud-native solution or not, if it utilizes AI, and the project duration.

Take this case study for example from Accenture website https://www.accenture.com/in-en/case-studiesnew/cloud/axa-cloud we want to extract the information into a Python class shown below.

from pydantic import BaseModel

class CaseStudy(BaseModel):
    client_name: str
    industry: str
    country: str
    business_problem: str
    technologies: list[str]
    is_cloud_native_solution: bool
    uses_ai: bool
    project_duration: int 

If you have been following LLMs you know that most closed source(even some open source) LLMs support structured outputs. For example, OpenAI has JSON mode that always return JSON object that make sense for your use case.

Let’s use OpenAI JSON mode to extract the relevant information.

Read more: Extracting structured output with instructor LLM library Continue reading “Extracting structured output with instructor LLM library”