Making sense of screenshots with CLIP model embeddings

Today I was reading Chapter 9 “Multimodal Large Language Models” of Hands-On Large Language Models book and thought of applying it to a problem I face occassionally. The chapter covers CLIP model and how you can use them to embed both text and images in the same vector space.

Like most normal humans, I take a lot of screenshots, and if I don’t categorize them at the time I took the screenshot, then there’s a lot of manual effort required to find them when I need them. So, I decided to build a quick semantic search on it using the llm utility.

Continue reading “Making sense of screenshots with CLIP model embeddings”

Oreilly Answers: A case study of poorly designed LLM powered RAG system

I enjoy reading books on Oreilly learning platform https://learning.oreilly.com/ . For the past month, a new feature on the Oreilly platform called “Answers” has been staring me down, and I haven’t been tempted to click it. Maybe it’s LLM fatigue, or something else I just didn’t give it a try. I do use LLM tools daily but most of these tools I have designed for myself around my workflows.

Today, I decided to give it a try. If you go to a book page like the one I am reading currently https://learning.oreilly.com/library/view/hands-on-large-language/9781098150952/ you will see Answers icon in the right side bar.

When you click on Answers it will show a standard Chat input box and suggestions. We all have seen them million times by now.

It looks like a standard Retrieval Augmented Generation (RAG) use case. When you ask a question it will search in its knowledge base(some sort of Vector/Hybrid search) and then generate the answer.

Continue reading “Oreilly Answers: A case study of poorly designed LLM powered RAG system”

Generating architecture.md with code2prompt and OpenAI gpt-4o-mini model

New contributors often struggle to grasp project intricacies without a well-defined architectural understanding documentation. This post exposes how we can leverage the power of Large Language Models (LLMs) to automate the generation of architecture documentation (architecture.md) directly from a project’s codebase. I first learnt about architecture.md from a post that was published in 2021. Many popular open source projects like Caddy have architecture.md in their source code.

We call our script as shown below.

./architecturemd-generator.sh https://github.com/frdel/agent-zero.git

and it generates architecture.md as shown below in the screenshot. You can look at the complete generated architecture.md file in the GitHub repository.

Continue reading “Generating architecture.md with code2prompt and OpenAI gpt-4o-mini model”

Building a YouTube Video Summarizer with llm and yt-dlp

In this blog post, we’ll create a handy utility that summarizes YouTube videos using the power of large language models (LLMs) and the versatility of Python’s yt-dlp tool. It leverages the summarizing capabilities of llm to extract key points and insights from YouTube subtitles, making it easier to grasp the video’s content without having to watch the entire thing.

Setting the Stage

Before we dive in, let’s ensure you have the necessary tools:

  1. llm: This command-line interface allows us to interact with large language models. Follow the installation instructions on the llm project’s website https://llm.datasette.io/en/stable/index.html.
  2. yt-dlp: This versatile tool helps download various formats from YouTube, including subtitles. Install it using pip install yt-dlp.
  3. Set OPENAI_API_KEY environment variables. This utility defaults to using OpenAI API and gpt-4o-mini model.

GitHub Repo

You can get the complete source code here https://github.com/shekhargulati/llm-tools.

Continue reading “Building a YouTube Video Summarizer with llm and yt-dlp”

Leveraging BERTopic to Understand AI Assistant Usage Patterns

I am building and operating a ChatGPT like enteprise AI-Assistant for a year. We log all the user queries in a database for future analysis and building personalized features. We have seen its usage grow over time and it is becoming difficult for our small team(4) to use manual quality analysis methods like eye balling and vibe checks to understand system accuracy and usage patterns.

In this quick post I will cover how we can use BERTopic and OpenAI gpt-4o-mini model to cluster user queries into labelled groups. We will run this analysis on Chatbot Arena dataset.

This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp.

BERTopic is an open-source project offers a novel approach to topic modelling. Topic modelling is an unsupervised and exploratory approach to make sense of bunch of documents.

BERTopic leveraging the power of BERT, a state-of-the-art language model, and c-TF-IDF ( a variation of the traditional TF-IDF (Term Frequency-Inverse Document Frequency) algorithm designed to work with multiple classes or clusters of documents), BERTopic helps uncover hidden thematic structures within your text data. This approach assumes that documents grouped by semantic similarity can effectively represent a topic, where each cluster reflects a major theme and the combined clusters paint a broader picture.

Continue reading “Leveraging BERTopic to Understand AI Assistant Usage Patterns”

From Screenshots to Markdown Tables with LLMs

One of the tasks I frequently use ChatGPT-like tools for is extracting markdown text from images. I enjoy watching conference videos on YouTube. Often, I find slides during these videos that I want to keep for future reference. To achieve this, I take screenshots and add them to my notebook. However, if I forget to add any textual comments with the screenshots, searching for them later becomes difficult. Additionally, there are times when I need to extract text in markdown format from the screenshots for future use.

Let’s look at an example screenshot that I took yesterday from a talk by OpenAI engineer on Fine Tuning.

Continue reading “From Screenshots to Markdown Tables with LLMs”

Query Rewriting in RAG Applications

Creating an AI assistant that generate helpful answers from a knowledge base is a complex problem. A significant hurdle is the frequent mismatch between how users ask questions and how information is structured within the data. Most people struggle to ask good questions. This often results in irrelevant or incomplete answers, frustrating users.

As builders of these systems we should not expect users to write well-crafted queries. In our application, we have implemented query rewriting to rephrase user queries to better align with the underlying data. This has dramatically improved the accuracy and helpfulness of our AI assistant responses.

In this post I will share details on how we implemented query rewriting in our application. We will end the post by looking at how popular open source systems do query rewriting.

You can hear this blog in podcast format here – https://notebooklm.google.com/notebook/ed6e648e-c95c-4ad8-88a2-767be02c7c4d/audio

Continue reading “Query Rewriting in RAG Applications”

A simple optimization that reduced output tokens by 30% in our LLM-based RAG solution

I’ve been running a chat assistant application built on OpenAI for the past year. My biggest learning has come from analyzing our AI assistant responses and finding ways to optimize(both cost and quality) them. Like all RAG applications, we add source URLs to all chunks and instruct the LLM to include citations referencing the source link. Here’s a snippet of our answer generation prompt:

For each document indicate which sources most support it via valid citation markers at the end of sentence in the markdown format. Add a link to the source using markdown format. Also, include page number with the source.

Our analysis revealed that over 60% of our answers contain more than five source links, with listing questions exceeding ten links. These links inflate both input and output tokens.

Continue reading “A simple optimization that reduced output tokens by 30% in our LLM-based RAG solution”

RouteLLM Paper

Paper Link : https://arxiv.org/pdf/2406.18665v2

Paper Title: RouteLLM: Learning to Route LLMs with Preference Data

With the growing capabilities of large language models (LLMs), efficiently utilizing them becomes crucial. LLM routing emerges as a promising solution. It directs user queries to the most suitable LLM based on factors like complexity and domain. This approach aims to optimize response quality while minimizing costs. However, optimal routing presents a challenge: the router model needs to understand the query’s intent, complexity, and domain, along with the capabilities of candidate LLMs. Additionally, it should be economical, fast, and adaptable to new, improved models.

Continue reading “RouteLLM Paper”

Practical Takeaways from “APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets” Paper

A recent paper by the Salesforce AI research team describes a method for generating function-calling datasets for Large Language Models (LLMs). Function calling enables LLMs to interact with external systems, like remote APIs, databases, or in-process code. This equips LLMs with tools to perform specific actions, such as retrieving weather information, booking reservations, or fetching stock data from APIs.

If you’re unfamiliar with function calling, refer to the OpenAI docs to learn more.

This post explores practical takeaways for developers building LLM applications.

Continue reading “Practical Takeaways from “APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets” Paper”