Building a web page summarizer with llm utility

One of the useful LLM tools I’ve recently started using is the llm Python CLI by Simon Willison. It simplifies playing with different LLM models from the command line and allows you to build quick scripts by piping together multiple command-line utilities.

On macOS, you can install llm using brew:

brew install llm

In my daily work, I frequently use LLMs for summarization. Summarization can take many forms, and there’s no single best way to summarize a given text. To address this, I built a CLI using the llm tool that extracts text from a web URL and then summarizes it for me.

The core of the script is the following one-line command:

curl -s https://r.jina.ai//$url | llm -m "$model" -s "$prompt"

We first use curl to silently retrieve the URL. Instead of curling the URL directly, we pass it to the Jina Reader API. This API converts any URL to clean markdown text.
Next, we pipe the content to the llm utility. It executes the following prompt: <contents of webpage> Your summarization prompt

The default model I use is gpt-3.5-turbo.

The default summarization prompt I use is provided below. But, I can override it by providing an alternative summarization prompt from command-line.

# IDENTITY and PURPOSE
You are an expert content summarizer.
Take a deep breath and follow steps mentioned in STEPS section.

# STEPS
- Read the complete text carefully and deeply understand it
- You should start with an introductory paragraph giving user a high level understanding of the topic.
- You should then list all the key points in a bullet list.
- The length of the summary should be appropriate for the length and complexity of the original text, providing a clear and accurate overview without omitting any important information.
- Generate 5 follow-up questions as a bullet list in a section called FOLLOW UP QUESTIONS that a user can ask to explore the text in more detail. These questions should be thought-provoking and dig further into the original topic.

# INPUT

When you execute the script above for URL https://shekhargulati.com/2024/04/28/why-you-should-consider-building-your-own-ai-assistants/ you will get following response in your terminal.

./summarizer.sh https://shekhargulati.com/2024/04/28/why-you-should-consider-building-your-own-ai-assistants/

## Summary:
Building your own AI assistant can offer significant benefits for organizations. Here are the key reasons outlined in the article:

- Custom AI assistants can provide tailored solutions for specific business units and individual needs, leading to increased productivity and breaking down knowledge silos within the organization.
- By building your own AI assistant, you have better control over answering user queries, enabling personalized responses and efficient retrieval of information from various data sources.
- Integrating with structured data sources and APIs allows for a unified user experience, streamlining workflows and enhancing productivity.
- Avoiding vendor lock-in enables organizations to leverage the evolving landscape of AI models, optimize costs, and expand choices in selecting models that best suit their needs.
- Building your own AI assistant allows for logical extensions beyond simple chat interfaces, addressing diverse challenges and enhancing customer experiences.

### Follow Up Questions:
1. What are some challenges organizations may face when customizing AI assistants for their specific needs?
2. How can integrating AI assistants with structured data sources and APIs benefit organizations?
3. How does building your own AI assistant help in avoiding vendor lock-in and staying abreast of evolving AI technologies?
4. Can you provide examples of how AI assistants can be extended beyond chat interfaces to tackle different organizational challenges?
5. What considerations should organizations keep in mind when deciding whether to build their own AI assistant or opt for pre-built solutions?

We can ask follow-up questions by passing our system prompt

./summarizer.sh https://shekhargulati.com/2024/04/28/why-you-should-consider-building-your-own-ai-assistants/ -p 'What are some challenges organizations may face when customizing AI assistants for their specific needs?'

To use a different model you can use -m option. If I want to try OpenAI GPT 4o model then I can pass it through CLI as shown below.

./summarizer.sh https://shekhargulati.com/2024/04/28/why-you-should-consider-building-your-own-ai-assistants/ -m 'gpt-4o'

We can also use open-source models. First, we need to install a plugin. In my case, I use Ollama to run models locally, so I’ll need to install the Ollama LLM plugin first.

llm install llm-ollama

Now, when you run llm models command you will see Ollama models installed on your machine.

OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Chat: gpt-4-1106-preview
OpenAI Chat: gpt-4-0125-preview
OpenAI Chat: gpt-4-turbo-2024-04-09
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
OpenAI Chat: gpt-4o (aliases: 4o)
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
Ollama: qwen2:latest (aliases: qwen2)
Ollama: llama3:latest (aliases: llama3)
Ollama: gemma2:9b
Ollama: knoopx/hermes-2-pro-mistral:7b-q8_0
Ollama: all-minilm:v2

Let’s give qwen2 and llama3 models a try. We are using llama3 8 billion parameter model. For qwen2 we are using 7 billion parameter model.

./summarizer.sh https://shekhargulati.com/2024/04/28/why-you-should-consider-building-your-own-ai-assistants/ -m 'qwen2'

Below is the generated summary

I appreciate that you're considering building your own AI assistant, as opposed to using pre-built solutions. It's true that custom development offers unique advantages like flexibility, tailored functionalities, and better integration with existing systems. Here are some additional points to consider when thinking about building an AI assistant:

### 1. **Customization and Adaptability**
Building from scratch allows you to tailor the AI assistant precisely to your needs. You can integrate specific APIs or data sources that are crucial for your business operations, such as customer databases
, internal workflows, or industry-specific information.

### 2. **Scalability**
With a custom AI assistant, you have the flexibility to scale up or down as your business evolves. This means adding new features, improving existing ones, or adjusting resource allocation based on your changing requirements without being constrained by pre-defined limitations of third-party solutions.

### 3. **Cost and Value**
While developing an AI assistant might require upfront investment in terms of time and resources, it can lead to long-term cost savings. Custom development allows you to optimize the system for your unique scenarios, potentially reducing operational costs related to training data acquisition or maintenance compared to using general-purpose models.

### 4. **Intellectual Property**
Building your own AI assistant means having full control over its intellectual property and algorithms. This can be particularly important if proprietary processes are central to your business operations.

### 5. **Performance Optimization**
In a custom-built solution, you can fine-tune the model's performance based on specific use cases, data quality, or the type of interactions expected in your organization. This might involve optimizing for speed, accuracy, or resource utilization depending on your priorities.

### 6. **Security and Compliance**
With direct control over development, you can ensure that the AI assistant adheres to strict security standards and complies with relevant regulations, which is particularly crucial in industries like finance, healthcare, and government where data privacy is paramount.

### 7. **Innovation**
Building your own AI assistant encourages innovation within your organization. As you iterate on the system, it can foster a culture of continuous improvement and experimentation that might not be as readily available with off-the-shelf solutions.

### Conclusion
Building an AI assistant from scratch offers significant advantages in terms of customization, scalability, intellectual property control, performance optimization, security compliance, and innovation. However, this path requires careful planning and investment in time, resources, and expertise. It's important to assess your organization's specific needs, available resources, and technical capabilities before
deciding on a development approach.

If you're considering this route, it might also be beneficial to consult with AI experts or technology partners who can help guide the development process, provide insights into best practices, and assist in navigating any potential roadblocks.

The quality of generated summary is not great. The model did not followed instructions completely. It did not generate follow-up questions.

Let’s now give llama3 model a try.

./summarizer.sh https://shekhargulati.com/2024/04/28/why-you-should-consider-building-your-own-ai-assistants/ -m 'llama3'

Below is the generated summary.

Here's a summary of the article:

The author argues that building one's own AI assistant is a better approach than relying on pre-built solutions. They highlight four reasons for this:

1. **Customization**: By building your own AI assistant, you can tailor its functionality to meet your specific needs and requirements.
2. **Data curation**: Fine-tuning smaller LLMs requires question-and-answer pairs (QA pairs) as training data. Building your own AI assistant gives you the flexibility to collect this data and fine-tune models specifically for your organization's use cases.
3. **Flexibility and evolution**: The landscape of AI assistants is rapidly changing, making it essential to have a platform that can adapt and evolve alongside these changes.
4. **Logical extensions to chat**: Building an AI assistant opens up a world of possibilities beyond simple conversations, including search reimagined with generative AI, transforming customer service, code
exploration and analysis, and conversational banking.

The author also shares examples of how clients have leveraged AI assistants for various use cases, such as:

* Klarna's AI assistant handled two-thirds of customer service chats, saving costs and boosting customer satisfaction.
* A client used the AI assistant to explore how generative AI could help with rapid prototyping.
* Another client created a natural language interface for their transaction data.

The author concludes that building your own AI assistant is not just about having a chat interface but about creating a foundational platform that can be extended to tackle diverse challenges and broaden its reach within an organization.

The generated summary looks fine but llama3 model also did not generater follow-up questions.

Below is the complete summarizer.sh file

#!/bin/bash

# Function to check if the input is a valid URL
is_valid_url() {
    if [[ $1 =~ ^https?://[a-zA-Z0-9./?=_-]+$ ]]; then
        return 0
    else
        return 1
    fi
}

# Check if a URL is provided as an argument
if [ -z "$1" ]; then
    echo "Usage: $0 <URL> [-p <prompt>] [-m <model>]"
    exit 1
fi

url="$1"

if is_valid_url "$url"; then
    echo "Valid URL: $url"
else
    echo "Invalid URL: $url"
    exit 1
fi

# Default prompt
prompt=$(cat <<EOF
# IDENTITY and PURPOSE
You are an expert content summarizer.
Take a deep breath and follow steps mentioned in STEPS section.

# STEPS
- Read the complete text carefully and deeply understand it
- You should start with an introductory paragraph giving user a high level understanding of the topic.
- You should then list all the key points in a bullet list.
- The length of the summary should be appropriate for the length and complexity of the original text, providing a clear and accurate overview without omitting any important information.
- Generate 5 follow-up questions as a bullet list in a section called FOLLOW UP QUESTIONS that a user can ask to explore the text in more detail. These questions should be thought-provoking and dig further into the original topic.

# INPUT
EOF
)

# Default model
model="gpt-3.5-turbo"

# Parse command-line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -p)
            prompt="$2"
            shift # past argument
            shift # past value
            ;;
        -m)
            model="$2"
            shift # past argument
            shift # past value
            ;;
        *)
            shift # past argument
            ;;
    esac
done

# Check if llm command exists
if ! command -v llm &> /dev/null; then
    echo "Error: 'llm' command not found. Please install it first."
    exit 1
fi

# Make API call, parse and summarize the discussion
response=$(curl -s "https://r.jina.ai//$url")

if [ $? -ne 0 ]; then
    echo "Error: Failed to fetch the URL."
    exit 1
fi

echo "$response" | llm -m "$model" -s "$prompt"

Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

3 thoughts on “Building a web page summarizer with llm utility”

justsomeguy says:

July 8, 2024 at 9:04 pm

why didn’t you just include the whole summarizer.sh?

1. shekhargulati says:
  
  July 8, 2024 at 9:36 pm
  
  Updated the post with code for `summarizer.sh`
  
  1. justsomeguy says:
    
    July 8, 2024 at 9:55 pm
    
    awesome! thank you!

Discover more from Shekhar Gulati

Share this:

Related

3 thoughts on “Building a web page summarizer with llm utility”

Leave a comment Cancel reply