Building a YouTube Video Summarizer with llm and yt-dlp


In this blog post, we’ll create a handy utility that summarizes YouTube videos using the power of large language models (LLMs) and the versatility of Python’s yt-dlp tool. It leverages the summarizing capabilities of llm to extract key points and insights from YouTube subtitles, making it easier to grasp the video’s content without having to watch the entire thing.

Setting the Stage

Before we dive in, let’s ensure you have the necessary tools:

  1. llm: This command-line interface allows us to interact with large language models. Follow the installation instructions on the llm project’s website https://llm.datasette.io/en/stable/index.html.
  2. yt-dlp: This versatile tool helps download various formats from YouTube, including subtitles. Install it using pip install yt-dlp.
  3. Set OPENAI_API_KEY environment variables. This utility defaults to using OpenAI API and gpt-4o-mini model.

GitHub Repo

You can get the complete source code here https://github.com/shekhargulati/llm-tools.

yt-summarizer.sh

Create a new file with name yt-summarizer.sh. You can get the complete script from the GitHub repo here

#!/bin/bash

video_url="$1"

uuid=$(uuidgen)

# Construct the file path
file_path="./subtitles/${uuid}.txt"

yt-dlp --quiet --write-auto-sub --convert-subs=srt --skip-download $video_url --output  $file_path

srt_file_path="${file_path}.en.srt"

subtitles_content=$(cat "$srt_file_path")

prompt=$(cat <<EOF
# IDENTITY and PURPOSE
You are an expert at summarizing and extracting key insights from Youtube videos. You will be given Youtube subtitles in the SRT (SubRip Subtitle) format. 
Take a deep breath and think step by step about how to best accomplish this goal using the following steps.

# STEPS
- Read the complete youtube subtitle text carefully and deeply understand it
- You should start with an introductory paragraph giving user a high level understanding of the topic.
- Extract key points and insights from the input text and group them in to logical groups. 
    - You can create 10-20 logical groups
- For each group come up with logical group name. 
    - Group name length should be 10-20 words long 
    - Use group name the heading
- For each logical group extract can have 5-10 key points
    - Each key point should be detailed and upto 100 words
    - With each key point also mention the timestamp

# OUTPUT

1. Do not create groups with name Group Name 1, Group Name 2

# INPUT
EOF
)

model="gpt-4o-mini"

echo "Summmary of the Youtube Video: $video_url"

echo "$subtitles_content" | llm -m "$model" -s "$prompt" -o temperature 0.2 -o max_tokens 2000 

The Script Breakdown

Our script takes a YouTube video URL as input and delivers a concise summary of its key points. Let’s break down the steps:

  1. Checking Dependencies: The script verifies if both llm and a URL are provided as arguments. Missing dependencies or arguments result in helpful error messages.
  2. Downloading Subtitles: Using yt-dlp, the script downloads subtitles in SRT (SubRip Subtitle) format without actually downloading the video itself.
  3. Prompt Structure. A well-defined prompt instructs the llm model on how to process the subtitles. It guides the model to:
    • Read the subtitles thoroughly
    • Create an introductory paragraph summarizing the video’s topic
    • Extract key points and group them logically (10-20 groups)
    • Generate a descriptive group name for each group (10-20 words)
    • Summarize each key point in detail (5-10 points per group, up to 100 words each) and include timestamps
  4. Summary Generation: The script feeds the downloaded subtitles and the crafted prompt to the llm model. The model, powered by gpt-4o-mini (or your chosen model), generates a concise summary adhering to the specified format.

Running the Script

  1. Save the script as youtube_summarizer.sh.
  2. Make the script executable: chmod +x youtube_summarizer.sh
  3. Run the script with a YouTube video URL as an argument: ./youtube_summarizer.sh https://www.youtube.com/watch?v=example_video_id

This will download the subtitles, craft the prompt, and utilize the llm model to provide a bulleted summary of the video, highlighting key points with timestamps.

Let’s see the tool in action. We will summarize the video discussion between Jensen Huang and Mark Zuckerberg they had SIGGRAPH conference

./yt-summarizer.sh https://www.youtube.com/watch\?v\=w-cmMcMZoZ4

It will produce following output

In this engaging discussion, Mark Zuckerberg and Jensen Huang delve into the transformative impact of artificial intelligence (AI) and its integration into various industries, particularly focusing on the advancements in generative AI and its applications. They explore the evolution of AI technologies, the importance of open-source initiatives, and the future of computing, including the development of smart glasses and mixed reality devices. The conversation highlights the potential of AI to revolutionize not only technology but also everyday interactions and business operations.

## The Impact of AI on Society
- The advent of generative AI is reshaping industries and consumer experiences at an unprecedented pace (00:51:00).
- AI is becoming integral in various fields, including healthcare, climate tech, and biotech, facilitating advancements that were previously unimaginable (00:51:30).
- The ability to create personalized AI models allows businesses to embed their institutional knowledge, enhancing operational efficiency (00:33:00).

## Open Source Philosophy
- Zuckerberg emphasizes the significance of open-source projects like Llama, which enable a collaborative ecosystem for AI development (00:24:00).
- Open-source initiatives allow companies to build upon existing technologies, fostering innovation and reducing costs (00:24:30).
- The open-source strategy is viewed as a beneficial business model that encourages widespread adoption and adaptation of AI technologies (00:32:00).

## Advancements in Mixed Reality
- The discussion includes the development of smart glasses and mixed reality headsets, which are expected to become mainstream consumer products (00:44:00).
- These devices aim to blend digital and physical experiences, allowing for more immersive interactions (00:44:30).
- The evolution of display technology is crucial for the success of these devices, with ongoing efforts to make them more compact and user-friendly (00:45:00).

## AI in Business Operations
- Companies are increasingly adopting AI to enhance customer interactions, streamline operations, and improve service delivery (00:37:00).
- The integration of AI into business processes is seen as a way to amplify the capabilities of employees, making them more efficient (00:36:30).
- AI tools are being developed to assist in various tasks, from customer support to content creation, allowing businesses to scale their operations effectively (00:36:50).

## Future of AI and Computing
- The conversation touches on the potential for AI to evolve into a more generalized assistant capable of handling diverse tasks across different domains (00:52:00).
- There is optimism about the future of AI, with expectations that it will continue to improve and integrate into everyday life (00:52:30).
- The importance of developing AI responsibly, with a focus on ethics and safety, is highlighted as a critical aspect of future advancements (00:52:50).

## Personalization and User Experience
- The ability to create personalized AI experiences is becoming increasingly important, allowing users to interact with AI in a way that reflects their preferences and needs (00:52:10).
- AI is expected to facilitate more natural and intuitive interactions, enhancing user engagement and satisfaction (00:52:40).
- The future of AI includes the potential for users to train their own models, tailoring them to specific tasks or styles (00:52:20).

## Collaboration and Innovation
- The discussion emphasizes the collaborative nature of AI development, with various stakeholders contributing to the ecosystem (00:25:00).
- Innovation in AI is driven by the collective efforts of researchers, developers, and businesses working together to push the boundaries of technology (00:25:30).
- The importance of fostering a diverse range of AI applications is underscored, as it leads to richer and more effective solutions (00:25:50).

## The Role of Generative AI
- Generative AI is transforming how content is created, enabling users to generate high-quality outputs based on minimal input (00:51:10).
- The technology is being applied in creative fields, allowing artists and creators to enhance their work and reach new audiences (00:51:40).
- The potential for generative AI to revolutionize industries is vast, with applications ranging from entertainment to education (00:51:20).

## Challenges and Opportunities
- The conversation acknowledges the challenges of integrating AI into existing systems and the need for careful planning and execution (00:37:30).
- Despite these challenges, the opportunities presented by AI are immense, with the potential to drive significant advancements across various sectors (00:37:50).
- The importance of continuous learning and adaptation in the face of rapid technological change is emphasized (00:38:10).

## Vision for the Future
- Zuckerberg and Huang express a shared vision for a future where AI is seamlessly integrated into daily life, enhancing productivity and creativity (00:52:30).
- The discussion highlights the need for ongoing innovation and the development of new technologies to support this vision (00:52:50).
- There is a strong belief that the next generation of computing will be defined by open ecosystems that foster collaboration and creativity (00:52:10).

Let me end the summary with a quote: "The future is not something we enter. The future is something we create."

Note: This script provides a basic framework. You can customize it further by:

  • Implementing error handling for subtitle downloads
  • Allowing users to specify the desired LLM model
  • Expanding the output format to include more information

Feel free to experiment and tailor it to your specific needs!

I am building a course on how to build production apps using LLMs. We will cover topics like prompt engineering, RAG, search, testing and evals, fine tuning, feedback analysis, and agents. You can register now and get 50% discount. Register using form – https://forms.gle/twuVNs9SeHzMt8q68


Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

2 thoughts on “Building a YouTube Video Summarizer with llm and yt-dlp”

Leave a comment