April 2025 – Shekhar Gulati

How We Used Claude to Implement Text Synchronization Feature of Videocrawl

At Videocrawl https://www.videocrawl.dev/, we’ve been exploring how AI assistants can enhance our development process. Recently, we built a text synchronization feature for our video player using Claude as our AI pair programmer. The feature highlights transcript text as a video plays, but the journey to get there revealed both the strengths and limitations of AI-assisted development.

The Initial Approach

We presented Claude with our requirements: synchronize transcript text with video playback, highlight the current text, and auto-scroll to keep it visible. Claude quickly generated a wireframe showing how the feature would look and proposed an initial implementation.

The first solution used custom HTML spans with direct styling to highlight words. While technically sound, this approach had a critical flaw: it broke our existing markdown rendering system. The highlighting was being applied at the DOM level after markdown processing, causing formatting inconsistencies.

As the developer, I had to intervene: “This breaks our markdown formatting. Can we use markdown bold tags instead of custom styling?”

Claude immediately pivoted to a new approach using markdown bold syntax (word), which preserved our existing formatting system. This was our first insight: AI needs guidance on system context that isn’t obvious from the code alone.

Giving Summary Generation Some Agency

One of the most common use cases of LLMs is summary generation. I have worked on multiple systems where we have summarized different kinds of documents – word, pdf, text, web pages, call transcripts, and video transcripts. I am building Videocrawl where we generate summaries of video content. In almost all the summary use cases I have implemented, we have gone with a static summary prompt where we instruct the LLM to generate a summary in a specific format. In my recent work, I have been playing with the idea of giving some agency to the summarizer so that we can generate dynamic summarization prompts. In this short post, I will share my approach.

Let’s make it concrete. Let’s assume that we want to summarize Search R1 paper. This paper covers how we can train LLMs to reason and leverage search engines for reinforcement learning.

New Videocrawl Feature: Tracking Video Progress

We’ve implemented a smart video progress tracking system in https://www.videocrawl.dev/ that remembers your watching position across sessions. Now when you close a tab or navigate away from a video, you’ll be able to pick up right where you left off when you return.

The feature includes:

A visual progress bar showing how much of the video you’ve watched
Automatic resumption from your last position when returning to a video
Persistent progress tracking across browser sessions

How I Built Videocrawl’s Screenshot Feature with Claude

I am building Videocrawl (https://www.videocrawl.dev/), an AI companion app for videos. The application aims to improve my learning experience while watching videos. Most of my feature ideas come from using the application, identifying gaps in the experience, implementing solutions, testing them in production, learning from actual usage, and then making further improvements. This development cycle continues iteratively. I use LLMs for writing most of the code, primarily relying on Claude for my chat-driven development workflow.

Videocrawl works by processing a YouTube video URL that you provide. We then present a side-by-side view with the video on the left and various LLM tools (clean transcript, summary, chat, and FAQs) on the right, as shown below. You can customize the layout based on your workflow preferences.

One feature I recently wanted to add was the ability to take a screenshot of the current video frame and save it as a note. We already supported text-based notes, so this seemed like a natural extension.

The concept was straightforward: when the user presses a camera button or uses a keyboard shortcut, we capture the current video frame and save it to their notes. Without LLMs, I would likely have avoided implementing such a feature, as it would require extensive research and trial-and-error. However, with LLMs, I felt confident that I could successfully attempt this implementation.