How I Built Videocrawl’s Screenshot Feature with Claude

I am building Videocrawl (https://www.videocrawl.dev/), an AI companion app for videos. The application aims to improve my learning experience while watching videos. Most of my feature ideas come from using the application, identifying gaps in the experience, implementing solutions, testing them in production, learning from actual usage, and then making further improvements. This development cycle continues iteratively. I use LLMs for writing most of the code, primarily relying on Claude for my chat-driven development workflow.

Videocrawl works by processing a YouTube video URL that you provide. We then present a side-by-side view with the video on the left and various LLM tools (clean transcript, summary, chat, and FAQs) on the right, as shown below. You can customize the layout based on your workflow preferences.

One feature I recently wanted to add was the ability to take a screenshot of the current video frame and save it as a note. We already supported text-based notes, so this seemed like a natural extension.

The concept was straightforward: when the user presses a camera button or uses a keyboard shortcut, we capture the current video frame and save it to their notes. Without LLMs, I would likely have avoided implementing such a feature, as it would require extensive research and trial-and-error. However, with LLMs, I felt confident that I could successfully attempt this implementation.

I prompted Claude with the following request, providing relevant context:

We want to give users an option to take a screenshot of the current video frame and show that in the notes section. 
The option should be visible next to the play and reset buttons. 
Also, there should be keyboard shortcuts for both Windows and Mac to take the screenshot. 
Make sure the keyboard shortcut uses two keys like opt+s or something that makes sense.

In my Claude project instructions, I had directed Claude to first discuss any changes before making code modifications. Additionally, I requested SVG wireframes for frontend changes. Claude responded with the following wireframe:

Claude also informed me that it would implement a canvas-based screenshot capture mechanism. This approach seemed correct, so I proceeded with the implementation.

Below is part of the code generated by Claude. It created a hidden canvas element for capturing the current video frame. When the user clicked the button, it called the captureScreenshot function, which performed the following steps:

Accessed the video element inside the YouTube iframe
Drew the current frame to a canvas
Converted the canvas to a base64 JPEG image
Set the screenshot quality to 85% JPEG for a good balance between quality and file size

<canvas 
  ref={canvasRef} 
  style={{ display: 'none' }} 
  className="screenshot-canvas"
/>
const canvasRef = useRef<HTMLCanvasElement>(null);

const captureScreenshot = useCallback(() => {
  try {
    logger.info('Capturing screenshot of current video frame');
    setIsCapturingScreenshot(true);

    // Get the video element from the iframe
    const iframe = document.getElementById('youtube-player-watch-read') as HTMLIFrameElement;

    // Wait a short delay to ensure player is fully loaded
    setTimeout(() => {
      try {
        // Get the video element from within the iframe
        const iframeContent = iframe.contentDocument || iframe.contentWindow.document;
        const videoElement = iframeContent.querySelector('video');


        // Set canvas dimensions to match video
        const canvas = canvasRef.current;
        if (!canvas) {
          logger.error('Canvas element not available');
          setIsCapturingScreenshot(false);
          return;
        }

        canvas.width = videoElement.videoWidth;
        canvas.height = videoElement.videoHeight;

        // Draw the current frame on the canvas
        const ctx = canvas.getContext('2d');

        ctx.drawImage(videoElement, 0, 0, canvas.width, canvas.height);

        // Convert canvas to base64 image data
        const screenshotData = canvas.toDataURL('image/jpeg', 0.85);

      });
    })

This approach seemed correct, but when I implemented the changes, it failed with a cross-origin error:

SecurityError: Failed to read a named property 'document' from 'Window': 
Blocked a frame with origin \"http://localhost:4000\" from accessing a cross-origin frame.

When working with a same-origin HTML5 video element, you can draw its current frame to a canvas using JavaScript (via methods like canvas.getContext('2d').drawImage(video, 0, 0, ...)) and then extract an image from that canvas. However, YouTube videos are served from a different domain, and their content is protected by CORS (Cross-Origin Resource Sharing) policies. This means that even if you embed a YouTube video using their iframe API, attempting to capture its frame with canvas will result in a “tainted canvas” that prevents reading its data.

When I prompted Claude again with the error message, it suggested using the html2canvas library instead:

window.html2canvas(playerContainer, {
  useCORS: true,
  allowTaint: true,
  backgroundColor: '#000000'
}).then(renderedCanvas => {
  // Convert the rendered canvas to data URL
  const screenshotData = renderedCanvas.toDataURL('image/jpeg', 0.9);
  createScreenshotNote(screenshotData);
}).catch(err => {
  logger.error('html2canvas screenshot capture failed, falling back to basic method', err);
  captureBasicScreenshot();
});

Unfortunately, this approach also failed for the same reason. When I took the screenshot, I got a tainted image:

When I prompted Claude again, it began to loop, unable to solve the problem. It started implementing a three-tier screenshot strategy:

Primary Method: Use the html2canvas library (if available) for high-quality screenshots
YouTube Thumbnail Method: Fall back to using YouTube’s thumbnail image for the video
Enhanced Visual Placeholder: Create a visually appealing, informative placeholder with video details

At this point, I recalled that a few days earlier, while considering this functionality, I had researched whether browsers natively support screenshot features. I remembered finding the Screen Capture API (https://developer.mozilla.org/en-US/docs/Web/API/Screen_Capture_API/Using_Screen_Capture).

I saved the Screen Capture API documentation as a PDF and prompted Claude:

Can we use the browser Screen Capture API instead of 
html2canvas as it does not work because of cross-origin issues? 
I have attached documentation.

Claude responded enthusiastically:

You raise a great point! Yes, we can use the Screen Capture API as an alternative approach to html2canvas for capturing video frames. 
This is actually a better solution since it's built directly into modern browsers and is designed specifically for capturing screen content.

It then generated the following code:

const stream = await navigator.mediaDevices.getDisplayMedia({
  video: {
    // Hint that we prefer the current browser tab
    preferCurrentTab: true,
    // Settings for capturing good quality snapshots
    frameRate: 30
  },
  audio: false
});


// Set up one-time event handler for when user is done with selection
const track = stream.getVideoTracks()[0];

// Create a video element to receive the stream
const videoEl = document.createElement('video');
videoEl.srcObject = stream;

// When video is ready, we can take a screenshot
videoEl.onloadedmetadata = () => {
  videoEl.play();

  // Create a canvas to capture the frame
  const canvas = canvasRef.current;
  if (!canvas) {
    throw new Error('Canvas element not available');
  }

  // Set canvas size to match video dimensions
  canvas.width = videoEl.videoWidth;
  canvas.height = videoEl.videoHeight;

  // Pause briefly to ensure frame is loaded
  setTimeout(() => {
    try {
      // Get canvas context and draw the current video frame
      const ctx = canvas.getContext('2d');
      if (!ctx) {
        throw new Error('Could not get canvas context');
      }

      // Draw the current frame
      ctx.drawImage(videoEl, 0, 0, canvas.width, canvas.height);

      // Convert to image data
      const screenshotData = canvas.toDataURL('image/jpeg', 0.9);

The Screen Capture API offers several advantages:

Browser-native solution: It’s built into modern browsers and specifically designed for capturing screen content
Bypasses cross-origin restrictions: Since the capture happens at the browser level, we avoid cross-origin security issues
Higher quality captures: The API captures exactly what’s on screen at full resolution

The code above:

Uses navigator.mediaDevices.getDisplayMedia() to prompt the user to select the video player
Captures a frame from the selected area using a canvas

This solution worked, but instead of taking a screenshot of just the video frame, it captured the entire page. So it solved the problem partially. I researched whether we could select a specific element on the page, but it appears this functionality isn’t yet available. The standard Screen Capture API (using getDisplayMedia()) doesn’t allow restricting the capture to just one element. Instead, the API only lets users select an entire screen, window, or tab. There is an experimental Element Capture API that only recent versions of Chrome implement, but it’s not widely available.

Instead of waiting for broader implementation of the Element Capture API, I decided to add a crop feature that would allow users to select just the video portion of the screenshot. I asked Claude to implement this functionality:

Can we crop the screenshot so that users can only 
select the video part of the screenshot before saving it to notes?

Claude generated an ImageCropper component that allows users to crop the image before saving.

You can see a demonstration of the complete feature in action below:

Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

Discover more from Shekhar Gulati

Share this:

Related

Leave a comment Cancel reply