llm-experiments – Shekhar Gulati

I conducted an LLM training session last week. To teach attendees about structured output, I built an HTML/JS web application. This application allows users to input a webpage and specify fields they want to extract. The web app uses OpenAI’s LLM to extract the relevant information. Before making the OpenAI call, the app first sends a request to Jina to retrieve a markdown version of the webpage. Then, the extracted markdown is passed to OpenAI for further processing. You can access the tool here: Structured Extraction Tool.

Note: The tool will prompt you to enter your OpenAI key, which is stored in your browser’s local storage.

Below, I will demonstrate the app’s workflow using screenshots. The user starts by entering the webpage URL. In this example, I want to extract some information from a case study.

Next, users specify the fields they want to extract. We also support field templates for reusability. For each field, users provide a name, description, and its type.

After specifying the fields, users press the Extract Data button. The app displays the extracted data, along with token usage and cost.

I remain skeptical about using LLMs in an autonomous, agentic manner. In my experience, for software development tasks, they are most useful when employed in a chat-driven development manner, where humans guide their behavior and work with them step by step. The Answer.ai team recently published a post about Devin, sharing multiple tasks where their AI agent failed. Devin is a collaborative AI teammate built to help ambitious engineering teams achieve more. According to Answer.ai blog post, out of 20 tasks they gave to Devin, it failed at 14 tasks, succeeded in 3, and 3 were inconclusive. These results are quite disappointing. They have shared the complete task list in the appendix for anyone to try.

Many of the tasks they mentioned seemed achievable using AI assistants like Claude or ChatGPT, so I decided to experiment with Claude to complete one of them. I’ve increasingly been using Claude for coding-related tasks and have a paid subscription. This experiment uses Claude 3.5 Sonnet.

Tag: llm-experiments

How Good Are LLMs at Generating Functional and Aesthetic UIs? An Experiment

Can Claude Single Call and Zero Shot Do What Devin Can’t Do?