devin – Shekhar Gulati

I remain skeptical about using LLMs in an autonomous, agentic manner. In my experience, for software development tasks, they are most useful when employed in a chat-driven development manner, where humans guide their behavior and work with them step by step. The Answer.ai team recently published a post about Devin, sharing multiple tasks where their AI agent failed. Devin is a collaborative AI teammate built to help ambitious engineering teams achieve more. According to Answer.ai blog post, out of 20 tasks they gave to Devin, it failed at 14 tasks, succeeded in 3, and 3 were inconclusive. These results are quite disappointing. They have shared the complete task list in the appendix for anyone to try.

Many of the tasks they mentioned seemed achievable using AI assistants like Claude or ChatGPT, so I decided to experiment with Claude to complete one of them. I’ve increasingly been using Claude for coding-related tasks and have a paid subscription. This experiment uses Claude 3.5 Sonnet.

Tag: devin

Can Claude Single Call and Zero Shot Do What Devin Can’t Do?