Notes on mini-swe-agent


I was going over the code base of mini-swe-agent today. The core agent loop is 100 lines long. All agentic framework does something similar. Interesting facts about mini-swe-agent:

  • Only uses bash tool
  • Does not depend on function calling. It parses the response to extract commands that need to be run

The Mini-SWE-Agent operates in a continuous loop, iteratively solving problems by querying an LLM for actions, executing bash commands, and observing results until the task is complete.

Below is a pictorial representation of the below agent loop.

    def run(self, task: str, **kwargs) -> tuple[str, str]:
        """Run step() until agent is finished. Return exit status & message"""
        self.extra_template_vars |= {"task": task, **kwargs}
        self.messages = []
        self.add_message("system", self.render_template(self.config.system_template))
        self.add_message("user", self.render_template(self.config.instance_template))
        while True:
            try:
                self.step()
            except NonTerminatingException as e:
                self.add_message("user", str(e))
            except TerminatingException as e:
                self.add_message("user", str(e))
                return type(e).__name__, str(e)
┌─────────────────────────────────────────────────────────────────────────────┐
│                            AGENT.RUN(task)                                   │
│                                                                               │
│  1. Initialize messages with system & instance prompts                       │
│  2. Enter main loop                                                          │
│                                                                               │
│     ┌───────────────────────────────────────────────────────────┐           │
│     │                      MAIN LOOP                             │           │
│     │                                                            │           │
│     │  ┌──────────────────────────────────────────────────────┐ │           │
│     │  │                    STEP()                            │ │           │
│     │  │                                                      │ │           │
│     │  │  ┌────────────────────────────────────────────┐     │ │           │
│     │  │  │ 1. QUERY MODEL                             │     │ │           │
│     │  │  │    • Check limits (cost/steps)             │     │ │           │
│     │  │  │    • Send messages[] to LLM                │     │ │           │
│     │  │  │    • Get response with bash command        │     │ │           │
│     │  │  │    • Add assistant message to history      │     │ │           │
│     │  │  └────────────────────────┬───────────────────┘     │ │           │
│     │  │                           ▼                          │ │           │
│     │  │  ┌────────────────────────────────────────────┐     │ │           │
│     │  │  │ 2. PARSE ACTION                            │     │ │           │
│     │  │  │    • Extract ```bash...``` blocks          │     │ │           │
│     │  │  │    • Validate exactly one command          │     │ │           │
│     │  │  │    • Raise FormatError if invalid          │     │ │           │
│     │  │  └────────────────────────┬───────────────────┘     │ │           │
│     │  │                           ▼                          │ │           │
│     │  │  ┌────────────────────────────────────────────┐     │ │           │
│     │  │  │ 3. EXECUTE ACTION                          │     │ │           │
│     │  │  │    • Run command in environment            │     │ │           │
│     │  │  │    • Handle timeouts                       │     │ │           │
│     │  │  │    • Check for completion marker           │     │ │           │
│     │  │  └────────────────────────┬───────────────────┘     │ │           │
│     │  │                           ▼                          │ │           │
│     │  │  ┌────────────────────────────────────────────┐     │ │           │
│     │  │  │ 4. OBSERVE & ADD TO HISTORY                │     │ │           │
│     │  │  │    • Format output with template           │     │ │           │
│     │  │  │    • Add user message with observation     │     │ │           │
│     │  │  │    • Return to loop for next iteration     │     │ │           │
│     │  │  └────────────────────────────────────────────┘     │ │           │
│     │  │                                                      │ │           │
│     │  └──────────────────────────────────────────────────────┘ │           │
│     │                           │                                │           │
│     │                           ▼                                │           │
│     │        ┌─────────────────────────────────┐                │           │
│     │        │   EXCEPTION HANDLING            │                │           │
│     │        ├─────────────────────────────────┤                │           │
│     │        │ NonTerminatingException:        │                │           │
│     │        │   • FormatError                 │────────────────▶│ Continue  │
│     │        │   • ExecutionTimeoutError       │                │  Loop     │
│     │        │   • Add error to messages[]     │                │           │
│     │        ├─────────────────────────────────┤                │           │
│     │        │ TerminatingException:           │                │           │
│     │        │   • Submitted (task complete)   │────────────────▶│ Exit      │
│     │        │   • LimitsExceeded              │                │  Loop     │
│     │        │   • Return (status, message)    │                │           │
│     │        └─────────────────────────────────┘                │           │
│     │                                                            │           │
│     └────────────────────────────────────────────────────────────┘           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

If you are using a model that supports structured outputs and function calling then you do not need parse step. mini-swe-agent make use of exceptions for control flow. There are two kinds of exceptions:

  • Non-terminating exceptions: These exceptions don’t stop the agent, they just add error context:
    • FormatError: LLM output wasn’t properly formatted
    • ExecutionTimeoutError: Command took too long to execute
    • The error message is added to the conversation and the loop continues
  • Terminating exceptions: These exceptions end the agent’s execution:
  • Submitted: Task completed successfully (completion marker found)
  • LimitsExceeded: Cost or step limit reached
  • The agent returns with a status and final message

The agent recognizes task completion when the command output starts with:

  • COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT
  • MINI_SWE_AGENT_FINAL_OUTPUT

Below is the execution fora task involving collatz sequence.

  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that can interact with a computer.\n\nYour response must contain exactly ONE bash code block with ONE command (or commands connected with && or ||).\nInclude a THOUGHT section before your command where you explain your reasoning process.\nFormat your response as shown in <format_example>.\n\n<format_example>\nYour reasoning and analysis here. Explain why you want to perform the action.\n\n```bash\nyour_command_here\n```\n</format_example>\n\nFailure to follow these rules will cause your response to be rejected."
    },
    {
      "role": "user",
      "content": "Please solve this issue: A Collatz sequence in mathematics can be defined as follows. Starting with any positive integer:\n\nif n is even, the next number in the sequence is n / 2\nif n is odd, the next number in the sequence is 3n + 1\nIt is conjectured that every such sequence eventually reaches the number 1. Test this conjecture.\n\nBonus: What input n <= 1000000 gives the longest sequence?\n\nYou can execute bash commands and edit files to implement the necessary changes.\n\n## Recommended Workflow\n\nThis workflows should be done step-by-step so that you can iterate on your changes and any possible problems.\n\n1. Analyze the codebase by finding and reading relevant files\n2. Create a script to reproduce the issue\n3. Edit the source code to resolve the issue\n4. Verify your fix works by running your script again\n5. Test edge cases to ensure your fix is robust\n6. Submit your changes and finish your work by issuing the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`.\n   Do not combine it with any other command. <important>After this command, you cannot continue working on this task.</important>\n\n## Important Rules\n\n1. Every response must contain exactly one action\n2. The action must be enclosed in triple backticks\n3. Directory or environment variable changes are not persistent. Every action is executed in a new subshell.\n   However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files\n\n<system_information>\nDarwin 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:28:17 PDT 2025; root:xnu-11417.140.69~1/RELEASE_X86_64 x86_64 i386\n</system_information>\n\n## Formatting your response\n\nHere is an example of a correct response:\n\n<example_response>\nTHOUGHT: I need to understand the structure of the repository first. Let me check what files are in the current directory to get a better understanding of the codebase.\n\n```bash\nls -la\n```\n</example_response>\n\n## Useful command examples\n\n### Create a new file:\n\n```bash\ncat <<'EOF' > newfile.py\nimport numpy as np\nhello = \"world\"\nprint(hello)\nEOF\n```\n\n### Edit files with sed:<important>\nYou are on MacOS. For all the below examples, you need to use `sed -i ''` instead of `sed -i`.\n</important>```bash\n# Replace all occurrences\nsed -i 's/old_string/new_string/g' filename.py\n\n# Replace only first occurrence\nsed -i 's/old_string/new_string/' filename.py\n\n# Replace first occurrence on line 1\nsed -i '1s/old_string/new_string/' filename.py\n\n# Replace all occurrences in lines 1-10\nsed -i '1,10s/old_string/new_string/g' filename.py\n```\n\n### View file content:\n\n```bash\n# View specific lines with numbers\nnl -ba filename.py | sed -n '10,20p'\n```\n\n### Any other command you want to run\n\n```bash\nanything\n```"
    },
    {
      "role": "assistant",
      "content": "THOUGHT: I need to inspect the repository to find any existing Collatz implementation or test files so I can reproduce and fix the issue. I'll list the files in the current directory (including hidden ones) to start the investigation.\n\n```bash\nls -la\n```"
    },
    {
      "role": "user",
      "content": "<returncode>0</returncode>\n<output>....</output>"
    },
    {
      "role": "assistant",
      "content": "THOUGHT: I should search the repository for any existing Collatz-related code or tests so I know whether to modify existing files or create a new script to compute Collatz sequences and find the number \u2264 1,000,000 that produces the longest sequence.\n\n```bash\ngrep -RIn \"collatz\" . || true\n```"
    },
    {
      "role": "user",
      "content": "<returncode>0</returncode>\n<output>\n</output>"
    },
    {
      "role": "assistant",
      "content": "THOUGHT: There is no existing Collatz implementation in the repo, so I'll create a new Python script that efficiently computes Collatz sequence lengths for all starting values n \u2264 1,000,000 using memoization, then runs it to report whether each sequence reached 1 in the tested range and which starting number \u2264 1,000,000 produces the longest chain. I'll write the script to be fast and memory-efficient (iterative approach, caching intermediate lengths), then execute it.\n\n```bash\ncat <<'EOF' > collatz.py\n#!/usr/bin/env python3\nimport time\n\ndef main():\n    MAX_N = 1_000_000\n    start_time = time.time()\n    cache = {1: 1}  # length of sequence starting at 1 is 1 (just 1)\n    max_len = 1\n    max_start = 1\n\n    for i in range(1, MAX_N + 1):\n        n = i\n        seq = []\n        # traverse until we hit a cached value\n        while n not in cache:\n            seq.append(n)\n            if n % 2 == 0:\n                n //= 2\n            else:\n                n = 3 * n + 1\n        # now n is cached; build lengths back\n        length = cache[n]\n        for k in reversed(seq):\n            length += 1\n            cache[k] = length\n        if cache[i] > max_len:\n            max_len = cache[i]\n            max_start = i\n\n    elapsed = time.time() - start_time\n    # Basic verification: any number's chain ends at 1 if it's cached ultimately referencing 1\n    # (since we seeded cache with 1 and only follow Collatz rules, this indicates reaching 1)\n    print(f\"Tested range: 1..{MAX_N}\")\n    print(\"Result: All tested starting numbers reached 1 (Collatz conjecture holds for this range).\")\n    print(f\"Longest chain starting number <= {MAX_N}: {max_start} with length {max_len}\")\n    print(f\"Elapsed time: {elapsed:.3f} seconds\")\n\nif __name__ == '__main__':\n    main()\nEOF\n\npython3 collatz.py\n```"
    },
    {
      "role": "user",
      "content": "<returncode>0</returncode>\n<output>\nTested range: 1..1000000\nResult: All tested starting numbers reached 1 (Collatz conjecture holds for this range).\nLongest chain starting number <= 1000000: 837799 with length 525\nElapsed time: 1.120 seconds\n</output>"
    },
    {
      "role": "assistant",
      "content": "THOUGHT: I implemented and ran a Collatz test script (collatz.py) that verifies the conjecture for all starting n \u2264 1,000,000, found that all tested sequences reached 1, and determined the longest chain starts at 837799 with length 525. Now I'll submit and finalize the task by emitting the required completion string.\n\n```bash\necho COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT\n```"
    }
  ],

From the system prompt we can learn following:

  • It forces decomposition of complex tasks into atomic steps as it ask to respond with a single bash code block with single command.
  • First reason then write code. This can work with non-thinking models as well.
  • Specify format model should respond for reliable parsing
  • Explicit completion signal prevents infinite loops

From instance/user prompt we can learn following:

  • Suggest model to follow a workflow. Prompt encourages systematic exploration
  • Give examples of useful commands like sed, cat, etc.
  • Give termination clarity


Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

Leave a comment