Debugging failed HTTP request with Claude Code

I have a Python HTTP client built using the httpx library. The client worked fine when I was working with the test environment, but as soon as I pointed it to the prod environment, the request started failing with an HTTP 403 error. To be 100% sure, I tested the same request using the Bruno REST client and it worked. I used Bruno to generate cURL and Python 3 (without any dependencies) code, and they both worked as well. I spent 15 minutes debugging myself but couldn’t figure out the reason. The same request passing in Bruno, cURL, and Python 3 (without dependencies) was working, but the client code written with the httpx client library was failing.

So, I opened Claude Code and prompted it to debug the issue for me.

 I am facing an issue where if I run the cURL request below it succeeeds. The same call if we make it to @app/clients/identity_client.py  via id_test.py give 403. I am 
  having hard time understanding the difference between the two. Help me debug what is the difference between two and why it is failing. Then, help me fix it.


  curl --request POST \
    --url URL \
    --header 'content-type: application/json' \
    --data '{
    "key": "00e78489023b4d6ea2ed440ef5fb3e2f",
    "tenantDomain": "domain",
    "username": "user"
  }'

Claude Code first suggested that the 403 error was coming because of:

  1. Different base URLs: Your Python client uses env1.identitysystem.io but the working cURL uses env2.identitysystem.io
  2. Different key: The cURL uses a specific key while your client generates a random UUID

I told Claude that this was not the case. In my id_test.py I set the base URL to env2.identitysystem.io.

Claude Code then said:

You’re absolutely right! You are already setting the correct base URL in id_test.py. Looking at the client code again, the issue is likely the random key generation.

Claude Code then ran the Python client code with the key value from the cURL request and found out that the key is not the issue. Even with the same key, the API call was failing.

The issue persists. Let me debug further by checking if there are any HTTP headers differences. Let me modify the client to print the actual request being made and add proper headers:

Claude Code added debug logs to print headers and data to debug the issue.

Using the debug logs, it figured out the issue.

Great! Now I can see the issue more clearly. The response is an HTML 403 error from the load balancer (server: awselb/2.0), not from the API itself. This suggests there might be a User-Agent or other header requirement.

Now let’s test if setting a cURL-like User-Agent fixes the issue

Perfect! The issue was indeed the User-Agent header. The API/load balancer was blocking requests with the default httpx User-Agent but allows requests with a cURL-like User-Agent.

httpx library uses python-httpx/X.Y.Z as user agent where X.Y.Z depends on the httpx library version.

Claude Code successfully fixed the issue.

Root Cause: The Identity provider API/load balancer was blocking requests based on the User-Agent header. The default httpx User-Agent was being rejected with a 403 Forbidden error.

Conclusion

This type of issue is particularly frustrating for developers because environment dissimilarities create false assumptions. When the same request works in Bruno and cURL but fails in your application, you don’t immediately suspect something as basic as the User-Agent header.

These bugs are time-consuming because they violate our expectations – if the HTTP request is identical, it should work regardless of the client. The root cause often lies in subtle differences that aren’t obvious, like default headers that vary between tools.

Having a systematic debugging approach, whether through AI assistance or methodical logging, helps identify these hidden variables more efficiently than manual trial and error. Sometimes an external perspective is needed to spot what you’ve overlooked.

Notes from Gemini Embedding Paper

I was reading a paper by the Google DeepMind team on how they trained Gemini Embedding, a state-of-the-art, unified embedding model. This is the second paper I’ve read this month on training embedding models. Last week, I read about how the Jina embedding model was trained. The Jina embedding paper was thin and lacked details, so I didn’t write about it. This paper is full of insights, so I thought I’d write a short post sharing what I learned.

Gemini Embedding achieves state-of-the-art performance across MMTEB’s multilingual, English, and code benchmarks.

Gemini embeddings use a multi-resolution loss (MRL) so that a single model can produce embeddings of different sizes (768, 1536, 3072). During training, the model applies separate contrastive losses on different sub-portions of the embedding vector, ensuring that both shorter and longer embeddings are well-trained. This provides flexibility: smaller embeddings for efficiency, larger ones for accuracy — all from the same model.

They cite two main reasons why the Gemini Embedding model achieves state-of-the-art performance in benchmarks:

  • The Gemini Embedding model is initialized from the weights of the Gemini LLM backbone. They also note that several recent embedding models such as E5-Mistral, SFR-Mistral, BGE-ICL, and NV-Embed have been initialized from the Mistral-7B (Jiang et al., 2023) backbone and then further adapted as embedding models. The same is true for the jina-code-embeddings-0.5b and 1.5b models, as they are built on the Qwen2.5-Coder-0.5B and Qwen2.5-Coder-1.5B backbones.
  • The second reason they cite is high-quality datasets. These datasets are synthetically generated using Gemini LLM. They mention: “Leveraging Gemini’s diverse capabilities, we train Gemini Embedding on a comprehensive suite of embedding tasks. To construct a high-quality, heterogeneous training dataset, we employ Gemini for several critical data curation steps: filtering low-quality examples, determining relevant positive and negative passages for retrieval, and generating rich synthetic datasets. This curated dataset facilitates training with a contrastive learning objective, enabling Gemini Embedding to learn robust semantic representations.”

In the paper, they also mention that the Gemini embedding model is trained with a contrastive loss that pulls queries close to their correct targets while pushing away incorrect ones. Negatives are usually sampled from the same batch, and sometimes hard negatives are added to make learning more robust. Each example is also tagged with a task type, which conditions the model to learn embeddings useful across different domains like Q&A or fact-checking.

Each training example also includes a task description such as "question answering" or "fact checking". This string tells the model what kind of relationship between the query and target it should focus on. In effect, it makes the embeddings task-aware, allowing a single embedding model to generalize across multiple use cases.

They also discuss that to train the model they used a two-stage process — Pre-finetuning and Finetuning.

  • Pre-finetuning: First, the model is “pre-finetuned” on a large number of potentially noisy (query, target) pairs, omitting the hard-negative term from the loss function. They found it beneficial to use a large batch size, as the primary objective is to adapt the parameters from autoregressive generation to encoding.
  • Finetuning: Next, the model is fine-tuned on a large mixture of task-specific datasets containing (query, target, hard negative target) triples. For this phase of training, they found it beneficial to use smaller batch sizes (e.g., less than 1024) and to limit each batch to a single dataset, as distinguishing a given positive target from in-batch targets from the same task provides greater signal than discerning (say) a retrieval target from a classification label.

Comparing Different OpenAI Models on Extracting Structured Information from PDF Documents

I was working on a problem where I needed to extract information from hotel tariff sheet PDF documents. These documents provide details on seasonal room rates, occupancy terms, and related supplements. They serve as standard reference material for travel agents, tour operators, and partners when contracting accommodations. Below is a screenshot of a synthetic document (similar to the original) that I created using ChatGPT.

For this use case I used OpenAI responses API. I tried extraction with gpt-4.1-mini, gpt-4o, gpt-4o-mini, gpt-5-nanoand gpt-5-mini models.

Continue reading “Comparing Different OpenAI Models on Extracting Structured Information from PDF Documents”

Notes on mini-swe-agent

I was going over the code base of mini-swe-agent today. The core agent loop is 100 lines long. All agentic framework does something similar. Interesting facts about mini-swe-agent:

  • Only uses bash tool
  • Does not depend on function calling. It parses the response to extract commands that need to be run

The Mini-SWE-Agent operates in a continuous loop, iteratively solving problems by querying an LLM for actions, executing bash commands, and observing results until the task is complete.

Continue reading “Notes on mini-swe-agent”