Use OpenAI Code Interpreter To RAG over user data


When building RAG systems, one common challenge is helping users query their own data. Users often come with a couple of Excel files, Word documents, or CSV files and want to ask questions like “Which department has the highest expenses?” or “What are the trends in our sales data?” Traditional RAG approaches struggle here because they’re designed for large, pre-processed knowledge bases, not for ad-hoc analysis of user-uploaded files.

I am a big fan of OpenAI’s Code Interpreter feature for solving exactly this problem. The code interpreter allows models to write and run Python code in a sandboxed environment to solve tasks. It is available in all tiers of ChatGPT, so you might already have seen it in action. Last week, I used it to process a huge (50 sheets) Excel file and extract a structured JSON from it. It first generated the code, then executed the code against my Excel file, and then gave me a link to download the JSON. The best part is that you can iterate over the solution in a step-by-step manner throughout the complete conversation.

If you have data in Excel, CSV, or any other structured format like JSON or XML, then you can use the code interpreter tool to ask questions about the data. For me, this is a better way to do RAG over user data when the data is not huge. Unlike traditional RAG that requires preprocessing, embedding, and vector storage, the Code Interpreter approach lets users directly upload their files and start querying immediately.

When you are using ChatGPT, it automatically figures out when to invoke the code interpreter tool. If you want to explicitly call it, then you have to use “the python tool” in your prompt. You can look at the leaked system prompt of ChatGPT to see how ChatGPT defines the Python tool.

You are ChatGPT, a large language model trained by OpenAI.

Knowledge cutoff: 2024-06
Current date: 2025-05-06
.....

# Tools

...

## python
When you send a message containing Python code to python, it will be executed in a
stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0
seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail.
Use ace_tools.display_dataframe_to_user(name: str, dataframe: pandas.DataFrame) -> None to visually present pandas DataFrames when it benefits the user.
 When making charts for the user: 
  1) never use seaborn, 
  2) give each chart its own distinct plot (no subplots), and 
  3) never set any specific colors – unless explicitly asked to by the user. 

...

Let’s look at a simple example of using the Code Interpreter via ChatGPT. We can ask ChatGPT to list all the packages installed in the code interpreter tool by making a prompt.

list all the packages installed in python tool. Give the complete list as a text file. 

It will generate the following code.

import pkg_resources

# Get a list of all installed packages
installed_packages = sorted(["{}=={}".format(d.project_name, d.version) for d in pkg_resources.working_set])

# Save the list to a text file
file_path = "/mnt/data/installed_packages.txt"
with open(file_path, "w") as f:
    f.write("\n".join(installed_packages))

file_path

The following code lists that there are 369 packages installed. You can view the complete listing here https://gist.github.com/shekhargulati/73bcb33f0646792aa63e60669728a1e0.

The complete ChatGPT conversation is here https://chatgpt.com/share/684fbbef-93dc-800d-b876-521e5b3b3b19.

Why Code Interpreter Works Better for User Data

Traditional RAG systems require you to preprocess documents, create embeddings, and store them in vector databases. This works well for static knowledge bases but becomes cumbersome when users want to query their own dynamic data files. With Code Interpreter, users can simply upload their Excel files, CSVs, or JSON files and immediately start asking questions. The model can perform complex data analysis, aggregations, and visualizations on the fly—something that traditional RAG cannot do effectively.

Let’s use Code Interpreter Programmatically

Now, let’s see how we can use the Code Interpreter in our code.

You start by creating an OpenAI client:

from openai import OpenAI

client = OpenAI()

Next, you have to upload the file that you want to analyze or query. You can upload multiple files as well. I picked the Aviation Grievance dataset from data.gov https://www.data.gov.in/datasets_webservices/datasets/6622425. It is a CSV with 1,644 records.

file = client.files.create(
    file=open("data/aviation_grievance.csv", "rb"),
    purpose="assistants"
)

Once we have uploaded the file, we can start asking queries. We will ask the following query:

query = "which airline has the most checkin and delay related issues?"

response = client.responses.create(
    model="o4-mini",
    input=query,
    tools=[
        {
            "type": "code_interpreter",
            "container": {"type": "auto", "file_ids": [file.id]}
        }
    ],
)

We used the OpenAI o4-mini model, passing it the code-interpreter tool. We also had to pass the file IDs that this container will have access to. When the container is instantiated, the files are mounted.

The above query will give the following response.output_text:

Analyzing the "Airline" grievances by issue type (using the "totalReceived" counts):

* Check-in & Boarding complaints  
   – IndiGo: 2,783  
   – Air India: 1,895  
   – SpiceJet:   868  
   – Vistara:     587  
   – GoAir:       552  

* Flight‐delay complaints  
   – IndiGo: 3,720  
   – SpiceJet: 3,634  
   – Air India: 3,176  
   – Air India Express: 1,272  
   – GoAir: 1,247  

So IndiGo tops the list for both check-in/boarding issues and delay‐related issues.

Along with the answer, you also have access to the code the model generated to get the answer. The model does not generate code in one go. It does it incrementally. You can look at response.output to see the complete code. The way it works is that the model first reasons and generates the first code snippet. The code is executed in the container. The output of the code is used to generate the next version of the code.

So, it starts with reasoning and generates code:

{
    "id": "rs_684fd2dd8b10819c9058de38b62ba25f0d741e9e64a6f67d",
    "summary": [],
    "type": "reasoning",
    "encrypted_content": null,
    "status": null
},

The code is passed to the code interpreter to execute. The code below tries to understand the structure of the file by reading the first few records:

{
    "id": "ci_684fd2e0433c819ca97e5f563ba2aa650d741e9e64a6f67d",
    "code":"import pandas as pd\\n\\ndf = pd.read_csv(\'/mnt/data/aviation_grievance.csv\')\\ndf.head(), df.columns, len(df)",
    "results": null,
    "status": "completed",
    "type": "code_interpreter_call",
    "container_id": "cntr_684fd2dbf120819180130d4b895086e705646a165004624e",
    "outputs": null
},

Then, it again reasons based on the container output and generates new code. This is a simple example of an agentic application—agents are models using tools in a loop.

Limitations of OpenAI Code Interpreter

There are some limitations of the OpenAI Code Interpreter tool as mentioned below:

  • Code should execute under 60 seconds
  • It cannot make external web requests or API calls
  • If containers remain inactive for 20 minutes, then they expire, so you should treat them as ephemeral
  • It only supports file types listed here – https://platform.openai.com/docs/guides/tools-code-interpreter#supported-files.
  • You can only make 100 requests per minute (RPM) per organization
  • Maximum file upload size is 512 MB per file
  • They have not specified RAM or compute available to each container, but users have reported that large data loads can be sluggish or fail.

Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.

Leave a comment