When building RAG systems, one common challenge is helping users query their own data. Users often come with a couple of Excel files, Word documents, or CSV files and want to ask questions like “Which department has the highest expenses?” or “What are the trends in our sales data?” Traditional RAG approaches struggle here because they’re designed for large, pre-processed knowledge bases, not for ad-hoc analysis of user-uploaded files.
I am a big fan of OpenAI’s Code Interpreter feature for solving exactly this problem. The code interpreter allows models to write and run Python code in a sandboxed environment to solve tasks. It is available in all tiers of ChatGPT, so you might already have seen it in action. Last week, I used it to process a huge (50 sheets) Excel file and extract a structured JSON from it. It first generated the code, then executed the code against my Excel file, and then gave me a link to download the JSON. The best part is that you can iterate over the solution in a step-by-step manner throughout the complete conversation.
If you have data in Excel, CSV, or any other structured format like JSON or XML, then you can use the code interpreter tool to ask questions about the data. For me, this is a better way to do RAG over user data when the data is not huge. Unlike traditional RAG that requires preprocessing, embedding, and vector storage, the Code Interpreter approach lets users directly upload their files and start querying immediately.
Continue reading “Use OpenAI Code Interpreter To RAG over user data”