I have spent last few months working on a regulatory intelligence software. One of the important feature is extracting obligations from dense PDF documents. In this post I am sharing some of the lessons we’ve learned about architecting AI systems that work in production.
#1. Break complex tasks: List First, Analyze Later
One of our biggest breakthroughs came from realizing that obligation extraction isn’t a single-step process. Initially, we tried to extract complete, structured obligations in one pass, but this led to inconsistent results and missed obligations.
Our solution? A two-step approach that mirrors how human analysts work:
Step 1: Obligation Identification – Cast a wide net to find all potential obligation statements using trigger phrases like “shall”, “must”, “should”, and “is required to”. This agent prioritizes completeness over precision, ensuring we don’t miss anything.
async def identify_obligations(section_text):
prompt = """
Extract all obligation statements from this text.
Look for trigger phrases: shall, must, should, is required to
Return only the obligation statements as a list.
"""
return await identification_agent.run(prompt + section_text)
Step 2: Detailed Analysis – Take each identified obligation and extract structured information: who is obligated, what they must do, under what conditions, and whether it’s a general requirement or regulatory power.
async def analyze_obligation(obligation_text, context):
prompt = """
Analyze this obligation and extract:
- obligated_party: Who must comply
- conditions: When/how it applies
- is_general_requirement: Boolean
- is_regulatory_power: Boolean
"""
return await analysis_agent.run(prompt, obligation_text, context)
This separation of concerns dramatically improved our recall rate. The identification agent can focus purely on finding obligations without getting bogged down in complex structuring tasks.
Continue reading “Extracting obligations from regulatory text”