How executives select GenAI vendors

I was reading state of AI in Business 2025 report today and below resonated with me. I am also building enterprise Generative AI products and found it useful.

Memory is becoming critical for succeeding at Generative AI products. People expect Generative AI products to become smart. I have listened to multiple talks this year. These talks were from folks at Microsoft and different AI labs. They all are talking about memory.

Paper: Working with AI: Measuring the Occupational Implications of Generative AI

Today I was going over a paper by Microsoft Research team on how AI is impacting professsional work. This paper was published in July 2025. They analyzed 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot to understand how generative AI impacts different occupations and work activities.

They seperated analysis into two distinct perspectives:

  • User Goals: What people are trying to accomplish with AI assistance
  • AI Actions: What work activities the AI actually performs

They used O*NET database’s 332 Intermediate Work Activities (IWAs) as the basis of their classification. One of the surprising finding of the paper is that in 40% of conversations, user goals and AI actions were completely different – AI often acts as a coach/advisor rather than directly performing the user’s task.

They also list occupations where there is highest AI applicability like translators, sales reps, customer service representatives, writers, etc.

As per their study currently AI augments human work rather than fully automating it. Most occupations have some AI applicability, but none are fully automated. They also mentions that impact is uneven – some work activities highly affected, others not at all. Even successful AI assistance typically covers only moderate portions of work activities.

Continue reading “Paper: Working with AI: Measuring the Occupational Implications of Generative AI”

I Tested Gemma 3 270M on the Simplest NLP Task

Google recently released Gemma 3 270M, a remarkably compact 270 million parameter language model that promises efficient AI capabilities in a tiny package. As someone building AI voice agents, I was immediately interested in testing whether this model could handle one of my simplest but frequent use cases: generating message variations for conversational AI.

For example, given a message like “Please wait. I am checking if your username exists in the system,” I want the LLM to generate semantically equivalent variations such as “One moment please while I verify your username in our system.” This is a lightweight task that models like GPT-4.1-mini, Claude Haiku, or Gemini Flash handle well, but they still add latency. To minimize this, I’m considering using the Gemma 270M model in a sidecar to eliminate unnecessary network delays.

The Gemma 3 270M represents Google’s “right tool for the job” philosophy—a model designed specifically for fine-tuning rather than general-purpose use. According to Google’s release:

“Its true power is unlocked through fine-tuning. Once specialized, it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness.”

What makes this model particularly interesting from a technical perspective is its parameter allocation: approximately 170M parameters are dedicated to embeddings, with only 100M for the transformer layers. This unusual split reflects Google’s strategy to maintain a large vocabulary while keeping the model compact—a design choice that facilitates adaptation to different languages and domains through fine-tuning.

The model is available in GGUF format and can run efficiently on CPU, making it accessible for edge deployment scenarios where larger models would be prohibitive.

Continue reading “I Tested Gemma 3 270M on the Simplest NLP Task”

Making coderunner-ui work with Docker using Claude Code

Today, I was browsing Hacker News when I stumbled upon an interesting project: coderunner-ui. The premise was compelling – a local-first AI workspace that lets you chat with LLMs and execute generated code in isolated environments, all without sending your data to the cloud. As someone who’s always looking for tools that respect privacy while providing powerful capabilities, this caught my attention immediately.

I cloned the repository, excited to try it out. Then I hit a wall: “Requires macOS on Apple Silicon.”

I use an Intel Mac. The Apple container system that coderunner-ui depends on is only available on Apple Silicon Macs.I have spent considerable time last few weeks solving something similar so I decided to dig deeper.

Continue reading “Making coderunner-ui work with Docker using Claude Code”

Extracting obligations from regulatory text

I have spent last few months working on a regulatory intelligence software. One of the important feature is extracting obligations from dense PDF documents. In this post I am sharing some of the lessons we’ve learned about architecting AI systems that work in production.

#1. Break complex tasks: List First, Analyze Later

One of our biggest breakthroughs came from realizing that obligation extraction isn’t a single-step process. Initially, we tried to extract complete, structured obligations in one pass, but this led to inconsistent results and missed obligations.

Our solution? A two-step approach that mirrors how human analysts work:

Step 1: Obligation Identification – Cast a wide net to find all potential obligation statements using trigger phrases like “shall”, “must”, “should”, and “is required to”. This agent prioritizes completeness over precision, ensuring we don’t miss anything.

async def identify_obligations(section_text):
    prompt = """
    Extract all obligation statements from this text.
    Look for trigger phrases: shall, must, should, is required to
    Return only the obligation statements as a list.
    """
    return await identification_agent.run(prompt + section_text)

Step 2: Detailed Analysis – Take each identified obligation and extract structured information: who is obligated, what they must do, under what conditions, and whether it’s a general requirement or regulatory power.

async def analyze_obligation(obligation_text, context):
    prompt = """
    Analyze this obligation and extract:
    - obligated_party: Who must comply
    - conditions: When/how it applies  
    - is_general_requirement: Boolean
    - is_regulatory_power: Boolean
    """
    return await analysis_agent.run(prompt, obligation_text, context)

This separation of concerns dramatically improved our recall rate. The identification agent can focus purely on finding obligations without getting bogged down in complex structuring tasks.

Continue reading “Extracting obligations from regulatory text”