Building Production-Ready Voice Agents

I spent the second half of 2025 building a voice agent platform for a company that provides IT support services to higher education institutions. The platform is now live and handling calls from students and staff across multiple universities.

We built a multi-tenant system where students call a phone number, speak with an AI agent, and get help with common IT tasks. The three primary use cases we’ve deployed are:

  1. Password resets — verifying identity and generating new credentials
  2. FAQ responses — answering common questions about IT services
  3. Front desk routing — transferring calls to the appropriate department or staff member

The platform is extensible, allowing us to add new use cases with minimal changes to the core system.

Tech Stack

  • Backend: Python 3.x, FastAPI, Pipecat, Pipecat Flows
  • Speech-to-Text: Deepgram
  • LLM: OpenAI GPT-4.1 and GPT-4.1-mini
  • Text-to-Speech: Cartesia
  • Telephony: Twilio
  • Database: PostgreSQL

This stack worked well for a small team of three developers building and iterating quickly.

Architecture Overview

Incoming calls connect via Twilio (or WebRTC for browser-based testing). The WebSocket handler creates a processing pipeline that orchestrates the speech-to-text, LLM, and text-to-speech services. The Flow Manager maintains conversation state, determining which prompts and functions are available at each step.


In this post, I’ll share the principles, best practices, and lessons learned from building this system. While our examples use Pipecat, the concepts apply regardless of your tech stack.


Continue reading “Building Production-Ready Voice Agents”