Build a real-time AI voice agent using only Vobiz XML WebSocket streaming - no LiveKit, no Pipecat, no third-party SDK.Documentation Index
Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt
Use this file to discover all available pages before exploring further.
View on GitHub
Clone and run the full working example
Getting started
Overview
This example shows the lowest-level integration possible with Vobiz - raw WebSocket audio frames, manual VAD, direct STT/LLM/TTS API calls, and base64 audio encoding back to Vobiz. Use this when you need maximum control and minimum latency with no intermediary layers.Architecture
How it works
XML routing
When an inbound call hits your FastAPI webhook, respond with Vobiz XML instructing Vobiz to open a bidirectional WebSocket to your server.
Audio frame parsing
Vobiz sends JSON frames containing base64-encoded G.711 μ-law audio. Decode these frames into raw byte streams.
Streaming STT
Forward raw audio bytes to Deepgram’s streaming WebSocket for real-time transcription. As words are recognized, stream them to the LLM.
LLM response
Send the transcription to OpenAI’s ChatCompletions API. Response tokens stream back as they are generated.
Vobiz XML hook
When to use this
| Use case | Recommendation |
|---|---|
| Maximum latency control | ✅ This example |
| Rapid prototyping | Use LiveKit or Pipecat examples |
| Custom audio processing | ✅ This example |
| Production-ready pipeline | Use LiveKit or Pipecat examples |
Environment variables
.env