Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt

Use this file to discover all available pages before exploring further.

Build a real-time AI voice agent using only Vobiz XML WebSocket streaming - no LiveKit, no Pipecat, no third-party SDK.

View on GitHub

Clone and run the full working example

Getting started

git clone https://github.com/vobiz-ai/Vobiz-All-XML.git
cd Vobiz-All-XML
pip install -r requirements.txt
python server.py

Overview

This example shows the lowest-level integration possible with Vobiz - raw WebSocket audio frames, manual VAD, direct STT/LLM/TTS API calls, and base64 audio encoding back to Vobiz. Use this when you need maximum control and minimum latency with no intermediary layers.

Architecture

Caller → Vobiz SIP

    XML: <Response><Connect><Stream url="wss://your-server/ws"/></Connect></Response>

    FastAPI WebSocket endpoint

    JSON event parsing → base64 decode → G.711 μ-law bytes

    Deepgram streaming STT WebSocket (speech → text)

    OpenAI ChatCompletions (text → response tokens)

    ElevenLabs / OpenAI TTS (tokens → audio bytes)

    base64 encode → JSON → WebSocket → Vobiz → Caller

How it works

1

XML routing

When an inbound call hits your FastAPI webhook, respond with Vobiz XML instructing Vobiz to open a bidirectional WebSocket to your server.
2

Audio frame parsing

Vobiz sends JSON frames containing base64-encoded G.711 μ-law audio. Decode these frames into raw byte streams.
3

Streaming STT

Forward raw audio bytes to Deepgram’s streaming WebSocket for real-time transcription. As words are recognized, stream them to the LLM.
4

LLM response

Send the transcription to OpenAI’s ChatCompletions API. Response tokens stream back as they are generated.
5

TTS and playback

Synthesize tokens using a TTS engine (ElevenLabs or OpenAI). Base64-encode the resulting audio and send it back over the WebSocket to Vobiz, which plays it to the caller.

Vobiz XML hook

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Connect>
    <Stream url="wss://your-server.com/ws" />
  </Connect>
</Response>

When to use this

Use caseRecommendation
Maximum latency control✅ This example
Rapid prototypingUse LiveKit or Pipecat examples
Custom audio processing✅ This example
Production-ready pipelineUse LiveKit or Pipecat examples

Environment variables

.env
DEEPGRAM_API_KEY=your-deepgram-key
OPENAI_API_KEY=sk-...
ELEVENLABS_API_KEY=your-elevenlabs-key
HTTP_PORT=8000
PUBLIC_URL=https://your-server.com