Bare-metal XML WebSocket

Build a real-time AI voice agent using only Vobiz XML WebSocket streaming - no LiveKit, no Pipecat, no third-party SDK.

View on GitHub

Clone and run the full working example

Getting started

git clone https://github.com/vobiz-ai/Vobiz-All-XML.git
cd Vobiz-All-XML
pip install -r requirements.txt
python server.py

Overview

This example shows the lowest-level integration possible with Vobiz - raw WebSocket audio frames, manual VAD, direct STT/LLM/TTS API calls, and base64 audio encoding back to Vobiz. Use this when you need maximum control and minimum latency with no intermediary layers.

Architecture

Caller → Vobiz SIP
              ↓
    XML: <Response><Connect><Stream url="wss://your-server/ws"/></Connect></Response>
              ↓
    FastAPI WebSocket endpoint
              ↓
    JSON event parsing → base64 decode → G.711 μ-law bytes
              ↓
    Deepgram streaming STT WebSocket (speech → text)
              ↓
    OpenAI ChatCompletions (text → response tokens)
              ↓
    ElevenLabs / OpenAI TTS (tokens → audio bytes)
              ↓
    base64 encode → JSON → WebSocket → Vobiz → Caller

How it works

XML routing

When an inbound call hits your FastAPI webhook, respond with Vobiz XML instructing Vobiz to open a bidirectional WebSocket to your server.

Audio frame parsing

Vobiz sends JSON frames containing base64-encoded G.711 μ-law audio. Decode these frames into raw byte streams.

Streaming STT

Forward raw audio bytes to Deepgram’s streaming WebSocket for real-time transcription. As words are recognized, stream them to the LLM.

LLM response

Send the transcription to OpenAI’s ChatCompletions API. Response tokens stream back as they are generated.

TTS and playback

Synthesize tokens using a TTS engine (ElevenLabs or OpenAI). Base64-encode the resulting audio and send it back over the WebSocket to Vobiz, which plays it to the caller.

Vobiz XML hook

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Connect>
    <Stream url="wss://your-server.com/ws" />
  </Connect>
</Response>

When to use this

Use case	Recommendation
Maximum latency control	✅ This example
Rapid prototyping	Use LiveKit or Pipecat examples
Custom audio processing	✅ This example
Production-ready pipeline	Use LiveKit or Pipecat examples

Environment variables

.env

DEEPGRAM_API_KEY=your-deepgram-key
OPENAI_API_KEY=sk-...
ELEVENLABS_API_KEY=your-elevenlabs-key
HTTP_PORT=8000
PUBLIC_URL=https://your-server.com

Vobiz + Pipecat

IVR menu

⌘I

Getting started
Overview
Architecture
How it works
Vobiz XML hook
When to use this
Environment variables

Guides

LiveKit Templates

XML & Python Templates

Industries

Compare Platforms

Bare-metal XML WebSocket

View on GitHub

Getting started

Overview

Architecture

How it works

Vobiz XML hook

When to use this

Environment variables

Guides

LiveKit Templates

XML & Python Templates

Industries

Compare Platforms

Documentation Index

View on GitHub

​Getting started

​Overview

​Architecture

​How it works

​Vobiz XML hook

​When to use this

​Environment variables

Getting started

Overview

Architecture

How it works

Vobiz XML hook

When to use this

Environment variables