Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt

Use this file to discover all available pages before exploring further.

These are the two fundamental architectures for connecting phone calls to AI voice agents. The choice is not about which is “better” - it depends entirely on what you are building, your platform choices, and your priorities between deployment speed and total ownership.

SIP Trunking

Telephony industry standard - robust, enterprise-ready, and mandatory when you need call transfer, PBX integration, or managed platforms like LiveKit, VAPI, and Retell AI.

WebSocket Streaming

Developer-native path - highly cost-effective at scale, more direct, and ideal for custom AI pipelines built with Pipecat or bare-metal code.

Architectural Comparison

The two approaches operate at completely different layers of the stack. SIP is a telephony-layer protocol. WebSocket streaming is an application-layer transport.

SIP Trunking Path

1

PSTN

Caller dials phone number
2

SIP Trunk (Vobiz)

PSTN → SIP INVITE routed to endpoint URI
3

Platform SIP Endpoint

LiveKit / VAPI / Retell terminates SIP, creates room
4

RTP Audio

UDP audio stream directly to platform
5

AI Agent

Receives WebRTC audio, runs STT → LLM → TTS
6

RTP Audio (back)

TTS audio back to caller via UDP

WebSocket Streaming Path

1

PSTN

Caller dials phone number
2

Vobiz Webhook

Fetches your webhook, receives VoiceXML stream directive
3

WebSocket (wss://)

Direct TCP connection established to your server
4

Your Server

Receives base64 µ-law audio directly from Vobiz
5

AI Pipeline

STT → LLM → TTS logic driven entirely by your code
6

WebSocket (back)

µ-law voice audio sent back over same socket connection
Important architectural nuance: These two architectures are not completely mutually exclusive. You can use a generic SIP Trunk provider to route a call to Vobiz, and then use a Vobiz VoiceXML <Stream> directive to pipe that exact call to your custom WebSocket server. SIP handles the initial routing; WebSocket handles the audio layer.

Full Decision Matrix

Evaluation FactorSIP TrunkingWebSocket Streaming
Developer ProfileTelephony admins, DevOps, or teams comfortable with SIP/RTP conceptsPython / Node.js engineers - WebSocket + async patterns are familiar
Setup ComplexityHigh - trunk config, IP ACLs, SIP URI routing, codec negotiation, firewall rulesLow/Medium - VoiceXML webhook + WebSocket server handling JSON
Call Setup Latency1–5 seconds (SIP INVITE handshake + PSTN routing overhead)Near-instant (WebSocket TCP handshake + Vobiz webhook fetch)
Audio Transport LatencyLower - UDP/RTP has no retransmission. Dropped packets are skipped, preserving real-time flow.Slightly higher - TCP guarantees delivery. Retransmitted packets can add jitter on poor networks.
Audio Quality SupportG.711 8kHz or G.722 16kHz (HD wideband, if chosen carrier supports)G.711 µ-law 8kHz (PSTN floor, same as standard SIP)
Infrastructure CostVobiz trunk rate + AI platform fee (LiveKit/VAPI/Retell markup)Vobiz channel rate + raw AI API costs only. No platform markup.
Live Call TransferSupported - blind and warm transfer via SIP REFERSupported - Vobiz handles call transfer on the WebSocket path as well
Enterprise PBX IntegrationNative - Avaya, Cisco UCM, Teams Direct Routing demand SIPNot applicable - no standard bridge to existing PBX infra
Turn-Taking / InterruptionAbstracted - handled completely by the managed platformManual - you must build VAD + async pipeline cancellation
Horizontal ScalingCarrier-layer - add trunk channels without touching server infraProcess-layer - you must scale WebSocket workers/containers

Platform Compatibility Matrix

PlatformSIP TrunkingWebSocket StreamingRole in Architecture
LiveKit✅ Primary-Complete AI voice platform. SIP trunk terminates into LiveKit SIP Service. AI agent runs as LiveKit participant.
VAPI✅ Primary-Managed AI voice platform. BYO SIP trunk or direct SIP URI. PSTN calls route exclusively through SIP trunking.
Retell AI✅ Primary-Managed AI voice platform. Elastic SIP trunk or Register Phone Call API (SIP URI dialing).
ElevenLabs✅ Primary-Conversational AI platform with native SIP integration. Connects directly to PSTN phone calls via SIP trunking.
Pipecat-✅ PrimaryOpen-source Python pipeline framework. Designed exclusively around WebSocket transport. No native SIP support.
Direct Python (Vobiz)-✅ PrimaryBare-metal WebSocket handler against Vobiz streaming API. Maximum control, maximum ownership.
BolnaSupported✅ PrimaryManaged voice AI orchestration layer. Can integrate via WebSocket streams or via SIP trunk configuration.
UltravoxSupported✅ PrimaryReal-time AI voice platform. Primary integration via WebSocket audio; SIP via intermediary transport.

Cost Analysis

The Vobiz channel rate is identical for both paths in spirit - the difference comes from whether you add a managed AI platform layer on top (SIP path) or own the pipeline yourself (WebSocket path). All pricing below is in INR.

SIP Trunking Cost Stack

ItemCostNotes
Vobiz SIP channel₹0.45/min45 paise per minute, inbound + outbound
Phone number (DID)₹500/monthPer active Vobiz number
Managed AI platformTheir pricingLiveKit, VAPI, Retell, ElevenLabs each charge their own per-minute or subscription rate. Main cost driver at scale.
STT (e.g. Deepgram)Included or APIVAPI/Retell include STT; LiveKit needs your own API key
LLM (e.g. GPT-4o)API key requiredPass-through or bundled per-minute rate
TTS (e.g. ElevenLabs)API key requiredPer character or per minute
Vobiz base cost: ₹0.45/min + ₹500/month per number. Total = Vobiz rate + AI platform fees + STT/LLM/TTS API costs.

WebSocket Streaming Cost Stack

ItemCostNotes
Vobiz channel rate₹0.65/min65 paise per minute, inbound + outbound
Phone number (DID)₹500/monthSame as SIP
No managed AI platform₹0You build the pipeline yourself. This is the key saving.
STT (e.g. Deepgram)Direct API ratePay STT provider directly
LLM (e.g. GPT-4o-mini)Direct API ratePay OpenAI / Anthropic / Google directly
TTS (e.g. Cartesia / ElevenLabs)Direct API rateChoose your TTS provider
Server computeCloud infraOne process per concurrent call
Vobiz WebSocket rate: ₹0.65/min + ₹500/month per number. Total = Vobiz rate + direct AI API costs only. No platform markup.
The bottom line: Under 50,000 calls/month, the platform premium is often worth the saved engineering time. Above 50,000 calls/month, owning the pipeline (WebSocket path) pays off significantly.

Latency Analysis

SIP has lower audio transport latency than WebSocket streaming. SIP uses UDP/RTP - a fire-and-forget protocol that never retransmits dropped packets, keeping audio delivery strictly real-time. WebSocket runs over TCP, which guarantees delivery by retransmitting lost packets - useful for data, but a source of jitter for live audio on poor networks. If latency is the only factor you care about, SIP wins. But latency is rarely why developers choose WebSocket streaming. They choose it for the ecosystem - direct access to AI frameworks (Pipecat), raw STT/LLM/TTS APIs, full pipeline control, and lower cost.
PathVobiz Telephony LayerNotes
Both paths< 50msAudio delivery from PSTN to your server
SIPUDP/RTPNo retransmission; dropped packets skipped
WebSocketTCPRetransmissions add jitter on poor networks

When to Choose SIP

  • You need to integrate with enterprise PBX (Avaya, Cisco UCM, Microsoft Teams Direct Routing)
  • You’re using a managed AI platform (LiveKit, VAPI, Retell, ElevenLabs)
  • You need maximum audio quality (G.722 wideband)
  • Live call transfer must work without custom code
  • You want carrier-layer scaling (add trunk channels without touching infra)

When to Choose WebSockets

  • You’re building a custom AI pipeline (Pipecat, direct Python, Node.js)
  • You want direct access to STT/LLM/TTS APIs
  • Total cost matters more than platform abstraction
  • You want full control of the pipeline (interruption, VAD, turn-taking)
  • You’re comfortable scaling WebSocket workers yourself

Migration Path

You don’t have to commit to one architecture forever:
  1. Start with SIP + managed platform for fastest time-to-market
  2. Validate product-market fit with low engineering investment
  3. Migrate to WebSocket streaming once volume justifies engineering ownership
  4. Use Vobiz <Stream> directive to bridge: SIP trunk routes call → WebSocket pipes audio to your server