Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt

Use this file to discover all available pages before exploring further.

For Voice AI developers, SIP is the vital bridge between the global telephone network (PSTN) and modern application infrastructure. It translates standard phone calls into digital streams your AI agents can interact with.
The Golden Rule of Telephony: SIP is a signaling protocol only. It handles call routing and setup, but carries absolutely zero audio. The actual voice data travels completely separately over RTP (Real-time Transport Protocol). Grasping this split is critical.

How SIP works

SIP operates similarly to HTTP, using text-based requests and responses to negotiate settings like audio codecs via SDP (Session Description Protocol). It typically runs over UDP for speed, but supports TCP and TLS for secure, reliable signaling.

Signaling vs. Media (RTP)

SIP: The Signaling Layer

  • Runs via UDP, TCP, or TLS
  • Text-based messages (like HTTP)
  • Negotiates codecs and ports
  • Handles ringing, answering, hanging up
  • Zero audio bytes ever pass through SIP

RTP: The Media Layer

  • Always runs over UDP (prioritizes speed)
  • Binary packets, 20ms audio frames
  • Carries G.711 µ-law or G.722 audio
  • Direct path between endpoints
  • Fully distinct from SIP servers
When you hit the classic “one-way audio” bug during testing, SIP negotiation succeeded flawlessly. The isolated failure is in the RTP path, likely blocked by symmetric NAT or restrictive UDP firewall rules.

The SIP handshake flow

SIP Packet Timeline
PSTN CALLER                           YOUR ENDPOINT
Caller   ─── INVITE (SDP offer) ───▶  Endpoint
Caller   ◀── 100 Trying ────────────  Endpoint
Caller   ◀── 180 Ringing ───────────  Endpoint
Caller   ◀── 200 OK (SDP answer) ───  Endpoint
Caller   ─── ACK ───────────────────▶ Endpoint

Caller   ◀══ RTP AUDIO STREAM (UDP) ══▶ Endpoint

Caller   ─── BYE ───────────────────▶ Endpoint
Caller   ◀── 200 OK ────────────────  Endpoint

SIP trunking for voice AI

A SIP Trunk is a virtual phone line that connects your infrastructure directly to a carrier like Vobiz. When someone dials your number, Vobiz sends a SIP INVITE to your platform - LiveKit, Vapi, or your own server. Call routing path:
1

PSTN

Caller dials your number
2

Vobiz

Authenticates the call and forwards the INVITE to your SIP URI
3

Platform

LiveKit / Vapi answers and negotiates the codec
4

AI Agent

RTP audio flows to your STT → LLM → TTS pipeline

Advantages

  • Universal Access - It is the absolute global standard for telephony.
  • Native Transfer Support - Includes REFER techniques allowing invisible, live human agent hand-offs.
  • Enterprise PBX integration - Plugs directly into corporate PBX systems - Asterisk, FreeSWITCH, Teams, Cisco.
  • G.722 wideband audio - Gives you 16 kHz instead of 8 kHz - doubles speech recognition accuracy in noisy environments.

Disadvantages

  • Connection setup delay - SIP handshakes add ~1 second before audio starts. Not ideal for sub-second latency requirements.
  • Firewall configuration - RTP uses random UDP ports (typically 10000–20000). Your firewall must allow that range outbound.
  • AI barge-in complexity - SIP wasn’t designed for streaming tokens or mid-sentence interruptions - you may need middleware to bridge them.
  • Silent failures - Misconfigured ACLs or URI routes fail quietly - calls just drop with no error log. Test end-to-end before production.

Platform compatibility

LiveKit - primary SIP hub

LiveKit runs a dedicated SIP service that ingests trunks cleanly and exposes callers as standard SIP participants in your WebRTC rooms. It preserves native encryption, Krisp noise filtering, and call transfer support. View Implementation

VAPI - unified telephony

VAPI exposes unified SIP destinations that accept bring-your-own trunks. It supports custom SIP headers, enabling dynamic AI context injection from CRM lookups directly from the incoming INVITE. View Implementation

Retell AI - elastic integration

Retell AI supports two connection paths: direct bridging from Vobiz, or dynamic API registration hooks that give you fine-grained control over application flow before SIP connectivity is established. View Implementation

ElevenLabs - native SIP integration

ElevenLabs Conversational AI connects directly to PSTN phone calls via SIP trunking - no intermediary platform required. Point a Vobiz SIP trunk at ElevenLabs’ SIP endpoint and inbound calls are answered directly by an ElevenLabs AI agent. ElevenLabs also provides standalone TTS, STT, and voice cloning services that other SIP-based pipelines (LiveKit agents, etc.) can consume as a voice synthesis layer. View Implementation

Common Developer Pitfalls

01. Wrong username in SIP credentialsYour Vobiz dashboard display name is not your SIP username. Use the credential username you created in the Trunks section - anything else returns 401.
02. Do not hardcode platform IPsSaaS AI platforms (Vapi, LiveKit) rotate egress IPs. Hardcoding them in your firewall ACL will cause silent disconnections. Use hostname-based rules instead.
03. One-way audio = blocked UDPIf the call connects but audio only flows one way, SIP signaling worked - the problem is your firewall blocking RTP (UDP). Open ports 10000–20000.
04. International calls return 403International dialing is off by default. Enable it in the console under your account settings before calling international numbers.

When to choose SIP

LiveKit / Vapi / Retell integration

Live call transfers to a human agent

Enterprise PBX (Asterisk, Cisco)

Regulated environments (HIPAA, PCI)

Standard PSTN phone numbers required

High call volume at scale

Want to skip SIP entirely? If you want to route audio directly from your app without a phone number, WebSocket streaming is simpler and lower latency for server-side pipelines.

What developers usually do next

Create your first SIP trunk

Step-by-step console walkthrough - outbound + inbound (5 min)

Connect Vapi

Route SIP directly into a Vapi AI agent (3 min)

WebSocket streaming

Skip SIP - stream audio directly from your server (10 min read)

SIP vs WebSockets

Full decision matrix - 10 factors, real latency numbers (10 min read)