For Voice AI developers, SIP is the vital bridge between the global telephone network (PSTN) and modern application infrastructure. It translates standard phone calls into digital streams your AI agents can interact with.Documentation Index
Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt
Use this file to discover all available pages before exploring further.
How SIP works
SIP operates similarly to HTTP, using text-based requests and responses to negotiate settings like audio codecs via SDP (Session Description Protocol). It typically runs over UDP for speed, but supports TCP and TLS for secure, reliable signaling.Signaling vs. Media (RTP)
SIP: The Signaling Layer
- Runs via UDP, TCP, or TLS
- Text-based messages (like HTTP)
- Negotiates codecs and ports
- Handles ringing, answering, hanging up
- Zero audio bytes ever pass through SIP
RTP: The Media Layer
- Always runs over UDP (prioritizes speed)
- Binary packets, 20ms audio frames
- Carries G.711 µ-law or G.722 audio
- Direct path between endpoints
- Fully distinct from SIP servers
When you hit the classic “one-way audio” bug during testing, SIP negotiation succeeded flawlessly. The isolated failure is in the RTP path, likely blocked by symmetric NAT or restrictive UDP firewall rules.
The SIP handshake flow
SIP Packet Timeline
SIP trunking for voice AI
A SIP Trunk is a virtual phone line that connects your infrastructure directly to a carrier like Vobiz. When someone dials your number, Vobiz sends a SIP INVITE to your platform - LiveKit, Vapi, or your own server. Call routing path:Advantages
- Universal Access - It is the absolute global standard for telephony.
- Native Transfer Support - Includes REFER techniques allowing invisible, live human agent hand-offs.
- Enterprise PBX integration - Plugs directly into corporate PBX systems - Asterisk, FreeSWITCH, Teams, Cisco.
- G.722 wideband audio - Gives you 16 kHz instead of 8 kHz - doubles speech recognition accuracy in noisy environments.
Disadvantages
- Connection setup delay - SIP handshakes add ~1 second before audio starts. Not ideal for sub-second latency requirements.
- Firewall configuration - RTP uses random UDP ports (typically 10000–20000). Your firewall must allow that range outbound.
- AI barge-in complexity - SIP wasn’t designed for streaming tokens or mid-sentence interruptions - you may need middleware to bridge them.
- Silent failures - Misconfigured ACLs or URI routes fail quietly - calls just drop with no error log. Test end-to-end before production.
Platform compatibility
LiveKit - primary SIP hub
LiveKit runs a dedicated SIP service that ingests trunks cleanly and exposes callers as standard SIP participants in your WebRTC rooms. It preserves native encryption, Krisp noise filtering, and call transfer support. View ImplementationVAPI - unified telephony
VAPI exposes unified SIP destinations that accept bring-your-own trunks. It supports custom SIP headers, enabling dynamic AI context injection from CRM lookups directly from the incoming INVITE. View ImplementationRetell AI - elastic integration
Retell AI supports two connection paths: direct bridging from Vobiz, or dynamic API registration hooks that give you fine-grained control over application flow before SIP connectivity is established. View ImplementationElevenLabs - native SIP integration
ElevenLabs Conversational AI connects directly to PSTN phone calls via SIP trunking - no intermediary platform required. Point a Vobiz SIP trunk at ElevenLabs’ SIP endpoint and inbound calls are answered directly by an ElevenLabs AI agent. ElevenLabs also provides standalone TTS, STT, and voice cloning services that other SIP-based pipelines (LiveKit agents, etc.) can consume as a voice synthesis layer. View ImplementationCommon Developer Pitfalls
When to choose SIP
LiveKit / Vapi / Retell integration
Live call transfers to a human agent
Enterprise PBX (Asterisk, Cisco)
Regulated environments (HIPAA, PCI)
Standard PSTN phone numbers required
High call volume at scale
What developers usually do next
Create your first SIP trunk
Step-by-step console walkthrough - outbound + inbound (5 min)
Connect Vapi
Route SIP directly into a Vapi AI agent (3 min)
WebSocket streaming
Skip SIP - stream audio directly from your server (10 min read)
SIP vs WebSockets
Full decision matrix - 10 factors, real latency numbers (10 min read)