Key takeaways
- VoIP (Voice over Internet Protocol) carries voice as data packets over IP networks instead of the circuit-switched phone network.
- A call = digitize → encode with a codec → packetize → signal with SIP → stream media over RTP/SRTP → reassemble.
- A standard G.711 call uses ~85–90 kbps per direction; Opus scales from 6 to 510 kbps and adapts to the network.
- There are ~5 types of VoIP (ATA, softphone, mobile, cloud PBX, programmable/API). For voice AI, the ones that matter are programmable + low-latency.
- VoIP’s weak points (jitter, power/internet dependency, security, E911) are real but solvable with the right network.
What is VoIP?
Voice over Internet Protocol (VoIP) is a set of technologies for voice communication over Internet Protocol (IP) networks. The one distinction that matters: the legacy telephone network is circuit-switched, it opens a dedicated path for the whole call, while VoIP is packet-switched, breaking your voice into small packets that share the network with all other internet traffic. That single change is why VoIP is cheap, global, and programmable, and also why call quality depends on the network underneath.How VoIP works (step by step)
- Digitization. Your analog voice is sampled (typically 8,000 times a second, 8 kHz, for telephony) and turned into digital values.
- Encoding with a codec. A codec compresses that audio. The two you’ll meet most:
- G.711, the PSTN-grade codec, ~64 kbps of audio (≈85–90 kbps with IP/UDP/RTP overhead). Pristine but heavy.
- Opus, modern, adaptive from 6 to 510 kbps with built-in noise and echo handling; the default for AI-grade audio.
- Signaling (SIP). The call is set up by the Session Initiation Protocol (SIP). SIP messages (
INVITE,200 OK,BYE) run over ports 5060 (unencrypted) or 5061 (TLS). - Media (RTP/SRTP). SIP carries no audio; once the call connects, the voice streams over RTP, or SRTP when encrypted.
- Jitter buffer & playback. The receiver buffers packets to smooth out jitter, reorders them, decodes, and plays them back as sound.
How much bandwidth does VoIP use?
A single VoIP call is light by modern standards:| Codec | Audio bitrate | Per call (with overhead) |
|---|---|---|
| G.711 | 64 kbps | ~85–90 kbps each way |
| G.729 | 8 kbps | ~30 kbps each way |
| Opus | 6–510 kbps (adaptive) | ~30–40 kbps typical |
Components of a VoIP network
A working VoIP deployment is more than “the internet.” The pieces:- VoIP endpoints, IP phones and softphones (apps) that originate and receive calls.
- An IP-PBX or call-control server, routes calls internally and connects to the outside world (or a cloud platform that does this for you).
- Network hardware, routers and switches that should prioritize voice traffic (QoS).
- A Session Border Controller (SBC), at the edge for security, NAT traversal, and trunk control.
- A connection to the PSTN, a SIP trunk or Voice API that reaches real phone numbers.
The 5 types of VoIP
- ATA / adapter-based. A small box (an Analog Telephone Adapter) that plugs a legacy analog phone into VoIP, the cheapest way to keep familiar handsets.
- Softphone. An app on a laptop or desktop (a headset plus software), popular for remote teams.
- Mobile VoIP. Calling apps on a smartphone over Wi-Fi or data.
- Cloud PBX / business VoIP. A hosted phone system for an organization, extensions, IVR, voicemail, and recording.
- Programmable / API VoIP. Voice built into your own software via APIs and VobizXML. This is the category that powers contact centres, campaigns, and voice AI agents, and it’s where Vobiz lives.
What equipment do you need to set up VoIP?
- A stable internet connection (~100 kbps of headroom per concurrent call).
- An endpoint, an IP phone, a softphone app, an ATA for legacy handsets, or, for programmable VoIP, just your application and an API key.
- A provider, a SIP trunk, a cloud PBX, or a Voice API.
- (For production) a Session Border Controller and encryption (SRTP/TLS) to secure trunks and block toll fraud.
VoIP call quality and QoS
Because voice rides a shared network, quality depends on three things: latency (delay), jitter (variation in packet arrival), and packet loss. To keep calls crisp:- Apply Quality of Service (QoS) on your network so voice packets get priority over bulk data.
- Use wideband codecs (Opus) where possible for HD audio.
- Choose a low-latency provider, a single-hop, direct-carrier path beats public-internet routing. (Vobiz runs sub-80 ms single-hop vs the 300–400 ms many legacy platforms hit.)
- Monitor, watch MOS, jitter, and loss in your call logs.
VoIP vs landline (PSTN)
| Landline (PSTN) | VoIP | |
|---|---|---|
| Network | Circuit-switched | Packet-switched (IP) |
| Cost | Per-line, distance-sensitive | Low, distance-agnostic |
| Setup time | Days–weeks, on-site | Minutes, self-serve |
| Mobility | Fixed location | Any device, anywhere |
| Scaling | Add physical lines | Provision in software |
| Number types | Local only | Local, mobile, toll-free, global |
| Programmability | None | Routing, IVR, recording, streaming via API |
| HD audio | No (8 kHz) | Yes (wideband / 24 kHz with Opus) |
| Analytics | None | Recording, transcription, call analytics |
| Power outage | Works (line-powered) | Needs power + internet |
| Emergency (E911) | Precise location | Location must be registered |
Advantages of VoIP
- Lower cost, one shared network for voice and data slashes communication costs.
- Global reach, numbers and routing anywhere, no distance premium.
- Programmability, embed calling in apps; add IVR, recording, transfer, and real-time streaming.
- HD audio, wideband codecs beat the 8 kHz landline ceiling, which also helps speech recognition.
- Elastic scaling, add capacity in software for spiky or seasonal traffic.
- Feature-rich, voicemail, conferencing, analytics, and more, all in software.
Disadvantages of VoIP (and how they’re solved)
- Power + internet dependency. Unlike a line-powered landline, VoIP needs both. Mitigation: UPS, failover internet, mobile fallback.
- Latency & jitter. Packets can arrive late or out of order, causing lag or choppiness. Mitigation: low-latency carrier routing, a jitter buffer, and QoS to prioritize voice, the difference between a 300 ms legacy path and a sub-80 ms one.
- Security. An unprotected trunk on the internet is exposed to toll fraud and eavesdropping. Mitigation: encrypt signaling and media (TLS + SRTP), put an SBC in front, and use IP access control lists.
- Emergency-call location. VoIP numbers aren’t tied to a physical exchange, so E911/emergency location must be registered. Mitigation: register service addresses; use the local emergency framework.
VoIP, SIP, and SIP trunking, how they relate
These terms get used interchangeably but aren’t the same. VoIP is the broad category (voice over IP). SIP is the signaling protocol most VoIP uses to set up calls. A SIP trunk is one productized VoIP service: the connection that links your phone system or app to the public telephone network. Some VoIP also uses WebRTC and WebSocket streaming instead of SIP, for example, browser-based calling. For the full comparison, see SIP vs VoIP.How much does VoIP cost?
VoIP pricing is usually per-minute or per-seat, far below legacy per-line plus long-distance models. The exact number depends on destination, volume, and features, and many providers price in USD with asymmetric inbound/outbound rates. Vobiz keeps it flat and INR-native: ₹0.65/min (65 paise) for both inbound and outbound, with enterprise pricing above ~50,000 minutes a month. Compared with a traditional PRI (per-circuit fees, long-distance charges, and on-site maintenance), VoIP’s pay-as-you-go model is dramatically cheaper to start and scale.VoIP use cases
- Cloud contact centres, route, queue, record, and analyze at scale.
- Remote and distributed teams, softphones and mobile apps replace desk lines.
- Click-to-call in apps and websites, via WebRTC.
- Voice AI agents, connect Vapi/Retell/ElevenLabs/Pipecat to real numbers over SIP or WebSocket.
- Notifications, OTP, and reminders, programmable outbound calls.
VoIP for voice AI, what actually matters
When VoIP carries a voice AI agent, the priorities shift from “cheap minutes” to conversation quality in real time:- Latency budget. A natural turn has to fit under ~1 second across telephony + STT + LLM + TTS. Legacy VoIP at 300–400 ms eats that budget before the model runs.
- Audio fidelity. 24 kHz wideband audio (Opus) gives speech-to-text more signal than the 8 kHz norm.
- Barge-in & streaming. The caller has to be able to interrupt; that needs bidirectional audio streaming, not record-then-process.
How Vobiz delivers VoIP
Vobiz is VoIP infrastructure built for voice AI, not retrofitted from a BPO-era stack:- Sub-80 ms latency on a single-hop, event-driven architecture with direct carrier connect (vs 300–400 ms legacy).
- AI-native media, bidirectional 24 kHz audio streaming with native noise cancellation.
- Instant eKYC provisioning, API key to live call in minutes (not 4–8 weeks); DID in 130+ countries, outbound to 190+, all number types via SIP trunking or BYOC.
- Programmable, routing, IVR, recording, transfer, and a full Voice API.
- Secure & reliable, SRTP/TLS 1.3; 99.99% uptime; 4.2+ MOS at 3M+ calls/day; flat ₹0.65/min.
- It powers your stack, not a locked-in agent, voice-AI builders like Bolna, fintechs like Razorpay and Acko, and enterprises like KPMG run on Vobiz.
Frequently asked questions
What does VoIP stand for?
What does VoIP stand for?
VoIP stands for Voice over Internet Protocol, carrying voice calls as data over IP networks instead of the traditional circuit-switched telephone network.
How much internet speed does VoIP need?
How much internet speed does VoIP need?
Budget roughly 100 kbps per concurrent call for a clean G.711 call (less with Opus). Consistent low latency and low jitter matter more than raw bandwidth.
Is VoIP the same as SIP?
Is VoIP the same as SIP?
No. VoIP is the broad category; SIP is the signaling protocol most VoIP uses to set up calls. SIP is one part of how VoIP works.
Does VoIP work in a power cut?
Does VoIP work in a power cut?
Not on its own, VoIP needs power and internet. Use a UPS, backup connectivity, or mobile fallback. A line-powered landline keeps working; that’s its one durable edge.
What equipment do I need for VoIP?
What equipment do I need for VoIP?
An internet connection, an endpoint (IP phone, softphone, or ATA, or just your app for programmable VoIP), and a provider (SIP trunk, cloud PBX, or Voice API).
What codecs does VoIP use?
What codecs does VoIP use?
Common ones are G.711 (64 kbps, PSTN-grade), G.729 (8 kbps, bandwidth-saving), and Opus (6–510 kbps, adaptive, AI-grade). Opus is preferred for high-fidelity, low-latency calls.
Is VoIP secure?
Is VoIP secure?
It can be, with TLS for signaling, SRTP for media, IP access control lists, and a Session Border Controller. Unencrypted VoIP on the public internet is exposed to fraud and eavesdropping.
Is VoIP good enough for a voice AI agent?
Is VoIP good enough for a voice AI agent?
Yes, if the latency and audio are right. Prioritize sub-80 ms telephony, 24 kHz audio, and bidirectional streaming so the conversation feels natural in real time.
Further reading on Vobiz
- What is SIP? · SIP vs VoIP · What is SIP trunking? · What is a Voice API?
- SIP trunking overview · Audio streaming · VobizXML, how it works
- Cloud IVR · Call recording · WebRTC integration
Sources
- Wikipedia, “Voice over IP”.
- IETF, “SIP: Session Initiation Protocol” (RFC 3261), June 2002.
- Wikipedia, “Session Initiation Protocol”.
Build on Vobiz
Provision a number and place your first programmable VoIP call in minutes