What HappensWhen Vobiz encounters a Stream element in your XML response:
- Vobiz initiates a WebSocket connection to your specified URL
- Once connected, raw audio packets are streamed in real-time
- Your application can process, analyze, or forward the audio
- For bidirectional streams, your app can also send audio back to the call
XML Setup
To initiate an audio stream, include the Stream element in your XML response with the WebSocket URL:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Stream
bidirectional="true"
audioTrack="inbound"
streamTimeout="7200"
keepCallAlive="true">
wss://stream.vobiz.ai/stream
</Stream>
</Response>
Understanding keepCallAliveWithout keepCallAlive="true": The <Stream> element starts the background stream process and finishes immediately. Since there are no more elements in your XML response, the call hits “End Of XML Instructions” and hangs up instantly.With keepCallAlive="true": The system explicitly waits for the stream to finish (disconnect/error/timeout) before moving on or hanging up. This keeps the call open without needing a <Pause> element.Note: keepCallAlive requires bidirectional="true". Use this attribute instead of <Pause> for a cleaner implementation.
Key Configuration Parameters
- WebSocket URL: The text content of the element (e.g.,
wss://stream.vobiz.ai/stream). Must be publicly reachable; use wss:// (TLS) in production - ws:// is for local testing only.
- bidirectional: Set to
true to enable sending audio back to the call. Requires audioTrack="inbound" (or omit it).
- audioTrack:
inbound (caller), outbound (callee), or both. Default inbound. Do not combine both/outbound with bidirectional="true".
- contentType: Negotiated codec + rate - one of
audio/x-l16;rate=8000 (default), audio/x-l16;rate=16000, audio/x-l16;rate=24000, audio/x-mulaw;rate=8000. Echoed back in start.mediaFormat.
- streamTimeout: Maximum streaming duration in seconds (default: 86400 / 24 hours). When reached, Vobiz stops the stream and (for server-side termination) fires
Event=StopStream.
- keepCallAlive: Set to
true to prevent the call from hanging up when the stream ends or encounters an error. Requires bidirectional="true".
- maxRetries: Reconnect attempts if the WebSocket fails to open or drops mid-stream. Default
0 (disabled), max 10.
- statusCallbackUrl / extraHeaders: See the full attribute reference on the
<Stream> element page.
For the complete attribute table, allowed values, and the HTTP status-callback payloads, see Stream XML element.
WebSocket Connection
Your WebSocket server must be ready to accept connections from Vobiz. Here’s what the initial connection looks like:
Connection Start Message
Vobiz sends this when the stream starts
{
"sequenceNumber": 0,
"event": "start",
"start": {
"callId": "5401fd2e-6344-40df-a22c-c8ffea7a92e7",
"streamId": "c4dfd815-a92a-4140-ab85-5ff28c004116",
"accountId": "500025",
"tracks": ["inbound"],
"mediaFormat": {
"encoding": "audio/x-l16",
"sampleRate": 16000
}
},
"extra_headers": "{}"
}
This frame fires once, immediately after the WebSocket opens. The call/stream identifiers live inside the nested start object - not at the top level. start.mediaFormat mirrors the contentType and sample rate negotiated on the <Stream> element.
Audio Data Messages
Continuous 20 ms audio frames sent during the stream
{
"sequenceNumber": 2,
"streamId": "c4dfd815-a92a-4140-ab85-5ff28c004116",
"event": "media",
"media": {
"track": "inbound",
"timestamp": "1778597597091",
"chunk": 2,
"payload": "base64-encoded-audio"
},
"extra_headers": "{}"
}
Vobiz emits ~50 media frames per second (one every 20 ms) while the call is up. media.payload is the base64-encoded raw audio in whatever encoding start.mediaFormat declared - there is no per-frame contentType or sampleRate. For L16, bytes are network byte order (big-endian); swap to little-endian before writing to a WAV file.
End of stream
When the call ends, the sequence Vobiz sends over the WebSocket is:
- The final
media frame (no special marker - its shape is identical to every other media frame).
- The WebSocket
close event - this is the end-of-stream signal.
- Out-of-band: the
hangup_url HTTP webhook fires with the full Event=Hangup payload (HangupCause, Duration, EndTime, etc).
Handle end of stream from your close handler
ws.on('close', (code, reason) => {
console.log(`Stream ended: ${streamId}`, code, reason?.toString());
// Flush recording / transcripts / per-call state here.
});
End-of-stream triggers include: the caller hanging up, the carrier ending the call, streamTimeout being reached, or your own server sending a stop command.
Vobiz does not send an inbound { "event": "stop" }. The WebSocket simply closes after the final media frame - that is the only in-band signal. If you’re waiting for a stop JSON message on the socket, you’ll wait forever.Event=StopStream on statusCallbackUrl is only delivered when the server initiates the termination via an outbound stop packet (it fires reliably in that case). It does not fire for caller-hangup or mid-stream-kill paths.The authoritative “call is over” signal across all termination paths is the hangup_url HTTP webhook, which fires once with Event=Hangup plus the full hangup payload. Configure it on the REST call-create request.stop does exist as an outbound (Server → Vobiz) command for agent-initiated hangup - see Server-initiated stop on the Stream events page.
Connection Flow
- Vobiz receives your XML response containing the Stream element
- WebSocket connection established to your specified URL
- “start” event sent with call metadata and stream configuration
- Continuous “media” events stream audio packets in real-time
- Bidirectional streams (optional): Your app can send playAudio events back to Vobiz
- WebSocket close - the final
media frame is followed by a socket close (no inbound stop event)
hangup_url webhook fires with the authoritative Event=Hangup payload
Reconnects and idempotency
If the WebSocket fails to open or drops mid-stream and you set maxRetries (1-10) on the <Stream> element, Vobiz retries the connection. Each retry is a fresh connection: it opens a new socket and replays a new start event with a new streamId (the callId stays the same).
Design your handler to be idempotent across reconnects:
- Key per-call state on
start.callId, not start.streamId, so a reconnect resumes the same logical session.
- Expect to receive a second
start after a drop. Re-send your greeting only if the conversation hadn’t progressed, or resume from saved state.
- A
close you see may be a transient drop that Vobiz will retry, not the end of the call. The authoritative end-of-call signal is the hangup_url Event=Hangup webhook (see below).
With maxRetries="0" (the default), a dropped socket is terminal.
Implementation Examples
Prefer a runnable reference? See the Bun Media Stream Server - a minimal sink-only example that answers a call with <Stream>, records audio to WAV, and logs status callbacks. Drop in your own STT/LLM/TTS pipeline to extend it.
Node.js WebSocket Server
Simple WebSocket server to receive audio
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });
wss.on('connection', (ws) => {
console.log('Vobiz connected to audio stream');
ws.on('message', (message) => {
const data = JSON.parse(message);
if (data.event === 'start') {
console.log('Stream started:', data.start.streamId);
console.log('Call ID:', data.start.callId);
} else if (data.event === 'media') {
// Process base64-encoded audio data (big-endian L16 for audio/x-l16)
const audioBuffer = Buffer.from(data.media.payload, 'base64');
console.log('Received audio chunk:', audioBuffer.length, 'bytes');
// Your audio processing logic here
// e.g., send to transcription service, save to file, etc.
}
// Note: Vobiz does NOT send an inbound { event: 'stop' }.
// Detect end of stream via the 'close' event below.
});
ws.on('close', () => {
// Canonical end-of-stream signal. Flush recordings/transcripts here.
console.log('WebSocket connection closed');
});
});
console.log('WebSocket server listening on port 8080');
Python WebSocket Handler
Python asyncio WebSocket server
import asyncio
import websockets
import json
import base64
async def handle_audio_stream(websocket, path):
print("Vobiz connected to audio stream")
async for message in websocket:
data = json.loads(message)
if data['event'] == 'start':
print(f"Stream started: {data['start']['streamId']}")
print(f"Call ID: {data['start']['callId']}")
elif data['event'] == 'media':
# Decode base64 audio data (big-endian L16 for audio/x-l16)
audio_bytes = base64.b64decode(data['media']['payload'])
print(f"Received audio chunk: {len(audio_bytes)} bytes")
# Your audio processing logic here
# e.g., send to transcription, analysis, etc.
# Note: Vobiz does NOT send an inbound 'stop' event.
# The async-for loop exits when the socket closes - that is
# your end-of-stream signal.
async def main():
async with websockets.serve(handle_audio_stream, "0.0.0.0", 8080):
print("WebSocket server listening on port 8080")
await asyncio.Future() # run forever
if __name__ == "__main__":
asyncio.run(main())
Vobiz XML Response with Stream
Complete example with status callbacks
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Speak>This call is being recorded for quality assurance.</Speak>
<Stream
bidirectional="false"
audioTrack="both"
streamTimeout="3600"
statusCallbackUrl="https://api.vobiz.ai/stream-status"
statusCallbackMethod="POST"
contentType="audio/x-mulaw;rate=8000"
extraHeaders="session_id=abc123,agent_id=john">
wss://stream.vobiz.ai/stream
</Stream>
<Dial>+14155551234</Dial>
</Response>
This example streams both inbound and outbound audio with 8kHz sample rate (Mu-law), includes custom session headers, and sends status updates to your callback URL.
Troubleshooting
Invalid Stream ConfigurationProblem: If your call hangs up with End Of XML Instructions, you likely have an invalid configuration.Solution: The media server does not support audioTrack="both" when bidirectional="true". You must set audioTrack="inbound" (or remove the attribute) for bidirectional streams.