A minimal Bun server that answers a Vobiz call with aDocumentation Index
Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt
Use this file to discover all available pages before exploring further.
<Stream> XML, accepts the inbound WebSocket, and writes each call’s audio to a local WAV file. Use it as a sink-only reference when bringing up a new agent stack — replace the WAV writer with your own STT/LLM/TTS pipeline once frames are flowing.
Run the example
bun install && bun start — server listens on port 3000 by default.What it does
The server exposes three endpoints:| Endpoint | Purpose |
|---|---|
GET / | Returns VobizXML containing a <Stream> element pointing at /stream. |
WS /stream | Receives start / media / stop JSON frames; writes audio to recordings/call-<timestamp>.wav. |
POST /webhook | Receives status callbacks (StartStream, MediaError, StopStream, …) and appends each to webhook-events.log. |
playAudio / checkpoint / clearAudio / stop packets back to Vobiz. The full bidirectional protocol is documented in Stream Events, Play Audio, Checkpoint Event, and Clear Audio; extend index.js with those once you wire in an agent.
Run it
index.js:
VobizXML answer response
GET / returns:
| Attribute | Purpose |
|---|---|
bidirectional="true" | Caller audio streams to your server and your server can stream audio back. |
keepCallAlive="true" | Keeps the call up while no further XML is executing — required for long-running agent sessions. |
statusCallbackUrl | Vobiz POSTs lifecycle events here (StartStream, MediaError, StopStream, …). |
<Stream> attributes.
What the handler does with each frame
GET /returns the<Response><Stream>…</Stream></Response>XML.- On WebSocket
/streamopen, allocatesrecordings/call-<ISO timestamp>.wav. - On every
mediaframe, base64-decodesmedia.payload, byte-swaps L16 → WAV PCM little-endian, and appends to the buffer. - On close, flushes the buffered PCM to disk as a mono 16-bit WAV.
- On
POST /webhook, appends each status event towebhook-events.logas one JSON line.
Sample console output
A live call produces onemedia frame every 20 ms. A typical mid-stream frame looks like:
Mid-stream media frame
The last media frame
When the caller hangs up, the finalmedia frame arrives, the WebSocket closes, and the buffered PCM is flushed to a WAV. There is no special marker inside the last frame — its shape is identical to every other media frame; stream end is signalled by the stop event (and the subsequent socket close), not by the frame contents.
In this real example, sequenceNumber: 274 was the final frame — its payload happens to be all 0xFF bytes (L16 -1, i.e. silence as the carrier wound the call down), but that is a property of the audio at that instant, not an end-of-stream indicator:
Final media frame, immediately followed by socket close + WAV flush
Audio format
The example’s<Stream> element negotiates:
- Encoding: L16 (linear 16-bit PCM, network byte order)
- Sample rate: 8000 Hz
- Channels: Mono
- Chunk size: 20 ms (320 bytes per frame at 8 kHz L16)
contentType="audio/x-mulaw;rate=8000" to the <Stream> tag — the JSON envelope is unchanged, only the bytes inside media.payload differ.
| Property | L16 | μ-law |
|---|---|---|
contentType | audio/l16 (this example) | audio/x-mulaw;rate=8000 |
| Bit depth | 16-bit | 8-bit |
| Bytes per 20 ms frame | 320 | 160 |
| Byte order | Network (big-endian) — swap to little-endian for WAV PCM | n/a |
Output files
receivedAt timestamp.
Extending it
To turn this sink into a full agent, send these frames back over the same socket — see the linked protocol pages for the exact JSON shape:playAudio— queue a 20 ms chunk for playback.checkpoint— mark end of an utterance; Vobiz replies withplayedStreamonce it’s actually played.clearAudio— drop queued audio on barge-in; Vobiz replies withclearedAudio.stop— terminate the call leg from the WebSocket side without a second REST round-trip. Equivalent REST option:POST /audio-streams/.../stop.
Related
- Initiate a Stream — full
<Stream>XML reference. - Stream events — every inbound JSON event.
- Vobiz + Pipecat — same idea, Python + Pipecat pipeline instead of Bun.
- WebSockets integration — protocol-level overview.