> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Play Audio Event

> Use the Vobiz playAudio Stream event to transmit audio through the WebSocket connection back into a live call - ideal for AI voice agent responses.

<Info>
  **Prerequisites**

  * Stream element must have `bidirectional="true"`
  * Active WebSocket connection established by Vobiz
  * Audio must be base64-encoded in the correct format (PCM or μ-law)
</Info>

## Attributes

| Attribute                                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| ------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `event` <br /> *string* <br /> **Required** | Indicates the event type. `playAudio` is the value required to transmit audio over the WebSocket.                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `media` <br /> *object* <br /> **Required** | An object containing media metadata and payload. <br /><br />**Media Object Properties:** <br />- **contentType** (string, required): The audio codec format <br />  - `audio/x-l16` - Linear PCM 16-bit <br />  - `audio/x-mulaw` - μ-law encoded <br />- **sampleRate** (integer, required): Sample rate of the audio transmitted <br />  - `8000` - 8kHz (standard telephony) <br />  - `16000` - 16kHz (wideband) <br />  - `24000` - 24kHz (high quality) <br />- **payload** (string, required): Base64-encoded string of raw audio |

## Audio format requirements

### Supported formats

**Linear PCM (audio/x-l16)**

* **Format:** 16-bit linear PCM
* **Endianness:** Little-endian
* **Sample rates:** 8000 Hz, 16000 Hz, or 24000 Hz
* **Channels:** Mono (1 channel)
* **Encoding:** Base64

**μ-law (audio/x-mulaw)**

* **Format:** μ-law compressed
* **Sample rate:** 8000 Hz
* **Channels:** Mono (1 channel)
* **Encoding:** Base64
* **Use case:** Lower bandwidth

<Warning>
  **Important:**

  * Audio must be raw PCM or μ-law data (no WAV headers or file containers)
  * The contentType and sampleRate must match your actual audio data **and** the format Vobiz negotiated for the call (see `start.mediaFormat`)
  * For best quality, use 16kHz or 24kHz sample rate with audio/x-l16
  * Each playAudio event should contain a reasonable chunk size (typically 20-60ms of audio)
</Warning>

### Endianness

For `audio/x-l16`, the audio you send in `playAudio` must be **little-endian** 16-bit PCM. Note the asymmetry with the receive path: inbound `media` frames Vobiz sends you are **big-endian (network byte order)**. If you echo inbound audio back, byte-swap it first.

### Frame size cheat sheet

A 20 ms chunk is the sweet spot - small enough for responsive [barge-in](/xml/stream/clear-audio), large enough to avoid per-message overhead.

| Format                 | Bytes / 20 ms | Bytes / sample |
| ---------------------- | ------------- | -------------- |
| `audio/x-mulaw` @ 8000 | 160           | 1              |
| `audio/x-l16` @ 8000   | 320           | 2              |
| `audio/x-l16` @ 16000  | 640           | 2              |
| `audio/x-l16` @ 24000  | 960           | 2              |

Formula: `bytes = sampleRate × bytesPerSample × seconds`. For L16, `bytesPerSample = 2`; for μ-law, `1`.

<Note>
  **Matching `start.mediaFormat`.** The `start` event reports the negotiated format as `start.mediaFormat.encoding` (e.g. `audio/x-l16`) and `start.mediaFormat.sampleRate` (e.g. `8000`). Use those exact values in every `playAudio.media`. Sending a different rate or codec produces garbled or chipmunk/slow-motion audio - Vobiz does not resample for you.
</Note>

## Examples

### Request format

```json playAudio event with 8kHz PCM audio theme={null}
{
  "event": "playAudio",
  "media": {
    "contentType": "audio/x-l16",
    "sampleRate": 8000,
    "payload": "base64 encoded raw audio..."
  }
}
```

### Node.js implementation

```javascript Generate and send audio from text theme={null}
const WebSocket = require('ws');
const fs = require('fs');

// Example: Convert text to speech and send via WebSocket
async function textToSpeechAndPlay(ws, text) {
  try {
    // 1. Generate audio from text (using TTS service)
    const audioBuffer = await generateSpeech(text, {
      format: 'pcm',
      sampleRate: 8000,
      bitDepth: 16
    });

    // 2. Convert to base64
    const audioBase64 = audioBuffer.toString('base64');

    // 3. Send playAudio event
    const playAudioEvent = {
      event: 'playAudio',
      media: {
contentType: 'audio/x-l16',
sampleRate: 8000,
payload: audioBase64
      }
    };

    ws.send(JSON.stringify(playAudioEvent));
    console.log('Sent audio:', text);

  } catch (error) {
    console.error('Error generating/sending audio:', error);
  }
}

// Example: Stream audio from file in chunks
function streamAudioFile(ws, filepath) {
  // Read raw PCM audio file
  const audioData = fs.readFileSync(filepath);

  // Calculate chunk size (e.g., 20ms of audio)
  const sampleRate = 8000;
  const bytesPerSample = 2; // 16-bit = 2 bytes
  const chunkDuration = 0.020; // 20ms
  const chunkSize = Math.floor(sampleRate * bytesPerSample * chunkDuration);

  // Send audio in chunks
  for (let i = 0; i < audioData.length; i += chunkSize) {
    const chunk = audioData.slice(i, i + chunkSize);
    const chunkBase64 = chunk.toString('base64');

    const playAudioEvent = {
      event: 'playAudio',
      media: {
contentType: 'audio/x-l16',
sampleRate: 8000,
payload: chunkBase64
      }
    };

    ws.send(JSON.stringify(playAudioEvent));

    // Small delay to match real-time playback
    // In production, you'd time this more precisely
  }

  console.log('Finished streaming audio file');
}

// Mock TTS function (replace with actual TTS service)
async function generateSpeech(text, options) {
  // This would call your TTS service (e.g., Google TTS, AWS Polly, etc.)
  // and return raw PCM audio data
  return Buffer.from('...'); // Placeholder
}
```

### Python implementation

```python Send audio using asyncio WebSocket theme={null}
import asyncio
import websockets
import json
import base64

async def play_audio(websocket, audio_bytes, sample_rate=8000, content_type='audio/x-l16'):
    """Send playAudio event to Vobiz"""
    # Encode audio as base64
    audio_base64 = base64.b64encode(audio_bytes).decode('utf-8')

    # Create playAudio event
    play_audio_event = {
        'event': 'playAudio',
        'media': {
            'contentType': content_type,
            'sampleRate': sample_rate,
            'payload': audio_base64
        }
    }

    # Send to Vobiz
    await websocket.send(json.dumps(play_audio_event))
    print(f"Sent {len(audio_bytes)} bytes of audio")

async def text_to_speech_and_play(websocket, text):
    """Generate speech from text and play it"""
    # 1. Generate audio from text (using TTS)
    audio_bytes = await generate_speech(text, sample_rate=8000)

    # 2. Send to Vobiz
    await play_audio(websocket, audio_bytes, sample_rate=8000)

async def stream_audio_file(websocket, filepath):
    """Stream audio file in chunks"""
    chunk_size = 320  # 20ms at 8kHz, 16-bit = 320 bytes

    with open(filepath, 'rb') as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
                break

await play_audio(websocket, chunk, sample_rate=8000)

            # Delay to simulate real-time playback
await asyncio.sleep(0.020)  # 20ms

    print("Finished streaming audio file")

async def generate_speech(text, sample_rate=8000):
    """Generate speech from text (mock)"""
    # This would call your TTS service
    # (e.g., Google TTS, Amazon Polly, gTTS, pyttsx3, etc.)
    # and return raw PCM audio bytes

    # Placeholder - replace with actual TTS
    return b'...'  # Raw PCM audio bytes

# Example usage in your WebSocket handler
async def handle_stream(websocket, path):
    async for message in websocket:
data = json.loads(message)

if data['event'] == 'start':
            # Play greeting when stream starts
await text_to_speech_and_play(
                websocket,
                "Hello! Welcome to Vobiz audio streaming."
            )

elif data['event'] == 'media':
            # Process incoming audio from call
pass
```

### Converting WAV to PCM

```javascript Convert WAV file to raw PCM for playAudio theme={null}
const fs = require('fs');
const wav = require('wav');

function convertWavToPCM(wavFilePath) {
  return new Promise((resolve, reject) => {
    const reader = new wav.Reader();
    const chunks = [];

    reader.on('format', (format) => {
      console.log('WAV Format:', format);

      // Ensure it's the right format
      if (format.audioFormat !== 1) { // 1 = PCM
reject(new Error('Only PCM WAV files are supported'));
return;
      }

      if (format.channels !== 1) {
reject(new Error('Only mono audio is supported'));
return;
      }

      if (![8000, 16000, 24000].includes(format.sampleRate)) {
reject(new Error('Sample rate must be 8000, 16000, or 24000'));
return;
      }
    });

    reader.on('data', (chunk) => {
      chunks.push(chunk);
    });

    reader.on('end', () => {
      const pcmData = Buffer.concat(chunks);
      resolve(pcmData);
    });

    reader.on('error', reject);

    fs.createReadStream(wavFilePath).pipe(reader);
  });
}

// Usage
async function playWavFile(ws, wavFilePath) {
  try {
    // Convert WAV to raw PCM
    const pcmData = await convertWavToPCM(wavFilePath);

    // Convert to base64
    const audioBase64 = pcmData.toString('base64');

    // Send playAudio event
    ws.send(JSON.stringify({
      event: 'playAudio',
      media: {
contentType: 'audio/x-l16',
sampleRate: 8000, // or 16000 depending on your file
payload: audioBase64
      }
    }));

    console.log('WAV file sent successfully');
  } catch (error) {
    console.error('Error playing WAV file:', error);
  }
}
```

## Best practices

### Match audio format to ContentType

Ensure your audio data format exactly matches the contentType and sampleRate you specify. Mismatches will result in distorted or garbled audio.

### Use Appropriate Chunk Sizes

Send audio in reasonable chunks (20-60ms). Too small chunks add overhead; too large chunks increase latency. For 8kHz 16-bit audio, 20ms = 320 bytes.

### Remove file headers

Always send raw PCM or μ-law data without WAV headers or any file container metadata. Strip headers before base64-encoding.

### Use checkpoints to track playback

Send checkpoint events after your playAudio events to get acknowledgments when audio finishes playing. This helps synchronize your application logic. See [Checkpoint Event](/xml/stream/checkpoint-event).

### Let Vobiz buffer - don't pace by sleeping

Vobiz queues `playAudio` chunks and plays them to the caller at real-time rate. You can send a full utterance's chunks back-to-back without `await sleep(20ms)` between them - the queue handles pacing. Pacing matters only when streaming from a file in the *receive-and-relay* direction; for TTS output, push as fast as your TTS produces audio. Keeping the queue full also means a [`clearAudio`](/xml/stream/clear-audio) barge-in can flush everything that hasn't reached the caller yet.

### Ordering with stop

After you send a [`stop`](/xml/stream/stop-event) event the WebSocket closes immediately. Any `playAudio` you send after `stop` is dropped. To play a farewell, send the audio, send a `checkpoint`, wait for `playedStream`, *then* send `stop`.
