Skip to main content
The Speak element converts text to speech (TTS) and plays it to the caller. The text to read is the element’s content; voice, language, and loop count are set as attributes. Speak runs to completion before Vobiz moves to the next element, and posts no parameters of its own to any URL.

Attributes

AttributeDescription
voice
string
The voice used to read the text.
Allowed values: WOMAN, MAN
Defaults to WOMAN.
language
string
Language used to read the text.
Allowed values: See the “Supported voices and languages” table below.
Defaults to en-US.
loop
integer
Number of times to speak the text. Set to 0 to loop indefinitely.
Allowed values: integer >= 0 (0 indicates a continuous loop)
Defaults to 1.

Nesting rules

Speak takes plain text or SSML markup as its content. It cannot contain other verbs. You can nest Speak inside Gather (to prompt for input) and PreAnswer (to speak before answering).

Examples

Speak a basic message

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Speak>Thank you for calling. Please hold while we connect you.</Speak>
</Response>

Choose a voice and language

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Speak voice="MAN" language="en-GB">Welcome to our London office.</Speak>
</Response>

Repeat a message in a loop

Set loop to repeat an announcement. Use loop="0" to repeat indefinitely (for example, a waiting-room message) until another event moves the call forward.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Speak loop="3">Your call is important to us. Please stay on the line.</Speak>
</Response>

Edge cases and tips

  • Escape special XML characters. Ampersands and angle brackets in the spoken text must be escaped as &amp;, &lt;, and &gt;. Unescaped characters cause an XML parsing error and the call fails.
  • A voice may not exist for every language. Not every language has both a WOMAN and a MAN voice (see the table below). If you request an unavailable combination, Vobiz falls back to the available voice for that language.
  • Speak vs Play. Use Speak for dynamic, per-call text (account balances, names, confirmation read-backs). Use Play for prerecorded audio when you need consistent quality, music, or branding.
  • Pronunciation control. For numbers, dates, currency, and pauses, use SSML to control how text is read aloud.
  • Keep prompts short. Callers lose attention after 15-20 seconds. Split long content into shorter Speak elements or break it up with menu prompts.

Supported voices and languages

LanguageWomanMan
Danish (da-DK)yesno
Dutch (nl-NL)yesyes
English - Australian (en-AU)yesyes
English - British (en-GB)yesyes
English - USA (en-US)yesyes
French (fr-FR)yesyes
French - Canadian (fr-CA)yesno
German (de-DE)yesyes
Italian (it-IT)yesyes
Polish (pl-PL)yesyes
Portuguese (pt-PT)noyes
Portuguese - Brazilian (pt-BR)yesyes
Russian (ru-RU)yesno
Spanish (es-ES)yesyes
Spanish - USA (es-US)yesyes
Swedish (sv-SE)yesno