> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vobiz.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Gather XML Element – Collect DTMF & Speech Input | Vobiz

> Collect caller input via DTMF or speech recognition (ASR) in a live call. Configurable timeout, digit count, and action URL for dynamic IVR flows.

You can use the Gather XML element to collect user input through automatic speech recognition or DTMF "digit press" inputs.

When collecting speech as input, Vobiz transcribes and relays a user's speech to the specified action URL in real time.

When collecting input through digit press, Vobiz relays the digits entered to the specified action URL.

The Gather XML element supports simultaneous detection of both speech and digit press inputs.

## Nesting elements

You can nest Speak XML (text-to-speech) and Play XML elements inside Gather XML to prompt users for inputs. This is useful for building interactive voice response (IVR) experiences.

## Attributes

**`action`** *(string, required)* *Callback-retry configurable*

The input is sent to a specific URL. See the "parameters sent to the action URL" table below for more information.

**Allowed values:** a fully qualified URL

**`method`** *(string)*
The HTTP method to use when invoking the action URL.

**Allowed values:** GET, POST

Defaults to POST.

**`inputType`** *(string)*
The type of input(s) you expect to receive.

**Allowed values:** dtmf, speech, dtmf speech

When set to dtmf speech, Vobiz listens for both speech and digit inputs. The input that's detected first is relayed to the action URL.

**`executionTimeout`** *(integer)*
Maximum execution time, in seconds, for which input detection is carried out. If the user fails to provide input within the timeout period, the next element in the response will be processed. This duration is counted after nested Play/Speak elements have ended.

**Allowed values:** 5 to 60

Defaults to 15.

**`digitEndTimeout`** *(string)*
Time, in seconds, allowed between consecutive digit inputs. If no new digit input is provided within the digitEndTimeout period, digits entered until then will be processed.

**Allowed values:** 2 to 10, or auto

Defaults to auto.

This attribute is applicable to input types dtmf and dtmf speech.

**`speechEndTimeout`** *(string)*
Time, in seconds, that Vobiz waits for more speech once silence is detected before it stops speech recognition. At that point, a transcription of the collected speech is relayed to the action URL.

**Allowed values:** 2 to 10, or auto

Defaults to auto.

This attribute is applicable to input types speech and dtmf speech.

**`finishOnKey`** *(string)*
A digit that the user can press to submit digits.

**Allowed values:** One and only one of  0–9, \*, #, `<empty string>`, none

Defaults to #.

If set to `<empty string>` or none, input capture will end based on a timeout or the numDigits attribute.

This attribute is applicable to input types dtmf and dtmf speech.

**`numDigits`** *(integer)*
The maximum number of digits to be processed in the current operation. Vobiz relays the digits to the action URL as soon as the maximum number of digits specified is collected.

**Allowed values:** 1 to 32

Default: 32

This attribute is applicable to input types dtmf and dtmf speech.

**`speechModel`** *(string)*
The automatic speech recognition (ASR) model to use for transcribing the speech.

**Allowed values:** default, command\_and\_search, phone\_call

Default: default

This attribute is applicable to input types speech and dtmf speech.

* **command\_and\_search:** Optimized for short queries such as voice commands and voice search.

  * **phone\_call:** Optimized for transcribing audio from a phone call where the quality of the audio is slightly inconsistent.

  * **telephony:** telephony is an enhanced version of the phone\_call model, optimized for audio typically originating from phone calls.

  * **default:** Optimized for audio that is not one of the specific audio models such as long-form audio.

**`hints`** *(string)*
A list of phrases to act as "hints" to the speech recognition model; these phrases can boost the probability that such words or phrases will be recognized. Phrases may be provided both as small groups of words or as single words.

**Allowed values:** a non-empty string of comma-separated phrases

**Limits:**

* Phrases per request: 500

  * Characters per request: 10,000

  * Characters per phrase: 100

This attribute is applicable to input types speech and dtmf speech.

**`language`** *(string)*
Specifies the language Vobiz should recognize from the user.

**Allowed values:** See list of supported languages

Defaults to en-US.

This attribute is applicable to input types speech and dtmf speech.

**`interimSpeechResultsCallback`** *(string)* *Callback-retry configurable*
If interimSpeechResultsCallback URL is specified, requests to this URL are made in real-time as Vobiz recognizes speech.

See the "parameters sent to the interimSpeechResultsCallback URL" table below for more information.

**Allowed values:** a fully qualified URL

This attribute is applicable to input types speech and dtmf speech.

**`interimSpeechResultsCallbackMethod`** *(string)*
The HTTP method to use when invoking the interimSpeechResultsCallback URL.

**Allowed values:** GET, POST

Defaults to POST.

This attribute is applicable to input types speech and dtmf speech.

**`log`** *(boolean)*
If true, Vobiz will log digits or recognized speech from the caller. If false, logging will be disabled while processing the Gather element.

**Allowed values:** true, false

Defaults to true.

**`redirect`** *(boolean)*
If true, redirect to action URL. If false, only request the URL and continue to the next element.

**Allowed values:** true, false

Defaults to true.

**`profanityFilter`** *(boolean)*
If true, filters out profane words. Words filtered out are transcribed with their first letter and asterisks for the remaining characters (e.g. f\*\*\*). The profanity filter operates on single words; it doesn't detect abusive or offensive speech that's a phrase or a combination of words.

**Allowed values:** true, false

Defaults to false.

This attribute is applicable to input types speech and dtmf speech.

## Parameters sent to the action URL

In addition to the standard action URL request parameters, these parameters are sent to the action URL specified.

| Parameter               | Description                                                                                                   |
| ----------------------- | ------------------------------------------------------------------------------------------------------------- |
| `InputType`             | The type of input detected. **Allowed values:** `dtmf`, `speech`                                              |
| `Digits`                | The digits entered by the caller, excluding the finishOnKey input, if used. Empty if `inputType` is `speech`. |
| `Speech`                | The transcribed result of the caller's speech. Empty if `inputType` is `dtmf`.                                |
| `SpeechConfidenceScore` | A confidence score between 0.0 and 1.0. The higher the score, the more likely the transcription is accurate.  |
| `BilledAmount`          | The total amount billed for speech input transcription.                                                       |

## Parameters sent to the interimSpeechResultsCallback URL

In addition to the standard callback URL request parameters, these parameters are sent to the interim speech results callback URL.

| Parameter        | Description                                                                                                                                                                      |
| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `StableSpeech`   | The stable transcribed result of the user's speech.                                                                                                                              |
| `UnstableSpeech` | The newer, unstable transcribed result of the user's speech. This is an interim result and may change as more speech is gathered.                                                |
| `Stability`      | Likelihood that the recognizer will not change its guess about the interim result. Range: 0.0 (completely unstable) to 1.0 (completely stable). Only applies to unstable speech. |
| `SequenceNumber` | Sequence number of the interim speech callback, to help with ordering incoming callback requests.                                                                                |

## Examples

### Collect a single DTMF digit (IVR menu)

Nest a `Speak` or `Play` element to prompt the caller. With `numDigits="1"`, Vobiz posts to the action URL as soon as one digit is pressed.

```xml theme={null}
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Gather action="https://yourapp.com/menu-choice" method="POST"
            inputType="dtmf" numDigits="1" executionTimeout="10">
        <Speak>Press 1 for sales, 2 for support, or 0 for an operator.</Speak>
    </Gather>
    <Speak>We didn't receive your input. Goodbye.</Speak>
    <Hangup/>
</Response>
```

### Collect a multi-digit number with finishOnKey

For variable-length input such as an account number, set `finishOnKey="#"` and let the caller signal completion.

```xml theme={null}
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Gather action="https://yourapp.com/account" method="POST"
            inputType="dtmf" numDigits="12" finishOnKey="#" executionTimeout="20">
        <Speak>Enter your account number, then press pound.</Speak>
    </Gather>
    <Redirect>https://yourapp.com/answer</Redirect>
</Response>
```

### Collect speech input

```xml theme={null}
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Gather action="https://yourapp.com/intent" method="POST"
            inputType="speech" language="en-US" speechEndTimeout="auto"
            hints="billing,support,sales">
        <Speak>In a few words, tell us what you're calling about.</Speak>
    </Gather>
    <Speak>Sorry, we didn't catch that.</Speak>
</Response>
```

### Accept either speech or digits

With `inputType="dtmf speech"`, whichever input Vobiz detects first is the one relayed to the action URL.

```xml theme={null}
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Gather action="https://yourapp.com/choice" method="POST"
            inputType="dtmf speech" numDigits="1" speechEndTimeout="auto">
        <Speak>Say "yes" or press 1 to confirm. Say "no" or press 2 to cancel.</Speak>
    </Gather>
</Response>
```

## Webhook payload sent to the action URL

After the caller responds (or input ends), Vobiz POSTs the standard call parameters plus the Gather-specific parameters to your action URL.

```http DTMF input theme={null}
POST /menu-choice HTTP/1.1
Host: yourapp.com
Content-Type: application/x-www-form-urlencoded

CallUUID=xyz789&From=14155551234&To=14155559999&Direction=inbound&InputType=dtmf&Digits=1&Speech=
```

```http Speech input theme={null}
POST /intent HTTP/1.1
Host: yourapp.com
Content-Type: application/x-www-form-urlencoded

CallUUID=xyz789&From=14155551234&To=14155559999&Direction=inbound&InputType=speech&Digits=&Speech=I+have+a+billing+question&SpeechConfidenceScore=0.92&BilledAmount=0.0050
```

## Edge cases and tips

* **No input / timeout.** If the caller provides no input within `executionTimeout` seconds (counted *after* nested `Play`/`Speak` finishes), Vobiz moves on to the next element in the document. Always place fallback XML (a retry, a `Redirect` back to the menu, or a `Hangup`) after the `Gather`. By default, Vobiz still POSTs to the action URL with empty `Digits` and `Speech` on timeout - check for empty values in your handler.
* **Use `executionTimeout`, never `timeout`.** `Gather` has no `timeout` attribute. The `timeout` attribute belongs only to [`Dial`](/xml/dial) and `Number`. The valid range for `executionTimeout` is 5-60 seconds (default 15).
* **`finishOnKey` is excluded from `Digits`.** The terminating key (default `#`) is not included in the `Digits` parameter. Set `finishOnKey=""` or `finishOnKey="none"` to rely solely on `numDigits` or the timeout.
* **`numDigits` ends collection early.** Vobiz posts as soon as it collects `numDigits` digits, before the timeout or `finishOnKey`. For single-key menus, set `numDigits="1"` for the snappiest response.
* **DTMF vs speech.** Use `dtmf` for menus and structured input (account numbers, PINs) - it is precise and free. Use `speech` for open-ended intent capture; it incurs a per-request charge (see [Pricing for speech recognition](/xml/gather/pricing-for-speech-recognition)) and returns a `SpeechConfidenceScore` you should threshold before acting. Combine with `dtmf speech` when you want to accept both.
* **Improve recognition.** Pass domain words via `hints`, set the correct `language` (see [supported languages](/xml/gather/supported-languages)), and pick a `speechModel` suited to the input (`command_and_search` for short commands, `phone_call`/`telephony` for call audio).
* **Validate input server-side.** Never trust `Digits` or `Speech` directly in queries or business logic - sanitize and range-check them in your handler.