Skip to main content

Text-to-Translated-Speech

The tts_task command generates speech from text, with optional translation. Depending on your output_stream setting, the TTS data is delivered over WebSockets or the WebRTC data channel. Follow the general API connection flow described here.

Refer to the Recommended settings for exact option values in the examples below.

tip

If you only need text-to-speech (without translation), use the dedicated Realtime TTS API instead.

The tts_task command

Send a WebSocket / WebRTC data channel message with the following structure:

{
"message_type": "tts_task",
"data": {
"text": "Hello, how are you?",
"language": "en",
"translate_text": true
}
}

Required fields

FieldTypeDescriptionConstraints
textstringText to generate speech fromMax length: 2048 characters
languagestringLanguage of the textMust be one of the supported languages

Optional fields

FieldTypeDefaultDescription
translate_textbooleanfalseEnable translation before TTS

Text-to-Translated-Speech

Set translate_text to true in tts_task. The text will be translated into every target_language configured in the translations section, and TTS will be generated for each of them.

All output_stream and translations options are supported. Use the following set_task command structure:

{
"input_stream": null, // set to `null` or omit the field
"output_stream": {/*...*/},
"pipeline": {
"transcription": null, // set to `null` or omit the field
"translations": [{/*...*/}, {/*...*/}], // translation and speech generation options for each language
"allowed_message_types": [
// you will only receive translated text and `output_audio_data` messages (when using WS transport)
"translated_transcription"
]
}
}

Multiple target languages

When multiple languages are configured in the translations section, the system returns translations and TTS for all configured languages simultaneously.

Text-to-Speech without translation (deprecated)

Deprecated

For plain text-to-speech, use the dedicated Realtime TTS API instead.

Set translate_text to false in tts_task or omit the field. All standard task options are supported — see the Recommended settings.

{
"input_stream": null, // set to `null` or omit the field
"output_stream": {/*...*/},
"pipeline": {
"transcription": null, // set to `null`
"translations": [{/*...*/}, {/*...*/}], // speech generation options for each language
"allowed_message_types": [] // you will receive only `output_audio_data` messages (when using WS transport)
}
}

Requirements in this mode:

  • language must be one of the supported languages.
  • language in tts_task must equal one of the target_language values in translations to apply speech settings (this will change in future versions).
  • You can configure multiple languages in the translations section, but a separate tts_task message with the corresponding language is required for each one.