Skip to main content

Realtime ASR API

Use the streaming API in ASR-only mode to receive real-time captions — with or without text translation, and without TTS audio.

Caption messages are delivered over WebSockets or the WebRTC data channel. Follow the general API connection flow described here, and refer to the Recommended settings for exact option values.

Captions only

Omit output_stream and translations (or set them to null / an empty list). All input_stream and transcription options are supported.

Example set_task command structure:

{
"input_stream": {/*...*/},
"output_stream": null, // set to `null` or omit the field
"pipeline": {
"transcription": {
/*...*/
"sentence_splitter": {
"enabled": false // it's recommended to disable the advanced sentence splitter algorithm
},
},
"translations": [], // set to an empty list or omit the field
"allowed_message_types": [
// you will only receive messages of these types
"partial_transcription",
"validated_transcription"
]
}
}

Captions with translation

Omit the speech_generation field (or set it to null) in each translation config. All standard task options are supported — see the Recommended settings.

Example set_task command structure:

{
"input_stream": {/*...*/},
"output_stream": null, // set to `null` or omit the field
"pipeline": {
"transcription": {
/*...*/
"sentence_splitter": {
"enabled": false // it's recommended to disable the advanced sentence splitter algorithm
},
},
"translations": [{/*...*/}, {/*...*/}], // translation settings per language; set `speech_generation` to null or omit it
"allowed_message_types": [
// you will only receive messages of these types
"partial_transcription",
"validated_transcription",
"translated_transcription"
]
}
}

Multiple target languages

If multiple target languages are set in translations, you will receive a separate translated_transcription for each one.