Translation management API
Control the Palabra real-time translation pipeline over WebSockets or a WebRTC data channel.
Prerequisites
Before you connect, make sure you have the following values. All three are returned when you create a streaming session:
webrtc_url— the URL of the Palabra WebRTC server.ws_url— the URL of the Palabra WebSocket server.publisher— a JWT access token used to authorize your connection.
Choose a transport
Option 1. WebRTC
- Connect to
webrtc_urlwith any LiveKit client using yourpublishertoken. - Once connected, send commands through the default (empty-topic) WebRTC data channel.
Option 2. WebSockets
Connect to ws_url, passing your publisher token as a query parameter:
const endpoint = `${ws_url}?token=${publisher}`;
const socket = new WebSocket(endpoint);
Once the connection is open, you can start sending commands.
Message format
Every API message — request and response — has the same envelope:
{
"message_type": "<string>",
"data": { /* payload */ }
}
If message_type is "error", the data field contains diagnostic information.
Typical workflow
- Create a task — send
set_taskto start the translation. - Update the task — send another
set_taskto change settings during an ongoing translation. - Pause processing — send
pause_taskto pause the translation (billing stops). Resume with anotherset_task. - Flush processing — send
flush_taskto instantly cancel transcription, translation, and speech of the current phrase without pausing the translation of subsequent phrases. - Finish the task — send
end_task. The server closes your connection automatically; the session is invalidated within 1 minute.
Settings reference
- What each field means — see the Translation settings breakdown.
- Recommended values — see the Recommended settings.
Rate limits
| Scope | Limit |
|---|---|
| WebSocket connections | 20 per minute per token |
set_task, get_task | 1 per 2 seconds |
pause_task, end_task | 1 per 3 seconds |
tts_task | 60 per minute |
Exceeding a command limit returns an error message; exceeding the connection limit closes the connection with code 1008.
Streaming audio configuration
Option 1. WebRTC audio I/O
Use the following input/output configuration in your set_task:
{
"data": {
"input_stream": {
"content_type": "audio",
"source": {
"type": "webrtc"
}
},
"output_stream": {
"content_type": "audio",
"target": {
"type": "webrtc"
}
}
// ...
}
}
- Publish your microphone track to the LiveKit room.
- Subscribe to the translated tracks that Palabra publishes to the same room after you send
set_task.
Option 2. WebSocket audio I/O
Use this configuration instead:
{
"data": {
"input_stream": {
"content_type": "audio",
"source": {
"type": "ws",
"format": "pcm_s16le", // or opus, wav
"sample_rate": 24000, // 16000 – 48000
"channels": 1 // 1 or 2
}
},
"output_stream": {
"content_type": "audio",
"target": {
"type": "ws",
"format": "pcm_s16le" // or zlib_pcm_s16le
}
}
}
}
The output is always 24000 Hz, mono (sample_rate and channels cannot be changed).
- Send base64-encoded audio chunks that exactly match the declared format.
- Receive base64-encoded TTS chunks in
output_audio_dataresponses (see below).
API messages
Request message schema
Response message schema
Requests (client → server)
| Message type | Description |
|---|---|
set_task | Create or update the translation task |
get_task | Return the current task |
pause_task | Pause the current task; resume with set_task |
flush_task | Cancel processing of the current (ongoing) phrase |
end_task | Finish the translation task |
tts_task | Generate TTS from text |
input_audio_data | Input audio data chunk (WebSocket audio transport only) |
set_task
Create a new task or modify the current one:
- Sent for the first time after creating a session — starts the translation.
- Sent after
pause_task— resumes the translation. - Sent during an ongoing translation — updates the settings in real time, no need to stop the translation.
{
"message_type": "set_task",
"data": {
"input_stream": { /* Depends on transport — see "Streaming audio configuration" above */ },
"output_stream": { /* Depends on transport — see "Streaming audio configuration" above */ },
"pipeline": {
"transcription": {
"source_language": "string",
"detectable_languages": ["string"],
"segment_confirmation_silence_threshold": "float",
"only_confirm_by_silence": "bool",
"sentence_splitter": {
"enabled": "bool"
},
"verification": {
"auto_transcription_correction": "bool",
"transcription_correction_style": "string"
}
},
// Translation and speech generation settings for one or more target languages
"translations": [
{
"target_language": "string",
"translate_partial_transcriptions": "bool",
"speech_generation": {
"voice_cloning": "bool",
"voice_id": "string",
"voice_timbre_detection": {
"enabled": "bool",
"high_timbre_voices": [],
"low_timbre_voices": []
}
}
}
// You can add more targets
],
"translation_queue_configs": {
"global": {
"desired_queue_level_ms": "int",
"max_queue_level_ms": "int",
"auto_tempo": "bool"
},
"es": {
"desired_queue_level_ms": "int",
"max_queue_level_ms": "int"
}
},
// Select response types to receive
"allowed_message_types": [
"translated_transcription",
"partial_transcription",
"partial_translated_transcription",
"validated_transcription"
]
}
}
}
See the Translation settings breakdown for details on each field and the Recommended settings for best-practice values.
ASR-only mode
To use this API in
ASR onlymode, do either of the following:
- Set
output_streamtonull— you will still get text translations, but no TTS audio.- Use an empty
translationslist — you will get neither text translations nor TTS audio.
get_task
Return the current task (as a current_task response). Set exclude_hidden to false to include advanced (hidden) fields in the response.
{ "message_type": "get_task", "data": { "exclude_hidden": true } }
pause_task
Pause the current task. No audio is processed and no billing applies while the task is paused. Use set_task to resume.
{ "message_type": "pause_task", "data": {} }
flush_task
Flush the translation of already spoken phrases (cancel their processing) without pausing upcoming ones. Useful when the current phrase no longer needs finishing — for example, if your conversation partner interrupts you.
{
"message_type": "flush_task",
"data": {
"languages": ["global"], // 'global' (all languages, default) or target languages from `translations`
"pause_task": false // pause the task after flushing
}
}
end_task
Finish the current task. The server closes the connection after receiving end_task.
{
"message_type": "end_task",
"data": {
"eos_timeout": 4, // optional, 1–30 seconds
"force": false // set true to skip finalization of the last phrase
}
}
If eos_timeout is set, the server waits until that many seconds have passed since your last detected speech, then sends an eos response (see below) before closing. Use it to make sure the tail of the translation is fully processed and delivered.
tts_task
Generates TTS from text within the active task. The behavior depends on translate_text (default false):
"translate_text": false— TTS only.languageis the language to speak in and must be one of thetarget_languagevalues configured in the task'stranslationssection (otherwise aVALIDATION_ERRORis returned)."translate_text": true— the text is first translated into everytarget_languageof the task, then synthesized.languageis the language of the input text and must be a valid source language.
text is limited to 2048 characters. See the Text-to-Translated-Speech guide for details.
{
"message_type": "tts_task",
"data": {
"text": "Hello, how are you?",
"language": "en", // text language
"translate_text": false // default
}
}
input_audio_data
Send a base64-encoded input audio chunk when WebSockets is selected as the audio transport. The audio chunks you push must match the format / sample_rate / channels declared in your set_task command. The optimal chunk length is 320 ms. The base64 payload must be between 1 KB and 512 KB (≈384 KB of raw audio per message).
{
"message_type": "input_audio_data",
"data": {
"data": "base64 encoded data"
}
}
Responses (server → client)
| Message type | Description |
|---|---|
partial_transcription | Unconfirmed ASR segment |
partial_translated_transcription | Unconfirmed translation segment |
validated_transcription | Final ASR segment |
translated_transcription | Final translation |
output_audio_data | Chunk of generated TTS audio (WebSocket audio transport only) |
current_task | Response to the get_task command |
eos | End-of-stream confirmation (after end_task with eos_timeout) |
warning | Audio stream health warning |
error | Validation or runtime error |
- To receive
partial_transcription,validated_transcription, andtranslated_transcriptionmessages, include these message types in the allowed_message_types field of yourset_taskcommand.- To receive
partial_translated_transcriptionmessages, include it in allowed_message_types and set translate_partial_transcriptions totruein yourset_taskcommand.
partial_transcription
Transcription of an uncompleted segment:
{
"message_type": "partial_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "en",
"text": "One, two"
}
}
}
partial_translated_transcription
Translation of an uncompleted segment:
{
"message_type": "partial_translated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es",
"text": "Um, dois,"
}
}
}
validated_transcription
Transcription of a completed segment:
{
"message_type": "validated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "en",
"text": "One, two, three, four, five."
}
}
}
translated_transcription
Translation of a completed segment:
{
"message_type": "translated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es",
"text": "Um, dois, três, quatro, cinco."
}
}
}
output_audio_data
TTS audio chunk (when WebSockets is the audio transport):
{
"message_type": "output_audio_data",
"data": {
"transcription_id": "190983855fe3404e",
"translation_part_id": "0", // part index when a phrase is split into several TTS pieces
"language": "es", // TTS language
"last_chunk": false, // true on the last generated chunk for this `transcription_id`
"data": "base64 string", // ≤512 KB base64 (≈384 KB of raw audio)
"chunk_generation_delta": 120 // optional, ms between generation start and this chunk
}
}
The audio is always 24 kHz mono; the encoding (pcm_s16le or zlib_pcm_s16le) matches the output_stream.target.format of your task.
current_task
The response to the get_task command. Contains the current task configuration plus its status:
{
"message_type": "current_task",
"data": {
"input_stream": { /* ... */ },
"output_stream": { /* ... */ },
"pipeline": { /* ... */ },
"task_status": "running" // running | paused | unknown
}
}
current_taskis sent in response toget_task, and also automatically when you reconnect to an already running pipeline. The server does not send it as a confirmation ofset_task— to verify that a new task has started, pollget_task(it returns aNOT_FOUNDerror until the pipeline is up).
eos
End-of-stream confirmation. Sent after an end_task with eos_timeout, once the requested silence period has elapsed; the connection is closed right after.
{ "message_type": "eos", "data": {} }
warning
Audio stream health warnings. The server monitors the pace of your input_audio_data stream:
{
"message_type": "warning",
"data": {
"code": "AUDIO_STREAM_TOO_FAST", // AUDIO_STREAM_TOO_FAST | AUDIO_STREAM_TOO_SLOW | AUDIO_STREAM_STALLED
"message": "Human-readable description"
}
}
Send audio at real-time rate (e.g., one 320 ms chunk every 320 ms): pushing faster, slower, or stopping mid-stream triggers these warnings and degrades translation quality.
error
Validation, authorization, or other kinds of errors:
{
"message_type": "error",
"data": {
"code": "VALIDATION_ERROR",
"desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
"msg": "value is not a valid enumeration member; permitted: 'audio'\", 'type': 'type_error.enum'",
"param": null
}
}