Skip to main content

Translation management API

Control the Palabra real-time translation pipeline over WebSockets or a WebRTC data channel.

Prerequisites

Before you connect, make sure you have the following values. All three are returned when you create a streaming session:

  • webrtc_url — the URL of the Palabra WebRTC server.
  • ws_url — the URL of the Palabra WebSocket server.
  • publisher — a JWT access token used to authorize your connection.

Choose a transport

Option 1. WebRTC

  1. Connect to webrtc_url with any LiveKit client using your publisher token.
  2. Once connected, send commands through the default (empty-topic) WebRTC data channel.

Option 2. WebSockets

Connect to ws_url, passing your publisher token as a query parameter:

const endpoint = `${ws_url}?token=${publisher}`;
const socket = new WebSocket(endpoint);

Once the connection is open, you can start sending commands.

Message format

Every API message — request and response — has the same envelope:

{
"message_type": "<string>",
"data": { /* payload */ }
}

If message_type is "error", the data field contains diagnostic information.

Typical workflow

  1. Create a task — send set_task to start the translation.
  2. Update the task — send another set_task to change settings during an ongoing translation.
  3. Pause processing — send pause_task to pause the translation (billing stops). Resume with another set_task.
  4. Flush processing — send flush_task to instantly cancel transcription, translation, and speech of the current phrase without pausing the translation of subsequent phrases.
  5. Finish the task — send end_task. The server closes your connection automatically; the session is invalidated within 1 minute.

Settings reference

Rate limits

ScopeLimit
WebSocket connections20 per minute per token
set_task, get_task1 per 2 seconds
pause_task, end_task1 per 3 seconds
tts_task60 per minute

Exceeding a command limit returns an error message; exceeding the connection limit closes the connection with code 1008.

Streaming audio configuration

Option 1. WebRTC audio I/O

Use the following input/output configuration in your set_task:

{
"data": {
"input_stream": {
"content_type": "audio",
"source": {
"type": "webrtc"
}
},
"output_stream": {
"content_type": "audio",
"target": {
"type": "webrtc"
}
}
// ...
}
}
  1. Publish your microphone track to the LiveKit room.
  2. Subscribe to the translated tracks that Palabra publishes to the same room after you send set_task.

Option 2. WebSocket audio I/O

Use this configuration instead:

{
"data": {
"input_stream": {
"content_type": "audio",
"source": {
"type": "ws",
"format": "pcm_s16le", // or opus, wav
"sample_rate": 24000, // 16000 – 48000
"channels": 1 // 1 or 2
}
},
"output_stream": {
"content_type": "audio",
"target": {
"type": "ws",
"format": "pcm_s16le" // or zlib_pcm_s16le
}
}
}
}

The output is always 24000 Hz, mono (sample_rate and channels cannot be changed).

  1. Send base64-encoded audio chunks that exactly match the declared format.
  2. Receive base64-encoded TTS chunks in output_audio_data responses (see below).

API messages

Request message schema

Loading ....

Response message schema

Loading ....

Requests (client → server)

Message typeDescription
set_taskCreate or update the translation task
get_taskReturn the current task
pause_taskPause the current task; resume with set_task
flush_taskCancel processing of the current (ongoing) phrase
end_taskFinish the translation task
tts_taskGenerate TTS from text
input_audio_dataInput audio data chunk (WebSocket audio transport only)

set_task

Create a new task or modify the current one:

  • Sent for the first time after creating a session — starts the translation.
  • Sent after pause_task — resumes the translation.
  • Sent during an ongoing translation — updates the settings in real time, no need to stop the translation.
{
"message_type": "set_task",
"data": {
"input_stream": { /* Depends on transport — see "Streaming audio configuration" above */ },
"output_stream": { /* Depends on transport — see "Streaming audio configuration" above */ },
"pipeline": {
"transcription": {
"source_language": "string",
"detectable_languages": ["string"],
"segment_confirmation_silence_threshold": "float",
"only_confirm_by_silence": "bool",
"sentence_splitter": {
"enabled": "bool"
},
"verification": {
"auto_transcription_correction": "bool",
"transcription_correction_style": "string"
}
},
// Translation and speech generation settings for one or more target languages
"translations": [
{
"target_language": "string",
"translate_partial_transcriptions": "bool",
"speech_generation": {
"voice_cloning": "bool",
"voice_id": "string",
"voice_timbre_detection": {
"enabled": "bool",
"high_timbre_voices": [],
"low_timbre_voices": []
}
}
}
// You can add more targets
],
"translation_queue_configs": {
"global": {
"desired_queue_level_ms": "int",
"max_queue_level_ms": "int",
"auto_tempo": "bool"
},
"es": {
"desired_queue_level_ms": "int",
"max_queue_level_ms": "int"
}
},
// Select response types to receive
"allowed_message_types": [
"translated_transcription",
"partial_transcription",
"partial_translated_transcription",
"validated_transcription"
]
}
}
}

See the Translation settings breakdown for details on each field and the Recommended settings for best-practice values.

ASR-only mode

To use this API in ASR only mode, do either of the following:

  • Set output_stream to null — you will still get text translations, but no TTS audio.
  • Use an empty translations list — you will get neither text translations nor TTS audio.

get_task

Return the current task (as a current_task response). Set exclude_hidden to false to include advanced (hidden) fields in the response.

{ "message_type": "get_task", "data": { "exclude_hidden": true } }

pause_task

Pause the current task. No audio is processed and no billing applies while the task is paused. Use set_task to resume.

{ "message_type": "pause_task", "data": {} }

flush_task

Flush the translation of already spoken phrases (cancel their processing) without pausing upcoming ones. Useful when the current phrase no longer needs finishing — for example, if your conversation partner interrupts you.

{
"message_type": "flush_task",
"data": {
"languages": ["global"], // 'global' (all languages, default) or target languages from `translations`
"pause_task": false // pause the task after flushing
}
}

end_task

Finish the current task. The server closes the connection after receiving end_task.

{
"message_type": "end_task",
"data": {
"eos_timeout": 4, // optional, 1–30 seconds
"force": false // set true to skip finalization of the last phrase
}
}

If eos_timeout is set, the server waits until that many seconds have passed since your last detected speech, then sends an eos response (see below) before closing. Use it to make sure the tail of the translation is fully processed and delivered.

tts_task

Generates TTS from text within the active task. The behavior depends on translate_text (default false):

  • "translate_text": false — TTS only. language is the language to speak in and must be one of the target_language values configured in the task's translations section (otherwise a VALIDATION_ERROR is returned).
  • "translate_text": true — the text is first translated into every target_language of the task, then synthesized. language is the language of the input text and must be a valid source language.

text is limited to 2048 characters. See the Text-to-Translated-Speech guide for details.

{
"message_type": "tts_task",
"data": {
"text": "Hello, how are you?",
"language": "en", // text language
"translate_text": false // default
}
}

input_audio_data

Send a base64-encoded input audio chunk when WebSockets is selected as the audio transport. The audio chunks you push must match the format / sample_rate / channels declared in your set_task command. The optimal chunk length is 320 ms. The base64 payload must be between 1 KB and 512 KB (≈384 KB of raw audio per message).

{
"message_type": "input_audio_data",
"data": {
"data": "base64 encoded data"
}
}

Responses (server → client)

Message typeDescription
partial_transcriptionUnconfirmed ASR segment
partial_translated_transcriptionUnconfirmed translation segment
validated_transcriptionFinal ASR segment
translated_transcriptionFinal translation
output_audio_dataChunk of generated TTS audio (WebSocket audio transport only)
current_taskResponse to the get_task command
eosEnd-of-stream confirmation (after end_task with eos_timeout)
warningAudio stream health warning
errorValidation or runtime error

partial_transcription

Transcription of an uncompleted segment:

{
"message_type": "partial_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "en",
"text": "One, two"
}
}
}

partial_translated_transcription

Translation of an uncompleted segment:

{
"message_type": "partial_translated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es",
"text": "Um, dois,"
}
}
}

validated_transcription

Transcription of a completed segment:

{
"message_type": "validated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "en",
"text": "One, two, three, four, five."
}
}
}

translated_transcription

Translation of a completed segment:

{
"message_type": "translated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es",
"text": "Um, dois, três, quatro, cinco."
}
}
}

output_audio_data

TTS audio chunk (when WebSockets is the audio transport):

{
"message_type": "output_audio_data",
"data": {
"transcription_id": "190983855fe3404e",
"translation_part_id": "0", // part index when a phrase is split into several TTS pieces
"language": "es", // TTS language
"last_chunk": false, // true on the last generated chunk for this `transcription_id`
"data": "base64 string", // ≤512 KB base64 (≈384 KB of raw audio)
"chunk_generation_delta": 120 // optional, ms between generation start and this chunk
}
}

The audio is always 24 kHz mono; the encoding (pcm_s16le or zlib_pcm_s16le) matches the output_stream.target.format of your task.

current_task

The response to the get_task command. Contains the current task configuration plus its status:

{
"message_type": "current_task",
"data": {
"input_stream": { /* ... */ },
"output_stream": { /* ... */ },
"pipeline": { /* ... */ },
"task_status": "running" // running | paused | unknown
}
}

current_task is sent in response to get_task, and also automatically when you reconnect to an already running pipeline. The server does not send it as a confirmation of set_task — to verify that a new task has started, poll get_task (it returns a NOT_FOUND error until the pipeline is up).

eos

End-of-stream confirmation. Sent after an end_task with eos_timeout, once the requested silence period has elapsed; the connection is closed right after.

{ "message_type": "eos", "data": {} }

warning

Audio stream health warnings. The server monitors the pace of your input_audio_data stream:

{
"message_type": "warning",
"data": {
"code": "AUDIO_STREAM_TOO_FAST", // AUDIO_STREAM_TOO_FAST | AUDIO_STREAM_TOO_SLOW | AUDIO_STREAM_STALLED
"message": "Human-readable description"
}
}

Send audio at real-time rate (e.g., one 320 ms chunk every 320 ms): pushing faster, slower, or stopping mid-stream triggers these warnings and degrades translation quality.

error

Validation, authorization, or other kinds of errors:

{
"message_type": "error",
"data": {
"code": "VALIDATION_ERROR",
"desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
"msg": "value is not a valid enumeration member; permitted: 'audio'\", 'type': 'type_error.enum'",
"param": null
}
}