Translation management API
Control Palabra real-time translation pipeline over WebSockets or a WebRTC data channel.
1. Prerequisites
Before you connect, make sure you have the following:
webrtc_url
– The URL of the Palabra WebRTC server.ws_url
– The URL of the Palabra WebSocket server.publisher
access token – A JWT used to authorize your connection.
All three values are returned when you create a streaming session.
2. Choose a transport
2.1 Option 1. WebRTC
- Connect with any LiveKit client to
webrtc_url
using yourpublisher
token. - Once the connection is open, you can start sending commands through the default (empty-topic) WebRTC data channel.
2.2 Option 2. WebSockets
Connect to ws_url
, passing your publisher
token as a query parameter:
// WebSocket control URL
const endpoint = `${ws_url}?token=${publisher}`;
const socket = new WebSocket(endpoint);
Once the connection is open, you can start sending commands.
3. Message format (WebRTC & WebSockets)
Every API packet—request and response has the same envelope:
{
"message_type": "<string>",
"data": { /* payload */ }
}
If message_type
is "error"
, the data
field contains diagnostic information.
4. Typical workflow
- Create task - send a
set_task
message type to start the translation. - Update task – Send another
set_task
message to update the translation settings during an ongoing translation. - Pause processing - send a
pause_task
message type to pause the translation (stops billing). Resume with anotherset_task
. - Finish task - send an
end_task
(the server will close your connection automatically; session will be invalidated in 1 minute).
5. Message Settings reference
- What each field means - see the translation settings breakdown.
- Recommended values - see the best-practice settings.
6. Streaming audio configuration
6.1 Option 1. WebRTC audio I/O configuration
Use the following input/output streams configuration in your set_task
:
{
"data": {
"input_stream": {
"content_type": "ws",
"source": {
"type": "webrtc"
}
},
"output_stream": {
"content_type": "ws",
"target": {
"type": "webrtc"
}
}
// ...
}
}
- Publish your microphone track to LiveKit Room.
- Subscribe to the translation tracks that Palabra will publish in the same LiveKit Room after you send the
set_task
message.
6.2 Option 2. WebSocket audio I/O configuration
Use this configuration instead:
{
"data": {
"input_stream": {
"content_type": "ws",
"source": {
"type": "ws",
"format": "opus", // or pcm_s16le, wav
"sample_rate": 24000, // 16000 - 24000
"channels": 1 // 1 or 2
}
},
"output_stream": {
"content_type": "ws",
"target": {
"type": "ws",
"format": "pcm_s16le" // or zlib_pcm_s16le
}
}
}
}
- Send base-64 audio chunks that exactly match the declared format.
- Receive base-64 TTS chunks in
output_audio_data
responses:
{
"message_type": "output_audio_data",
"data": {
"transcription_id": "190983855fe3404e",
"language": "es",
"last_chunk": false,
"data": "<base64-encoded audio>"
}
}
7. API messages
Request message schema
Response message schema
7.1 Requests (client → server)
Message type | Short description |
---|---|
set_task | Create/update translation task |
end task | Finish translation task |
get_task | Return current task |
pause_task | Pause current task, use set_task to continue |
tts_task | Generate TTS from text |
input_audio_data | Input audio data chunk (Websockets audio transport only) |
set_task
Create a new task or modify the current one.
- Sending for the first time after creating a session - starts the translation.
- Sending after
pause_task
- resumes the translation. - Sending another
set_task
message during an ongoing translation updates the current translation settings in real time—no need to stop the translation.
{
"message_type": "set_task",
"data": {
"input_stream": { /* Depending on transport, see the audio I/O section above */ },
"output_stream": { /* Depending on transport, see the audio I/O section above */ },
"pipeline": {
"transcription": {
"source_language": "string",
"detectable_languages": ["string"],
"segment_confirmation_silence_threshold": "float",
"only_confirm_by_silence": "bool",
"sentence_splitter": {
"enabled": "bool"
},
"verification": {
"auto_transcription_correction": "bool",
"transcription_correction_style": "string"
}
},
// Translation and speech generation settings for one or more target languages
"translations": [
{
"target_language": "string",
"translate_partial_transcriptions": "bool",
"speech_generation": {
"voice_cloning": "bool",
"voice_id": "string",
"voice_timbre_detection": {
"enabled": "bool",
"high_timbre_voices": [],
"low_timbre_voices": []
}
}
}
// You can add more targets
],
"translation_queue_configs": {
"global": {
"desired_queue_level_ms": "int",
"max_queue_level_ms": "int",
"auto_tempo": "bool"
},
"es": {
"desired_queue_level_ms": "int",
"max_queue_level_ms": "int"
}
},
// Select response types to receive
"allowed_message_types": [
"translated_transcription",
"partial_transcription",
"partial_translated_transcription",
"validated_transcription"
]
}
}
}
See Translation settings breakdown for details on each field and recommended settings of the translation's task pipeline.
ASR Only Mode
To use Palabra AI in ASR only
mode, you can do either of the following:
- Set the
output_stream
tonull
. (You will still get text translations, but no TTS audio.)- Use an empty
translations
list (You will not get text translations and no TTS audio.)
end_task
Finish the current task.
The server closes the connection after receiving end_task
.
{
"message_type": "end_task",
"data": { "force": false } // set true to skip finalization of the last phrase
}
pause_task
Pause the current task.
No audio data is processed and no billing while the task is paused.
Use set_task
to resume translation.
{ "message_type": "pause_task", "data": {} }
get_task
Return the current task.
{ "message_type": "pause_task", "data": {} }
tts_task
Generates TTS from a text.
It will be translated to all target_language
in translations
task section.
{
"message_type": "tts_task",
"data": {
"text": "Hello, how are you?",
"language": "en" // text language
}
}
input_audio_data
Used to send input base64 encoded audio data chunk whe Websockets selected as audio transport.
The audio chunks you push must match the format / sample_rate / channels
you declare in your set_task
command.
The optimal chunk length is 320ms.
{
"message_type": "input_audio_data",
"data": {
"data": "base64 encoded data"
}
}
7.2 Responses (server → client)
Message type | Short description |
---|---|
partial_transcription | Unconfirmed ASR segment |
partial_translated_transcription | Unconfirmed translation segment |
validated_transcription | Final ASR segment |
translated_transcription | Final translation |
output_audio_data | Chunk of generated TTS audio (WebSockets audio transport only) |
current_task | get_task command response |
error | Validation or runtime error |
To receive
partial_transcription
,validated_transcription
, andtranslated_transcription
messages, you must include these message types in the allowed_message_types field of yourset_task
command.To receive
partial_translated_transcription
messages, you must must include it in the allowed_message_types field AND set translate_partial_transcriptions totrue
in yourset_task
command.
partial_transcription
Uncompleted segment transcription:
{
"message_type": "partial_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "en",
"text": "One, two"
}
}
}
partial_translated_transcription
Uncompleted segment translation.
{
"message_type": "translated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es",
"text": "Um, dois,"
}
}
}
validated_transcription
Completed segment transcription.
{
"message_type": "validated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "en",
"text": "One, two, three, four, five."
}
}
}
translated_transcription
Completed segment translation:
{
"message_type": "translated_transcription",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es",
"text": "Um, dois, três, quatro, cinco."
}
}
}
output_audio_data
TTS audio chunk (if you use Websockets as audio transport).
{
"message_type": "output_audio_data",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es", // TTS language
"last_chunk": false, // Last generated chunk for this `transcription_id`
"data": "base64 string"
}
}
}
current_task
The get_task
command response.
{
"message_type": "error",
"data": {
"code": "VALIDATION_ERROR",
"desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
"msg": "value is not a valid enumeration member; permitted: 'audio'\", 'type': 'type_error.enum'",
"param": null
}
}
error
:
Validation, authorization or other kinds of errors.
{
"message_type": "error",
"data": {
"code": "VALIDATION_ERROR",
"desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
"param": null
}
}