Translation management API

Control Palabra real-time translation pipeline over WebSockets or a WebRTC data channel.

1. Prerequisites

Before you connect, make sure you have the following:

webrtc_url – The URL of the Palabra WebRTC server.
ws_url – The URL of the Palabra WebSocket server.
publisher access token – A JWT used to authorize your connection.

All three values are returned when you create a streaming session.

2. Choose a transport

2.1 Option 1. WebRTC

Connect with any LiveKit client to webrtc_url using your publisher token.
Once the connection is open, you can start sending commands through the default (empty-topic) WebRTC data channel.

2.2 Option 2. WebSockets

Connect to ws_url, passing your publisher token as a query parameter:

// WebSocket control URL
const endpoint = `${ws_url}?token=${publisher}`;
const socket = new WebSocket(endpoint);

Once the connection is open, you can start sending commands.

3. Message format (WebRTC & WebSockets)

Every API packet—request and response has the same envelope:

{
  "message_type": "<string>",
  "data": { /* payload */ }
}

If message_type is "error", the data field contains diagnostic information.

4. Typical workflow

Create task - send a set_task message type to start the translation.
Update task – Send another set_task message to update the translation settings during an ongoing translation.
Pause processing - send a pause_task message type to pause the translation (stops billing). Resume with another set_task.
Flush processing - send a flush_task message type to instantly stop transcription, translation, and speech of the current phrase without pausing the translation of subsequent phrases.
Finish task - send an end_task (the server will close your connection automatically; session will be invalidated in 1 minute).

5. Message Settings reference

What each field means - see the translation settings breakdown.
Recommended values - see the best-practice settings.

6. Streaming audio configuration

6.1 Option 1. WebRTC audio I/O configuration

Use the following input/output streams configuration in your set_task:

{
  "data": {
    "input_stream": {
      "content_type": "audio",
      "source": {
        "type": "webrtc"
      }
    },
    "output_stream": {
      "content_type": "audio",
      "target": {
        "type": "webrtc"
      }
    }
    // ...
  }
}

Publish your microphone track to LiveKit Room.
Subscribe to the translation tracks that Palabra will publish in the same LiveKit Room after you send the set_task message.

6.2 Option 2. WebSocket audio I/O configuration

Use this configuration instead:

{
  "data": {
    "input_stream": {
      "content_type": "audio",
      "source": {
        "type": "ws",
        "format": "pcm_s16le", // or opus, wav
        "sample_rate": 16000, // 16000 - 24000
        "channels": 1         // 1 or 2
      }
    },
    "output_stream": {
      "content_type": "audio",
      "target": {
        "type": "ws",
        "format": "pcm_s16le" // or zlib_pcm_s16le
      }
    }
  }
}

Send base-64 audio chunks that exactly match the declared format.
Receive base-64 TTS chunks in output_audio_data responses:

{
  "message_type": "output_audio_data",
  "data": {
    "transcription_id": "190983855fe3404e",
    "language":        "es",
    "last_chunk":      false,
    "data":            "<base64-encoded audio>"
  }
}

7. API messages

Request message schema

Loading ....

Response message schema

Loading ....

7.1 Requests (client → server)

Message type	Short description
`set_task`	Create/update translation task
`end task`	Finish translation task
`get_task`	Return current task
`pause_task`	Pause current task, use `set_task` to continue
`flush_task`	Cancel processing of the current (ongoing) phrase.
`tts_task`	Generate TTS from text
`input_audio_data`	Input audio data chunk (Websockets audio transport only)

`set_task`

Create a new task or modify the current one.

Sending for the first time after creating a session - starts the translation.
Sending after pause_task - resumes the translation.
Sending another set_task message during an ongoing translation updates the current translation settings in real time—no need to stop the translation.

{
  "message_type": "set_task",
  "data": {
    "input_stream":  { /* Depending on transport, see the audio I/O section above */ },
    "output_stream": { /* Depending on transport, see the audio I/O section above */ },
    "pipeline": {
      "transcription": {
        "source_language": "string",
        "detectable_languages": ["string"],
        "segment_confirmation_silence_threshold": "float",
        "only_confirm_by_silence": "bool",
        "sentence_splitter": {
          "enabled": "bool"
         },
        "verification": {
          "auto_transcription_correction": "bool",
          "transcription_correction_style": "string"
        }
      },
      // Translation and speech generation settings for one or more target languages
      "translations": [
        {
          "target_language": "string",
          "translate_partial_transcriptions": "bool",
          "speech_generation": {
            "voice_cloning": "bool",
            "voice_id": "string",
            "voice_timbre_detection": {
              "enabled": "bool",
              "high_timbre_voices": [],
              "low_timbre_voices": []
            }
          }
        }
        // You can add more targets
      ],
      "translation_queue_configs": {
        "global": {
          "desired_queue_level_ms": "int",
          "max_queue_level_ms": "int",
          "auto_tempo": "bool"
        },
        "es": {
          "desired_queue_level_ms": "int",
          "max_queue_level_ms": "int"
        }
      },
      // Select response types to receive
      "allowed_message_types": [
        "translated_transcription",
        "partial_transcription",
        "partial_translated_transcription",
        "validated_transcription"
      ]
    }
  }
}

See Translation settings breakdown for details on each field and recommended settings of the translation's task pipeline.

ASR Only Mode

To use Palabra AI in ASR only mode, you can do either of the following:

Set the output_stream to null. (You will still get text translations, but no TTS audio.)

Use an empty translations list (You will not get text translations and no TTS audio.)

`end_task`

Finish the current task. The server closes the connection after receiving end_task.

{
  "message_type": "end_task",
  "data": { "force": false } // set true to skip finalization of the last phrase
}

`pause_task`

Pause the current task. No audio data is processed and no billing while the task is paused. Use set_task to resume translation.

{ "message_type": "pause_task", "data": {} }

`flush_task`

Flush the translation of already spoken phrases (cancel their processing) without pausing upcoming ones. Useful when the current phrase no longer needs finishing - for example, if your conversation partner interrupts you.

{ 
  "message_type": "flush_task", 
  "data": {
    "languages": ["global"], // must be either 'global' or match the other languages listed in `translation_queue_configs`.
    "pause_task": false
  } 
}

`get_task`

Return the current task.

{ "message_type": "pause_task", "data": {} }

`tts_task`

Generates TTS from a text. It will be translated to all target_language in translations task section.

{
  "message_type": "tts_task",
  "data": {
    "text": "Hello, how are you?",
    "language": "en" // text language
  }
}

`input_audio_data`

Used to send input base64 encoded audio data chunk whe Websockets selected as audio transport. The audio chunks you push must match the format / sample_rate / channels you declare in your set_task command. The optimal chunk length is 320ms.

{
  "message_type": "input_audio_data",
  "data": {
    "data": "base64 encoded data"
  }
}

7.2 Responses (server → client)

Message type	Short description
`partial_transcription`	Unconfirmed ASR segment
`partial_translated_transcription`	Unconfirmed translation segment
`validated_transcription`	Final ASR segment
`translated_transcription`	Final translation
`output_audio_data`	Chunk of generated TTS audio (WebSockets audio transport only)
`current_task`	`get_task` command response
`error`	Validation or runtime error

To receive partial_transcription, validated_transcription, and translated_transcription messages, you must include these message types in the allowed_message_types field of your set_task command.

To receive partial_translated_transcription messages, you must must include it in the allowed_message_types field AND set translate_partial_transcriptions to true in your set_task command.

`partial_transcription`

Uncompleted segment transcription:

{
  "message_type": "partial_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "en",
      "text": "One, two"
    }
  }
}

`partial_translated_transcription`

Uncompleted segment translation.

{
  "message_type": "translated_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "es",
      "text": "Um, dois,"
    }
  }
}

`validated_transcription`

Completed segment transcription.

{
  "message_type": "validated_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "en",
      "text": "One, two, three, four, five."
    }
  }
}

`translated_transcription`

Completed segment translation:

{
  "message_type": "translated_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "es",
      "text": "Um, dois, três, quatro, cinco."
    }
  }
}

`output_audio_data`

TTS audio chunk (if you use Websockets as audio transport).

{
  "message_type": "output_audio_data",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "es", // TTS language
      "last_chunk": false, // Last generated chunk for this `transcription_id`
      "data": "base64 string"
    }
  }
}

`current_task`

The get_task command response.

{
  "message_type": "error",
  "data": {
    "code": "VALIDATION_ERROR",
    "desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
    "msg": "value is not a valid enumeration member; permitted: 'audio'\", 'type': 'type_error.enum'",
    "param": null
  }
}

`error`:

Validation, authorization or other kinds of errors.

{
  "message_type": "error",
  "data": {
    "code": "VALIDATION_ERROR",
    "desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
    "param": null
  }
}

1. Prerequisites​

2. Choose a transport​

2.1 Option 1. WebRTC​

2.2 Option 2. WebSockets​

3. Message format (WebRTC & WebSockets)​

4. Typical workflow​

5. Message Settings reference​

6. Streaming audio configuration​

6.1 Option 1. WebRTC audio I/O configuration​

6.2 Option 2. WebSocket audio I/O configuration​

7. API messages​

Request message schema​

Response message schema​

7.1 Requests (client → server)​

set_task​

ASR Only Mode​

end_task​

pause_task​

flush_task​

get_task​

tts_task​

input_audio_data​

7.2 Responses (server → client)​

partial_transcription​

partial_translated_transcription​

validated_transcription​

translated_transcription​

output_audio_data​

current_task​

error:​