Skip to main content

Publishing and receiving audio

After you start a streaming session, Palabra returns two URLs and a JWT access token:

PurposeFieldTypical value
WebRTC (audio/control)webrtc_urlhttps://<STREAMING_SERVER>.palabra.ai/livekit/
WebSocket (audio/control)ws_urlwss://<STREAMING_SERVER>.palabra.ai/streaming-api/v1/speech-to-speech/stream
AuthenticationpublishereyJhbGciOiJIUzI1NiIsInR5cCI6…

You can publish audio, receive translated audio, and control the translation through either transport:

  • WebRTC — best for client applications (browsers, mobile apps). Handled by LiveKit.
  • WebSockets — convenient for server-side integrations.
Important
  • Regardless of the transport, you control the translation by sending JSON text messages — through the WebRTC data channel or the WebSocket connection respectively. See the Translation management API.
  • If you choose WebSockets as the audio transport, the audio chunks you push must match the format / sample_rate / channels declared in your set_task command.

Using the WebRTC transport

Use any LiveKit client library to publish your audio track, then create a translation task using the Translation management API. Palabra will publish a translated audio track for each target language.

Code examples

See the Quick Start Guide for code examples of publishing original audio (Step 4) and receiving translated audio (Step 5).

Using the WebSocket transport

Connect to ws_url using your publisher access token, create a translation task using the Translation management API, then start sending and receiving audio chunks as described below.

Publishing

Send base64-encoded audio chunks over the WebSocket. The chunks must match the format, sample_rate, and channels declared in your set_task command. The optimal chunk length is 320 ms.

Message format example:

{
"message_type": "input_audio_data",
"data": {
"data": "base64 encoded data"
}
}

Receiving

Palabra sends TTS audio chunks as output_audio_data messages over the same WebSocket connection. The chunks are base64-encoded; the default format is 24 kHz 16-bit mono PCM (can be changed with the set_task command).

Message format example:

{
"message_type": "output_audio_data",
"data": {
"transcription_id": "190983855fe3404e",
"language": "es", // TTS language
"last_chunk": false, // true on the last generated chunk for this `transcription_id`
"data": "base64 string"
}
}