Publishing and receiving audio
After you start a streaming session, Palabra returns two URLs and a JWT access token:
| Purpose | Field | Typical value |
|---|---|---|
| WebRTC (audio/control) | webrtc_url | https://<STREAMING_SERVER>.palabra.ai/livekit/ |
| WebSocket (audio/control) | ws_url | wss://<STREAMING_SERVER>.palabra.ai/streaming-api/v1/speech-to-speech/stream |
| Authentication | publisher | eyJhbGciOiJIUzI1NiIsInR5cCI6… |
You can publish audio, receive translated audio, and control the translation through either transport:
- WebRTC — best for client applications (browsers, mobile apps). Handled by LiveKit.
- WebSockets — convenient for server-side integrations.
- Regardless of the transport, you control the translation by sending JSON text messages — through the WebRTC data channel or the WebSocket connection respectively. See the Translation management API.
- If you choose WebSockets as the audio transport, the audio chunks you push must match the
format / sample_rate / channelsdeclared in your set_task command.
Using the WebRTC transport
Use any LiveKit client library to publish your audio track, then create a translation task using the Translation management API. Palabra will publish a translated audio track for each target language.
- LiveKit Python SDK
- LiveKit Golang SDK
- LiveKit JS SDK
- See other SDKs here.
Code examples
See the Quick Start Guide for code examples of publishing original audio (Step 4) and receiving translated audio (Step 5).
Using the WebSocket transport
Connect to ws_url using your publisher access token, create a translation task using the Translation management API, then start sending and receiving audio chunks as described below.
Publishing
Send base64-encoded audio chunks over the WebSocket. The chunks must match the format, sample_rate, and channels declared in your set_task command. The optimal chunk length is 320 ms.
Message format example:
{
"message_type": "input_audio_data",
"data": {
"data": "base64 encoded data"
}
}
Receiving
Palabra sends TTS audio chunks as output_audio_data messages over the same WebSocket connection. The chunks are base64-encoded; the default format is 24 kHz 16-bit mono PCM (can be changed with the set_task command).
Message format example:
{
"message_type": "output_audio_data",
"data": {
"transcription_id": "190983855fe3404e",
"language": "es", // TTS language
"last_chunk": false, // true on the last generated chunk for this `transcription_id`
"data": "base64 string"
}
}