Skip to main content

Quick Start (WebRTC)

Best for client-side apps

The following steps explain how to use the WebRTC-based architecture, which is recommended for client-side applications. If you are looking for backend solutions, please refer to our other WebSockets Quick Start Guide.

Introduction

Palabra's API solution enables real-time speech translation through a WebRTC-based architecture using LiveKit.

The process involves creating a secure session, establishing a connection to a Palabra Translation Room, publishing your original audio stream into the Room, and configuring the translation pipeline with your desired language settings.

Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then publishes the translated audio track to the same room, allowing you to subscribe to it and play it back in your application instantly.

Step 1. Get API Credentials

Visit Palabra API Keys section to obtain your Client ID and Client Secret.

Step 2. Create a Session

Use your credentials to call the POST /session-storage/session endpoint. You'll receive the webrtc_url and publisher JWT token, required for further steps.

Request Example

const { data } = await axios.post(
"https://api.palabra.ai/session-storage/session",
{
data: {
subscriber_count: 0,
publisher_can_subscribe: true,
},
},
{
headers: {
ClientId: "<API_CLIENT_ID>",
ClientSecret: "<API_CLIETN_SECRET>",
},
}
);

Response Example

{
"publisher": "eyJhbGciOiJIU...Gxr2gjWSA4",
"subscriber": [],
"webrtc_room_name": "50ff0fa2",
"webrtc_url": "https://streaming-0.palabra.ai/livekit/",
"ws_url": "wss://streaming-0.palabra.ai/streaming-api/v1/speech-to-speech/stream",
"id": "7f99b553-4697...7d450728"
}

webrtc_url - WebRTC (livekit) server to connect to the Translation room.

publisher - JWT Token to authenticate your connection to the WebRTC server.

Step 3. Connect to the Translation Room

Use the LiveKit SDK to join the webrtc_url with your publisher token you received on Step 2.

npm install livekit-client
import { Room } from "livekit-client";

const connectTranslationRoom = async (WEBRTC_URL, PUBLISHER) => {
try {
const room = new Room();
await room.connect(WEBRTC_URL, PUBLISHER, { autoSubscribe: true });
return room;
} catch (e) {
console.error(e);
throw e;
}
};

Step 4. Publish the Original Audio Stream

Get the audio stream from your microphone and publish it to the room using LiveKit SDK.

Example

import { LocalAudioTrack } from "livekit-client";

const publishAudioTrack = async (room) => {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: { channelCount: 1 } });
const localTrack = new LocalAudioTrack(stream.getAudioTracks()[0]);
await room.localParticipant.publishTrack(localTrack, {
dtx: false,
red: false,
audioPreset: {
maxBitrate: 32000,
priority: "high"
}
});
} catch (e) {
console.error("Error while publishing audio track:", e);
throw e;
}
}

Step 5. Handle Translated Audio Track

As soon as the translated audio track is published in the room, you will be auto-subscribed to it. You can handle it within a callback and play it through the speakers.

Example

import { RoomEvent } from "livekit-client";

const playTranslationInBrowser = (track) => {
if (track.kind === "audio") {
const mediaStream = new MediaStream([track.mediaStreamTrack]);
const audioElement = document.getElementById(
"remote-audio"
); // Your HTML audio element

if (audioElement) {
audioElement.srcObject = mediaStream;
audioElement.play();
} else {
console.error("Audio element not found!");
}
}
};

// Add a handler for a TrackSubscribed event
room.on(RoomEvent.TrackSubscribed, playTranslationInBrowser);

Step 6. Start the Translation

Use the webRTC connection's data channel to publish a set_task message with the translation settings to start the translation process.

Example

const startTranslation = (room, translationSettings) => {
const payload = JSON.stringify({
message_type: "set_task",
data: translationSettings
});
const encoder = new TextEncoder();
const message = encoder.encode(payload);

// Send the set_task message through the data channel
room.localParticipant.publishData(message, { reliable: true });
};

Summary

As soon as you send the set_task message, Palabra will take your published original audio track, translate it into the target language specified in the settings, and publish the translated track to the same room. The LiveKit SDK will auto-subscribe you to this translated audio stream, making it available for real-time playback through the speakers.

Good to know

  • Read more about pausing and stopping the translation on Translation management section.
  • Unused sessions remain active for at least 1 minute. To avoid reaching the limit of simultaneously active sessions, it's best practice to delete unused sessions when you stop translation or when the page is unmounted. Learn more about Sessions Lifcycle.
  • Due to browser security restrictions, audio cannot be played until the user has interacted with the page. Therefore, do not start the entire pipeline automatically when the page loads. Instead, wait for the user to perform an action (like pressing a 'Start' button) before activating audio playback and related processes.

Need help?

Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].