Quick Start
Welcome to the Quick Start guide for real-time speech-to-speech translation using the Palabra API. In this guide, you'll learn how to create a streaming session, configure translation tasks, and publish and receive audio streams using WebSockets and WebRTC.
Real-Time Speech-to-Speech Translation Stream Guide
This guide will begin by walking you through the process of connecting to the Palabra WebSocket server, where you'll manage streaming configurations. After that, we'll will explain how to use the Palabra WebRTC server to send your speech as an audio stream and receive the translated audio in real time.
Step 1: Obtain your Credentials
First, sign up for a Palabra Account and follow these instructions to generate your API token. This token contains your Client ID
and Client Secret
, which you'll need for authenticating API requests.
Step 2: Create a Streaming Session
Before connecting to the WebSocket and WebRTC servers, you need to create a streaming session.
Send a request to the POST /session-storage/sessions
endpoint using your Client ID
and Client Secret
as headers. This will return the session details you'll need:
room_name
: The name of the WebRTC server room to join.stream_url
: The Streaming API URL for sending your audio stream.control_url
: The WebSocket API URL for managing translation settings.publisher
: An array of JWT tokens for the publisher user, including theaccess token
used for managing translation tasks via the WebSocket API (step 3) and connecting to the WebRTC server (step 4).
Learn more:
To learn more about Palabra's Streaming API, check out the following documentation:
- Speech-to-Speech Real-Time Translation Streaming: Learn about the core components of the Palabra API, including WebRTC for audio streaming and the WebSocket API for managing translation settings.
- Streaming Session: Learn more about the Streaming Session endpoint, which handles the creation of rooms for audio publishing and translation management.
Step 3. Connect to the WebSocket Server
The WebSocket connection allows you to manage translation settings in real-time.
3.1. Initialize the WebSocket Connection
Connect to the control_url
using your access_token
value as a token
GET parameter:
// Palabra WebSocket endpoint
const endpoint = "{control_url}?token={access_token}";
// Initialize the WebSocket connection
const ws = new WebSocket(endpoint);
Optionally, you can also set the onmessage
WebSocket listener to receive text transcriptions of recognized and translated speech:
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
// Handle transcriptions
if (data.transcription) {
console.log("Transcription: ", data.transcription);
}
// Handle translations
if (data.translation) {
console.log("Translation: ", data.translation);
}
};
With this listener, you can display the transcribed or translated text in your application, providing real-time feedback on the audio translation.
3.2. Configure the Translation
To manage translation settings, send a set_task
message through the WebSocket.This message allows you to specify fields like the room_name
, source_language
, and target_language
using the configuration options.
// Example: Setting translation from English to Spanish via WebSocket
const setTaskMessage = {
action: "set_task",
room_name: "{room_name}",
source_language: "en",
target_language: "es"
};
ws.send(JSON.stringify(setTaskMessage));
This request configures the translation task with your desired language settings.
Learn more:
To learn more about managing translation tasks through the WebSocket, check out the Translation Management WebSocket API.
Step 4: Connect to the WebRTC Server
The WebRTC connection allows you to stream your speech via your local audio stream in real-time and receive the translated audio stream.
To easily manage WebRTC connections, use the LiveKit SDK.
4.1. Set Up the Room
Create a new LiveKit Room and add a listener for the trackSubscribed
event to capture the audio track from the server. Once available, you can connect this track to your device's audio output to hear the translated speech.
4.2. Join the Room
Connect to the LiveKit room using the stream_url
and access_token
:
livekitRoom.connect(stream_url, access_token, { autoSubscribe: true })
Here, autoSubscribe: true
automatically subscribes to all tracks published to the room, ensuring you receive the translated audio.
4.3. Publish Your Audio
Before publishing, obtain your local audio track using a function like getUserMedia()
:
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const audioTrack = stream.getAudioTracks()[0];
livekitRoom.localParticipant.publishTrack(audioTrack);
})
.catch(err => console.error("Error accessing local audio: ", err));
After capturing your local audio, you can publish it to the LiveKit room:
livekitRoom.localParticipant.publishTrack(audioTrack);
Learn more
For more information about audio streaming with Palabra, check out How to publish an audio stream and How to receive an audio stream.
Summary
Once your streaming session is set up correctly:
- Your local audio track will be published to the server's WebRTC room for translation.
- The translated audio stream will be available to you as an incoming track via WebRTC.
- You can play this track through your device's sound output to hear the translation.
- You can send a new
set_task
WebSocket message to update translation settings without needing to reconnect to the streaming session.
By following these steps, you’ll be able to publish your local audio stream, receive real-time translations, and manage translation tasks dynamically, all with the Palabra API.