Skip to main content

Quick Start

Welcome to the Quick Start guide for real-time speech-to-speech translation using the Palabra API. In this guide, you'll learn how to create a streaming session, configure translation tasks, and publish and receive audio streams using WebSockets and WebRTC.

Real-Time Speech-to-Speech Translation Stream Guide

This guide will begin by walking you through the process of connecting to the Palabra WebSocket server, where you'll manage streaming configurations. After that, we'll will explain how to use the Palabra WebRTC server to send your speech as an audio stream and receive the translated audio in real time.

Step 1: Obtain your Credentials

Quick Start Step 1

First, sign up for a Palabra Account and follow these instructions to generate your API token. This token contains your Client ID and Client Secret, which you'll need for authenticating API requests.

Step 2: Create a Streaming Session

Before connecting to the WebSocket and WebRTC servers, you need to create a streaming session.

Quick Start Step 2

Send a request to the POST /session-storage/sessions endpoint using your Client ID and Client Secret as headers. This will return the session details you'll need:

  • room_name: The name of the WebRTC server room to join.
  • stream_url: The Streaming API URL for sending your audio stream.
  • control_url: The WebSocket API URL for managing translation settings.
  • publisher: An array of JWT tokens for the publisher user, including the access token used for managing translation tasks via the WebSocket API (step 3) and connecting to the WebRTC server (step 4).

Learn more:

To learn more about Palabra's Streaming API, check out the following documentation:

  • Speech-to-Speech Real-Time Translation Streaming: Learn about the core components of the Palabra API, including WebRTC for audio streaming and the WebSocket API for managing translation settings.
  • Streaming Session: Learn more about the Streaming Session endpoint, which handles the creation of rooms for audio publishing and translation management.

Step 3. Connect to the WebSocket Server

The WebSocket connection allows you to manage translation settings in real-time.

Quick Start Step 3

3.1. Initialize the WebSocket Connection

Connect to the control_url using your access_token value as a token GET parameter:

// Palabra WebSocket endpoint
const endpoint = "{control_url}?token={access_token}";

// Initialize the WebSocket connection
const ws = new WebSocket(endpoint);

Optionally, you can also set the onmessage WebSocket listener to receive text transcriptions of recognized and translated speech:

ws.onmessage = function(event) {
const data = JSON.parse(event.data);

// Handle transcriptions
if (data.transcription) {
console.log("Transcription: ", data.transcription);
}

// Handle translations
if (data.translation) {
console.log("Translation: ", data.translation);
}
};

With this listener, you can display the transcribed or translated text in your application, providing real-time feedback on the audio translation.

3.2. Configure the Translation

To manage translation settings, send a set_task message through the WebSocket.This message allows you to specify fields like the room_name, source_language, and target_language using the configuration options.

// Example: Setting translation from English to Spanish via WebSocket
const setTaskMessage = {
action: "set_task",
room_name: "{room_name}",
source_language: "en",
target_language: "es"
};

ws.send(JSON.stringify(setTaskMessage));

This request configures the translation task with your desired language settings.

Learn more:

To learn more about managing translation tasks through the WebSocket, check out the Translation Management WebSocket API.

Step 4: Connect to the WebRTC Server

The WebRTC connection allows you to stream your speech via your local audio stream in real-time and receive the translated audio stream.

To easily manage WebRTC connections, use the LiveKit SDK.

Quick Start Step 4

4.1. Set Up the Room

Create a new LiveKit Room and add a listener for the trackSubscribed event to capture the audio track from the server. Once available, you can connect this track to your device's audio output to hear the translated speech.

4.2. Join the Room

Connect to the LiveKit room using the stream_url and access_token:

livekitRoom.connect(stream_url, access_token, { autoSubscribe: true })

Here, autoSubscribe: true automatically subscribes to all tracks published to the room, ensuring you receive the translated audio.

4.3. Publish Your Audio

Before publishing, obtain your local audio track using a function like getUserMedia():

navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const audioTrack = stream.getAudioTracks()[0];
livekitRoom.localParticipant.publishTrack(audioTrack);
})
.catch(err => console.error("Error accessing local audio: ", err));

After capturing your local audio, you can publish it to the LiveKit room:

livekitRoom.localParticipant.publishTrack(audioTrack);

Learn more

For more information about audio streaming with Palabra, check out How to publish an audio stream and How to receive an audio stream.

Summary

Once your streaming session is set up correctly:

  • Your local audio track will be published to the server's WebRTC room for translation.
  • The translated audio stream will be available to you as an incoming track via WebRTC.
  • You can play this track through your device's sound output to hear the translation.
  • You can send a new set_task WebSocket message to update translation settings without needing to reconnect to the streaming session.

By following these steps, you’ll be able to publish your local audio stream, receive real-time translations, and manage translation tasks dynamically, all with the Palabra API.