Skip to main content

Palabra.ai Integration with Agora

Introduction

Palabra's API solution enables real-time speech translation through a WebRTC-based architecture using Agora SDK.

The process involves creating a secure session with multiple access tokens, establishing a connection to a Agora Translation Channel, publishing your original audio stream into the Channel, and configuring the translation pipeline with your desired language settings.

Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then publishes the translated audio track to the same Channel, allowing you to subscribe to it and play it back in your application instantly.

Prerequisites

You have to create multiple Agora Tokens for processing the audio stream translation:

  • 1st token – for connecting to the Agora channel and publishing your original audio stream
  • 2nd token – for the Palabra Translation Task to retrieve your original audio stream from the Agora channel
  • 3rd token – for the Palabra Translation Task to publish the translated audio stream to the Agora channel
  • Optional (4th+ tokens) – if you wish to translate your original speech into multiple languages in parallel, you will need one additional token per language, in addition to the 3rd token used for the first target language.

You must have the Agora App ID and at least 3 Token Data sets, created on Agora side:

    interface AgoraTokenData {
token: string
channel: string
uid: number
}

const channelTokenData: AgoraTokenData
const receiverTokenData: AgoraTokenData
const translatorOneTokenData: AgoraTokenData
// const translatorTwoTokenData: AgoraTokenData
// const translatorThreeTokenData: AgoraTokenData

const agoraAppID = "<YOUR_AGORA_APP_ID>"

Use the "Authenticate Your Users with Tokens" Agora guide to create Token Data sets.

Step 1. Connect to the Agora Channel

Use the Agora SDK to join the Translation Channel with your agoraAppID and сhannelTokenData (see "Prerequisites" section).

npm install agora-rtc-sdk-ng
import AgoraRTC from 'agora-rtc-sdk-ng';

// Create the Agora Client instance
client = AgoraRTC.createClient({ mode: 'rtc', codec: 'vp8' });

// Setup the `user-published` event listener to play new track from channel
client.on('user-published', async (user, mediaType) => {
await client?.subscribe(user, mediaType);
if (mediaType === 'audio') {
user.audioTrack?.play();
}
});

// Setup the `user-unpublished` event listener to get rid of HTML players no longer used
client.on('user-unpublished', (user) => {
const remotePlayerContainer = document.getElementById(user.uid)?;
remotePlayerContainer?.remove();
});

// Join the Agora channel
await client.join(
agoraAppID,
сhannelTokenData.channel,
сhannelTokenData.token,
Number(сhannelTokenData.uid)
);

// Publish your local mic audio track to Agora Channel
const localAudioTrack = await AgoraRTC.createMicrophoneAudioTrack();
await client.publish([localAudioTrack]);

As a result, you will be connected to the Agora channel, your microphone audio stream will be published to it, and you will be ready to automatically play the translation audio stream as soon as Palabra publishes it (see next step).

Step 2. Start the Translation

To translate your published original speech from English to Spanish in real time, use your channelTokenData, receiverTokenData and translatorOneTokenData to call the POST https://streaming-demo.palabra.site/agora/translations endpoint and create a Translation Task from en to es.

You also have to generate and provide any random UUID as the Authorization header for your request. Later you have to use the same Authorization header to get or update your translation tasks if needed.

Request Example

const authUUID = uuidv4(); // Any random UUID: https://www.npmjs.com/package/uuid

const { data } = await axios.post(
"https://streaming-demo.palabra.site/agora/translations",
{
"channel": channelTokenData.channel,
"remote_uid": сhannelTokenData.uid,
"local_uid": receiverTokenData.uid,
"token": receiverTokenData.token,
"speech_recognition": {
"source_language": "en", // Translate FROM
"options": {}
},
"translations": [
{
"token": translatorTokenOneData.token,
"local_uid": translatorTokenOneData.uid,
"target_language": "es", // Translate TO
"options": {}
}
]
},
{
headers: {
Authorization: authUUID
}
}
);

Multiple languages translation

To translate the original audio into multiple languages, you need to add additional objects to the translations array, providing an extra Token and UID for each additional language.

    // ...
"translations": [
{
"token": translatorTokenOneData.token,
"local_uid": translatorTokenOneData.uid,
"target_language": "es", // Spanish
"options": {}
},
{
"token": translatorTokenTwoData.token, // New token
"local_uid": translatorTokenTwoData.uid, // New unique UID
"target_language": "fr", // French
"options": {}
},
]
// ...

Note: The user-published event (from Step 1) provides access to the user entity, which includes the related UID (sent in Step 2). You can use this UID to distinguish which track corresponds to which language.

Speech Recognition Configuration

By default, the recommended settings are applied to the speech recognition pipeline if you do not provide any additional configuration.

However, you can customize the speech recognition by passing extra settings through the options object:

// ...
"speech_recognition": {
"source_language": "en",
"options": { // Manual Configuration
"segment_confirmation_silence_threshold": 0.7,
"sentence_splitter": {
"enabled": true
},
}
},
// ...

Check out the list of settings numbered as 2.3.2.X in the Translation Settings Breakdown to see which options can be applied to speech_recognition.

Translation Configuration

By default, the recommended settings are applied to the translation pipeline if you do not provide any additional configuration.

However, you can customize the translation by passing extra settings through the options object:

    // ...
"translations": [
{
"token": translatorTokenOneData.token,
"local_uid": translatorTokenOneData.uid,
"target_language": "es",
"options": { // Manual Configuration
"speech_generation": {
"voice_cloning": true,
"voice_timbre_detection": {
"enabled": true,
"high_timbre_voices": ["default_high"],
"low_timbre_voices": ["default_low"]
}
}
}
},
]
// ...

Check out the list of settings numbered as 2.3.3.X in the Translation Settings Breakdown to see which options can be applied to translations.

Optional Step 3. Handle Transcriptions (Captions)

If you wish, you can also receive transcriptions (captions) for both the original and translated speech in real time . All you need to do is connect to the Palabra Centrifuge server via WebSockets.

npm install centrifuge
import { Centrifuge } from 'centrifuge';

const taskID = "<TRANSLATION_TASK_ID>"; // get it from response in Step 2
const serverURL = "https://streaming-demo.palabra.site/agora/translations/subtitles";
const connectionURL = `${serverURL}/${taskId}`;

// Fetch Translation task's meta data by its taskID
const { data } = await axios.get(connectionURL, {
headers: {
Authorization: authUUID // the same value as used during task creation in step 2
}
});

// Create the Centrifuge client insance
const centrifuge = new Centrifuge(data.websocket_url, { token: data.connection_token });

// Configure your subscription for the Centrifuge's data
const subscription = centrifuge.newSubscription(taskID, { token: data.subscription_token });

// Set up the `publication` event listener to handle the captions
subscription.on('publication', (ctx) => {
const { data } = ctx;
console.log('captions record', data); // Here is your captions message
});

// Init your subscription and connect the server
subscription.subscribe();
centrifuge.connect();

Summary

As soon as you create the Translation Task in Step 2, Palabra will take your published original audio track from Agora channel, translate it into the target language specified in the settings, and publish the translated track to the same channel.

Since you set up the user-published event listener in Step 1, the new translated track will be automatically played in your browser.

If you have configured transcription handling in Step 3, you will receive real-time messages containing partial and validated captions of your original speech, as well as translated captions of the translated speech.

Good to know

  • Due to browser security restrictions, audio cannot be played until the user has interacted with the page. Therefore, do not start the entire pipeline automatically when the page loads. Instead, wait for the user to perform an action (like pressing a 'Start' button) before activating audio playback and related processes.
  • Each Translation Task created in Step 2 has a 30-second Idle TTL. If the original audio track is not published to the Agora channel within 30 seconds of task creation, the task will be automatically deleted.

Need help?

Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].