Palabra.ai Integration with Agora

Introduction

Palabra's API solution enables real-time speech translation through a WebRTC-based architecture using Agora SDK.

The process involves creating a secure session with multiple access tokens, establishing a connection to a Agora Translation Channel, publishing your original audio stream into the Channel, and configuring the translation pipeline with your desired language settings.

Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then publishes the translated audio track to the same Channel, allowing you to subscribe to it and play it back in your application instantly.

Agora Demo

Enter you Palabra API Credentials to try Real-time speech translation with Agora.

Prerequisites

You have to create multiple Agora Tokens for processing the audio stream translation:

1st token – for connecting to the Agora channel and publishing your original audio stream
2nd token – for the Palabra Translation Task to retrieve your original audio stream from the Agora channel
3rd token – for the Palabra Translation Task to publish the translated audio stream to the Agora channel
Optional (4th+ tokens) – if you wish to translate your original speech into multiple languages in parallel, you will need one additional token per language, in addition to the 3rd token used for the first target language.

You must have the Agora App ID and at least 3 Token Data sets, created on Agora side:

    interface AgoraTokenData {
        token: string
        channel: string
        uid: number
    }

    const channelTokenData: AgoraTokenData
    const receiverTokenData: AgoraTokenData
    const translatorOneTokenData: AgoraTokenData
    // const translatorTwoTokenData: AgoraTokenData
    // const translatorThreeTokenData: AgoraTokenData

    const agoraAppID = "<YOUR_AGORA_APP_ID>"

Use the "Authenticate Your Users with Tokens" Agora guide to create Token Data sets.

Step 1. Connect to the Agora Channel

Use the Agora SDK to join the Translation Channel with your agoraAppID and сhannelTokenData (see "Prerequisites" section).

JavaScript

npm install agora-rtc-sdk-ng

JavaScript

import AgoraRTC from 'agora-rtc-sdk-ng';

// Create the Agora Client instance
client = AgoraRTC.createClient({ mode: 'rtc', codec: 'vp8' });

// Setup the  `user-published` event listener to play new track from channel
client.on('user-published', async (user, mediaType) => {
    await client?.subscribe(user, mediaType);
    if (mediaType === 'audio') {
        user.audioTrack?.play();
    }
});

// Setup the `user-unpublished` event listener to get rid of HTML players no longer used
client.on('user-unpublished', (user) => {
    const remotePlayerContainer = document.getElementById(user.uid)?;
    remotePlayerContainer?.remove();
});

// Join the Agora channel
await client.join(
    agoraAppID,
    сhannelTokenData.channel,
    сhannelTokenData.token,
    Number(сhannelTokenData.uid)
);

// Publish your local mic audio track to Agora Channel
const localAudioTrack = await AgoraRTC.createMicrophoneAudioTrack();
await client.publish([localAudioTrack]);

As a result, you will be connected to the Agora channel, your microphone audio stream will be published to it, and you will be ready to automatically play the translation audio stream as soon as Palabra publishes it (see next step).

Step 2. Start the Translation

To translate your published original speech from English to Spanish in real time, use your channelTokenData, receiverTokenData and translatorOneTokenData to call the POST https://api.palabra.ai/agora/translations endpoint and create a Translation Task from en to es.

Use your Palabra client ID and client secret credentials in headers to authorize your request.

Request Example

JavaScript

const { data } = await axios.post(
 "https://api.palabra.ai/agora/translations",
 {
    "channel": channelTokenData.channel,
    "remote_uid": сhannelTokenData.uid,
    "local_uid": receiverTokenData.uid,
    "token": receiverTokenData.token,
    "speech_recognition": {
        "source_language": "en", // Translate FROM
        "options": {}
    },
    "translations": [
        {
            "token": translatorTokenOneData.token,
            "local_uid": translatorTokenOneData.uid,
            "target_language": "es", // Translate TO
            "options": {}
        }
    ]
    },
 {
    headers: {
        ClientID: "<PALABRA_CLIENT_ID>", // Visit https://docs.palabra.ai/docs/auth/obtaining_api_keys
        ClientSecret: "<PALABRA_CLIENT_SECRET>"
    }
 }
);

Supported languages

Multiple languages translation

To translate the original audio into multiple languages, you need to add additional objects to the translations array, providing an extra Token and UID for each additional language.

    // ...
    "translations": [
        {
            "token": translatorTokenOneData.token,
            "local_uid": translatorTokenOneData.uid,
            "target_language": "es", // Spanish
            "options": {}
        },
        {
            "token": translatorTokenTwoData.token, // New token
            "local_uid": translatorTokenTwoData.uid, // New unique UID
            "target_language": "fr", // French
            "options": {}
        },
    ]
    // ...

Note: The user-published event (from Step 1) provides access to the user entity, which includes the related UID (sent in Step 2). You can use this UID to distinguish which track corresponds to which language.

Speech Recognition Configuration

By default, the recommended settings are applied to the speech recognition pipeline if you do not provide any additional configuration.

However, you can customize the speech recognition by passing extra settings through the options object:

// ...
"speech_recognition": {
    "source_language": "en",
    "options": { // Manual Configuration
        "segment_confirmation_silence_threshold": 0.7,
        "sentence_splitter": {
            "enabled": true
        },
    }
},
// ...

Check out the list of settings numbered as 2.3.2.X in the Translation Settings Breakdown to see which options can be applied to speech_recognition.

Translation Configuration

By default, the recommended settings are applied to the translation pipeline if you do not provide any additional configuration.

However, you can customize the translation by passing extra settings through the options object:

    // ...
    "translations": [
        {
            "token": translatorTokenOneData.token,
            "local_uid": translatorTokenOneData.uid,
            "target_language": "es",
            "options": { // Manual Configuration
                "speech_generation": {
                    "voice_cloning": true,
                    "voice_timbre_detection": {
                        "enabled": true,
                        "high_timbre_voices": ["default_high"],
                        "low_timbre_voices": ["default_low"]
                    }
                }
            }
        },
    ]
    // ...

Check out the list of settings numbered as 2.3.3.X in the Translation Settings Breakdown to see which options can be applied to translations.

Optional Step 3. Handle Transcriptions (Captions)

If you wish, you can also receive transcriptions (captions) for both the original and translated speech in real time. All you need to do is connect to the Palabra Centrifuge server via WebSockets.

JavaScript

npm install centrifuge

JavaScript

import { Centrifuge } from 'centrifuge';

const taskID = "<TRANSLATION_TASK_ID>"; // get it from response in Step 2
const serverURL = "https://api.palabra.ai/agora/translations/subtitles";
const connectionURL = `${serverURL}/${taskId}`;

// Fetch Translation task's meta data by its taskID
const { data } = await axios.get(connectionURL, {
    headers: {
        ClientID: "<PALABRA_CLIENT_ID>", // Visit https://docs.palabra.ai/docs/auth/obtaining_api_keys
        ClientSecret: "<PALABRA_CLIENT_SECRET>"
    }
 });

// Create the Centrifuge client insance
const centrifuge = new Centrifuge(data.websocket_url, { token: data.connection_token });

// Configure your subscription for the Centrifuge's data
const subscription = centrifuge.newSubscription(taskID, { token: data.subscription_token });

// Set up the `publication` event listener to handle the captions
subscription.on('publication', (ctx) => {
  const { data } = ctx;
  console.log('captions record', data); // Here is your captions message
});

// Init your subscription and connect the server
subscription.subscribe();
centrifuge.connect();

Summary

As soon as you create the Translation Task in Step 2, Palabra will take your published original audio track from Agora channel, translate it into the target language specified in the settings, and publish the translated track to the same channel.

Since you set up the user-published event listener in Step 1, the new translated track will be automatically played in your browser.

If you have configured transcription handling in Step 3, you will receive real-time messages containing partial and validated captions of your original speech, as well as translated captions of the translated speech.

Good to know

Due to browser security restrictions, audio cannot be played until the user has interacted with the page. Therefore, do not start the entire pipeline automatically when the page loads. Instead, wait for the user to perform an action (like pressing a 'Start' button) before activating audio playback and related processes.
Each Translation Task created in Step 2 has a 30-second idle TTL. If the user does not join the Agora channel within 30 seconds of task creation, the task will be automatically deleted.
If the user creates the task and joins the Agora channel but does not publish the original audio track, Palabra will still bill your account.

Need help?

Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].

Introduction​

Agora Demo​

Prerequisites​

Step 1. Connect to the Agora Channel​

Step 2. Start the Translation​

Request Example​

Multiple languages translation​

Speech Recognition Configuration​

Translation Configuration​

Optional Step 3. Handle Transcriptions (Captions)​

Summary​

Good to know​

Need help?​