Skip to main content

Quick Start (WebSockets)

Best for backend-side apps

The following steps explain how to use the WebSocket-based architecture, which is recommended for server-side applications and backend integrations. If you are looking for client-side solutions, please refer to our WebRTC Quick Start Guide.

Palabra's API solution enables real-time speech translation through a WebSocket-based architecture.

The process involves creating a secure session, establishing a WebSocket connection to Palabra's streaming API, sending your audio data through the WebSocket connection, and configuring the translation pipeline with your desired language settings.

Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then streams the translated audio back through the same WebSocket connection, allowing you to receive and process it in your application instantly.

Step 1. Get API Credentials

Visit Palabra API keys section to obtain your Client ID and Client Secret.

Step 2. Create a Session

Use your credentials to call the POST /session-storage/session endpoint. You'll receive the ws_url required for establishing the WebSocket connection.

Request Example

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import httpx

async def create_session(client_id: str, client_secret: str) -> dict:
url = "https://api.palabra.ai/session-storage/session"
headers = {"ClientId": client_id, "ClientSecret": client_secret}
payload = {"data": {"subscriber_count": 0, "publisher_can_subscribe": True}}

async with httpx.AsyncClient() as client:
response = await client.post(url, json=payload, headers=headers)
response.raise_for_status()
return response.json()

Response Example

{
"publisher": "eyJhbGciOiJIU...Gxr2gjWSA4",
"subscriber": [],
"webrtc_room_name": "50ff0fa2",
"webrtc_url": "https://streaming-0.palabra.ai/livekit/",
"ws_url": "wss://streaming-0.palabra.ai/streaming-api/v1/speech-to-speech/stream",
"id": "7f99b553-4697...7d450728"
}

ws_url - WebSocket endpoint to connect to for streaming audio data.

Step 3. Connect to the WebSocket API

Establish a WebSocket connection to the ws_url you received in Step 2. You'll need to pass your publisher token as a query parameter.

Example

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import websockets
import asyncio

async def connect_websocket(ws_url: str, publisher_token: str):
# Connect to WebSocket with publisher token
full_url = f"{ws_url}?token={publisher_token}"
websocket = await websockets.connect(full_url, ping_interval=10, ping_timeout=30)
print("🔌 Connected to WebSocket")
return websocket

Step 4. Send Audio Data

Capture audio from your microphone and send it through the WebSocket connection. Audio must be sent as base64-encoded data in JSON messages.

Example

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import asyncio
import base64
import json
import queue
import threading
import numpy as np
import sounddevice as sd

async def stream_microphone(websocket):
sample_rate = 24000 # Palabra expects 24kHz audio
chunk_duration = 0.32 # 320ms chunks recommended
chunk_samples = int(sample_rate * chunk_duration)

audio_queue = queue.Queue(maxsize=100)
stop_event = threading.Event()

def input_callback(indata, frames, time_info, status):
try:
audio_queue.put_nowait(np.frombuffer(indata, dtype=np.int16).copy())
except queue.Full:
pass

def recording_thread():
with sd.RawInputStream(
samplerate=sample_rate,
channels=1,
dtype='int16',
callback=input_callback,
blocksize=int(sample_rate * 0.02) # 20ms callback
):
print("🎤 Microphone started")
while not stop_event.is_set():
time.sleep(0.01)

threading.Thread(target=recording_thread, daemon=True).start()

# Send audio chunks
buffer = np.array([], dtype=np.int16)
while True:
try:
audio_data = audio_queue.get(timeout=0.1)
buffer = np.concatenate([buffer, audio_data])

while len(buffer) >= chunk_samples:
chunk = buffer[:chunk_samples]
buffer = buffer[chunk_samples:]

# Send via WebSocket as base64-encoded JSON
message = {
"message_type": "input_audio_data",
"data": {
"data": base64.b64encode(chunk.tobytes()).decode("utf-8")
}
}
await websocket.send(json.dumps(message))

# Important: pace audio to real-time rate
await asyncio.sleep(chunk_duration)

except queue.Empty:
await asyncio.sleep(0.001)

Step 5. Configure Translation Settings

Send a JSON message through the WebSocket to configure your translation settings. See management documentation for full settings reference.

Example

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import json

async def configure_translation(websocket, source_lang: str, target_langs: list):
settings = {
"message_type": "set_task",
"data": {
"input_stream": {
"content_type": "audio",
"source": {
"type": "ws",
"format": "pcm_s16le",
"sample_rate": 24000,
"channels": 1
}
},
"output_stream": {
"content_type": "audio",
"target": {
"type": "ws",
"format": "pcm_s16le"
}
},
"pipeline": {
"preprocessing": {},
"transcription": {
"source_language": source_lang
},
"translations": [
{
"target_language": lang,
"speech_generation": {}
} for lang in target_langs
]
}
}
}

await websocket.send(json.dumps(settings))
print(f"⚙️ Translation configured: {source_lang}{target_langs}")

Step 6. Receive and Play Translated Audio

Listen for messages from the WebSocket. Palabra sends different message types including transcriptions, translations, and audio data.

Example

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import json
import base64
import sounddevice as sd
import numpy as np
import queue

async def receive_and_play(websocket):
# Audio playback setup
sample_rate = 24000
audio_queue = queue.Queue(maxsize=100)

def audio_callback(outdata, frames, time_info, status):
nonlocal buffer
# Fill buffer if needed
while len(buffer) < frames:
try:
buffer = np.concatenate([buffer, audio_queue.get_nowait()])
except queue.Empty:
break

# Provide audio frames
if len(buffer) >= frames:
outdata[:] = buffer[:frames].reshape(-1, 1)
buffer = buffer[frames:]
else:
outdata.fill(0)

buffer = np.array([], dtype=np.int16)
output_stream = sd.OutputStream(
samplerate=sample_rate,
channels=1,
dtype='int16',
callback=audio_callback,
blocksize=int(sample_rate * 0.02)
)
output_stream.start()
print("🔊 Audio playback started")

# Receive messages
async for message in websocket:
data = json.loads(message)

# Parse nested JSON if needed
if isinstance(data.get("data"), str):
data["data"] = json.loads(data["data"])

msg_type = data.get("message_type")

if msg_type == "current_task":
print("📝 Task confirmed")
elif msg_type == "output_audio_data":
# Decode base64 audio
audio_bytes = base64.b64decode(data["data"]["data"])
audio_array = np.frombuffer(audio_bytes, dtype=np.int16)
try:
audio_queue.put_nowait(audio_array)
except queue.Full:
pass
elif msg_type == "partial_transcription":
text = data["data"]["transcription"]["text"]
lang = data["data"]["transcription"]["language"]
print(f"\r\033[K💬 [{lang}] {text}", end="", flush=True)
elif msg_type == "final_transcription":
text = data["data"]["transcription"]["text"]
lang = data["data"]["transcription"]["language"]
print(f"\r\033[K✅ [{lang}] {text}")

Complete Example

Here's a minimal working example. For the full implementation, see nanopalabra_ws.

import asyncio
import os
import signal

async def main():
# Graceful shutdown
signal.signal(signal.SIGINT, lambda s, f: os._exit(0))
print("🚀 Palabra WebSocket Client")

# Step 1: Your API credentials
client_id = os.getenv("PALABRA_CLIENT_ID")
client_secret = os.getenv("PALABRA_CLIENT_SECRET")

# Step 2: Create session
session = await create_session(client_id, client_secret)
ws_url = session["data"]["ws_url"]
publisher_token = session["data"]["publisher"]

# Step 3: Connect to WebSocket
websocket = await connect_websocket(ws_url, publisher_token)

# Step 4: Configure translation
await configure_translation(websocket, "en", ["es"])

# Wait for settings to process
await asyncio.sleep(3)

# Step 5 & 6: Create tasks for streaming and receiving
receive_task = asyncio.create_task(receive_and_play(websocket))
stream_task = asyncio.create_task(stream_microphone(websocket))

print("\n🎧 Listening... Press Ctrl+C to stop\n")

try:
# Run until interrupted
await asyncio.gather(receive_task, stream_task)
except KeyboardInterrupt:
print("\n🛑 Shutdown complete")

if __name__ == "__main__":
asyncio.run(main())

Summary

Once you establish the WebSocket connection and send your translation settings, Palabra will process your audio stream in real-time. The service transcribes your speech, translates it to the specified target languages, and streams back both the text translations and synthesized audio through the same WebSocket connection, enabling seamless real-time communication.

Need help?

Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].