Speech-to-Speech Real-Time Translation Streaming
Introduction
The Speech-to-Speech Streaming API enables real-time translation of speech from one language to another, or even multiple languages simultaneously. This allows you to deliver translations of live audio content to users with negligible latency, making it possible to stream live events in dozens of languages without the need for human interpreters.
Core concepts
WebRTC (LiveKit)
Web Real-Time Communication (WebRTC) facilitates real-time communication directly in web browsers without additional plugins or apps. For our translation API, WebRTC, extended by LiveKit, manages real-time audio streaming and distribution within a scalable, multi-user conferencing environment.
WebSocket API
The WebSocket API enables two-way interactive communication sessions between the client’s browser and the server. It allows clients to manage real-time settings and dynamically control the translation settings and audio processing pipeline during the session, ensuring that users can customize their translation experience as needed.
Streaming Session Initialization
Before clients can translate in real-time, they must initiate a streaming session by calling the POST /session-storage/sessions
endpoint. This action generates a JWT token for authentication, establishes a room for the session, and provides the addresses for both WebSocket API and WebRTC (LiveKit) connections.