Skip to main content

Translation management WebSocket API

The WebSocket API is used for controlling translation in real-time and receiving text transcriptions.

Prerequisites

Connection

Connect to the Palabra WebSocket Server by control_url endpoint using your access_token value as a token GET parameter:

// Palabra WebSocket endpoint
endpoint = "{control_url}?token={access_token}"

WebSocket configuration messages

Send WebSocket task messages to configure your translation pipeline, specifying the source and target languages, along with any other necessary settings.

Request/Response Structure

All request and response messages have the same structure. They include two fields, message_type and data. The error message has the text "error" in its message_type field.

{
"message_type": string,
"data": dict
}

General use

  1. Send a set_task message to create a processing task. You can edit the task by sending additional set_task messages after the first one.
  2. Send an end_task message to gracefully stop the task.
  3. Send a pause_task to stop audio processing without deleting the task.

Settings description

For detailed settings description, refer to our translation settings breakdown section For optimal settings values, refer to our recommended settings section.

Request message schema

Loading ....

Response message schema

Loading ....

Examples

Request chain example:

  1. Send a set_task message to create a task.

    {
    "message_type": "set_task",
    "data": {
    // Input audio stream settings
    "input_stream": {
    "content_type": "string",
    "source": {
    "type": "string",
    }
    },
    // Translated audio stream settings
    "output_stream": {
    "content_type": "string",
    "target": {
    "type": "string",
    }
    },
    "pipeline": {
    // Preprocessing settings
    "preprocessing": {
    "enable_vad": "bool",
    "vad_threshold": "float",
    "vad_left_padding": "int",
    "vad_right_padding": "int",
    "pre_vad_denoise": "bool",
    "pre_vad_dsp": "bool"
    },
    // ASR settings
    "transcription": {
    "source_language": "string",
    "detectable_languages": ["string"],
    "asr_model": "string",
    "denoise": "string",
    "allow_hotwords_glossaries": "bool",
    "suppress_numeral_tokens": "bool",
    "diarize_speakers": "bool",
    "priority": "string",
    "min_alignment_score": "float",
    "max_alignment_cer": "float",
    "segment_confirmation_silence_threshold": "float",
    "only_confirm_by_silence": "bool",
    "batched_inference": "bool",
    "force_detect_language": "bool",
    "sentence_splitter": {
    "enabled": "bool",
    "splitter_model": "string"
    },
    "verification": {
    "verification_model": "string",
    "allow_verification_glossaries": "bool",
    "auto_transcription_correction": "bool",
    "transcription_correction_style": "null or string"
    }
    },
    // Translation and speech generation settings for one or more target languages
    "translations": [
    {
    "target_language": "string",
    "allowed_source_languages": ["string"],
    "translation_model": "string",
    "allow_translation_glossaries": "bool",
    "style": "null or string",
    "speech_generation": {
    "tts_model": "string",
    "voice_cloning": "bool",
    "voice_id": "null or string",
    "voice_timbre_detection": {
    "enabled": "bool",
    "high_timbre_voices": ["string"],
    "low_timbre_voices": ["string"]
    },
    "denoise_voice_samples": "bool",
    "speech_tempo_auto": "bool",
    "speech_tempo_timings_factor": "float",
    "speech_tempo_adjustment_factor": "float",
    }
    },
    {
    // Settings for an additional language (one more audio track will be published in WebRTC)
    "target_language": "string",
    "allowed_source_languages": ["string"],
    "translation_model": "string",
    "allow_translation_glossaries": "bool",
    "style": null,
    "speech_generation": {
    "tts_model": "string",
    "voice_cloning": "bool",
    "voice_id": "null or string",
    "voice_timbre_detection": {
    "enabled": "bool",
    "high_timbre_voices": ["string"],
    "low_timbre_voices": ["string"]
    },
    "denoise_voice_samples": "bool",
    "speech_tempo_auto": "bool",
    "speech_tempo_timings_factor": "float",
    "speech_tempo_adjustment_factor": "float",
    }
    }
    ],
    // TTS buffer settings
    "translation_queue_configs": {
    "global": { // global setting
    "desired_queue_level_ms": "int",
    "max_queue_level_ms": "int"
    },
    "es": { // language overwrite
    "desired_queue_level_ms": "int",
    "max_queue_level_ms": "int"
    }
    },
    // Allowed WS messages
    "allowed_message_types": [
    "partial_transcription",
    "validated_transcription",
    "translated_transcription"
    ]
    }
    }
    }

    Note: If you want only ASR (Automatic Speech Recognition) without speech generation, you have two options:

    • Set output_stream to null. In this case, you will still receive translations, but there will be no text-to-speech (TTS).
    • Provide an empty list for translations. This will result in neither translations nor TTS being sent.
  2. Send an end_task message to finish the task:

    {
    "message_type": "end_task",
    "data": {
    "force": false // Do not wait for the last segments
    }
    }

Response message examples:

  • A partial_transcription uncompleted segment transcription:
    {
    "message_type": "partial_transcription",
    "data": {
    "transcription": {
    "transcription_id": "19615fa2e341df9c",
    "language": "en",
    "text": "One, two, three, four, five...",
    "segments": [
    {
    "text": "One, two, three, four, five...",
    "start": 0.33999999999999986,
    "end": 1.58,
    "start_timestamp": 1744125438.080009,
    "end_timestamp": 1744125439.040023
    }
    ]
    }
    }
    }
  • A validated_transcription complete segment transcription:
    {
    "message_type": "validated_transcription",
    "data": {
    "transcription": {
    "transcription_id": "19615fa2e341df9c",
    "language": "en",
    "text": "One, two, three, four, five, six, seven.",
    "segments": [
    {
    "text": "One, two, three, four, five, six, seven.",
    "start": 0,
    "end": 2,
    "start_timestamp": 1744125437.760025,
    "end_timestamp": 1744125439.679354
    }
    ]
    }
    }
    }
  • A translated_transcription complete segment translation:
    {
    "message_type": "translated_transcription",
    "data": {
    "transcription": {
    "transcription_id": "19615fa2e341df9c",
    "language": "es",
    "text": "Uno, dos, tres, cuatro, cinco, seis, siete. ",
    "segments": [
    {
    "text": "Uno, dos, tres, cuatro, cinco, seis, siete. ",
    "start": 0,
    "end": 2,
    "start_timestamp": 1744125437.760025,
    "end_timestamp": 1744125439.679354
    }
    ]
    }
    }
    }

Error message examples:

  • A validation error:
    {
    "message_type": "error",
    "data": {
    "code": "VALIDATION_ERROR",
    "desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type'), 'msg': \"value is not a valid enumeration member; permitted: 'audio'\", 'type': 'type_error.enum', 'ctx': {'enum_values': [<StreamContentType.audio: 'audio'>]}}])",
    "param": null
    }
    }