Skip to main content

Translation Settings Breakdown

A comprehensive reference for all configuration options available in the set_task message.

What are translation settings?

Translation settings are configuration parameters that you send to Palabra's streaming API to control how your audio is processed, transcribed, translated, and synthesized. These settings are included in the data field of a set_task message that you send via WebRTC data channel or WebSocket connection.

When to use these settings

You'll configure these settings in several scenarios:

  • Starting a new translation session - Send your initial set_task message to begin real-time translation
  • Updating an active session - Modify translation settings during an ongoing translation without interrupting the stream
  • Fine-tuning performance - Adjust parameters to optimize quality, latency, or specific use cases

Task object structure


1. message_type

  • Type: string
  • Description: Identifies the type of command being sent:
    • "set_task" - Create a new task or update an existing one.
    • "get_task" - Return the current task.
    • "end_task" - End the task (you will be disconnected automatically).
    • "pause_task" - Do not process new audio data, but keep the task alive. Use set_task to resume.
    • "tts_task"" - Generates text-to-speech from a text.
    • "input_audio_data"" - Websockets transport only. Contains audio data chunk.

2. data

  • Type: object
  • Description: Contains the main configuration details for this task.

Within data, there are five major sections:

  1. input_stream
  2. output_stream
  3. pipeline
  4. translation_queue_configs
  5. allowed_message_types

2.1. input_stream

  • Type: object
  • Description: Configures the input stream settings.

2.1.1. content_type

  • Type: string
  • Description: Describes the content type in the input stream. Currently, only audio is supported.

2.1.2. source

  • Type: object
  • Description: Indicates how and from where the input audio stream is sourced.
2.1.2.1. type
  • Type: string
  • Description: Specifies input audio transport. webrtc and ws are supported. Must match target transport.
2.1.2.2. format
  • Type: string
  • Description: Required for websockets transport only. Input audio format, supported: opus, pcm_s16le and wav
2.1.2.3. sample_rate
  • Type: string
  • Description: Required for websockets transport only. Input audio sample rate. Allowed range is from 16khz to 48khz.
2.1.2.4. channels
  • Type: string
  • Description: Required for websockets transport only. Input channels. One or two channels is supported.

2.2. output_stream

  • Type: object
  • Description: Defines the output settings.

2.2.1. content_type

  • Type: string
  • Description: Describes the content type in the output. Currently, only audio is supported.

2.2.2. target

  • Type: object
  • Description: Indicates the destination of the output.
2.2.2.1. type
  • Type: string
  • Description: Specifies output audio transport. webrtc and ws are supported. Must match source transport.
2.1.2.2. format
  • Type: string
  • Description: Required for websockets transport only. Output audio format, supported: pcm_s16le, zlib_pcm_s16le (zlib-compressed pcm)

2.3. pipeline

  • Type: object
  • Description: Holds the configuration for all processing steps, including preprocessing, transcription, and translation.

2.3.2. transcription

  • Type: object
  • Description: Settings for Automatic Speech Recognition (ASR).
2.3.2.1. source_language
  • Type: string
  • Description: Language code representing the input audio language (e.g., "en", "es", "fr"). Can be set to "auto" for automatic language detection and allow to set detectable_languages.
2.3.2.2. detectable_languages
  • Type: array of strings
  • Description: Only languages from the list will be detected if source_language is set to auto.
2.3.2.3. segment_confirmation_silence_threshold
  • Type: float
  • Description: The time in seconds of silence needed to confirm the end of a segment. The recommended value is between 0.5s and 0.9s, depending on the average speech tempo and pauses. Increase this value if a speaker frequently pauses between words. If it is set too low, it can lead to unwanted sentence splitting.
2.3.2.4. sentence_splitter
  • Type: object
  • Description: Controls how longer sentences are split into smaller parts (sometimes with slight rephrasing, but without losing the meaning) to speed up processing.
2.3.2.4.1. enabled
  • Type: bool
  • Description: Whether to enable automatic sentence splitting.
2.3.2.5. verification
  • Type: object
  • Description: Controls transcription verification settings.
2.3.2.5.1. auto_transcription_correction
  • Type: bool
  • Description: WIP. Allows automatic transcription verification using LLM model.
2.3.2.5.2. transcription_correction_style
  • Type: string or null
  • Description: Style of LLM correction.

2.3.3. translations

  • Type: array of objects
  • Description: An array of translation targets. Each object defines translation settings for a specific target language.

Note: translations is an array of objects, each representing a required language. Below is an example of an object for a single language.

2.3.3.1.1. target_language
  • Type: string
  • Description: The language into which the text should be translated (e.g., "en-us", "es", "fr").
2.3.3.1.2. translate_partial_transcriptions
  • Type: bool
  • Description: Allows translating partial transcriptions.
2.3.3.1.3. speech_generation
  • Type: object
  • Description: Configures text-to-speech (TTS) settings.
2.3.3.1.3.1. voice_cloning
  • Type: bool
  • Description: Experimental. Enables voice cloning to mimic the original speaker's voice. It usually takes 10-20 seconds of speech before the voice changes are applied.
2.3.3.1.3.2. voice_id
  • Type: null or string
  • Description: A particular voice ID can be specified. Voice cloning must be disabled. If set to "default_low" or "default_high", the best default voice for the selected language will be used automatically. You can create or manage voices in the Palabra web portal.
2.3.3.1.3.3. voice_timbre_detection
  • Type: object
  • Description: Allows automatically detecting and assigning voice IDs to different voice timbres.
2.3.3.1.3.3.1. enabled
  • Type: bool
  • Description: Enables voice timbre detection. Voice cloning must be disabled.
2.3.3.1.3.3.2. high_timbre_voices
  • Type: array of strings
  • Description: Specifies which voice ID to use for high timbre voices. (Currently, only one ID is supported, or use "default_high".)
2.3.3.1.3.3.3. low_timbre_voices
  • Type: array of strings
  • Description: Specifies which voice ID to use for low timbre voices. (Currently, only one ID is supported, or use "default_low".)

2.4. translation_queue_configs

  • Type: object
  • Description: Configures the behavior of unspoken TTS buffers.

2.4.1. global

  • Type: object
  • Description: Global/default settings for TTS queue behavior. You can add language-specific overrides by using the language code as a key (for example, "es" for Spanish).
2.4.1.1. desired_queue_level_ms
  • Type: number
  • Description: Desired average TTS buffer size in milliseconds. If speech_tempo_auto is enabled, it will try to keep the buffer at this level. A recommended value is between 5000 and 8000 milliseconds (5-8 seconds).
2.4.1.2. max_queue_level_ms
  • Type: number
  • Description: The maximum TTS queue size in milliseconds. If the queue grows beyond this limit, it will be reduced to desired_queue_level_ms by dropping older queued audio. It should be at least two or three times larger than desired_queue_level_ms.
2.4.1.3. auto_tempo
  • Type: bool
  • Description: Auto correct speech tempo based on the queue state. It is recommended to keep it on.

2.5. allowed_message_types

  • Type: array of strings

  • Description: Specifies the types of messages you will receive back via WebSocket. The same messages are also sent in the WebRTC data channel.

    • "partial_transcription" - Emitted for partial transcription segments as they are recognized.
    • "partial_translated_transcription" - Emitted for partial translated transcriptions if translate_partial_transcriptions is enabled.
    • "validated_transcription" - Emitted when a transcription segment is fully confirmed.
    • "translated_transcription" - Emitted when a transcription segment has been translated.