Features
Learn about configuration parameters for the Voice SDKBasic parameters
language (str, default: "en")
Language code for transcription (e.g., "en", "es", "fr").
See supported languages.
operating_point (OperatingPoint, default: ENHANCED)
Balance accuracy vs latency.
Options: STANDARD or ENHANCED.
domain (str, default: None)
Domain-specific model (e.g., "finance", "medical").
See supported languages and domains.
output_locale (str, default: None)
Output locale for formatting (e.g., "en-GB", "en-US").
See supported languages and locales.
enable_diarization (bool, default: False)
Enable speaker diarization to identify and label different speakers.
Turn detection
end_of_utterance_mode (EndOfUtteranceMode, default: FIXED)
Controls how turn endings are detected:
FIXED: Uses fixed silence threshold.
Fast but may split slow speech.ADAPTIVE: Adjusts delay based on speech rate, pauses, and disfluencies.
Best for natural conversation.SMART_TURN: Uses ML model to detect acoustic turn-taking cues.
Requires [smart] extras.EXTERNAL: Manual control via client.finalize().
For custom turn logic.
end_of_utterance_silence_trigger (float, default: 0.2)
Silence duration in seconds to trigger turn end.
end_of_utterance_max_delay (float, default: 10.0)
Maximum delay before forcing turn end.
max_delay (float, default: 0.7)
Maximum transcription delay for word emission.
Speaker configuration
speaker_sensitivity (float, default: 0.5)
Diarization sensitivity between 0.0 and 1.0.
Higher values detect more speakers.
max_speakers (int, default: None)
Limit maximum number of speakers to detect.
prefer_current_speaker (bool, default: False)
Give extra weight to current speaker for word grouping.
speaker_config (SpeakerFocusConfig, default: SpeakerFocusConfig())
Configure speaker focus/ignore rules.
from speechmatics.voice import SpeakerFocusConfig, SpeakerFocusMode
# Focus only on specific speakers
config = VoiceAgentConfig(
enable_diarization=True,
speaker_config=SpeakerFocusConfig(
focus_speakers=["S1", "S2"],
focus_mode=SpeakerFocusMode.RETAIN
)
)
# Ignore specific speakers
config = VoiceAgentConfig(
enable_diarization=True,
speaker_config=SpeakerFocusConfig(
ignore_speakers=["S3"],
focus_mode=SpeakerFocusMode.IGNORE
)
)
known_speakers (list[SpeakerIdentifier], default: [])
Pre-enrolled speaker identifiers for speaker identification.
from speechmatics.voice import SpeakerIdentifier
config = VoiceAgentConfig(
enable_diarization=True,
known_speakers=[
SpeakerIdentifier(label="Alice", speaker_identifiers=["XX...XX"]),
SpeakerIdentifier(label="Bob", speaker_identifiers=["YY...YY"])
]
)
Language and vocabulary
additional_vocab (list[AdditionalVocabEntry], default: [])
Custom vocabulary for domain-specific terms.
from speechmatics.voice import AdditionalVocabEntry
config = VoiceAgentConfig(
language="en",
additional_vocab=[
AdditionalVocabEntry(
content="Speechmatics",
sounds_like=["speech matters", "speech matics"]
),
AdditionalVocabEntry(content="API"),
]
)
punctuation_overrides (dict, default: None)
Custom punctuation rules.
Audio parameters
sample_rate (int, default: 16000)
Audio sample rate in Hz.
audio_encoding (AudioEncoding, default: PCM_S16LE)
Audio encoding format.
Advanced parameters
transcription_update_preset (TranscriptionUpdatePreset, default: COMPLETE)
Controls when to emit updates: COMPLETE, COMPLETE_PLUS_TIMING, WORDS, WORDS_PLUS_TIMING, or TIMING.
speech_segment_config (SpeechSegmentConfig, default: SpeechSegmentConfig())
Fine-tune segment generation and post-processing.
smart_turn_config (SmartTurnConfig, default: None)
Configure SMART_TURN behavior (buffer length, threshold).
include_results (bool, default: False)
Include word-level timing data in segments.
include_partials (bool, default: True)
Emit partial segments. Set to False for final-only output.
Configuration with overlays.
Use presets as a starting point and customize with overlays:
from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig
# Use preset with custom overrides
config = VoiceAgentConfigPreset.SCRIBE(
VoiceAgentConfig(
language="es",
max_delay=0.8
)
)
Available presets
presets = VoiceAgentConfigPreset.list_presets()
# Output: ['low_latency', 'conversation_adaptive', 'conversation_smart_turn', 'scribe', 'captions']
Configuration serialization
Export and import configurations as JSON:
from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig
# Export preset to JSON
config_json = VoiceAgentConfigPreset.SCRIBE().to_json()
# Load from JSON
config = VoiceAgentConfig.from_json(config_json)
# Or create from JSON string
config = VoiceAgentConfig.from_json('{"language": "en", "enable_diarization": true}')
For more information, see the Voice SDK on github. `