Transcription engines are selected by transcription.engine in ~/.whisper/config.toml, the macOS Settings app, the menu bar, or wh engine <name>. Engines are registered in src/whisper_voice/engines/__init__.py. Non-selected engines must stay lazy and must not load weights at startup.

Parakeet-TDT v3

Parakeet-TDT v3 is the default. It runs in-process through MLX, uses the mlx-community/parakeet-tdt-0.6b-v3 checkpoint, supports English plus 24 European languages, and handles long audio with overlapping chunks.
SettingDefaultNotes
modelmlx-community/parakeet-tdt-0.6b-v3Downloaded by setup for fresh installs.
timeout0No limit.
chunk_duration120.0Seconds per chunk. 0 disables chunking.
overlap_duration15.0Overlap between chunks.
decodinggreedyUse beam for a small quality bump at higher cost.
beam_size5Beam-only.
local_attentionfalseReduces peak RAM for unchunked long audio.

Qwen3-ASR

Qwen3-ASR runs in-process through MLX. It uses model-side language detection and supports long audio through a large internal chunk duration.
SettingDefaultNotes
modelmlx-community/Qwen3-ASR-1.7B-bf16Downloaded when selected.
timeout0No limit.
temperature0.0Deterministic decode.
repetition_penalty1.2Suppresses repetition loops.
chunk_duration1200.0Max chunk length in seconds.
max_tokens0Auto-scale from duration.

WhisperKit

WhisperKit uses a local server at localhost:50060. Install it separately:
brew install whisperkit-cli
wh engine whisperkit
SettingDefaultNotes
modelwhisper-large-v3-v20240930Accuracy-focused default.
languageautoAuto-detect unless set.
urlhttp://localhost:50060/v1/audio/transcriptionsLocal transcription endpoint.
check_urlhttp://localhost:50060/Health check.

Adding an engine

  1. Implement TranscriptionEngine in src/whisper_voice/engines/.
  2. Add an ENGINE_REGISTRY entry in src/whisper_voice/engines/__init__.py.
  3. Add config schema/defaults if needed.
  4. Verify menu bar, CLI, Settings, startup, and lazy loading.