Input path
VoiceInput records 16 kHz mono audio through sounddevice.InputStream. Silero VAD receives 512-sample frames, about 32 ms each. A lookback deque preserves speech onset.
When VAD emits speech_end, Eyra stops recording if enough frames were captured. Very short clips are treated as false triggers.
Transcription
Transcription prefers Local Whisper’s Unix socket:wh binary from preflight. Runtime code uses state.wh_bin, not bare wh from PATH.
Speech output
SpeechController calls Local Whisper TTS through the resolved wh_bin. interrupt() kills active TTS so the user can cut in.
listen() waits for speech to finish before opening the microphone.
Diagnostics
VoiceDiagnostics checks device selection, non-zero input, VAD behavior, Local Whisper ASR, generated WAV transcription, and barge-in flows.
/voice-diagnose is the normal first check.
Recovery
/voice on re-runs the Local Whisper check and enables whichever side is available. ASR and TTS are tracked separately.