Voice is a first-class input path. Eyra records locally, uses Silero VAD to detect speech boundaries, transcribes through Local Whisper, routes the resulting text through the same runtime as typed input, and speaks a short response back through Local Whisper.

Voice stack

microphone
  → sounddevice input stream
  → 32 ms Silero VAD frames
  → local WAV buffer
  → Local Whisper socket or wh CLI
  → Eyra runtime
  → Local Whisper TTS
Voice input and speech output are tracked separately. If one side fails, Eyra can keep the other side available.

Commands

/voice on
/voice off
/mute
/unmute
/voice-diagnose
/voice-test
/handsfree on
/handsfree off
/voice on rechecks Local Whisper at runtime, so a session can recover after Local Whisper starts without restarting Eyra.

Spoken control

These phrases are handled locally:
PhraseBehavior
stopInterrupt speech output
show statusShow runtime status
what are you doingShow current task state
what changedShow recent operation ledger entries
approve thatApprove one pending action when unambiguous
reject thatReject one pending action when unambiguous
choose number twoSelect a numbered local option
read the optionsRepeat numbered options
start dictationStart local dictation
end dictationEnd and save dictation
cancel dictationDiscard dictation
If more than one approval is pending, Eyra reads the ids and asks you to choose.

Dictation

Start a transient dictation buffer:
Start dictation.
Save dictation to a sandboxed file:
Start dictation to a file named note.txt in my Documents.
End or cancel:
End dictation.
Cancel dictation.
Use Literal ... when you need exact filenames, codes, or punctuation-sensitive text.

Interruption

Eyra can stop TTS immediately through SpeechController.interrupt(). /voice-test runs the manual voice interruption diagnostic. Physical acoustic barge-in is hardware-dependent. Treat code-path tests, synthetic loopback tests, and human microphone challenge tests as separate evidence.

Diagnostics

Run:
/voice-diagnose
The report checks:
  • Input device selection.
  • All-zero microphone audio.
  • VAD speech detection.
  • Local Whisper transcription.
  • Generated WAV transcription.
  • Local socket path.
  • Barge-in behavior when requested.

Configuration

LIVE_LISTENING_ENABLED=true
LIVE_SPEECH_ENABLED=true
SPEECH_COOLDOWN_MS=3000
VOICE_INPUT_DEVICE=
VOICE_SAMPLE_RATE=16000
VOICE_DEBUG_RECORD_SECONDS=3
VOICE_DIAGNOSTIC_SAVE_AUDIO=false
VOICE_SILENCE_MS=1500
VOICE_VAD_THRESHOLD=0.15
Raise VOICE_VAD_THRESHOLD for stricter speech detection. Lower it when the microphone is quiet or speech is not detected.