370 ms Near-instant voice-to-text — 100% on-device, forever free

|

Hold ⌥ Space, speak, release. Your words appear formatted and pasted in 370 milliseconds — every AI model running on your Mac. Startling accuracy. Zero cloud. Zero subscription. Zero data collection. Ever.

VoiceFlow REC Hey team, I wanted to follow up on the Q4 roadmap. Can we schedule a sync for Thursday afternoon? I think we need to reprioritize the API migration before the launch deadline. Hold Space to record · release to paste ⚡ 340ms total

Lightning fast

370 ms speech-to-text — speak, release, done

Startlingly accurate

Cloud-grade AI quality, entirely on your Mac

Fully private

Every model runs on-device — nothing leaves your Mac

Forever free

No subscription, no limits, no account ever

No cloud required.
No corners cut.

Other dictation tools send your voice to remote servers. VoiceFlow was built from scratch in Rust and Swift to prove you don't have to sacrifice privacy for quality.

Hold-to-Record

Hold ⌥ Space anywhere on your Mac to record. Release to transcribe and paste instantly. No app switching, no extra steps.

Zero Data Leaves Your Mac

Every AI model runs on your hardware. Your voice is processed locally and never recorded, stored, or transmitted. Not even anonymized telemetry.

370 ms End-to-End

A Rust-powered pipeline transcribes, formats, and pastes in under four-tenths of a second. No network round-trips — just raw local speed.

Smart Formatting

A local LLM cleans up grammar, adds punctuation, and removes filler words. Post-processing normalizes numbers ($50,000), percentages (25%), times (3:30 PM), phone numbers, dates, and abbreviations.

Say It Again to Fix It

Re-say a sentence — or just the word you got wrong — and VoiceFlow replaces your last dictation instead of appending. Correction cues like "scratch that" and a local LLM decide redo vs. new; hold ⌥⇧ Space to force a replace or speak an edit.

App-Aware Context

Detects your frontmost app and adjusts tone automatically. Professional for email, casual for Slack, technical for code editors. Custom prompts per app.

Visual Context (VLM)

A local vision model reads your screen to extract names, project terms, and writing context — so proper nouns are spelled correctly without a cloud lookup.

Learns Your Corrections

Fix a word once and VoiceFlow remembers. It monitors your edits and applies learned corrections to future dictations automatically.

Voice Snippets

Say "my signature" and it expands to your full sign-off. Create unlimited custom trigger phrases that expand into any text you want.

Summarize This

Say "summarize this" and a local LLM reads the current text field and appends a bullet-point summary. A live status pill shows progress through each step.

Cloud-grade accuracy, without the cloud

Multiple AI models run in tandem on your Mac: speech-to-text captures every word, a language model formats it, and a vision model reads your screen for context. The same quality you'd expect from a cloud API.

  • Strips filler words, false starts, and verbal corrections
  • Numbers, currency, times, percentages, and phone numbers formatted automatically
  • Per-persona vocabulary lists bias the LLM toward your domain terms and proper nouns
  • Vision model reads your screen to spell names and terms correctly
  • Learns from your corrections and improves over time
Raw speech
um so the budget is like fifty thousand dollars and we need it done by january fifteenth uh that's about ninety five percent of what we asked for period
✓ Formatted output
The budget is $50,000 and we need it done by January 15. That's about 95% of what we asked for.
Filler words removed Currency formatted Date normalized Percentage converted Punctuation added

Your workflow, your rules

Unlike cloud dictation tools that give you a single mode, VoiceFlow lets you control everything — AI models, visual context, per-app formatting, and more.

  • Toggle visual context (VLM) for screen-aware dictation
  • Edit per-persona vocabulary lists with a chip-style tag picker
  • Swap AI models to balance speed vs. quality
  • Customize formatting prompts per app
  • Create voice snippets, manage corrections, and tune spacing
Active Persona Software Engineer
Vocabulary Bias 42 terms
Formatting Level Moderate
Spacing Mode Context-Aware
Voice Commands
Visual Context (VLM)
Correction Learning
Speech-to-Text Parakeet 0.6B
Formatting LLM Bonsai 8B

Near-instant by design

VoiceFlow runs a 16K-token context window with the persona, vocabulary, on-screen text, and formatting rules permanently warm in the prompt prefix. Every dictation re-uses that cache — only your newly-spoken words get evaluated. The chart below shows real prompt-eval latency from this build: even long utterances stay well under a second.

~370ms
Typical end-to-end
(speaking → typed text)
96ms
AI formatting
(with memorized settings)
147ms
Voice recognition
(entirely on-device)
~12K
Words of context held in memory
(persona, vocabulary, rules)
Processing time grows slowly with dictation length
Real measurements from 31 consecutive dictations on Apple Silicon (v2.0.1). VoiceFlow keeps your persona, vocabulary, and formatting rules memorized in advance — so the only time spent processing is the words you actually said.
0 100 200 300 400 500 milliseconds to process → 0 50 100 150 200 words spoken in your dictation → ~50 ms to start · ~1.6 ms per word A 190-word dictation finishes in ~350 ms. Memorized settings cost no extra time.

The first dictation after launch takes ~5 seconds while VoiceFlow memorizes your settings once. Every dictation after that runs in well under a second, from the moment you stop speaking to formatted text in your app. The AI generates roughly 100 words per second; voice recognition completes in about 150 milliseconds. Everything happens entirely on your Mac.

Speak it. Save days.

Words leave your mouth at ~140 wpm. They leave your fingers at ~40 wpm. VoiceFlow adds 370 ms of processing per dictation — that's it. The gap compounds into entire weeks of your life every year, formatted and pasted with startling accuracy.

3.5×
Faster than typing
(140 vs 40 wpm)
0.37s
From voice to typed text
(formatted, ready to use)
4.5 days
Saved every year
at 1,000 words/day
22 days
Saved every year
at 5,000 words/day
Days reclaimed per year, by daily dictation volume
Calculated as (words/day ÷ 40 wpm typing) minus (words/day ÷ 140 wpm speaking), annualized over 365 days. VoiceFlow's 370 ms per-utterance processing is fixed and doesn't materially affect the math.
0 10 20 30 40 50 days saved per year → 500 wpd 2.3 days 1,000 wpd 4.5 days — nearly a working week 2,500 wpd 11.3 days 5,000 wpd 22.6 days — three working weeks 10,000 wpd 45.3 days — an extra month and a half

1,000 words a day is one focused email or a page of notes — dictate that volume and VoiceFlow gives you back nearly a working week every year. Heavy writers reclaim several. Every word is transcribed, punctuated, formatted, and pasted on-device with no cloud round-trip in the loop.

Every dictation, every day,
at a glance

VoiceFlow v2.0 ships a complete settings rebuild on a new "Liquid Glass" design system, paired with the Parakeet + Bonsai model defaults. The new Insights dashboard tracks your dictation pace, the apps you dictate into most, and your daily activity streak. Personas carry editable vocabulary lists that bias the LLM toward your domain terms — everything is computed and stored on-device.

VoiceFlow Insights dashboard showing words-per-minute, total words dictated, app usage breakdown, and a multi-week activity streak

Insights, personas, and full control

A smile-shaped speed gauge labels your dictation pace as Steady, Fast, or Top. Per-app bars show where you dictate most. A streak heatmap tracks every active day. Built-in personas come seeded with domain vocabulary — Software Engineer ships with kubectl, Postgres, Terraform, gRPC and more, and you can add your own with one click. Every number is computed locally; nothing syncs anywhere.

The dictation tool that
doesn't compromise

See how VoiceFlow stacks up against popular cloud-based alternatives.

VoiceFlow Wispr Flow macOS Dictation
Price Free forever $12–15/mo Free (built-in)
Privacy 100% on-device Cloud-processed Partial — sends to Apple
Internet required Never Always For enhanced mode
Smart formatting Local LLM Cloud AI No
App-aware context Email, Slack, code Email, Slack, code No
Voice punctuation Full command set Yes Basic
Voice snippets Yes Yes No
Visual context (VLM) Free — local VLM Pro plan only No
Correction learning Learns from edits Auto-learns No
Number formatting Currency, %, time, dates Yes No
Summarize text Voice-triggered No No
Custom AI models Swap anytime Locked to vendor No
Open-source Fully auditable Closed-source Closed-source
Data collection None Voice sent to cloud Voice sent to Apple

Competitor information based on publicly available documentation as of February 2026.

Start dictating
within minutes

Download VoiceFlow, run the setup wizard, pick your AI models, and you're ready. No account. No API key. No subscription. Free forever.

Download for macOS

Requires macOS 15 Sequoia or later · Apple Silicon (M1 or newer) · 16 GB RAM recommended

Already have VoiceFlow? This release adds automatic updates — re-download once, and future versions install themselves in the background.