Skip to main content
HyperWhisper doesn’t lock you into one engine. It ships a library of models because no single model wins on everything — each trades off privacy, language coverage, speed, accuracy, and cost differently. This page lists everything on offer and explains why each one is there. There are two kinds of model:
  • Speech-to-text — turns your voice into text (the transcription step).
  • Post-processing — an optional LLM pass that cleans up, punctuates, and formats the transcript afterwards.
You choose both in the app under Model Library, and per-mode in the mode editor. Model Library

Three ways to run a model

Every model — speech or post-processing — falls into one of three buckets:

On-device

Runs entirely on your machine. Audio never leaves your device, works offline, and costs nothing per minute. The strongest privacy guarantee.

HyperWhisper Cloud

Built-in, no API key, no separate account. The most accurate option, billed per minute of actual speech with no markup.

Bring your own key

Plug in your own provider API key and pay that provider directly — useful if you already have credits or want a specific model.
There is no universally “best” model. On-device models are unbeatable for privacy and offline use; cloud models are noticeably more accurate on accents, noise, and technical vocabulary. The library exists so you can make that trade yourself.

Speech-to-text models

On-device

These run locally with no network calls. Once downloaded (where applicable) they work fully offline — your audio is never uploaded anywhere. See Data Privacy for details.
ModelRuns onLanguagesWhy it’s here
Apple SpeechmacOS (built-in)Auto-detectZero download, instant, private. The fastest way to start dictating on a Mac with nothing to install.
NVIDIA ParakeetmacOS · WindowsEnglish (V2) · 25 European (V3)Fastest accurate on-device transcription for English and European languages.
NVIDIA Nemotron 3.5macOS6 Latin · ~40 incl. Chinese, Japanese, Korean, ArabicBest on-device accuracy and the broadest offline language coverage — the only local option that reaches beyond European languages.
WhispermacOS · Windows100 languagesOpenAI’s general-purpose model in many sizes (Tiny → Large). The universal fallback: runs on almost any hardware, including CPU-only and older machines.
Qwen3 ASRmacOSMultilingualAn additional multilingual on-device option for users who want to try Alibaba’s ASR model.

Whisper models

OpenAI’s general-purpose multilingual models. The VRAM values are the recommended GPU memory for full acceleration — with less, the model still runs on CPU (or partial GPU), just slower.
ModelSizeRecommended VRAMLanguagesBest for
Tiny78 MB~1 GBMultilingualLowest-end machines, quick drafts
Tiny (English)78 MB~1 GBEnglish onlySame as Tiny, slightly better English
Base148 MB~1 GBMultilingualLight hardware, basic dictation
Base (English)148 MB~1 GBEnglish onlySame as Base, slightly better English
Small488 MB~2 GBMultilingualBest balance for most users
Small (English)488 MB~2 GBEnglish onlySame as Small, slightly better English
Medium1.5 GB~5 GBMultilingualHigher accuracy, mid-range GPUs
Medium (English)1.5 GB~5 GBEnglish onlySame as Medium, slightly better English
Large v3 Turbo1.5 GB~6 GBMultilingualNear-Large accuracy, much faster
Large v23.1 GB~10 GBMultilingualHighest Whisper accuracy (older)
Large v33.1 GB~10 GBMultilingualHighest Whisper accuracy (latest)
English-only variants (.en) use the same architecture trained only on English data. If you only ever dictate in English, they’re slightly more accurate at the same size — but you lose multilingual support entirely.

NVIDIA Parakeet models

NVIDIA Parakeet models are typically faster than equivalent-size Whisper models and very accurate for the languages they support.
ModelSizeLanguagesBest for
Parakeet V2 (English)474 MBEnglish onlyFastest accurate English transcription
Parakeet V3 (Multilingual)494 MB25 European languagesMultilingual European dictation
Parakeet V3 covers: English, German, Spanish, French, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Czech, Slovak, Hungarian, Romanian, Bulgarian, Croatian, Slovenian, Serbian, Danish, Swedish, Norwegian, Finnish, Estonian, Latvian, Lithuanian.
On Windows, Parakeet runs on both x64 and ARM64, while Whisper is currently x64-only. If you’re on a Snapdragon / ARM Windows device, choose Parakeet.

NVIDIA Nemotron 3.5 models

NVIDIA’s Nemotron 3.5 ASR is the newest on-device option (macOS). It edges out the other local models on accuracy and reaches well beyond European languages — the multilingual variant is the only local model that handles Chinese, Japanese, Korean, and Arabic.
ModelSizeLanguagesBest for
Nemotron 3.5 (Latin)~350 MBEnglish, Spanish, French, Italian, Portuguese, GermanSmaller, faster Latin-script transcription
Nemotron 3.5 (Multilingual)~1.3 GB~40 languages incl. Chinese, Japanese, Korean, ArabicBroadest offline language coverage
Want non-European languages offline? Nemotron 3.5 (Multilingual) is the pick. Choose the Latin variant if you only speak English/Spanish/French/Italian/Portuguese/German and want it smaller and faster.

Apple Speech & Qwen3 ASR

  • Apple Speech is built into macOS — no download, available the moment you launch the app. It’s the quickest private option for everyday Mac dictation. (Requires a recent macOS version.)
  • Qwen3 ASR is an additional multilingual on-device model (macOS) for users who want to try Alibaba’s ASR.

HyperWhisper Cloud

HyperWhisper Cloud is built-in — no API key, no separate account. It routes to best-in-class providers behind four accuracy tiers, and you only pay for actual speech (silence and empty recordings cost 0 credits). Use it when you want the highest accuracy without any setup.
TierPowered byBest for
HighestElevenLabs Scribe v2Accents, noisy audio, technical vocabulary
HighDeepgram Nova-3Strong English accuracy, low latency
MediumGrok STT (xAI)Solid multilingual accuracy at low cost
FastGroq Whisper Large v3Sub-second latency for English & major European languages
See Providers for pricing, cost examples, and per-language guidance.

Bring your own key

If you already hold API credits, want a provider’s free tier (Deepgram $200, AssemblyAI $50), or need a specific model, plug in your own key under API Keys. You pay the provider directly at their published rate. Supported providers for bring-your-own-key transcription: OpenAI · Groq · Deepgram · AssemblyAI · ElevenLabs · Fireworks AI · Mistral · Soniox · Google Gemini
When you bring your own key, opting your audio out of model training is your responsibility — each provider has its own setting. See Data Privacy for a copy-pasteable prompt that finds the current opt-out for any provider.

Post-processing models

Post-processing is an optional second step: after transcription, an LLM cleans up filler words, fixes punctuation and capitalization, and applies any formatting your mode asks for. It’s separate from the speech model — you can mix any speech model with any post-processing model.

Cloud post-processing

Available built-in through HyperWhisper Cloud (no key) or with your own API key. Providers: OpenAI · Anthropic (Claude) · Google (Gemini) · Groq · xAI (Grok) · Cerebras These range from very fast, low-cost cleanup models to high-accuracy reasoning models. The app labels each with a relative speed and accuracy rating so you can pick the trade-off you want.

Local post-processing (Gemma)

Local Gemma 4 models clean up and format transcript text fully offline after download. They’re separate from speech models and are currently available only in the Windows x64 app.
ModelSizeRecommended VRAMBest for
Gemma 4 E2B (Q4_K_M)3.1 GB~4 GBRecommended local cleanup model
Gemma 4 E4B (Q4_K_M)5.0 GB~6 GBHigher quality local cleanup
Gemma 4 26B A4B MoE (UD-Q4_K_M)16.9 GB~18 GBHigh-memory workstations
Gemma 4 31B Dense (Q4_K_M)18.3 GB~20 GBHighest local quality, slowest
On AMD and Intel GPUs, local Gemma post-processing uses CPU fallback in the current Windows build. Transcription can still use GPU acceleration independently.

Using on-device models

Downloading & storage

Open Model Library in the app, click Download on any entry, and watch the circular progress indicator. You can cancel mid-stream with the × button. Downloaded models stay on disk until you remove them.
PlatformStorage location
Windows%LOCALAPPDATA%\HyperWhisper\Models\
macOS~/Library/Application Support/HyperWhisper/Models/
Apple Speech is built into macOS and needs no download.

GPU vs CPU

Local engines use your GPU when available and fall back to CPU automatically if you don’t have a dedicated GPU or there isn’t enough VRAM. The model still runs on CPU — it’s just slower.
EngineBackendGPU supportCPU fallback
Whisper (Windows)WhisperNet / DirectComputeNVIDIA, AMD, Intel (any DirectX 11 GPU)Yes
Whisper (macOS)libwhisper / MetalApple Silicon GPU + Neural EngineYes
Parakeet (Windows)sherpa-onnx / DirectMLNVIDIA, AMD, IntelYes
Parakeet (macOS)sherpa-onnx / CoreMLApple SiliconYes
Nemotron (macOS)FluidAudio / CoreMLApple SiliconYes
Local Gemma post-processing (Windows)LLamaSharp / GGUFNVIDIA CUDA only on x64Yes

Removing models

To free up disk space, click the trash icon next to any downloaded model in Model Library. The file is removed immediately and can be re-downloaded any time.

Which should I pick?

Privacy is non-negotiable / offline

An on-device speech model. Apple Speech for instant Mac dictation, or Parakeet / Nemotron for higher accuracy. Audio never leaves your machine.

I want the best accuracy, no setup

HyperWhisper Cloud — Highest (ElevenLabs Scribe v2). No API key, pay only for speech.

I speak a non-European language, offline

Nemotron 3.5 (Multilingual) — on-device coverage for Chinese, Japanese, Korean, Arabic, and ~40 languages total.

Older laptop / no dedicated GPU

Whisper Tiny or Small — runs comfortably on CPU. For longer audio, switch to HyperWhisper Cloud.

English only, want it fast & local

Parakeet V2 (English) — typically faster than equivalent Whisper with comparable accuracy.

I already have a provider key

Bring your own key — plug it in and pay the provider directly. See API Keys.

Boost accuracy on any model

  • Custom vocabulary — add product names, jargon, and colleagues’ names. The single biggest improvement for technical or professional use. (Support varies by model — Apple Speech and Whisper support it locally; among cloud providers most do, a few don’t.)
  • Low-noise environment — every model degrades with background noise. See Best Practices.
  • Natural pace — speech that’s too fast or too slow both hurt accuracy.

Go deeper

Providers

HyperWhisper Cloud tiers, per-minute pricing, cost examples, and accuracy by language.

API Keys

Set up bring-your-own-key access for any supported provider.