Local ASR model
Granite Speech 4.1 2B
Compact Apache 2.0 speech-language model for multilingual ASR and bidirectional speech translation. Adds punctuation, truecasing, keyword biasing and Japanese ASR improvements.
GPU recommended
speech-to-text transcription
6 languages
Apache 2.0
Quality
9.2/10
Speed
8/10
Model size
4 GB
Voices
N/A (ASR/AST: outputs text or translation)
Can Granite Speech 4.1 2B run locally?
Granite Speech 4.1 2B can run locally for offline speech-to-text. Start with pip install transformers torchaudio soundfile.
Apache 2.0 license. Still verify upstream usage notes before shipping.
pip install transformers torchaudio soundfile
Upstream source
streamingrealtimemultilingual
Audio profile
Best fit
Granite Speech 4.1 2B is best for offline transcription, speech indexing and local voice pipelines.
Hardware: gpuapple
Model details
Type
Local ASR model
Family
granite-speech
Latency
low
Formats
pytorchsafetensors
Languages
en, fr, de, es, pt, ja
Context
2B speech-language model for ASR, AST, punctuation, truecasing and keyword-biased recognition
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install transformers torchaudio soundfile
Good for
- speech-to-text transcription
- GPU recommended local workflows
- streaming, realtime, multilingual
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Kyutai
Kyutai STT 2.6B
Local ASR model · Q 9.4 · Speed 9.5
Alibaba Cloud (Qwen Team)
Qwen3-ASR
Local ASR model · Q 9.5 · Speed 9
OpenAI
Whisper v3 Turbo
Local ASR model · Q 9.1 · Speed 9.5
NVIDIA
Canary 1B v2
Local ASR model · Q 9.3 · Speed 9
Microsoft Research
VibeVoice ASR
Local ASR model · Q 9.3 · Speed 7.5
Cohere
Cohere Transcribe 03-2026
Local ASR model · Q 9 · Speed 8
NVIDIA
Parakeet TDT 0.6B v2
Local ASR model · Q 9.4 · Speed 10
hexgrad
Kokoro TTS
Local TTS model · Q 9.2 · Speed 9.8