Local ASR model
Qwen3-ASR
Open-source ASR family with 0.6B and 1.7B models. Supports language identification and speech recognition for 52 languages and dialects, streaming/offline inference and long audio transcription.
GPU recommended
speech-to-text transcription
52 languages
Apache 2.0
Quality
9.5/10
Speed
9/10
Model size
3.4 GB
Voices
N/A (ASR: outputs text)
Can Qwen3-ASR run locally?
Qwen3-ASR can run locally for offline speech-to-text. Start with pip install -U qwen-asr.
Apache 2.0 license. Still verify upstream usage notes before shipping.
pip install -U qwen-asr
Upstream source
streamingrealtimemultilinguallow-latency
Audio profile
Best fit
Qwen3-ASR is best for offline transcription, speech indexing and local voice pipelines.
Hardware: gpuapple
Model details
Type
Local ASR model
Family
qwen
Latency
low
Formats
pytorchsafetensors
Languages
zh, en, yue, ar, de, fr, es, pt, id, it, ko, ru, th, vi, ja, tr, hi, ms, nl, sv, da, fi, pl, cs, fil, fa, el, hu, mk, ro
Context
0.6B / 1.7B ASR models, 52 languages and dialects, streaming + offline inference
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install -U qwen-asr
Good for
- speech-to-text transcription
- GPU recommended local workflows
- streaming, realtime, multilingual
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Alibaba Cloud (Qwen Team)
Qwen3 TTS
Local TTS model · Q 9.5 · Speed 8.5
Kyutai
Kyutai STT 2.6B
Local ASR model · Q 9.4 · Speed 9.5
OpenAI
Whisper v3 Turbo
Local ASR model · Q 9.1 · Speed 9.5
NVIDIA
Parakeet TDT 0.6B v2
Local ASR model · Q 9.4 · Speed 10
NVIDIA
Canary 1B v2
Local ASR model · Q 9.3 · Speed 9
IBM Granite Team
Granite Speech 4.1 2B
Local ASR model · Q 9.2 · Speed 8
Microsoft Research
VibeVoice ASR
Local ASR model · Q 9.3 · Speed 7.5
Cohere
Cohere Transcribe 03-2026
Local ASR model · Q 9 · Speed 8