Local ASR model

Qwen3-ASR

Open-source ASR family with 0.6B and 1.7B models. Supports language identification and speech recognition for 52 languages and dialects, streaming/offline inference and long audio transcription.

GPU recommended speech-to-text transcription 52 languages Apache 2.0
Quality
9.5/10
Speed
9/10
Model size
3.4 GB
Voices
N/A (ASR: outputs text)

Can Qwen3-ASR run locally?

Qwen3-ASR can run locally for offline speech-to-text. Start with pip install -U qwen-asr.

Apache 2.0 license. Still verify upstream usage notes before shipping.

streamingrealtimemultilinguallow-latency

Audio profile

Quality
9.5
Speed
9
Local
9.1

Best fit

Qwen3-ASR is best for offline transcription, speech indexing and local voice pipelines.

Hardware: gpuapple

Model details

Type
Local ASR model
Family
qwen
Latency
low
Formats
pytorchsafetensors
Languages
zh, en, yue, ar, de, fr, es, pt, id, it, ko, ru, th, vi, ja, tr, hi, ms, nl, sv, da, fi, pl, cs, fil, fa, el, hu, mk, ro
Context
0.6B / 1.7B ASR models, 52 languages and dialects, streaming + offline inference

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install -U qwen-asr

Good for

  • speech-to-text transcription
  • GPU recommended local workflows
  • streaming, realtime, multilingual

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw