Local LLM model page

Qwen 3 MoE (30B/3B active)

Efficient MoE model with only 3B active params. Fast inference at large model quality. Hybrid thinking mode. Apache 2.0.

Find the best model for my hardware Browse all 183 LLMs

Parameters

30B (3B active)

Minimum RAM

24 GB

Model size

18 GB

Quantization

Q4_K_M

Can Qwen 3 MoE (30B/3B active) run locally?

Qwen 3 MoE (30B/3B active) is best suited for power-user machines with 32 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 24 GB RAM.

Search term for LM Studio or compatible runtimes: qwen3-30b-a3b

Hugging Face repository: lmstudio-community/Qwen3-30B-A3B-GGUF

chatcodereasoningpowerspeed

Strengths

Only 3B active params — blazing fast
MoE efficiency at 30B quality level
Hybrid thinking mode
Apache 2.0
Great balance of speed and quality

Limitations

MoE architecture needs more disk space
Not as good as dense 32B for pure quality
Needs 24GB RAM despite efficient inference

Best use cases

Fast local inference
Real-time chat applications
Code completion
Speed-sensitive deployments
Multi-user serving

Benchmarks

Speed: 8/10

Quality: 8/10

Coding: 8/10

Reasoning: 8/10

Technical details

Developer: Alibaba Cloud (Qwen Team)

License: Apache 2.0

Context window: 131,072 tokens

Architecture: Mixture of Experts (MoE) — 30B total, only 3B active per token

Released: 2025-04

Similar models

qwen3.5-35b-a3b qwen3-8b qwen3-14b