Qwen 3.5 MoE (397B/17B active)
Flagship open-source Qwen 3.5. Only 17B active params despite 397B total — world-class quality at MoE efficiency. Matches GPT-4o on major benchmarks. Requires multi-GPU or server-grade hardware. Apache 2.0.
Can Qwen 3.5 MoE (397B/17B active) run locally?
Qwen 3.5 MoE (397B/17B active) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 256 GB RAM.
Search term for LM Studio or compatible runtimes: qwen3.5-397b-a17b
Hugging Face repository: Qwen/Qwen3.5-397B-A17B
Strengths
- 🏆 Flagship open-source Qwen 3.5 — best quality available
- Only 17B active params despite 397B total = MoE efficiency
- Matches GPT-4o on major benchmarks (MMLU, HumanEval, MATH)
- 256K context window
- Hybrid thinking: toggle deep reasoning on/off per request
- Apache 2.0 — fully open-source and commercial
Limitations
- Requires ~256GB RAM (multi-GPU server or Mac Pro Ultra)
- Files are ~200GB+ even heavily quantized
- Not suitable for consumer hardware
- Practical only with multi-GPU rigs or NAS + PCIe 4.0
Best use cases
- Server-grade AI deployment (API serving)
- Maximum quality research tasks
- Frontier AI applications open-source
- Complex long-context analysis
- Replacing GPT-4o/Claude on local infrastructure
- Enterprise AI on-premise
Benchmarks
Speed: 2/10
Quality: 10/10
Coding: 10/10
Reasoning: 10/10
Technical details
Developer: Alibaba Cloud (Qwen Team)
License: Apache 2.0
Context window: 262,144 tokens
Architecture: Flagship MoE — 397B total parameters, only 17B active per token. World-record scale for open-source MoE.
Released: 2025-08