Local LLM model page

Qwen 3.5 MoE (397B/17B active)

Flagship open-source Qwen 3.5. Only 17B active params despite 397B total — world-class quality at MoE efficiency. Matches GPT-4o on major benchmarks. Requires multi-GPU or server-grade hardware. Apache 2.0.

Find the best model for my hardware Browse all 183 LLMs

Parameters

397B (17B active)

Minimum RAM

256 GB

Model size

200 GB

Quantization

Q4_K_M

Can Qwen 3.5 MoE (397B/17B active) run locally?

Qwen 3.5 MoE (397B/17B active) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 256 GB RAM.

Search term for LM Studio or compatible runtimes: qwen3.5-397b-a17b

Hugging Face repository: Qwen/Qwen3.5-397B-A17B

chatcodereasoningquality

Strengths

🏆 Flagship open-source Qwen 3.5 — best quality available
Only 17B active params despite 397B total = MoE efficiency
Matches GPT-4o on major benchmarks (MMLU, HumanEval, MATH)
256K context window
Hybrid thinking: toggle deep reasoning on/off per request
Apache 2.0 — fully open-source and commercial

Limitations

Requires ~256GB RAM (multi-GPU server or Mac Pro Ultra)
Files are ~200GB+ even heavily quantized
Not suitable for consumer hardware
Practical only with multi-GPU rigs or NAS + PCIe 4.0

Best use cases

Server-grade AI deployment (API serving)
Maximum quality research tasks
Frontier AI applications open-source
Complex long-context analysis
Replacing GPT-4o/Claude on local infrastructure
Enterprise AI on-premise

Benchmarks

Speed: 2/10

Quality: 10/10

Coding: 10/10

Reasoning: 10/10

Technical details

Developer: Alibaba Cloud (Qwen Team)

License: Apache 2.0

Context window: 262,144 tokens

Architecture: Flagship MoE — 397B total parameters, only 17B active per token. World-record scale for open-source MoE.

Released: 2025-08