Local LLM model page

Qwen 3.5 MoE (397B/17B active)

Flagship open-source Qwen 3.5. Only 17B active params despite 397B total — world-class quality at MoE efficiency. Matches GPT-4o on major benchmarks. Requires multi-GPU or server-grade hardware. Apache 2.0.

Parameters
397B (17B active)
Minimum RAM
256 GB
Model size
200 GB
Quantization
Q4_K_M

Can Qwen 3.5 MoE (397B/17B active) run locally?

Qwen 3.5 MoE (397B/17B active) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 256 GB RAM.

Search term for LM Studio or compatible runtimes: qwen3.5-397b-a17b

Hugging Face repository: Qwen/Qwen3.5-397B-A17B

chatcodereasoningquality

Strengths

  • 🏆 Flagship open-source Qwen 3.5 — best quality available
  • Only 17B active params despite 397B total = MoE efficiency
  • Matches GPT-4o on major benchmarks (MMLU, HumanEval, MATH)
  • 256K context window
  • Hybrid thinking: toggle deep reasoning on/off per request
  • Apache 2.0 — fully open-source and commercial

Limitations

  • Requires ~256GB RAM (multi-GPU server or Mac Pro Ultra)
  • Files are ~200GB+ even heavily quantized
  • Not suitable for consumer hardware
  • Practical only with multi-GPU rigs or NAS + PCIe 4.0

Best use cases

  • Server-grade AI deployment (API serving)
  • Maximum quality research tasks
  • Frontier AI applications open-source
  • Complex long-context analysis
  • Replacing GPT-4o/Claude on local infrastructure
  • Enterprise AI on-premise

Benchmarks

Speed: 2/10

Quality: 10/10

Coding: 10/10

Reasoning: 10/10

Technical details

Developer: Alibaba Cloud (Qwen Team)

License: Apache 2.0

Context window: 262,144 tokens

Architecture: Flagship MoE — 397B total parameters, only 17B active per token. World-record scale for open-source MoE.

Released: 2025-08