Local LLM model page

Qwen 3.5 MoE (122B/10B active)

Large MoE model with only 10B active params. 60% cheaper to run than Qwen3-Max. 256K context. Top-tier reasoning, coding and multilingual. Hybrid think/non-think. Apache 2.0.

Parameters
122B (10B active)
Minimum RAM
80 GB
Model size
65 GB
Quantization
Q4_K_M

Can Qwen 3.5 MoE (122B/10B active) run locally?

Qwen 3.5 MoE (122B/10B active) is best suited for large-memory workstations. LocalClaw recommends Q4_K_M as the default quantization, with at least 80 GB RAM.

Search term for LM Studio or compatible runtimes: qwen3.5-122b-a10b

Hugging Face repository: lmstudio-community/Qwen3.5-122B-A10B-GGUF

chatcodereasoningqualitypower

Strengths

  • 122B total params with only 10B active — 60% cheaper to run than Qwen3-Max
  • 256K context window
  • Top-tier reasoning, coding and multilingual quality
  • Hybrid thinking mode
  • Strong code generation rivaling specialized code models
  • Apache 2.0 fully commercial

Limitations

  • Requires ~80GB RAM (multi-GPU or Mac Pro/Studio Ultra)
  • MoE loading overhead
  • Files are 65GB+ even quantized
  • Primarily for enthusiasts with serious hardware

Best use cases

  • Maximum quality AI tasks on local hardware
  • Complex multi-step reasoning chains
  • Enterprise-grade code generation
  • Large codebase analysis (256K context)
  • Multilingual professional tasks
  • Research requiring frontier-level quality

Benchmarks

Speed: 4/10

Quality: 10/10

Coding: 9/10

Reasoning: 10/10

Technical details

Developer: Alibaba Cloud (Qwen Team)

License: Apache 2.0

Context window: 262,144 tokens

Architecture: Mixture of Experts (MoE) — 122B total, 10B active per token. Large-scale sparse MoE with hybrid attention.

Released: 2025-08