Local LLM model page
Qwen 3.5 MoE (122B/10B active)
Large MoE model with only 10B active params. 60% cheaper to run than Qwen3-Max. 256K context. Top-tier reasoning, coding and multilingual. Hybrid think/non-think. Apache 2.0.
Parameters
122B (10B active)
Minimum RAM
80 GB
Model size
65 GB
Quantization
Q4_K_M
Can Qwen 3.5 MoE (122B/10B active) run locally?
Qwen 3.5 MoE (122B/10B active) is best suited for large-memory workstations. LocalClaw recommends Q4_K_M as the default quantization, with at least 80 GB RAM.
Search term for LM Studio or compatible runtimes: qwen3.5-122b-a10b
Hugging Face repository: lmstudio-community/Qwen3.5-122B-A10B-GGUF
chatcodereasoningqualitypower
Strengths
- 122B total params with only 10B active — 60% cheaper to run than Qwen3-Max
- 256K context window
- Top-tier reasoning, coding and multilingual quality
- Hybrid thinking mode
- Strong code generation rivaling specialized code models
- Apache 2.0 fully commercial
Limitations
- Requires ~80GB RAM (multi-GPU or Mac Pro/Studio Ultra)
- MoE loading overhead
- Files are 65GB+ even quantized
- Primarily for enthusiasts with serious hardware
Best use cases
- Maximum quality AI tasks on local hardware
- Complex multi-step reasoning chains
- Enterprise-grade code generation
- Large codebase analysis (256K context)
- Multilingual professional tasks
- Research requiring frontier-level quality
Benchmarks
Speed: 4/10
Quality: 10/10
Coding: 9/10
Reasoning: 10/10
Technical details
Developer: Alibaba Cloud (Qwen Team)
License: Apache 2.0
Context window: 262,144 tokens
Architecture: Mixture of Experts (MoE) — 122B total, 10B active per token. Large-scale sparse MoE with hybrid attention.
Released: 2025-08