Local LLM model page

Qwen 3.5 MoE (122B/10B active)

Large MoE model with only 10B active params. 60% cheaper to run than Qwen3-Max. 256K context. Top-tier reasoning, coding and multilingual. Hybrid think/non-think. Apache 2.0.

Find the best model for my hardware Browse all 183 LLMs

Parameters

122B (10B active)

Minimum RAM

80 GB

Model size

65 GB

Quantization

Q4_K_M

Can Qwen 3.5 MoE (122B/10B active) run locally?

Qwen 3.5 MoE (122B/10B active) is best suited for large-memory workstations. LocalClaw recommends Q4_K_M as the default quantization, with at least 80 GB RAM.

Search term for LM Studio or compatible runtimes: qwen3.5-122b-a10b

Hugging Face repository: lmstudio-community/Qwen3.5-122B-A10B-GGUF

chatcodereasoningqualitypower

Strengths

122B total params with only 10B active — 60% cheaper to run than Qwen3-Max
256K context window
Top-tier reasoning, coding and multilingual quality
Hybrid thinking mode
Strong code generation rivaling specialized code models
Apache 2.0 fully commercial

Limitations

Requires ~80GB RAM (multi-GPU or Mac Pro/Studio Ultra)
MoE loading overhead
Files are 65GB+ even quantized
Primarily for enthusiasts with serious hardware

Best use cases

Maximum quality AI tasks on local hardware
Complex multi-step reasoning chains
Enterprise-grade code generation
Large codebase analysis (256K context)
Multilingual professional tasks
Research requiring frontier-level quality

Benchmarks

Speed: 4/10

Quality: 10/10

Coding: 9/10

Reasoning: 10/10

Technical details

Developer: Alibaba Cloud (Qwen Team)

License: Apache 2.0

Context window: 262,144 tokens

Architecture: Mixture of Experts (MoE) — 122B total, 10B active per token. Large-scale sparse MoE with hybrid attention.

Released: 2025-08