Local LLM model page
GLM 4.5 Air (MoE)
Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.
Parameters
106B (14B active, MoE)
Minimum RAM
16 GB
Model size
9 GB
Quantization
Q4_K_M
Can GLM 4.5 Air (MoE) run locally?
GLM 4.5 Air (MoE) is best suited for mainstream Macs and PCs with 16 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 16 GB RAM.
Search term for LM Studio or compatible runtimes: glm-4.5-air
Hugging Face repository: THUDM/GLM-4.5-Air-GGUF
chatcodepowerqualitygeneral
Strengths
- Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.
Limitations
- Performance depends heavily on quantization, RAM bandwidth and runtime support.
Best use cases
- chat
- code
- power
- quality
- general
Benchmarks
Speed: 7/10
Quality: 9/10
Coding: 9/10
Reasoning: 9/10
Technical details
Developer: glm
License: See model repository
Context window: Unknown tokens
Architecture: See model card
Released: 2025-07