Local LLM model page

GLM 4.5 Air (MoE)

Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.

Find the best model for my hardware Browse all 183 LLMs

Parameters

106B (14B active, MoE)

Minimum RAM

16 GB

Model size

9 GB

Quantization

Q4_K_M

Can GLM 4.5 Air (MoE) run locally?

GLM 4.5 Air (MoE) is best suited for mainstream Macs and PCs with 16 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 16 GB RAM.

Search term for LM Studio or compatible runtimes: glm-4.5-air

Hugging Face repository: THUDM/GLM-4.5-Air-GGUF

chatcodepowerqualitygeneral

Strengths

Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.

Limitations

Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

chat
code
power
quality
general

Benchmarks

Speed: 7/10

Quality: 9/10

Coding: 9/10

Reasoning: 9/10

Technical details

Developer: glm

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-07