Local LLM model page

GLM 4.5 Air (MoE)

Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.

Parameters
106B (14B active, MoE)
Minimum RAM
16 GB
Model size
9 GB
Quantization
Q4_K_M

Can GLM 4.5 Air (MoE) run locally?

GLM 4.5 Air (MoE) is best suited for mainstream Macs and PCs with 16 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 16 GB RAM.

Search term for LM Studio or compatible runtimes: glm-4.5-air

Hugging Face repository: THUDM/GLM-4.5-Air-GGUF

chatcodepowerqualitygeneral

Strengths

  • Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.

Limitations

  • Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

  • chat
  • code
  • power
  • quality
  • general

Benchmarks

Speed: 7/10

Quality: 9/10

Coding: 9/10

Reasoning: 9/10

Technical details

Developer: glm

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-07