Local LLM model page

GLM 4.7 Flash

Zhipu AI's fast GLM model. 14B parameters optimized for quick responses with strong bilingual (CN/EN) capabilities. Efficient inference for everyday tasks. Apache 2.0.

Parameters
14B
Minimum RAM
16 GB
Model size
9 GB
Quantization
Q5_K_M

Can GLM 4.7 Flash run locally?

GLM 4.7 Flash is best suited for mainstream Macs and PCs with 16 GB RAM. LocalClaw recommends Q5_K_M as the default quantization, with at least 16 GB RAM.

Search term for LM Studio or compatible runtimes: glm-4.7-flash

Hugging Face repository: THUDM/GLM-4.7-Flash-GGUF

chatcodepowerspeed

Strengths

  • 128K context
  • Fast inference
  • Strong bilingual CN/EN
  • Apache 2.0
  • Ranks #19 global usage

Limitations

  • Less known outside China
  • Community support smaller than Llama/Qwen

Best use cases

  • Chinese-English bilingual chat
  • Fast responses
  • Content generation
  • Enterprise China market

Benchmarks

Speed: 9/10

Quality: 7/10

Coding: 7/10

Reasoning: 7/10

Technical details

Developer: Zhipu AI / Tsinghua University

License: Apache 2.0

Context window: 131,072 tokens

Architecture: Transformer with 128K context

Released: 2025-12