Local LLM model page
GLM 4.7 Flash
Zhipu AI's fast GLM model. 14B parameters optimized for quick responses with strong bilingual (CN/EN) capabilities. Efficient inference for everyday tasks. Apache 2.0.
Parameters
14B
Minimum RAM
16 GB
Model size
9 GB
Quantization
Q5_K_M
Can GLM 4.7 Flash run locally?
GLM 4.7 Flash is best suited for mainstream Macs and PCs with 16 GB RAM. LocalClaw recommends Q5_K_M as the default quantization, with at least 16 GB RAM.
Search term for LM Studio or compatible runtimes: glm-4.7-flash
Hugging Face repository: THUDM/GLM-4.7-Flash-GGUF
chatcodepowerspeed
Strengths
- 128K context
- Fast inference
- Strong bilingual CN/EN
- Apache 2.0
- Ranks #19 global usage
Limitations
- Less known outside China
- Community support smaller than Llama/Qwen
Best use cases
- Chinese-English bilingual chat
- Fast responses
- Content generation
- Enterprise China market
Benchmarks
Speed: 9/10
Quality: 7/10
Coding: 7/10
Reasoning: 7/10
Technical details
Developer: Zhipu AI / Tsinghua University
License: Apache 2.0
Context window: 131,072 tokens
Architecture: Transformer with 128K context
Released: 2025-12