Open-weight local LLM
GLM-5.2 (744B MoE)
Z.ai flagship open model for long-horizon coding, reasoning and agentic work. 744B total, 40B active, 1M-token context, MIT license. Unsloth Dynamic GGUF makes it technically local, but it needs workstation/server-class memory: ~245GB total memory for 2-bit and 372GB+ for 4-bit.
Server-grade
256 GB RAM
UD-IQ2_M
Private long-horizon coding agents on large workstations
Parameters
744B (40B active)
Minimum RAM
256 GB
Model size
239 GB
Quantization
UD-IQ2_M
Can GLM-5.2 (744B MoE) run locally?
GLM-5.2 (744B MoE) is server-grade locally. Keep it for comparison unless you have very large unified memory, multiple GPUs or remote inference.
Search for glm-5.2 in LM Studio or another GGUF-compatible runtime.
unsloth/GLM-5.2-GGUFchatcodereasoningqualityagenticlong-contextgeneral
Install path
01
Check RAM fitMinimum 256 GB RAM. Start with the UD-IQ2_M quant.02
Load the modelSearch glm-5.2 in LM Studio.03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.Strengths
- MIT open-source license with official Z.ai weights
- 1,048,576 token context window for very large repositories and documents
- Strong benchmark positioning across coding, reasoning and agentic tasks
- Unsloth Dynamic GGUF provides 1-bit, 2-bit, 3-bit, 4-bit and higher quantization paths
- Thinking modes can be controlled through reasoning effort / chat template flags
Limitations
- Not suitable for normal laptops or small desktops
- 2-bit Unsloth GGUF still needs roughly 245GB total memory for inference
- 4-bit requires roughly 372-475GB total memory depending on quantization
- Full precision / high precision variants are multi-hundred-GB to TB scale
- Requires modern llama.cpp, Unsloth Studio, SGLang, vLLM or advanced runtime support
Best use cases
- Private long-horizon coding agents on large workstations
- Repository-scale reasoning and refactoring
- Large document analysis with local data control
- Research on ultra-large open MoE models
- Benchmarking local open models against frontier hosted systems
Capability profile
Technical notes
This model fits these next steps
Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.