Open-weight local LLM

GLM-5.2 (744B MoE)

Z.ai flagship open model for long-horizon coding, reasoning and agentic work. 744B total, 40B active, 1M-token context, MIT license. Unsloth Dynamic GGUF makes it technically local, but it needs workstation/server-class memory: ~245GB total memory for 2-bit and 372GB+ for 4-bit.

Server-grade 256 GB RAM UD-IQ2_M Private long-horizon coding agents on large workstations
Parameters
744B (40B active)
Minimum RAM
256 GB
Model size
239 GB
Quantization
UD-IQ2_M

Can GLM-5.2 (744B MoE) run locally?

GLM-5.2 (744B MoE) is server-grade locally. Keep it for comparison unless you have very large unified memory, multiple GPUs or remote inference.

Search for glm-5.2 in LM Studio or another GGUF-compatible runtime.

chatcodereasoningqualityagenticlong-contextgeneral

Install path

01
Check RAM fitMinimum 256 GB RAM. Start with the UD-IQ2_M quant.
02
Load the modelSearch glm-5.2 in LM Studio.
03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

  • MIT open-source license with official Z.ai weights
  • 1,048,576 token context window for very large repositories and documents
  • Strong benchmark positioning across coding, reasoning and agentic tasks
  • Unsloth Dynamic GGUF provides 1-bit, 2-bit, 3-bit, 4-bit and higher quantization paths
  • Thinking modes can be controlled through reasoning effort / chat template flags

Limitations

  • Not suitable for normal laptops or small desktops
  • 2-bit Unsloth GGUF still needs roughly 245GB total memory for inference
  • 4-bit requires roughly 372-475GB total memory depending on quantization
  • Full precision / high precision variants are multi-hundred-GB to TB scale
  • Requires modern llama.cpp, Unsloth Studio, SGLang, vLLM or advanced runtime support

Best use cases

  • Private long-horizon coding agents on large workstations
  • Repository-scale reasoning and refactoring
  • Large document analysis with local data control
  • Research on ultra-large open MoE models
  • Benchmarking local open models against frontier hosted systems

Capability profile

speed
2
quality
10
coding
10
reasoning
10

Technical notes

Developer
Z.ai
License
MIT
Context window
1,048,576 tokens
Architecture
GLM MoE DSA with 744B total parameters, about 40B active parameters, 8 experts per token, IndexShare sparse attention and a 1M-token context window. The practical local path is the Unsloth Dynamic GGUF release.

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Similar models to compare

Where to go next