Open-weight local LLM
MiniMax M3 (428B/23B active)
MiniMax native multimodal MoE with 1M context and MiniMax Sparse Attention. Around 428B parameters with 23B active. Built for long-context coding, cowork and agentic workflows, with local deployment via SGLang, vLLM or Transformers. Server-grade only.
Server-grade
2048 GB RAM
BF16 / custom runtime
Million-token repository analysis
Parameters
428B (23B active)
Minimum RAM
2048 GB
Model size
1700 GB
Quantization
BF16 / custom runtime
Can MiniMax M3 (428B/23B active) run locally?
MiniMax M3 (428B/23B active) is server-grade locally. Keep it for comparison unless you have very large unified memory, multiple GPUs or remote inference.
Search for minimax-m3 in LM Studio or another GGUF-compatible runtime.
MiniMaxAI/MiniMax-M3chatcodereasoningagenticlong-contextmultimodalquality
Install path
01
Check RAM fitMinimum 2048 GB RAM. Start with the BF16 / custom runtime quant.02
Load the modelSearch minimax-m3 in LM Studio.03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.Strengths
- 1M context window for huge codebases and document sets
- MiniMax Sparse Attention reduces long-context compute and memory pressure versus dense attention
- Native text, image and video multimodality from training
- Strong focus on coding, cowork and long-horizon agentic tasks
- Official local deployment notes mention SGLang, vLLM and Transformers
Limitations
- Server-grade only — far beyond normal Mac/consumer GPU setups
- License is MiniMax Community, not Apache/MIT
- Custom model code and sparse-attention runtime support are required for best results
- Not a simple LM Studio GGUF install at listing time
Best use cases
- Million-token repository analysis
- Long-horizon coding agents
- Multimodal document/video reasoning
- Research on sparse attention and agentic model serving
- Server-side local/private AI deployments
Capability profile
Technical notes
This model fits these next steps
Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.