Open-weight local LLM

MiniMax M3 (428B/23B active)

MiniMax native multimodal MoE with 1M context and MiniMax Sparse Attention. Around 428B parameters with 23B active. Built for long-context coding, cowork and agentic workflows, with local deployment via SGLang, vLLM or Transformers. Server-grade only.

Server-grade 2048 GB RAM BF16 / custom runtime Million-token repository analysis
Parameters
428B (23B active)
Minimum RAM
2048 GB
Model size
1700 GB
Quantization
BF16 / custom runtime

Can MiniMax M3 (428B/23B active) run locally?

MiniMax M3 (428B/23B active) is server-grade locally. Keep it for comparison unless you have very large unified memory, multiple GPUs or remote inference.

Search for minimax-m3 in LM Studio or another GGUF-compatible runtime.

chatcodereasoningagenticlong-contextmultimodalquality

Install path

01
Check RAM fitMinimum 2048 GB RAM. Start with the BF16 / custom runtime quant.
02
Load the modelSearch minimax-m3 in LM Studio.
03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

  • 1M context window for huge codebases and document sets
  • MiniMax Sparse Attention reduces long-context compute and memory pressure versus dense attention
  • Native text, image and video multimodality from training
  • Strong focus on coding, cowork and long-horizon agentic tasks
  • Official local deployment notes mention SGLang, vLLM and Transformers

Limitations

  • Server-grade only — far beyond normal Mac/consumer GPU setups
  • License is MiniMax Community, not Apache/MIT
  • Custom model code and sparse-attention runtime support are required for best results
  • Not a simple LM Studio GGUF install at listing time

Best use cases

  • Million-token repository analysis
  • Long-horizon coding agents
  • Multimodal document/video reasoning
  • Research on sparse attention and agentic model serving
  • Server-side local/private AI deployments

Capability profile

speed
3
quality
10
coding
10
reasoning
10

Technical notes

Developer
MiniMax AI
License
MiniMax Community License
Context window
1,000,000 tokens
Architecture
Native multimodal sparse MoE — about 428B parameters, about 23B active, MiniMax Sparse Attention for million-token context

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Similar models to compare

Where to go next