What is MiniMax M3 (428B/23B active) best for?

MiniMax M3 (428B/23B active) is best used for Million-token repository analysis.

Open-weight local LLM

MiniMax M3 (428B/23B active)

Q: Can MiniMax M3 (428B/23B active) run locally?

MiniMax M3 (428B/23B active) can run locally with at least 2048 GB RAM. LocalClaw recommends BF16 / custom runtime quantization.

MiniMax native multimodal MoE with 1M context and MiniMax Sparse Attention. Around 428B parameters with 23B active. Built for long-context coding, cowork and agentic workflows, with local deployment via SGLang, vLLM or Transformers. Server-grade only.

Server-grade 2048 GB RAM BF16 / custom runtime Million-token repository analysis

Run with LocalClaw Compare all models

Parameters

428B (23B active)

Minimum RAM

2048 GB

Model size

1700 GB

Quantization

BF16 / custom runtime

Can MiniMax M3 (428B/23B active) run locally?

MiniMax M3 (428B/23B active) is server-grade locally. Keep it for comparison unless you have very large unified memory, multiple GPUs or remote inference.

Search for minimax-m3 in LM Studio or another GGUF-compatible runtime.

MiniMaxAI/MiniMax-M3

chatcodereasoningagenticlong-contextmultimodalquality

Install path

Check RAM fitMinimum 2048 GB RAM. Start with the BF16 / custom runtime quant.

Load the modelSearch minimax-m3 in LM Studio.

Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

1M context window for huge codebases and document sets
MiniMax Sparse Attention reduces long-context compute and memory pressure versus dense attention
Native text, image and video multimodality from training
Strong focus on coding, cowork and long-horizon agentic tasks
Official local deployment notes mention SGLang, vLLM and Transformers

Limitations

Server-grade only — far beyond normal Mac/consumer GPU setups
License is MiniMax Community, not Apache/MIT
Custom model code and sparse-attention runtime support are required for best results
Not a simple LM Studio GGUF install at listing time

Best use cases

Million-token repository analysis
Long-horizon coding agents
Multimodal document/video reasoning
Research on sparse attention and agentic model serving
Server-side local/private AI deployments

Capability profile

speed

quality

coding

reasoning

Technical notes

Developer

MiniMax AI

License

MiniMax Community License

Context window

1,000,000 tokens

Architecture

Native multimodal sparse MoE — about 428B parameters, about 23B active, MiniMax Sparse Attention for million-token context

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Very large memoryMac Studio Ultra class Check model size firstNVIDIA GB10 / server options More practical alternativesCompare smaller models

Similar models to compare

Kimi K2.7 Code (1T MoE) 1T (32B active)MiniMax M2 (230B MoE) 230B (10B active)DeepSeek V3.2 (37B/671B MoE) 37B (671B MoE)Ling-2.6-flash (104B MoE) 104B (7.4B active)Qwen 3.5 MoE (397B/17B active) 397B (17B active)

Where to go next

RAM guideFind models for this memory tier HardwareSee computers for local AI LocalClawControl OpenClaw from one native app