Local LLM model page

Llama 4 Maverick (17B/128E MoE)

Meta's largest open MoE. 17B active params across 128 experts (~400B total). Multimodal with exceptional image reasoning. Server-grade hardware required. Llama 4 License.

Parameters
17B active (400B total, 128 experts)
Minimum RAM
320 GB
Model size
220 GB
Quantization
Q4_K_M

Can Llama 4 Maverick (17B/128E MoE) run locally?

Llama 4 Maverick (17B/128E MoE) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 320 GB RAM.

Search term for LM Studio or compatible runtimes: llama-4-maverick

Hugging Face repository: meta-llama/Llama-4-Maverick-17B-128E-Instruct-GGUF

chatvisionquality

Strengths

  • Largest open MoE model from Meta
  • Incredible multimodal capabilities
  • Top-tier on all benchmarks

Limitations

  • Requires 320GB+ RAM
  • Server-grade hardware only
  • Very slow on consumer hardware

Best use cases

  • Maximum quality outputs
  • Research
  • Enterprise multimodal AI
  • Frontier tasks

Benchmarks

Speed: 1/10

Quality: 10/10

Coding: 10/10

Reasoning: 10/10

Technical details

Developer: Meta AI

License: Llama 4 Community License

Context window: 131,072 tokens

Architecture: Mixture of Experts (MoE) — 400B total with native vision

Released: 2025-04