Local LLM model page

Llama 4 Maverick (17B/128E MoE)

Meta's largest open MoE. 17B active params across 128 experts (~400B total). Multimodal with exceptional image reasoning. Server-grade hardware required. Llama 4 License.

Find the best model for my hardware Browse all 183 LLMs

Parameters

17B active (400B total, 128 experts)

Minimum RAM

320 GB

Model size

220 GB

Quantization

Q4_K_M

Can Llama 4 Maverick (17B/128E MoE) run locally?

Llama 4 Maverick (17B/128E MoE) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 320 GB RAM.

Search term for LM Studio or compatible runtimes: llama-4-maverick

Hugging Face repository: meta-llama/Llama-4-Maverick-17B-128E-Instruct-GGUF

chatvisionquality

Strengths

Largest open MoE model from Meta
Incredible multimodal capabilities
Top-tier on all benchmarks

Limitations

Requires 320GB+ RAM
Server-grade hardware only
Very slow on consumer hardware

Best use cases

Maximum quality outputs
Research
Enterprise multimodal AI
Frontier tasks

Benchmarks

Speed: 1/10

Quality: 10/10

Coding: 10/10

Reasoning: 10/10

Technical details

Developer: Meta AI

License: Llama 4 Community License

Context window: 131,072 tokens

Architecture: Mixture of Experts (MoE) — 400B total with native vision

Released: 2025-04