Local LLM model page
Llama 4 Maverick (17B/128E MoE)
Meta's largest open MoE. 17B active params across 128 experts (~400B total). Multimodal with exceptional image reasoning. Server-grade hardware required. Llama 4 License.
Parameters
17B active (400B total, 128 experts)
Minimum RAM
320 GB
Model size
220 GB
Quantization
Q4_K_M
Can Llama 4 Maverick (17B/128E MoE) run locally?
Llama 4 Maverick (17B/128E MoE) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 320 GB RAM.
Search term for LM Studio or compatible runtimes: llama-4-maverick
Hugging Face repository: meta-llama/Llama-4-Maverick-17B-128E-Instruct-GGUF
chatvisionquality
Strengths
- Largest open MoE model from Meta
- Incredible multimodal capabilities
- Top-tier on all benchmarks
Limitations
- Requires 320GB+ RAM
- Server-grade hardware only
- Very slow on consumer hardware
Best use cases
- Maximum quality outputs
- Research
- Enterprise multimodal AI
- Frontier tasks
Benchmarks
Speed: 1/10
Quality: 10/10
Coding: 10/10
Reasoning: 10/10
Technical details
Developer: Meta AI
License: Llama 4 Community License
Context window: 131,072 tokens
Architecture: Mixture of Experts (MoE) — 400B total with native vision
Released: 2025-04