Local LLM model page

Llama 4 Scout (17B/109B MoE)

Meta Llama 4 Scout — natively multimodal MoE with 16 experts. 10M-token context window. Outperforms Gemma 3 and Mistral Small on most benchmarks at similar active cost. Llama 4 Community License.

Find the best model for my hardware Browse all 183 LLMs

Parameters

109B (17B active, 16 experts)

Minimum RAM

96 GB

Model size

65 GB

Quantization

Q4_K_M

Can Llama 4 Scout (17B/109B MoE) run locally?

Llama 4 Scout (17B/109B MoE) is best suited for large-memory workstations. LocalClaw recommends Q4_K_M as the default quantization, with at least 96 GB RAM.

Search term for LM Studio or compatible runtimes: llama-4-scout-17b-16e-instruct

Hugging Face repository: meta-llama/Llama-4-Scout-17B-16E-Instruct

chatvisionreasoningmultimodalpower

Strengths

Meta Llama 4 Scout — natively multimodal MoE with 16 experts. 10M-token context window. Outperforms Gemma 3 and Mistral Small on most benchmarks at similar active cost. Llama 4 Community License.

Limitations

Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

chat
vision
reasoning
multimodal
power

Benchmarks

Speed: 5/10

Quality: 9/10

Coding: 8/10

Reasoning: 9/10

Technical details

Developer: llama

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-04