Local LLM model page

Llama 4 Scout (17B/16E MoE)

Meta's multimodal MoE model. 17B active params across 16 experts (~109B total). Built-in image understanding. 10M token context window. Apache 2.0. 728K downloads.

Find the best model for my hardware Browse all 183 LLMs

Parameters

17B active (109B total, 16 experts)

Minimum RAM

16 GB

Model size

10 GB

Quantization

Q4_K_M

Can Llama 4 Scout (17B/16E MoE) run locally?

Llama 4 Scout (17B/16E MoE) is best suited for mainstream Macs and PCs with 16 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 16 GB RAM.

Search term for LM Studio or compatible runtimes: llama-4-scout

Hugging Face repository: meta-llama/Llama-4-Scout-17B-16E-Instruct-GGUF

chatvisionpowergeneral

Strengths

Latest Meta multimodal model
Built-in vision
MoE for efficient inference
728K downloads

Limitations

Needs 16GB RAM
New model — less community support
MoE complexity

Best use cases

Multimodal chat
Image understanding
General AI tasks
Content creation

Benchmarks

Speed: 6/10

Quality: 8/10

Coding: 8/10

Reasoning: 8/10

Technical details

Developer: Meta AI

License: Llama 4 Community License

Context window: 131,072 tokens

Architecture: Mixture of Experts (MoE) with native vision

Released: 2025-04