Local LLM model page

Llama 4 Scout (17B/109B MoE)

Meta Llama 4 Scout — natively multimodal MoE with 16 experts. 10M-token context window. Outperforms Gemma 3 and Mistral Small on most benchmarks at similar active cost. Llama 4 Community License.

Parameters
109B (17B active, 16 experts)
Minimum RAM
96 GB
Model size
65 GB
Quantization
Q4_K_M

Can Llama 4 Scout (17B/109B MoE) run locally?

Llama 4 Scout (17B/109B MoE) is best suited for large-memory workstations. LocalClaw recommends Q4_K_M as the default quantization, with at least 96 GB RAM.

Search term for LM Studio or compatible runtimes: llama-4-scout-17b-16e-instruct

Hugging Face repository: meta-llama/Llama-4-Scout-17B-16E-Instruct

chatvisionreasoningmultimodalpower

Strengths

  • Meta Llama 4 Scout — natively multimodal MoE with 16 experts. 10M-token context window. Outperforms Gemma 3 and Mistral Small on most benchmarks at similar active cost. Llama 4 Community License.

Limitations

  • Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

  • chat
  • vision
  • reasoning
  • multimodal
  • power

Benchmarks

Speed: 5/10

Quality: 9/10

Coding: 8/10

Reasoning: 9/10

Technical details

Developer: llama

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-04