Local LLM model page
Gemma 4 26B A4B
Gemma 4 MoE flagship-for-workstations: 26B total with ~4B active parameters. 256K context and excellent quality-per-watt for local inference. Apache 2.0.
Parameters
26B (A4B active)
Minimum RAM
24 GB
Model size
16 GB
Quantization
Q4_K_M
Can Gemma 4 26B A4B run locally?
Gemma 4 26B A4B is best suited for power-user machines with 32 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 24 GB RAM.
Search term for LM Studio or compatible runtimes: gemma-4-26b-a4b-it
Hugging Face repository: google/gemma-4-26B-A4B-it
chatcodereasoningpowermultimodalgeneral
Strengths
- Excellent quality-per-watt
- Large-model quality with reduced active compute
- 256K context
- Strong coding and reasoning
Limitations
- Needs workstation-class RAM/VRAM for comfortable local inference
Best use cases
- Advanced assistant
- Agent workflows
- Coding support
- Research and analysis
Benchmarks
Speed: 7/10
Quality: 9/10
Coding: 8/10
Reasoning: 9/10
Technical details
Developer: Google DeepMind
License: Apache 2.0
Context window: 262,144 tokens
Architecture: Mixture-of-Experts style Gemma 4 (26B total, ~4B active)
Released: 2026-03