Local LLM model page
Gemma 4 E2B
Gemma 4 compact multimodal model for on-device usage. Supports text, image, audio, and video understanding with 256K context. Apache 2.0.
Parameters
E2B
Minimum RAM
6 GB
Model size
2.3 GB
Quantization
Q5_K_M
Can Gemma 4 E2B run locally?
Gemma 4 E2B is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 6 GB RAM.
Search term for LM Studio or compatible runtimes: gemma-4-e2b-it
Hugging Face repository: google/gemma-4-E2B-it
chatvisionspeededgemultimodalgeneral
Strengths
- Designed for edge/mobile hardware
- Native multimodal understanding
- 256K context window
- Open Apache 2.0 license
Limitations
- Lower quality ceiling than larger Gemma 4 variants
- Best for lightweight to mid-complexity tasks
Best use cases
- On-device assistant
- Multimodal mobile apps
- Quick reasoning and summarization
- Low-power deployment
Benchmarks
Speed: 9/10
Quality: 6/10
Coding: 5/10
Reasoning: 6/10
Technical details
Developer: Google DeepMind
License: Apache 2.0
Context window: 262,144 tokens
Architecture: Gemma 4 multimodal Transformer (edge tier)
Released: 2026-03