Local LLM model page

Gemma 4 E2B

Gemma 4 compact multimodal model for on-device usage. Supports text, image, audio, and video understanding with 256K context. Apache 2.0.

Find the best model for my hardware Browse all 183 LLMs

Parameters

E2B

Minimum RAM

6 GB

Model size

2.3 GB

Quantization

Q5_K_M

Can Gemma 4 E2B run locally?

Gemma 4 E2B is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 6 GB RAM.

Search term for LM Studio or compatible runtimes: gemma-4-e2b-it

Hugging Face repository: google/gemma-4-E2B-it

chatvisionspeededgemultimodalgeneral

Strengths

Designed for edge/mobile hardware
Native multimodal understanding
256K context window
Open Apache 2.0 license

Limitations

Lower quality ceiling than larger Gemma 4 variants
Best for lightweight to mid-complexity tasks

Best use cases

On-device assistant
Multimodal mobile apps
Quick reasoning and summarization
Low-power deployment

Benchmarks

Speed: 9/10

Quality: 6/10

Coding: 5/10

Reasoning: 6/10

Technical details

Developer: Google DeepMind

License: Apache 2.0

Context window: 262,144 tokens

Architecture: Gemma 4 multimodal Transformer (edge tier)

Released: 2026-03

Similar models

gemma4-e4b gemma3n-4b qwen3.5-2b