Local LLM model page

Gemma 4 E2B

Gemma 4 compact multimodal model for on-device usage. Supports text, image, audio, and video understanding with 256K context. Apache 2.0.

Parameters
E2B
Minimum RAM
6 GB
Model size
2.3 GB
Quantization
Q5_K_M

Can Gemma 4 E2B run locally?

Gemma 4 E2B is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 6 GB RAM.

Search term for LM Studio or compatible runtimes: gemma-4-e2b-it

Hugging Face repository: google/gemma-4-E2B-it

chatvisionspeededgemultimodalgeneral

Strengths

  • Designed for edge/mobile hardware
  • Native multimodal understanding
  • 256K context window
  • Open Apache 2.0 license

Limitations

  • Lower quality ceiling than larger Gemma 4 variants
  • Best for lightweight to mid-complexity tasks

Best use cases

  • On-device assistant
  • Multimodal mobile apps
  • Quick reasoning and summarization
  • Low-power deployment

Benchmarks

Speed: 9/10

Quality: 6/10

Coding: 5/10

Reasoning: 6/10

Technical details

Developer: Google DeepMind

License: Apache 2.0

Context window: 262,144 tokens

Architecture: Gemma 4 multimodal Transformer (edge tier)

Released: 2026-03