Local LLM model page

Gemma 3n (8B)

Google on-device powerhouse with vision. Designed for phones/tablets/laptops but punches far above its weight. Per-layer memory management for constrained devices. Apache 2.0.

Find the best model for my hardware Browse all 183 LLMs

Parameters

Minimum RAM

8 GB

Model size

5 GB

Quantization

Q4_K_M

Can Gemma 3n (8B) run locally?

Gemma 3n (8B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q4_K_M as the default quantization, with at least 8 GB RAM.

Search term for LM Studio or compatible runtimes: gemma-3n-e8b-it

Hugging Face repository: google/gemma-3n-E8B-it-GGUF

chatvisionstandardgeneral

Strengths

Built-in vision capabilities
Optimized for on-device deployment
Per-layer memory management for constrained devices
Strong quality-to-size ratio
Runs on phones, tablets, and laptops

Limitations

Gemma license restrictions
Not the best for server-side deployment
Vision capabilities less powerful than dedicated VLMs

Best use cases

On-device AI assistant
Mobile vision apps
Edge computing
Multimodal chat on laptops
Embedded AI systems

Benchmarks

Speed: 7/10

Quality: 7/10

Coding: 6/10

Reasoning: 7/10

Technical details

Developer: Google DeepMind

License: Gemma License

Context window: 32,768 tokens

Architecture: Transformer (decoder-only) with per-layer memory management for constrained devices

Released: 2025-06