Local LLM model page
Gemma 3n (8B)
Google on-device powerhouse with vision. Designed for phones/tablets/laptops but punches far above its weight. Per-layer memory management for constrained devices. Apache 2.0.
Parameters
8B
Minimum RAM
8 GB
Model size
5 GB
Quantization
Q4_K_M
Can Gemma 3n (8B) run locally?
Gemma 3n (8B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q4_K_M as the default quantization, with at least 8 GB RAM.
Search term for LM Studio or compatible runtimes: gemma-3n-e8b-it
Hugging Face repository: google/gemma-3n-E8B-it-GGUF
chatvisionstandardgeneral
Strengths
- Built-in vision capabilities
- Optimized for on-device deployment
- Per-layer memory management for constrained devices
- Strong quality-to-size ratio
- Runs on phones, tablets, and laptops
Limitations
- Gemma license restrictions
- Not the best for server-side deployment
- Vision capabilities less powerful than dedicated VLMs
Best use cases
- On-device AI assistant
- Mobile vision apps
- Edge computing
- Multimodal chat on laptops
- Embedded AI systems
Benchmarks
Speed: 7/10
Quality: 7/10
Coding: 6/10
Reasoning: 7/10
Technical details
Developer: Google DeepMind
License: Gemma License
Context window: 32,768 tokens
Architecture: Transformer (decoder-only) with per-layer memory management for constrained devices
Released: 2025-06