Local LLM model page

Gemma 3n (8B)

Google on-device powerhouse with vision. Designed for phones/tablets/laptops but punches far above its weight. Per-layer memory management for constrained devices. Apache 2.0.

Parameters
8B
Minimum RAM
8 GB
Model size
5 GB
Quantization
Q4_K_M

Can Gemma 3n (8B) run locally?

Gemma 3n (8B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q4_K_M as the default quantization, with at least 8 GB RAM.

Search term for LM Studio or compatible runtimes: gemma-3n-e8b-it

Hugging Face repository: google/gemma-3n-E8B-it-GGUF

chatvisionstandardgeneral

Strengths

  • Built-in vision capabilities
  • Optimized for on-device deployment
  • Per-layer memory management for constrained devices
  • Strong quality-to-size ratio
  • Runs on phones, tablets, and laptops

Limitations

  • Gemma license restrictions
  • Not the best for server-side deployment
  • Vision capabilities less powerful than dedicated VLMs

Best use cases

  • On-device AI assistant
  • Mobile vision apps
  • Edge computing
  • Multimodal chat on laptops
  • Embedded AI systems

Benchmarks

Speed: 7/10

Quality: 7/10

Coding: 6/10

Reasoning: 7/10

Technical details

Developer: Google DeepMind

License: Gemma License

Context window: 32,768 tokens

Architecture: Transformer (decoder-only) with per-layer memory management for constrained devices

Released: 2025-06