Local LLM model page

Gemma 2 (9B)

Google Gemma 2nd gen. Excellent quality-to-size ratio. 8.1M downloads. Great all-around model.

Parameters
9B
Minimum RAM
8 GB
Model size
5.5 GB
Quantization
Q5_K_M

Can Gemma 2 (9B) run locally?

Gemma 2 (9B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 8 GB RAM.

Search term for LM Studio or compatible runtimes: gemma-2-9b-it

Hugging Face repository: lmstudio-community/gemma-2-9B-it-GGUF

chatcodestandardgeneral

Strengths

  • Excellent quality-to-size ratio
  • 8.1M+ downloads — battle-tested
  • Strong reasoning for 9B
  • Good coding abilities

Limitations

  • Only 8K context window
  • Gemma license more restrictive than Apache/MIT
  • Not the best for non-English tasks

Best use cases

  • General chat assistant
  • Code completion
  • Content writing
  • Summarization
  • Light reasoning tasks

Benchmarks

Speed: 8/10

Quality: 7/10

Coding: 7/10

Reasoning: 7/10

Technical details

Developer: Google DeepMind

License: Gemma License

Context window: 8,192 tokens

Architecture: Transformer (decoder-only) with sliding window attention

Released: 2024-06