Local LLM model page

Gemma 4 31B

Largest Gemma 4 model for premium local quality. Strong coding and reasoning with 256K context and broad multilingual support. Apache 2.0.

Find the best model for my hardware Browse all 183 LLMs

Parameters

31B

Minimum RAM

32 GB

Model size

19 GB

Quantization

Q4_K_M

Can Gemma 4 31B run locally?

Gemma 4 31B is best suited for power-user machines with 32 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 32 GB RAM.

Search term for LM Studio or compatible runtimes: gemma-4-31b-it

Hugging Face repository: google/gemma-4-31B-it

chatcodereasoningqualitymultimodalgeneral

Strengths

Highest quality in Gemma 4 family
Strong coding + reasoning
256K context for long documents
Multimodal support

Limitations

Requires high-end local hardware
Heavier inference cost than E2B/E4B

Best use cases

Premium local assistant
Complex coding tasks
Long-context research
Multimodal enterprise workflows

Benchmarks

Speed: 5/10

Quality: 9/10

Coding: 9/10

Reasoning: 9/10

Technical details

Developer: Google DeepMind

License: Apache 2.0

Context window: 262,144 tokens

Architecture: Gemma 4 dense high-capacity multimodal Transformer (31B)

Released: 2026-03

Similar models

gemma4-26b-a4b qwen3.5-27b llama3.3-70b