Local LLM model page

Granite 3.3 (2B Instruct)

IBM ultra-efficient 2B. Best-in-class among small models for tool calling & structured output. Perfect for on-device RAG and agents. 128K context. Apache 2.0.

Parameters
2B
Minimum RAM
4 GB
Model size
1.4 GB
Quantization
Q5_K_M

Can Granite 3.3 (2B Instruct) run locally?

Granite 3.3 (2B Instruct) is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 4 GB RAM.

Search term for LM Studio or compatible runtimes: granite-3.3-2b-instruct

Hugging Face repository: ibm-granite/granite-3.3-2b-instruct-GGUF

chatlightedgespeedcode

Strengths

  • IBM ultra-efficient 2B. Best-in-class among small models for tool calling & structured output. Perfect for on-device RAG and agents. 128K context. Apache 2.0.

Limitations

  • Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

  • chat
  • light
  • edge
  • speed
  • code

Benchmarks

Speed: 10/10

Quality: 6/10

Coding: 6/10

Reasoning: 5/10

Technical details

Developer: granite

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-10