Local LLM model page
Granite 3.3 (2B Instruct)
IBM ultra-efficient 2B. Best-in-class among small models for tool calling & structured output. Perfect for on-device RAG and agents. 128K context. Apache 2.0.
Parameters
2B
Minimum RAM
4 GB
Model size
1.4 GB
Quantization
Q5_K_M
Can Granite 3.3 (2B Instruct) run locally?
Granite 3.3 (2B Instruct) is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 4 GB RAM.
Search term for LM Studio or compatible runtimes: granite-3.3-2b-instruct
Hugging Face repository: ibm-granite/granite-3.3-2b-instruct-GGUF
chatlightedgespeedcode
Strengths
- IBM ultra-efficient 2B. Best-in-class among small models for tool calling & structured output. Perfect for on-device RAG and agents. 128K context. Apache 2.0.
Limitations
- Performance depends heavily on quantization, RAM bandwidth and runtime support.
Best use cases
- chat
- light
- edge
- speed
- code
Benchmarks
Speed: 10/10
Quality: 6/10
Coding: 6/10
Reasoning: 5/10
Technical details
Developer: granite
License: See model repository
Context window: Unknown tokens
Architecture: See model card
Released: 2025-10