Local LLM model page
Nemotron Nano 9B v2
NVIDIA hybrid Mamba-Transformer 9B. 6x throughput vs comparable dense models, 128K context, strong maths/code. Efficient toggle-able reasoning. NVIDIA Open Model License.
Parameters
9B
Minimum RAM
10 GB
Model size
5.5 GB
Quantization
Q5_K_M
Can Nemotron Nano 9B v2 run locally?
Nemotron Nano 9B v2 is best suited for mainstream Macs and PCs with 16 GB RAM. LocalClaw recommends Q5_K_M as the default quantization, with at least 10 GB RAM.
Search term for LM Studio or compatible runtimes: nvidia-nemotron-nano-9b-v2
Hugging Face repository: nvidia/NVIDIA-Nemotron-Nano-9B-v2
chatreasoningcodestandardgeneral
Strengths
- NVIDIA hybrid Mamba-Transformer 9B. 6x throughput vs comparable dense models, 128K context, strong maths/code. Efficient toggle-able reasoning. NVIDIA Open Model License.
Limitations
- Performance depends heavily on quantization, RAM bandwidth and runtime support.
Best use cases
- chat
- reasoning
- code
- standard
- general
Benchmarks
Speed: 9/10
Quality: 7/10
Coding: 8/10
Reasoning: 8/10
Technical details
Developer: nemotron
License: See model repository
Context window: Unknown tokens
Architecture: See model card
Released: 2025-08