Local LLM model page

Nemotron 3 Nano (4B)

⭐ Mac Mini M4 16GB top pick! NVIDIA's hybrid model — distilled from 9B, keeps 95% of its quality. Hybrid attention + SSM layers = ~80–120 tok/s on Apple Silicon. Blazing fast, minimal RAM. NVIDIA Open Model License.

Find the best model for my hardware Browse all 183 LLMs

Parameters

Minimum RAM

6 GB

Model size

2.8 GB

Quantization

Q5_K_M

Can Nemotron 3 Nano (4B) run locally?

Nemotron 3 Nano (4B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 6 GB RAM.

Search term for LM Studio or compatible runtimes: nvidia-nemotron-3-nano-4b

Hugging Face repository: nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF

chatlightspeedreasoning

Strengths

⭐ Top pick for Mac Mini M4 16GB
Hybrid architecture (attention + SSM) — very fast on Apple Silicon
Distilled from 9B — retains most quality at 4B
Only 2.8 GB download — fits in 6GB RAM
Exceptional speed/quality ratio for its size
GGUF available on HuggingFace (nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF)

Limitations

Short 4K context window (not suited for long documents)
NVIDIA Open Model License — not fully open-source
English only
Older architecture compared to 2025 models

Best use cases

Fast chat on Mac Mini M4 / MacBook
Quick Q&A and summarisation
Code assistance for short snippets
Edge and offline applications
RAG pipelines with short chunks

Benchmarks

Speed: 10/10

Quality: 7/10

Coding: 6/10

Reasoning: 7/10

Technical details

Developer: NVIDIA

License: NVIDIA Open Model License

Context window: 4,096 tokens

Architecture: Hybrid Transformer + SSM (Mamba-style layers) — distilled from 9B

Released: 2025-02