Local LLM model page
Llama-3.1-Nemotron (70B)
NVIDIA fine-tune of Llama 3.1 70B. Best-in-class instruction following and alignment. Ranked #1 on MT-Bench at release. Exceptional helpfulness and safety. Compatible with Mac (Apple Silicon 64GB). NVIDIA Open Model License.
Parameters
70B
Minimum RAM
48 GB
Model size
42 GB
Quantization
Q4_K_M
Can Llama-3.1-Nemotron (70B) run locally?
Llama-3.1-Nemotron (70B) is best suited for high-end workstations with 64 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 48 GB RAM.
Search term for LM Studio or compatible runtimes: llama-3.1-nemotron-70b-instruct
Hugging Face repository: nvidia/Llama-3.1-Nemotron-70B-Instruct-GGUF
chatreasoningpowerqualitygeneral
Strengths
- NVIDIA fine-tune of Llama 3.1 70B. Best-in-class instruction following and alignment. Ranked #1 on MT-Bench at release. Exceptional helpfulness and safety. Compatible with Mac (Apple Silicon 64GB). NVIDIA Open Model License.
Limitations
- Performance depends heavily on quantization, RAM bandwidth and runtime support.
Best use cases
- chat
- reasoning
- power
- quality
- general
Benchmarks
Speed: 3/10
Quality: 9/10
Coding: 8/10
Reasoning: 9/10
Technical details
Developer: nemotron
License: See model repository
Context window: Unknown tokens
Architecture: See model card
Released: 2024-11