Local LLM model page

Llama-3.1-Nemotron (70B)

NVIDIA fine-tune of Llama 3.1 70B. Best-in-class instruction following and alignment. Ranked #1 on MT-Bench at release. Exceptional helpfulness and safety. Compatible with Mac (Apple Silicon 64GB). NVIDIA Open Model License.

Find the best model for my hardware Browse all 183 LLMs

Parameters

70B

Minimum RAM

48 GB

Model size

42 GB

Quantization

Q4_K_M

Can Llama-3.1-Nemotron (70B) run locally?

Llama-3.1-Nemotron (70B) is best suited for high-end workstations with 64 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 48 GB RAM.

Search term for LM Studio or compatible runtimes: llama-3.1-nemotron-70b-instruct

Hugging Face repository: nvidia/Llama-3.1-Nemotron-70B-Instruct-GGUF

chatreasoningpowerqualitygeneral

Strengths

NVIDIA fine-tune of Llama 3.1 70B. Best-in-class instruction following and alignment. Ranked #1 on MT-Bench at release. Exceptional helpfulness and safety. Compatible with Mac (Apple Silicon 64GB). NVIDIA Open Model License.

Limitations

Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

chat
reasoning
power
quality
general

Benchmarks

Speed: 3/10

Quality: 9/10

Coding: 8/10

Reasoning: 9/10

Technical details

Developer: nemotron

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2024-11