Local LLM model page

Llama-3.1-Nemotron (70B)

NVIDIA fine-tune of Llama 3.1 70B. Best-in-class instruction following and alignment. Ranked #1 on MT-Bench at release. Exceptional helpfulness and safety. Compatible with Mac (Apple Silicon 64GB). NVIDIA Open Model License.

Parameters
70B
Minimum RAM
48 GB
Model size
42 GB
Quantization
Q4_K_M

Can Llama-3.1-Nemotron (70B) run locally?

Llama-3.1-Nemotron (70B) is best suited for high-end workstations with 64 GB RAM. LocalClaw recommends Q4_K_M as the default quantization, with at least 48 GB RAM.

Search term for LM Studio or compatible runtimes: llama-3.1-nemotron-70b-instruct

Hugging Face repository: nvidia/Llama-3.1-Nemotron-70B-Instruct-GGUF

chatreasoningpowerqualitygeneral

Strengths

  • NVIDIA fine-tune of Llama 3.1 70B. Best-in-class instruction following and alignment. Ranked #1 on MT-Bench at release. Exceptional helpfulness and safety. Compatible with Mac (Apple Silicon 64GB). NVIDIA Open Model License.

Limitations

  • Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

  • chat
  • reasoning
  • power
  • quality
  • general

Benchmarks

Speed: 3/10

Quality: 9/10

Coding: 8/10

Reasoning: 9/10

Technical details

Developer: nemotron

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2024-11