Local LLM model page

Hermes 4 (405B)

Nous Research flagship 405B with hybrid thinking. Matches Claude 3.5 Sonnet and GPT-4o on reasoning benchmarks. Server-grade hardware only. Llama 3.1 Community License.

Parameters
405B
Minimum RAM
384 GB
Model size
230 GB
Quantization
Q4_K_M

Can Hermes 4 (405B) run locally?

Hermes 4 (405B) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 384 GB RAM.

Search term for LM Studio or compatible runtimes: hermes-4-405b

Hugging Face repository: NousResearch/Hermes-4-Llama-3.1-405B-GGUF

chatreasoningqualitygeneral

Strengths

  • Nous Research flagship 405B with hybrid thinking. Matches Claude 3.5 Sonnet and GPT-4o on reasoning benchmarks. Server-grade hardware only. Llama 3.1 Community License.

Limitations

  • Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

  • chat
  • reasoning
  • quality
  • general

Benchmarks

Speed: 1/10

Quality: 10/10

Coding: 9/10

Reasoning: 10/10

Technical details

Developer: hermes

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-09