Local LLM model page

Hermes 4 (405B)

Nous Research flagship 405B with hybrid thinking. Matches Claude 3.5 Sonnet and GPT-4o on reasoning benchmarks. Server-grade hardware only. Llama 3.1 Community License.

Find the best model for my hardware Browse all 183 LLMs

Parameters

405B

Minimum RAM

384 GB

Model size

230 GB

Quantization

Q4_K_M

Can Hermes 4 (405B) run locally?

Hermes 4 (405B) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 384 GB RAM.

Search term for LM Studio or compatible runtimes: hermes-4-405b

Hugging Face repository: NousResearch/Hermes-4-Llama-3.1-405B-GGUF

chatreasoningqualitygeneral

Strengths

Nous Research flagship 405B with hybrid thinking. Matches Claude 3.5 Sonnet and GPT-4o on reasoning benchmarks. Server-grade hardware only. Llama 3.1 Community License.

Limitations

Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

chat
reasoning
quality
general

Benchmarks

Speed: 1/10

Quality: 10/10

Coding: 9/10

Reasoning: 10/10

Technical details

Developer: hermes

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-09