Open-weight MoE

Ornith 1.0 (397B MoE)

DeepReinforce MIT-licensed open-weight MoE derived from DeepSeek-V3.1-Terminus, tuned for agentic tool use, coding and reasoning. Official local serving examples target vLLM/SGLang on 8x80GB GPU nodes, so this is server-grade only.

Server-grade 640 GB RAM BF16 / FP8 serving Private server-grade agentic AI research
Parameters
397B MoE
Minimum RAM
640 GB
Model size
800 GB
Quantization
BF16 / FP8 serving

Can Ornith 1.0 (397B MoE) run locally?

Ornith 1.0 (397B MoE) is server-grade locally. Keep it for comparison unless you have very large unified memory, multiple GPUs or remote inference.

Use Ornith-1.0-397B with a server runtime such as vLLM, SGLang or Transformers. This is not a one-click GGUF/LM Studio listing.

chatcodereasoningqualityagentictool-callinggeneral

Install path

01
Check RAM fitServer-grade target. Plan for 640 GB class multi-GPU memory.
02
Load the modelServe Ornith-1.0-397B with vLLM, SGLang or Transformers.
03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

  • MIT licensed open-weight release
  • Agentic and tool-calling focus
  • Coding and reasoning oriented evaluation positioning
  • Official examples cover Transformers, vLLM and SGLang serving
  • Built from the DeepSeek-V3.1-Terminus base model lineage

Limitations

  • Server-grade only; not suitable for normal laptops, Mac mini, Mac Studio or single consumer GPUs
  • Official serving example targets an 8x80GB GPU node
  • No official GGUF or LM Studio friendly quantization was listed on the model card at review time
  • Full-weight local inference requires serious multi-GPU operations work

Best use cases

  • Private server-grade agentic AI research
  • Tool-calling and multi-step coding experiments
  • Benchmarking large open MoE systems
  • Advanced vLLM or SGLang deployments
  • Comparing frontier open weights against smaller practical local models

Capability profile

speed
1
quality
9
coding
9
reasoning
9

Technical notes

Developer
DeepReinforce
License
MIT
Context window
Unknown tokens
Architecture
Open-weight Mixture-of-Experts model derived from DeepSeek-V3.1-Terminus. The official release is distributed as safetensors and is intended for Transformers, vLLM and SGLang style serving rather than one-click GGUF desktop use.

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Similar models to compare

Where to go next