ZAYA1-8B
Zyphra's Apache-2.0 reasoning MoE: 8.4B total parameters with only ~760M active, 16 experts, 131K context, Compressed Convolutional Attention and strong math/code benchmarks. Experimental for local use today: currently needs Zyphra vLLM/Transformers forks; LM Studio/GGUF/MLX support is not yet verified.
Can ZAYA1-8B run locally?
ZAYA1-8B is best suited for power-user machines with 32 GB RAM. LocalClaw recommends BF16 (Zyphra fork) as the default quantization, with at least 24 GB RAM.
Search term for LM Studio or compatible runtimes: Zyphra/ZAYA1-8B
Hugging Face repository: Zyphra/ZAYA1-8B
Strengths
- Very high intelligence density: 8.4B total with ~760M active parameters
- Strong mathematics, coding and long-form reasoning benchmarks
- 131K context window
- Apache 2.0 license
- Designed for test-time-compute workflows such as Markovian RSA
Limitations
- Experimental local runtime support today
- Currently documented with Zyphra forks of vLLM and Transformers
- No verified LM Studio, Ollama, llama.cpp, GGUF or MLX support yet
- BF16 weights are too heavy for a clean Mac mini M4 16 GB setup
Best use cases
- Mathematical reasoning research
- Coding and algorithmic problem solving
- Reasoning benchmark experimentation
- Server/local lab evaluation with Zyphra runtime forks
- Future compact on-device MoE experiments once runtimes catch up
Benchmarks
Speed: 7/10
Quality: 8/10
Coding: 8/10
Reasoning: 9/10
Technical details
Developer: Zyphra
License: Apache 2.0
Context window: 131,072 tokens
Architecture: Sparse MoE with Compressed Convolutional Attention (CCA), 16 experts, top-1 MLP router and learned residual scaling
Released: 2026-05