Local LLM model page
Kimi K2 Thinking (1T MoE)
Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.
Parameters
1T (32B active, 384 experts)
Minimum RAM
1024 GB
Model size
600 GB
Quantization
Q4_K_M
Can Kimi K2 Thinking (1T MoE) run locally?
Kimi K2 Thinking (1T MoE) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 1024 GB RAM.
Search term for LM Studio or compatible runtimes: kimi-k2-thinking
Hugging Face repository: moonshotai/Kimi-K2-Thinking
reasoningcodequality
Strengths
- Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.
Limitations
- Performance depends heavily on quantization, RAM bandwidth and runtime support.
Best use cases
- reasoning
- code
- quality
Benchmarks
Speed: 2/10
Quality: 10/10
Coding: 10/10
Reasoning: 10/10
Technical details
Developer: kimi
License: See model repository
Context window: Unknown tokens
Architecture: See model card
Released: 2025-11