Local LLM model page

Kimi K2 Thinking (1T MoE)

Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.

Parameters
1T (32B active, 384 experts)
Minimum RAM
1024 GB
Model size
600 GB
Quantization
Q4_K_M

Can Kimi K2 Thinking (1T MoE) run locally?

Kimi K2 Thinking (1T MoE) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 1024 GB RAM.

Search term for LM Studio or compatible runtimes: kimi-k2-thinking

Hugging Face repository: moonshotai/Kimi-K2-Thinking

reasoningcodequality

Strengths

  • Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.

Limitations

  • Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

  • reasoning
  • code
  • quality

Benchmarks

Speed: 2/10

Quality: 10/10

Coding: 10/10

Reasoning: 10/10

Technical details

Developer: kimi

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-11