Local LLM model page

Kimi K2 Thinking (1T MoE)

Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.

Find the best model for my hardware Browse all 183 LLMs

Parameters

1T (32B active, 384 experts)

Minimum RAM

1024 GB

Model size

600 GB

Quantization

Q4_K_M

Can Kimi K2 Thinking (1T MoE) run locally?

Kimi K2 Thinking (1T MoE) is best suited for server-grade or multi-GPU systems. LocalClaw recommends Q4_K_M as the default quantization, with at least 1024 GB RAM.

Search term for LM Studio or compatible runtimes: kimi-k2-thinking

Hugging Face repository: moonshotai/Kimi-K2-Thinking

reasoningcodequality

Strengths

Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.

Limitations

Performance depends heavily on quantization, RAM bandwidth and runtime support.

Best use cases

reasoning
code
quality

Benchmarks

Speed: 2/10

Quality: 10/10

Coding: 10/10

Reasoning: 10/10

Technical details

Developer: kimi

License: See model repository

Context window: Unknown tokens

Architecture: See model card

Released: 2025-11