Kimi K2 Instruct (1T MoE)
Moonshot AI trillion-parameter MoE flagship. 32B active params per token with 384 experts. Matches or beats GPT-4 Turbo on MMLU, GSM8K, HumanEval. Agentic & tool-use specialist. Server-grade only. Modified MIT.
Local coding model picks for repository work, debugging, agents and private software engineering with LM Studio or a local runtime.
For coding, prioritize coding score, reasoning score and runtime fit. A slightly smaller model that stays responsive is usually better than a larger model that starves memory.
Moonshot AI trillion-parameter MoE flagship. 32B active params per token with 384 experts. Matches or beats GPT-4 Turbo on MMLU, GSM8K, HumanEval. Agentic & tool-use specialist. Server-grade only. Modified MIT.
Flagship open-source Qwen 3.5. Only 17B active params despite 397B total — world-class quality at MoE efficiency. Matches GPT-4o on major benchmarks. Requires multi-GPU or server-grade hardware. Apache 2.0.
Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.
DeepSeek frontier MoE with 1M-token context, hybrid compressed attention and top-tier coding/reasoning. MIT licensed. Datacenter-grade only.
Z.ai next-generation flagship for agentic engineering. Stronger coding, long-horizon tool use, SWE-Bench Pro, Terminal-Bench and repo generation. MIT licensed.
Experimental V3.2 with DeepSeek Sparse Attention (DSA) — halves inference cost vs V3.1 on long context while keeping quality. 128K context, improved coding & tool-use. MIT licensed. Server-grade.
Zhipu AI flagship — full GLM 4.6. 200K context, strong tool-calling & agentic workflows. Competes with Claude 3.5 Sonnet on reasoning and code. MIT licensed. Server-grade hardware.
Updated flagship DeepSeek R1 with improved reasoning chains and fewer hallucinations. Major upgrade to chain-of-thought quality. MIT licensed. Server-grade only.
Qwen flagship coding model. Designed for agentic coding with 256K context. Outperforms Claude 3.5 Sonnet on SWE-bench. Apache 2.0.
These guides use LocalClaw's internal model database for scoring, then avoid hard claims beyond public hardware and model availability signals checked before publishing.