coding

Best local LLMs for coding

Local coding model picks for repository work, debugging, agents and private software engineering with LM Studio or a local runtime.

Quick answer

For coding, prioritize coding score, reasoning score and runtime fit. A slightly smaller model that stays responsive is usually better than a larger model that starves memory.

Recommended starting points

#1

Kimi K2 Instruct (1T MoE)

1T (32B active, 384 experts) · 1024GB RAM · Q4_K_M · 600GB

Moonshot AI trillion-parameter MoE flagship. 32B active params per token with 384 experts. Matches or beats GPT-4 Turbo on MMLU, GSM8K, HumanEval. Agentic & tool-use specialist. Server-grade only. Modified MIT.

chatcodereasoningqualitygeneral
#2

Qwen 3.5 MoE (397B/17B active)

397B (17B active) · 256GB RAM · Q4_K_M · 200GB

Flagship open-source Qwen 3.5. Only 17B active params despite 397B total — world-class quality at MoE efficiency. Matches GPT-4o on major benchmarks. Requires multi-GPU or server-grade hardware. Apache 2.0.

chatcodereasoningquality
#3

Kimi K2 Thinking (1T MoE)

1T (32B active, 384 experts) · 1024GB RAM · Q4_K_M · 600GB

Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.

reasoningcodequality
#4

DeepSeek V4 Pro (1.6T MoE)

1.6T (49B active) · 1024GB RAM · FP4/FP8 · 850GB

DeepSeek frontier MoE with 1M-token context, hybrid compressed attention and top-tier coding/reasoning. MIT licensed. Datacenter-grade only.

chatcodereasoningqualityagenticlong-context
#5

GLM-5.1

754B MoE · 640GB RAM · Q4_K_M · 430GB

Z.ai next-generation flagship for agentic engineering. Stronger coding, long-horizon tool use, SWE-Bench Pro, Terminal-Bench and repo generation. MIT licensed.

chatcodereasoningqualityagenticgeneral
#6

DeepSeek V3.2 Exp (671B MoE)

671B (37B active) · 512GB RAM · Q4_K_M · 380GB

Experimental V3.2 with DeepSeek Sparse Attention (DSA) — halves inference cost vs V3.1 on long context while keeping quality. 128K context, improved coding & tool-use. MIT licensed. Server-grade.

chatcodereasoningquality
#7

GLM 4.6 (355B MoE)

355B (32B active) · 320GB RAM · Q4_K_M · 200GB

Zhipu AI flagship — full GLM 4.6. 200K context, strong tool-calling & agentic workflows. Competes with Claude 3.5 Sonnet on reasoning and code. MIT licensed. Server-grade hardware.

chatcodereasoningqualitygeneral
#8

DeepSeek R1 0528 (671B MoE)

671B (37B active) · 512GB RAM · Q4_K_M · 360GB

Updated flagship DeepSeek R1 with improved reasoning chains and fewer hallucinations. Major upgrade to chain-of-thought quality. MIT licensed. Server-grade only.

reasoningcodequality
#9

Qwen 3 Coder (30B)

30B · 24GB RAM · Q4_K_M · 18GB

Qwen flagship coding model. Designed for agentic coding with 256K context. Outperforms Claude 3.5 Sonnet on SWE-bench. Apache 2.0.

codepowerquality

Keep exploring

Source checks

These guides use LocalClaw's internal model database for scoring, then avoid hard claims beyond public hardware and model availability signals checked before publishing.