coding

Best local LLMs for coding

Local coding model picks for repository work, debugging, agents and private software engineering with LM Studio or a local runtime.

Run recommender Open full LLM list

Quick answer

For coding, prioritize coding score, reasoning score and runtime fit. A slightly smaller model that stays responsive is usually better than a larger model that starves memory.

Recommended starting points

Kimi K2 Instruct (1T MoE)

1T (32B active, 384 experts) · 1024GB RAM · Q4_K_M · 600GB

Moonshot AI trillion-parameter MoE flagship. 32B active params per token with 384 experts. Matches or beats GPT-4 Turbo on MMLU, GSM8K, HumanEval. Agentic & tool-use specialist. Server-grade only. Modified MIT.

chatcodereasoningqualitygeneral

Qwen 3.5 MoE (397B/17B active)

397B (17B active) · 256GB RAM · Q4_K_M · 200GB

Flagship open-source Qwen 3.5. Only 17B active params despite 397B total — world-class quality at MoE efficiency. Matches GPT-4o on major benchmarks. Requires multi-GPU or server-grade hardware. Apache 2.0.

chatcodereasoningquality

Kimi K2 Thinking (1T MoE)

1T (32B active, 384 experts) · 1024GB RAM · Q4_K_M · 600GB

Moonshot AI K2 with extended reasoning mode. Chain-of-thought traces before final answer. Top-5 on GPQA, AIME, SWE-bench. Requires datacenter-grade hardware or distributed inference. Modified MIT.

reasoningcodequality

DeepSeek V4 Pro (1.6T MoE)

1.6T (49B active) · 1024GB RAM · FP4/FP8 · 850GB

DeepSeek frontier MoE with 1M-token context, hybrid compressed attention and top-tier coding/reasoning. MIT licensed. Datacenter-grade only.

chatcodereasoningqualityagenticlong-context

GLM-5.1

754B MoE · 640GB RAM · Q4_K_M · 430GB

Z.ai next-generation flagship for agentic engineering. Stronger coding, long-horizon tool use, SWE-Bench Pro, Terminal-Bench and repo generation. MIT licensed.

chatcodereasoningqualityagenticgeneral

DeepSeek V3.2 Exp (671B MoE)

671B (37B active) · 512GB RAM · Q4_K_M · 380GB

Experimental V3.2 with DeepSeek Sparse Attention (DSA) — halves inference cost vs V3.1 on long context while keeping quality. 128K context, improved coding & tool-use. MIT licensed. Server-grade.

chatcodereasoningquality

GLM 4.6 (355B MoE)

355B (32B active) · 320GB RAM · Q4_K_M · 200GB

Zhipu AI flagship — full GLM 4.6. 200K context, strong tool-calling & agentic workflows. Competes with Claude 3.5 Sonnet on reasoning and code. MIT licensed. Server-grade hardware.

chatcodereasoningqualitygeneral

DeepSeek R1 0528 (671B MoE)

671B (37B active) · 512GB RAM · Q4_K_M · 360GB

Updated flagship DeepSeek R1 with improved reasoning chains and fewer hallucinations. Major upgrade to chain-of-thought quality. MIT licensed. Server-grade only.

reasoningcodequality

Qwen 3 Coder (30B)

30B · 24GB RAM · Q4_K_M · 18GB

Qwen flagship coding model. Designed for agentic coding with 256K context. Outperforms Claude 3.5 Sonnet on SWE-bench. Apache 2.0.

codepowerquality

Keep exploring

Use caseCoding guide ModelsQwen Coder family HardwareComputers for AI AppGet LocalClaw

Source checks

These guides use LocalClaw's internal model database for scoring, then avoid hard claims beyond public hardware and model availability signals checked before publishing.

Qwen model releases →LM Studio local model catalogue →