Qwen 3 Coder (30B)
Qwen flagship coding model. Designed for agentic coding with 256K context. Outperforms Claude 3.5 Sonnet on SWE-bench. Apache 2.0.
A static, Google-indexable guide to the best local AI models that fit in a 64GB RAM budget. Built from the LocalClaw model database and ranked by quality, reasoning, coding and speed.
With 64GB RAM, prioritize models with minimum RAM at or below 64GB and avoid filling memory completely. For most users, start with Qwen 3 Coder (30B), then test a faster smaller model if latency matters.
Qwen flagship coding model. Designed for agentic coding with 256K context. Outperforms Claude 3.5 Sonnet on SWE-bench. Apache 2.0.
Qwen 3.6 flagship dense model. Hybrid thinking mode with /think toggle for deep chain-of-thought reasoning. 128K context, 29+ languages. Significantly outperforms Qwen3.5-27B on reasoning, coding & math. Apache 2.0.
Gemma 4 MoE flagship-for-workstations: 26B total with ~4B active parameters. 256K context and excellent quality-per-watt for local inference. Apache 2.0.
Near GPT-4 intelligence locally. Thinking mode demolishes hard problems. The local AI dream.
Moonshot AI's agentic flagship. 1T total MoE parameters with 32B active per forward pass. Unmatched long-context reasoning at 256K tokens. Designed for complex agentic tasks and tool use. Model License — check moonshotai.com for commercial terms.
Alibaba's next-gen MoE with hybrid-gated DeltaNet attention. Only 3B active params — runs at dense 7B speed with 70B quality. 256K native context (extensible to 1M). Hybrid thinking mode. Apache 2.0.
Largest Gemma 4 model for premium local quality. Strong coding and reasoning with 256K context and broad multilingual support. Apache 2.0.
DeepSeek's massive MoE flagship. 37B active out of 671B total. Exceptional coding, reasoning and general capabilities. Ranks #6 on global usage leaderboards with 29B monthly tokens. MIT licensed.
Arcee AI's massive MoE open model. ~400B total parameters, 70B active per forward pass. Ranks near the top of global usage leaderboards. Exceptional versatility across reasoning, coding and chat. Free and open-source. Apache 2.0.
Dense 27B powerhouse. Hybrid thinking/non-thinking mode. Strong multilingual (29+ languages). 256K context window. Excellent instruction-following and math. Apache 2.0.
MoE gem — only 3B params active at inference. 19x faster than Qwen3-Max at 256K context. Best quality-per-watt of the series. Hybrid thinking mode. Runs on Mac Studio 32GB. Agentic coding standout.
NVIDIA's super-efficient 49B distilled from DeepSeek-R1 + Llama. Outperforms Llama-3.3-70B at half the compute. Strong reasoning, coding & instruction following. Runs on Mac Studio 64GB. NVIDIA Open Model License.
Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.
⚠️ Despite the small active count, this is a full 30B MoE model (Qwen3-30B-A3B base). ~82 GB full weights (Q4_K_M ≈18 GB). Deep-research agent with 256K context, tool calls, multilingual (EN/ZH). Requires H100 80 GB or serious multi-GPU. Not suitable for M1/M2 or consumer GPUs. Apache 2.0.
MiroMind AI second-gen deep-research agent. 30B MoE with stronger tool-use, 256K context, SOTA on BrowseComp-ZH (Chinese research). Designed for agentic workflows, not casual chat. Released March 2026. Apache 2.0.
⚠️ Despite the "Mini" name, this is a full 30B MoE model (Qwen3-30B-A3B). 3B = active params per forward pass, NOT model size. ~82 GB full weights. Requires H100 80GB or multi-GPU. 256K context, multilingual (EN/ZH+), deep-research agent with tool calls. Released 11 Mar 2026. Apache 2.0.
Qwen 3 VL flagship open vision model. Competes with GPT-4o on MMMU, chart-QA and document reasoning. Native video understanding up to 1 hour. Apache 2.0.
Zyphra's Apache-2.0 reasoning MoE: 8.4B total parameters with only ~760M active, 16 experts, 131K context, Compressed Convolutional Attention and strong math/code benchmarks. Experimental for local use today: currently needs Zyphra vLLM/Transformers forks; LM Studio/GGUF/MLX support is not yet verified.