RAM tier guide

Best local LLMs for 64GB RAM

A static, Google-indexable guide to the best local AI models that fit in a 64GB RAM budget. Built from the LocalClaw model database and ranked by quality, reasoning, coding and speed.

Compatible models
155
Best pick
Qwen 3 Coder (30B)
RAM tier
64GB
Hardware fit
high-end Mac Studio, desktop workstations and local coding/reasoning setups

Quick answer

With 64GB RAM, prioritize models with minimum RAM at or below 64GB and avoid filling memory completely. For most users, start with Qwen 3 Coder (30B), then test a faster smaller model if latency matters.

Top models for 64GB RAM

#1

Qwen 3 Coder (30B)

30B · 24GB min · Q4_K_M · 18GB

Qwen flagship coding model. Designed for agentic coding with 256K context. Outperforms Claude 3.5 Sonnet on SWE-bench. Apache 2.0.

codepowerquality
#2

Qwen 3.6 (27B)

27B · 32GB min · Q4_K_M · 17GB

Qwen 3.6 flagship dense model. Hybrid thinking mode with /think toggle for deep chain-of-thought reasoning. 128K context, 29+ languages. Significantly outperforms Qwen3.5-27B on reasoning, coding & math. Apache 2.0.

chatcodereasoningpowerquality
#3

Gemma 4 26B A4B

26B (A4B active) · 24GB min · Q4_K_M · 16GB

Gemma 4 MoE flagship-for-workstations: 26B total with ~4B active parameters. 256K context and excellent quality-per-watt for local inference. Apache 2.0.

chatcodereasoningpowermultimodal
#4

Qwen 3 (32B)

32B · 32GB min · Q4_K_M · 20GB

Near GPT-4 intelligence locally. Thinking mode demolishes hard problems. The local AI dream.

chatcodereasoningpowerquality
#5

Kimi K2.5 (32B/1T MoE)

32B active (1T total MoE) · 32GB min · Q4_K_M · 22GB

Moonshot AI's agentic flagship. 1T total MoE parameters with 32B active per forward pass. Unmatched long-context reasoning at 256K tokens. Designed for complex agentic tasks and tool use. Model License — check moonshotai.com for commercial terms.

chatcodereasoningpowerquality
#6

Qwen 3 Next (80B/3B MoE)

80B (3B active) · 64GB min · Q4_K_M · 48GB

Alibaba's next-gen MoE with hybrid-gated DeltaNet attention. Only 3B active params — runs at dense 7B speed with 70B quality. 256K native context (extensible to 1M). Hybrid thinking mode. Apache 2.0.

chatcodereasoningpowerquality
#7

Gemma 4 31B

31B · 32GB min · Q4_K_M · 19GB

Largest Gemma 4 model for premium local quality. Strong coding and reasoning with 256K context and broad multilingual support. Apache 2.0.

chatcodereasoningqualitymultimodal
#8

DeepSeek V3.2 (37B/671B MoE)

37B (671B MoE) · 48GB min · Q4_K_M · 40GB

DeepSeek's massive MoE flagship. 37B active out of 671B total. Exceptional coding, reasoning and general capabilities. Ranks #6 on global usage leaderboards with 29B monthly tokens. MIT licensed.

chatcodereasoningpowerquality
#9

Trinity Large Preview (70B MoE)

70B (MoE, ~400B total) · 48GB min · Q4_K_M · 45GB

Arcee AI's massive MoE open model. ~400B total parameters, 70B active per forward pass. Ranks near the top of global usage leaderboards. Exceptional versatility across reasoning, coding and chat. Free and open-source. Apache 2.0.

chatcodereasoningpowerquality
#10

Qwen 3.5 (27B)

27B · 32GB min · Q4_K_M · 17GB

Dense 27B powerhouse. Hybrid thinking/non-thinking mode. Strong multilingual (29+ languages). 256K context window. Excellent instruction-following and math. Apache 2.0.

chatcodereasoningpowergeneral
#11

Qwen 3.5 MoE (35B/3B active)

35B (3B active) · 24GB min · Q4_K_M · 20GB

MoE gem — only 3B params active at inference. 19x faster than Qwen3-Max at 256K context. Best quality-per-watt of the series. Hybrid thinking mode. Runs on Mac Studio 32GB. Agentic coding standout.

chatcodereasoningpowerspeed
#12

Llama-3.3-Nemotron-Super (49B)

49B · 40GB min · Q4_K_M · 30GB

NVIDIA's super-efficient 49B distilled from DeepSeek-R1 + Llama. Outperforms Llama-3.3-70B at half the compute. Strong reasoning, coding & instruction following. Runs on Mac Studio 64GB. NVIDIA Open Model License.

chatreasoningcodepowerquality
#13

GLM 4.5 Air (MoE)

106B (14B active, MoE) · 16GB min · Q4_K_M · 9GB

Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.

chatcodepowerqualitygeneral
#14

MiroThinker v1.5 (30B MoE)

30B (3B active, MoE) · 48GB min · Q4_K_M · 18GB

⚠️ Despite the small active count, this is a full 30B MoE model (Qwen3-30B-A3B base). ~82 GB full weights (Q4_K_M ≈18 GB). Deep-research agent with 256K context, tool calls, multilingual (EN/ZH). Requires H100 80 GB or serious multi-GPU. Not suitable for M1/M2 or consumer GPUs. Apache 2.0.

reasoningcodepowerquality
#15

MiroThinker 1.7 (30B MoE)

30B (3B active, MoE) · 48GB min · Q4_K_M · 18GB

MiroMind AI second-gen deep-research agent. 30B MoE with stronger tool-use, 256K context, SOTA on BrowseComp-ZH (Chinese research). Designed for agentic workflows, not casual chat. Released March 2026. Apache 2.0.

reasoningcodepowerquality
#16

MiroThinker 1.7 Mini (30B MoE)

30B (3B active, MoE) · 48GB min · Q4_K_M · 18GB

⚠️ Despite the "Mini" name, this is a full 30B MoE model (Qwen3-30B-A3B). 3B = active params per forward pass, NOT model size. ~82 GB full weights. Requires H100 80GB or multi-GPU. 256K context, multilingual (EN/ZH+), deep-research agent with tool calls. Released 11 Mar 2026. Apache 2.0.

reasoningcodepowerquality
#17

Qwen 3 VL (32B)

32B · 32GB min · Q4_K_M · 19GB

Qwen 3 VL flagship open vision model. Competes with GPT-4o on MMMU, chart-QA and document reasoning. Native video understanding up to 1 hour. Apache 2.0.

visionchatmultimodalpowerquality
#18

ZAYA1-8B

8.4B (760M active, MoE) · 24GB min · BF16 (Zyphra fork) · 17GB

Zyphra's Apache-2.0 reasoning MoE: 8.4B total parameters with only ~760M active, 16 experts, 131K context, Compressed Convolutional Attention and strong math/code benchmarks. Experimental for local use today: currently needs Zyphra vLLM/Transformers forks; LM Studio/GGUF/MLX support is not yet verified.

chatcodereasoningmathexperimental

How to choose at 64GB