176 LLMs + 47 TTS — Updated April 25, 2026

Find the right
local LLM_

Stop sending your data to the cloud. Find the perfect open-source model for your hardware.

Clear pricing: the web model recommender is free to use. The optional LocalClaw Installer for macOS is $49 one-time if you want one-click setup, activation, updates and support.

How LocalClaw Works

01 // INIT

Guided Mode

Simple questionnaire. OS, RAM level, use case. We handle the complexity.

Ex: MacBook Air 8 GB → Qwen 3 8B

02 // SPEC

Quick Spec

Direct input. Select RAM, GPU and priorities for instant logic execution.

Ex: 32 GB RAM + RTX 4090 → DeepSeek R1 32B

03 // TERM

Terminal

Paste diagnostics. Auto-detection of OS/RAM/GPU for precision targeting.

Ex: Paste neofetch → auto-detect & match

Database // Models

UPDATED: 2026-04-22 · v3.28
Kimi K2 — 1T MoE (Instruct + Thinking) ⭐ New!
DeepSeek V3.2 Exp — 671B w/ sparse attn ⭐ New!
Qwen 3 Next — 80B/3B MoE flagship ⭐ New!
GLM 4.6 — 355B MoE (full) ⭐ New!
MiniMax M2 — 230B MoE agentic ⭐ New!
Mistral Small 3.2 — 24B instruct ⭐ New!
Nemotron Nano — 9B v2 hybrid ⭐ New!
Ling 1T — InclusionAI trillion MoE ⭐ New!
Llama 4 — Scout 109B, Maverick 400B
Qwen 3 VL — 8B, 32B Vision
Gemma 4 — E2B, E4B, 26B-A4B, 31B
Hermes 4 — 70B, 405B
Qwen 3.5 — 0.8B, 2B, 4B, 9B, 397B-A17B
Command A — 111B (Cohere)
DeepSeek R1 0528 — 671B MoE
+ 158 more… See all →

// Latest Drops

LIVE · Apr 22, 2026
Moonshot · Jul 2025 🔥 Trending

Kimi K2 (1T MoE)

Trillion-param MoE with 32B active. Matches GPT-4 Turbo on MMLU/HumanEval. Thinking variant tops AIME & SWE-bench.

Details →
DeepSeek · Sep 2025 🔥 New

DeepSeek V3.2 Exp

Sparse attention (DSA) halves long-context inference cost vs V3.1 while keeping quality. 671B MoE, MIT licensed.

Details →
Alibaba · Sep 2025 🔥 Sweet spot

Qwen 3 Next (80B/3B MoE)

Hybrid-gated DeltaNet. Runs at dense 7B speed with 70B quality. 256K native context. Apache 2.0.

Details →
Zhipu · Sep 2025 Agentic

GLM 4.6 (355B MoE)

Full GLM 4.6 flagship. 200K context, strong tool-calling. Competes with Claude 3.5 Sonnet. MIT.

Details →
MiniMax · Oct 2025 Coding

MiniMax M2 (230B MoE)

10B active params, 4M-token context. Built for agentic coding & tool-use. MIT licensed.

Details →
Mistral · Jun 2025 Balanced

Mistral Small 3.2 (24B)

Refined instruction following, better function calling, less repetition. 128K context. Apache 2.0.

Details →
Neuphonic · Oct 2025 🔥 CPU TTS

NeuTTS Air

First super-realistic TTS LLM running real-time on CPU. 748M params, GGUF-native, 3s voice cloning.

Details →
StepFun · Aug 2025 Speech LLM

Step-Audio 2 Mini

Unified speech LLM: ASR + TTS + voice conv + dialogue in one model. Strong paralinguistic control.

Details →
Hume AI · Nov 2025 Emotion

OCTAVE 2

Emotion-aware speech LLM. Generate voice, style & personality from a text description — no reference audio.

Details →
Kyutai · Jun 2025 ASR

Kyutai STT 2.6B

Streaming ASR, 500ms latency, word timestamps & diarization. Top real-time EN/FR accuracy.

Details →
Meta · Apr 2025 Flagship

Llama 4 Scout & Maverick

109B/400B MoE, natively multimodal, 10M-token context. Beats Gemma 3 and rivals GPT-4o.

Details →
Alibaba · Dec 2025 Vision

Qwen 3 VL (8B & 32B)

Open vision-language SOTA. Chart-QA, OCR, 1 h video understanding. Apache 2.0.

Details →
Nous · Sep 2025 Reasoning

Hermes 4 (70B / 405B)

Hybrid thinking mode. Neutral & steerable. Matches Claude 3.5 Sonnet on reasoning.

Details →
Zhipu · Feb 2026 Lightweight

GLM 4.6 Air (12B)

Hybrid thinking, 200K context, strong CN/EN. Great alternative to Qwen 3.5 9B.

Details →
Cohere · Mar 2025 Agentic

Command A (111B)

Open-weight flagship for agents & long-context RAG. 256K context, 23 languages.

Details →
Boson AI · Jul 2025 TTS

Higgs Audio v2

SOTA expressive TTS. Natural laughter, whispers, BGM. Beats ElevenLabs on MOS.

Details →
OpenAI · Oct 2024 ASR

Whisper v3 Turbo

8× faster than Whisper Large v3. 99 languages. New gold standard for local STT.

Details →
NVIDIA · May 2025 ASR

Parakeet TDT 0.6B v2

#1 Open ASR Leaderboard. 50× faster than Whisper Large, real-time on GPU.

Details →
Kyutai · Sep 2024 Voice AI

Moshi (Full-duplex)

Listens and speaks simultaneously with 160 ms latency. Real-time voice-to-voice.

Details →

Local TTS Models — NEW!

View all TTS models →

Text-to-Speech and Speech-to-Text models that run 100% offline on your hardware. Voice cloning, real-time dialogue, 99-language transcription, audiobooks, accessibility & creative projects.

NeuTTS Air 🔥 New
First real-time TTS LLM on CPU
Step-Audio 2 🔥 New
Unified speech LLM (ASR+TTS)
OCTAVE 2 🔥 New
Voice from prompt, no reference
Kyutai STT 2.6B 🔥 ASR
500ms streaming, diarization
F5-TTS v1.1 🔥 New
Streaming flow-matching upgrade
Higgs Audio v2
SOTA expressive, beats ElevenLabs
Zonos
1.6B cloning, emotion control
IndexTTS 2
Separate voice + emotion ref
Whisper v3 Turbo ⭐ ASR
8× faster, 99 languages
Parakeet TDT 0.6B ⭐ ASR
#1 Open ASR leaderboard
Qwen3 TTS
30+ languages, streaming
Kokoro 82M
CPU real-time, 54 voices
Moshi
Full-duplex, 160ms latency
LLaSA 3B
LLaMA TTS, 250k hrs training
+ 33 more… View all →
Orpheus, XTTS v3, Dia, Chatterbox…
Real-time Voice Cloning 50+ Languages CPU/GPU/Edge
All-in-one

Optional macOS installer. $49. One-time.

The browser recommender stays free. Upgrade only if you want the native installer to install, update and manage your local AI stack from one dashboard.

macOS 13 Ventura or later required  ·  Apple Silicon or Intel  ·  8 GB RAM min.

View Pricing

Frequently Asked Questions

What is LM Studio?

LM Studio is a free desktop application that lets you run Large Language Models (LLMs) locally on your computer. No internet needed, no data sent anywhere. It provides a chat interface similar to ChatGPT but everything runs on YOUR hardware.

What is quantization (Q4, Q5, Q8)?

Quantization is a compression technique that reduces model size while preserving most of the quality. Think of it like JPEG compression for images. Q4 = more compressed (smaller, slightly lower quality), Q8 = less compressed (larger, nearly original quality). Q5_K_M is the sweet spot for most users.

How much RAM do I need to run a local AI model?

Rule of thumb: the model file size plus 2-3 GB for the system. A 5 GB model needs at least 8 GB RAM. On macOS with Apple Silicon, the unified memory makes things more efficient. On Windows/Linux with a GPU, VRAM helps offload the model.

Apple Silicon vs NVIDIA GPU for local AI?

Apple Silicon (M1-M4) uses unified memory, meaning your entire RAM is available for the model. This is incredibly efficient. NVIDIA GPUs are faster for inference but limited by VRAM (typically 8-24 GB). Both are great choices.

Is my data private when using LocalClaw?

Yes! LocalClaw runs entirely in your browser — zero data is collected or sent anywhere. When using LM Studio with recommended models, everything runs locally on your machine. No cloud, no tracking, no API calls.

What are the best local AI models in 2026?

For 8 GB RAM: Qwen 3.5 4B or Gemma 4 E4B. For 16 GB: Qwen 3.5 9B, GLM 4.6 Air 12B or Mistral Small 3.2 24B (tight). For 32 GB+: Gemma 4 31B, Qwen 3 Next 80B/3B MoE or Qwen 3 Coder 30B. For reasoning: Kimi K2 Thinking, DeepSeek V3.2 Exp, or Hermes 4 70B. For coding: MiniMax M2 and Qwen 3 Coder. For vision: Qwen 3 VL 32B or Gemma 4 multimodal.

What is OpenClaw?

OpenClaw is the open-source, self-hosted AI assistant at the heart of the LocalClaw ecosystem. It connects to your local models running in LM Studio or Ollama and provides a unified chat interface on desktop, web, and CLI. It's 100% private — no telemetry, no cloud, no API keys required.

What is free and what is paid?

The LocalClaw web recommender is free: use it to choose the right LLM/TTS model for your hardware. LocalClaw Installer is the optional native macOS app that manages setup — install models, handle updates, switch versions, and launch everything with one click. No terminal needed. The installer is a one-time purchase at $49, no subscription, no recurring fees. Your license is valid forever. See pricing →

Find your model in 30 seconds

Answer a few questions about your hardware and get personalized AI model recommendations — instantly, privately, for free.

Find My Model