Guided Mode
Simple questionnaire. OS, RAM level, use case. We handle the complexity.
Ex: MacBook Air 8 GB → Qwen 3 8B
Stop sending your data to the cloud. Find the perfect open-source model for your hardware.
Clear pricing: the web model recommender is free to use. The optional LocalClaw Installer for macOS is $49 one-time if you want one-click setup, activation, updates and support.
Simple questionnaire. OS, RAM level, use case. We handle the complexity.
Ex: MacBook Air 8 GB → Qwen 3 8B
Direct input. Select RAM, GPU and priorities for instant logic execution.
Ex: 32 GB RAM + RTX 4090 → DeepSeek R1 32B
Paste diagnostics. Auto-detection of OS/RAM/GPU for precision targeting.
Ex: Paste neofetch → auto-detect & match
Trillion-param MoE with 32B active. Matches GPT-4 Turbo on MMLU/HumanEval. Thinking variant tops AIME & SWE-bench.
Details →Sparse attention (DSA) halves long-context inference cost vs V3.1 while keeping quality. 671B MoE, MIT licensed.
Details →Hybrid-gated DeltaNet. Runs at dense 7B speed with 70B quality. 256K native context. Apache 2.0.
Details →Full GLM 4.6 flagship. 200K context, strong tool-calling. Competes with Claude 3.5 Sonnet. MIT.
Details →10B active params, 4M-token context. Built for agentic coding & tool-use. MIT licensed.
Details →Refined instruction following, better function calling, less repetition. 128K context. Apache 2.0.
Details →First super-realistic TTS LLM running real-time on CPU. 748M params, GGUF-native, 3s voice cloning.
Details →Unified speech LLM: ASR + TTS + voice conv + dialogue in one model. Strong paralinguistic control.
Details →Emotion-aware speech LLM. Generate voice, style & personality from a text description — no reference audio.
Details →Streaming ASR, 500ms latency, word timestamps & diarization. Top real-time EN/FR accuracy.
Details →109B/400B MoE, natively multimodal, 10M-token context. Beats Gemma 3 and rivals GPT-4o.
Details →Open vision-language SOTA. Chart-QA, OCR, 1 h video understanding. Apache 2.0.
Details →Hybrid thinking mode. Neutral & steerable. Matches Claude 3.5 Sonnet on reasoning.
Details →Hybrid thinking, 200K context, strong CN/EN. Great alternative to Qwen 3.5 9B.
Details →Open-weight flagship for agents & long-context RAG. 256K context, 23 languages.
Details →SOTA expressive TTS. Natural laughter, whispers, BGM. Beats ElevenLabs on MOS.
Details →8× faster than Whisper Large v3. 99 languages. New gold standard for local STT.
Details →#1 Open ASR Leaderboard. 50× faster than Whisper Large, real-time on GPU.
Details →Listens and speaks simultaneously with 160 ms latency. Real-time voice-to-voice.
Details →Text-to-Speech and Speech-to-Text models that run 100% offline on your hardware. Voice cloning, real-time dialogue, 99-language transcription, audiobooks, accessibility & creative projects.
The browser recommender stays free. Upgrade only if you want the native installer to install, update and manage your local AI stack from one dashboard.
macOS 13 Ventura or later required · Apple Silicon or Intel · 8 GB RAM min.
View PricingLM Studio is a free desktop application that lets you run Large Language Models (LLMs) locally on your computer. No internet needed, no data sent anywhere. It provides a chat interface similar to ChatGPT but everything runs on YOUR hardware.
Quantization is a compression technique that reduces model size while preserving most of the quality. Think of it like JPEG compression for images. Q4 = more compressed (smaller, slightly lower quality), Q8 = less compressed (larger, nearly original quality). Q5_K_M is the sweet spot for most users.
Rule of thumb: the model file size plus 2-3 GB for the system. A 5 GB model needs at least 8 GB RAM. On macOS with Apple Silicon, the unified memory makes things more efficient. On Windows/Linux with a GPU, VRAM helps offload the model.
Apple Silicon (M1-M4) uses unified memory, meaning your entire RAM is available for the model. This is incredibly efficient. NVIDIA GPUs are faster for inference but limited by VRAM (typically 8-24 GB). Both are great choices.
Yes! LocalClaw runs entirely in your browser — zero data is collected or sent anywhere. When using LM Studio with recommended models, everything runs locally on your machine. No cloud, no tracking, no API calls.
For 8 GB RAM: Qwen 3.5 4B or Gemma 4 E4B. For 16 GB: Qwen 3.5 9B, GLM 4.6 Air 12B or Mistral Small 3.2 24B (tight). For 32 GB+: Gemma 4 31B, Qwen 3 Next 80B/3B MoE or Qwen 3 Coder 30B. For reasoning: Kimi K2 Thinking, DeepSeek V3.2 Exp, or Hermes 4 70B. For coding: MiniMax M2 and Qwen 3 Coder. For vision: Qwen 3 VL 32B or Gemma 4 multimodal.
OpenClaw is the open-source, self-hosted AI assistant at the heart of the LocalClaw ecosystem. It connects to your local models running in LM Studio or Ollama and provides a unified chat interface on desktop, web, and CLI. It's 100% private — no telemetry, no cloud, no API keys required.
The LocalClaw web recommender is free: use it to choose the right LLM/TTS model for your hardware. LocalClaw Installer is the optional native macOS app that manages setup — install models, handle updates, switch versions, and launch everything with one click. No terminal needed. The installer is a one-time purchase at $49, no subscription, no recurring fees. Your license is valid forever. See pricing →
Answer a few questions about your hardware and get personalized AI model recommendations — instantly, privately, for free.
Find My Model