LFM2.5-8B-A1B
Liquid AI hybrid model built for on-device assistants. 8.3B total / 1.5B active, 128K context, tool use, GGUF, ONNX, MLX, llama.cpp and LM Studio support. Open-weight under LFM 1.0.
Practical local AI picks for Mac mini M4 and M4 Pro machines, focused on unified memory, LM Studio fit and real desktop workflows.
For a Mac mini M4 with 16GB, start with compact 8B-14B class models and keep enough memory headroom for macOS and LM Studio. For M4 Pro 24GB or 48GB, larger 24B-32B class models become more comfortable.
Liquid AI hybrid model built for on-device assistants. 8.3B total / 1.5B active, 128K context, tool use, GGUF, ONNX, MLX, llama.cpp and LM Studio support. Open-weight under LFM 1.0.
IBM Granite 4.1 long-context instruct model. Apache 2.0, 131K context, tool calling, RAG, code tasks, multilingual dialog and business assistant workflows on normal 8-16 GB machines.
Zhipu AI lightweight flagship. Strong bilingual CN/EN with hybrid thinking mode, 200K context and tool calling. Apache 2.0 — excellent alternative to Qwen 3.5 9B on modest GPUs.
Zhipu AI's efficient MoE powerhouse. 106B total parameters, only 14B active at inference — dense-model speed with much larger model quality. Clearly the best in the 16–24GB RAM range. Outperforms Llama 3.3 70B. Apache 2.0.
NVIDIA hybrid Mamba-Transformer 9B. 6x throughput vs comparable dense models, 128K context, strong maths/code. Efficient toggle-able reasoning. NVIDIA Open Model License.
The sweet spot. Incredible reasoning, coding and chat quality. The best model you can run on 16GB.
ServiceNow x NVIDIA mid-size reasoner. Half the memory of 32B reasoners with comparable performance on MBPP, BFCL, GPQA. Strong enterprise fit. MIT licensed.
⭐ Mac Mini M4 16GB top pick! NVIDIA fine-tune of Llama 3.1. Hybrid /think • /no_think mode — deep reasoning on demand, instant chat otherwise. ~80–120 tok/s on Apple Silicon Metal. 128K context. Apache 2.0.
Alibaba's hybrid-thinking micro-flagship. Toggles between instant answers and deep chain-of-thought reasoning on demand. 128K context, 29 languages, outperforms Qwen3-8B on reasoning benchmarks. Apache 2.0.
These guides use LocalClaw's internal model database for scoring, then avoid hard claims beyond public hardware and model availability signals checked before publishing.