Open-weight local LLM

DANTE-Mosaic-3.5B

OdaxAI compact dense model based on SmolLM3-3B and distilled from Kimi K2. Strong small-model benchmark profile: GSM8K 74.45, HellaSwag 76.73 and MBPP 42.6. Apache 2.0, BF16 weights, practical for local Transformers/vLLM use.

Laptop ready 8 GB RAM BF16 Local chat on 8GB+ machines
Parameters
3.08B
Minimum RAM
8 GB
Model size
6.2 GB
Quantization
BF16

Can DANTE-Mosaic-3.5B run locally?

DANTE-Mosaic-3.5B is a good fit for normal laptops and compact desktops with 8 GB RAM or more.

Search for OdaxAI/DANTE-Mosaic-3.5B in LM Studio or another GGUF-compatible runtime.

chatreasoningcodelightmultilingual

Install path

01
Check RAM fitMinimum 8 GB RAM. Start with the BF16 quant.
02
Load the modelSearch OdaxAI/DANTE-Mosaic-3.5B in LM Studio.
03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

  • Compact 3.08B dense model that can run on modest local hardware
  • Apache 2.0 license with open weights, scripts, configs and evaluation assets
  • Distilled from Kimi K2 while retaining a practical small-model footprint
  • Strong reported small-model results: 74.45 GSM8K, 76.73 HellaSwag and 42.6 MBPP
  • Runs from standard Hugging Face Transformers and can be served locally with vLLM/SGLang-style stacks
  • Good candidate for laptop-friendly reasoning and coding experiments

Limitations

  • No official GGUF quantization in the main repository at listing time
  • BF16 weights are larger than a 3B Q4 GGUF would be
  • Not a frontier model; quality is bounded by small dense-model capacity
  • Context window is not clearly documented in the model card

Best use cases

  • Local chat on 8GB+ machines
  • Small-model reasoning experiments
  • Light coding help and MBPP-style programming tasks
  • Research on knowledge distillation from large MoE teachers
  • Multilingual assistants with a small memory footprint

Capability profile

speed
8
quality
7
coding
6
reasoning
7

Technical notes

Developer
OdaxAI
License
Apache 2.0
Context window
Unknown tokens
Architecture
Dense SmolLM3 causal language model fine-tune with 3.08B parameters. Distilled from Kimi K2 using generative cross-architecture / cross-tokenizer distillation.

Similar models to compare

Where to go next