Local LLM model page

TinyLlama (1.1B)

Compact 1.1B trained on 3T tokens. Great for ultra-low resource environments. 3M downloads.

Parameters
1.1B
Minimum RAM
4 GB
Model size
0.6 GB
Quantization
Q5_K_M

Can TinyLlama (1.1B) run locally?

TinyLlama (1.1B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 4 GB RAM.

Search term for LM Studio or compatible runtimes: tinyllama-1.1b-chat

Hugging Face repository: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF

chatlightspeed

Strengths

  • Ultra-compact at 0.6GB
  • 3T tokens training — extremely well-trained for size
  • Apache 2.0
  • Runs on anything

Limitations

  • Very limited capability
  • Only 2K context
  • English-only
  • Struggles with anything complex

Best use cases

  • IoT and edge devices
  • Experimentation
  • Chatbot prototyping
  • Learning

Benchmarks

Speed: 10/10

Quality: 3/10

Coding: 2/10

Reasoning: 2/10

Technical details

Developer: Zhang Peiyuan

License: Apache 2.0

Context window: 2,048 tokens

Architecture: Transformer (same as Llama 2 at smaller scale)

Released: 2024-01