Local LLM model page

TinyLlama (1.1B)

Compact 1.1B trained on 3T tokens. Great for ultra-low resource environments. 3M downloads.

Find the best model for my hardware Browse all 183 LLMs

Parameters

1.1B

Minimum RAM

4 GB

Model size

0.6 GB

Quantization

Q5_K_M

Can TinyLlama (1.1B) run locally?

TinyLlama (1.1B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 4 GB RAM.

Search term for LM Studio or compatible runtimes: tinyllama-1.1b-chat

Hugging Face repository: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF

chatlightspeed

Strengths

Ultra-compact at 0.6GB
3T tokens training — extremely well-trained for size
Apache 2.0
Runs on anything

Limitations

Very limited capability
Only 2K context
English-only
Struggles with anything complex

Best use cases

IoT and edge devices
Experimentation
Chatbot prototyping
Learning

Benchmarks

Speed: 10/10

Quality: 3/10

Coding: 2/10

Reasoning: 2/10

Technical details

Developer: Zhang Peiyuan

License: Apache 2.0

Context window: 2,048 tokens

Architecture: Transformer (same as Llama 2 at smaller scale)

Released: 2024-01

Similar models

llama3.2-1b gemma3-1b smollm2-1.7b