Local LLM model page
TinyLlama (1.1B)
Compact 1.1B trained on 3T tokens. Great for ultra-low resource environments. 3M downloads.
Parameters
1.1B
Minimum RAM
4 GB
Model size
0.6 GB
Quantization
Q5_K_M
Can TinyLlama (1.1B) run locally?
TinyLlama (1.1B) is best suited for entry-level laptops and desktops. LocalClaw recommends Q5_K_M as the default quantization, with at least 4 GB RAM.
Search term for LM Studio or compatible runtimes: tinyllama-1.1b-chat
Hugging Face repository: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF
chatlightspeed
Strengths
- Ultra-compact at 0.6GB
- 3T tokens training — extremely well-trained for size
- Apache 2.0
- Runs on anything
Limitations
- Very limited capability
- Only 2K context
- English-only
- Struggles with anything complex
Best use cases
- IoT and edge devices
- Experimentation
- Chatbot prototyping
- Learning
Benchmarks
Speed: 10/10
Quality: 3/10
Coding: 2/10
Reasoning: 2/10
Technical details
Developer: Zhang Peiyuan
License: Apache 2.0
Context window: 2,048 tokens
Architecture: Transformer (same as Llama 2 at smaller scale)
Released: 2024-01