Local TTS model
MisoTTS
8B English-first emotive conversational TTS model designed for natural dialogue, voice continuation from prompt audio and private local speech experiments. Excellent quality signal, but heavier than small TTS models and best on CUDA GPUs or larger Apple Silicon setups.
GPU recommended
text-to-speech generation
1 languages
Other / custom
Quality
9.4/10
Speed
5.8/10
Model size
16 GB
Voices
Emotive English dialogue + prompt audio continuation
Can MisoTTS run locally?
MisoTTS can generate speech locally for private voice workflows. Start with git clone https://github.com/MisoLabsAI/MisoTTS.
Other / custom license. Review upstream restrictions before commercial use.
git clone https://github.com/MisoLabsAI/MisoTTS
Upstream source
emotiondialoguecloningcontrollable
Audio profile
Best fit
MisoTTS is best for local voice cloning and expressive speech generation.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
miso
Latency
medium
Formats
pytorchsafetensors
Languages
en
Context
8B text-to-dialogue RVQ Transformer, Mimi audio tokenizer, 2048 max sequence length
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.git clone https://github.com/MisoLabsAI/MisoTTS
Good for
- text-to-speech generation
- GPU recommended local workflows
- emotion, dialogue, cloning
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
Bilibili
IndexTTS 2
Local TTS model · Q 9.4 · Speed 8
Boson AI
Higgs Audio v2
Local TTS model · Q 9.7 · Speed 7
StepFun
Step-Audio 2 Mini
Local TTS model · Q 9.3 · Speed 7.5
Nari Labs
Dia
Local TTS model · Q 9.3 · Speed 7
Kyutai
Moshi
Local TTS model · Q 9 · Speed 9.5
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8