Local TTS model

Tortoise TTS

Quality-focused TTS with impressive voice cloning. Slower but produces very natural speech.

Quality
9.1/10
Speed
3/10
Size
4.5GB
Languages
1+

Quick answer

Tortoise TTS is a local speech model from James Betker. It is best suited for cloning workflows. Check the license before commercial use.

Model details

Hardware

gpu

Formats

pytorch

Voices

Unlimited cloning

Latency

high

License

Apache 2.0

Release

2022-05

Install command

pip install tortoise-tts

Features

cloning

Languages: en

Context: High quality mode

Related TTS models

Speech Research (SWivid)

F5-TTS v1.1

Quality 9.5 · Speed 9.2 · 1.6GB · MIT

Iterative upgrade over the original F5-TTS. Faster convergence via improved flow-matching schedule, better Chinese prosody, cross-lingual cloning. Now with streaming inference and improved CFM sampler.

realtimecloningstreamingmultilingual
Neuphonic

NeuTTS Air

Quality 9 · Speed 9.5 · 0.75GB · Apache 2.0

First super-realistic TTS LLM that runs in real-time on CPU. 748M params, LLaMA 3.2 backbone + NeuCodec audio tokenizer. GGUF-native - perfect for on-device agents and offline apps. Instant 3s voice cloning.

cloningrealtimestreaminglow-latency
Speech Research

F5-TTS

Quality 9.4 · Speed 9 · 1.5GB · MIT

Flow-matching based TTS with SOTA quality and extremely fast inference. Simple and efficient architecture.

realtimecloningstreaming
Amphion Team

MaskGCT

Quality 9.4 · Speed 9 · 2.8GB · MIT

Fully non-autoregressive TTS - no text-phone alignment needed. Achieves human parity on naturalness and similarity metrics. Incredibly fast inference.

cloningrealtimestreaming
Alibaba FunAudioLLM

CosyVoice 2

Quality 9.3 · Speed 8.8 · 2.4GB · Apache 2.0

Industrial-grade multilingual TTS with streaming, voice cloning and emotion control. Exceptional Chinese + English quality. Used in production at Alibaba scale.

streamingrealtimecloningemotionmultilingual
MYShell

MeloTTS

Quality 9 · Speed 9 · 1.5GB · MIT

High-quality multilingual TTS with extremely natural voice cloning. Best for Chinese and English with fast inference.

cloningrealtimemultilingual