Local TTS model
WavTTS
Research-grade zero-shot TTS that generates speech directly in raw waveform space instead of mel spectrograms, codec tokens or VAE latents. High-fidelity EN/ZH voice cloning direction, but the official 16 kHz checkpoint is large and best for CUDA GPU setups.
GPU recommended
text-to-speech generation
2 languages
CC-BY-NC 4.0 weights / MIT code
Quality
9.1/10
Speed
5.2/10
Model size
10.8 GB
Voices
Zero-shot EN/ZH voice cloning from reference audio
Can WavTTS run locally?
WavTTS can generate speech locally for private voice workflows. Start with git clone https://github.com/cwx-worst-one/WavTTS && pip install -e ..
CC-BY-NC 4.0 weights / MIT code license. Review upstream restrictions before commercial use.
git clone https://github.com/cwx-worst-one/WavTTS && pip install -e .
Upstream source
cloningmultilingualcontrollable
Audio profile
Best fit
WavTTS is best for local voice cloning and expressive speech generation.
Hardware: gpu
Model details
Type
Local TTS model
Family
wavtts
Latency
high
Formats
pytorch
Languages
en, zh
Context
Official 16 kHz checkpoint, direct raw waveform generation with flow matching + DiT
Install locally
01
Check runtimeConfirm the backend supports pytorch on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.git clone https://github.com/cwx-worst-one/WavTTS && pip install -e .
Good for
- text-to-speech generation
- GPU recommended local workflows
- cloning, multilingual, controllable
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
MyShell
OpenVoice V2
Local TTS model · Q 8.9 · Speed 9
Speech Research (SWivid)
F5-TTS v1.1
Local TTS model · Q 9.5 · Speed 9.2
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8
MYShell
MeloTTS
Local TTS model · Q 9 · Speed 9
Bilibili
IndexTTS 2
Local TTS model · Q 9.4 · Speed 8
OpenMOSS / MOSI.AI
MOSS-TTS-Nano
Local TTS model · Q 8.5 · Speed 9.7