Local TTS model

WavTTS

Research-grade zero-shot TTS that generates speech directly in raw waveform space instead of mel spectrograms, codec tokens or VAE latents. High-fidelity EN/ZH voice cloning direction, but the official 16 kHz checkpoint is large and best for CUDA GPU setups.

GPU recommended text-to-speech generation 2 languages CC-BY-NC 4.0 weights / MIT code
Quality
9.1/10
Speed
5.2/10
Model size
10.8 GB
Voices
Zero-shot EN/ZH voice cloning from reference audio

Can WavTTS run locally?

WavTTS can generate speech locally for private voice workflows. Start with git clone https://github.com/cwx-worst-one/WavTTS && pip install -e ..

CC-BY-NC 4.0 weights / MIT code license. Review upstream restrictions before commercial use.

cloningmultilingualcontrollable

Audio profile

Quality
9.1
Speed
5.2
Local
7.2

Best fit

WavTTS is best for local voice cloning and expressive speech generation.

Hardware: gpu

Model details

Type
Local TTS model
Family
wavtts
Latency
high
Formats
pytorch
Languages
en, zh
Context
Official 16 kHz checkpoint, direct raw waveform generation with flow matching + DiT

Install locally

01
Check runtimeConfirm the backend supports pytorch on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
git clone https://github.com/cwx-worst-one/WavTTS && pip install -e .

Good for

  • text-to-speech generation
  • GPU recommended local workflows
  • cloning, multilingual, controllable

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw