F5-TTS v1.1
Iterative upgrade over the original F5-TTS. Faster convergence via improved flow-matching schedule, better Chinese prosody, cross-lingual cloning. Now with streaming inference and improved CFM sampler.
Style-based TTS with high naturalness and style diffusion. Academic research model with excellent quality.
StyleTTS 2 is a local speech model from Y.L. Ma et al.. It is best suited for controllable, cloning workflows. Check the license before commercial use.
gpu
pytorch
Style transfer
medium
MIT
2024-01
pip install styletts2controllablecloning
Iterative upgrade over the original F5-TTS. Faster convergence via improved flow-matching schedule, better Chinese prosody, cross-lingual cloning. Now with streaming inference and improved CFM sampler.
First super-realistic TTS LLM that runs in real-time on CPU. 748M params, LLaMA 3.2 backbone + NeuCodec audio tokenizer. GGUF-native - perfect for on-device agents and offline apps. Instant 3s voice cloning.
Flow-matching based TTS with SOTA quality and extremely fast inference. Simple and efficient architecture.
Fully non-autoregressive TTS - no text-phone alignment needed. Achieves human parity on naturalness and similarity metrics. Incredibly fast inference.
Industrial-grade multilingual TTS with streaming, voice cloning and emotion control. Exceptional Chinese + English quality. Used in production at Alibaba scale.
High-quality multilingual TTS with extremely natural voice cloning. Best for Chinese and English with fast inference.