Kokoro TTS
Ultra-lightweight yet stunning quality. 82M params only - runs on CPU in real-time. Best quality-to-size ratio of any TTS model.
Massively Multilingual Speech by Meta. Supports 1100+ languages, the most comprehensive language coverage available.
MMS (Meta) is a local speech model from Meta AI. It is best suited for multilingual workflows. Check the license before commercial use.
cpugpuedge
pytorch
Per-language
low
CC-BY-NC 4.0
2023-05
pip install transformersmultilingual
Ultra-lightweight yet stunning quality. 82M params only - runs on CPU in real-time. Best quality-to-size ratio of any TTS model.
Production-grade streaming ASR from Kyutai (makers of Moshi). Delay-streaming transformer with 500ms latency, word-level timestamps, speaker diarization. Top of Open ASR Leaderboard for real-time French + English.
Iterative upgrade over the original F5-TTS. Faster convergence via improved flow-matching schedule, better Chinese prosody, cross-lingual cloning. Now with streaming inference and improved CFM sampler.
OpenAI's optimized Whisper v3 with 4 decoder layers instead of 32. 8× faster than Whisper Large v3 with only minor accuracy trade-off. 99 languages supported. New gold standard for fast local transcription.
NVIDIA multilingual ASR + speech translation in a single model. 25 European languages, bidirectional EN↔XX translation. Tops Open ASR Leaderboard multilingual category. Word-level timestamps, punctuation & capitalization.
Industrial-grade multilingual TTS with streaming, voice cloning and emotion control. Exceptional Chinese + English quality. Used in production at Alibaba scale.