Local TTS model
Sesame CSM
Conversational Speech Model - generates speech with natural turn-taking, backchannels and interruptions. Built specifically for multi-turn dialogue with real-time response generation.
GPU recommended
text-to-speech generation
1 languages
Apache 2.0
Quality
9.5/10
Speed
7.5/10
Model size
3.5 GB
Voices
Built-in conversational voices
Can Sesame CSM run locally?
Sesame CSM can generate speech locally for private voice workflows. Start with pip install sesame-csm.
Apache 2.0 license. Still verify upstream usage notes before shipping.
pip install sesame-csm
Upstream source
dialoguestreamingrealtimeemotion
Audio profile
Best fit
Sesame CSM is best for fast on-device voice responses and local assistants.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
sesame
Latency
low
Formats
pytorchsafetensors
Languages
en
Context
Turn-taking, backchannels, interruptions
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install sesame-csm
Good for
- text-to-speech generation
- GPU recommended local workflows
- dialogue, streaming, realtime
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Kyutai
Moshi
Local TTS model · Q 9 · Speed 9.5
Alibaba Cloud (Qwen Team)
Qwen3 TTS
Local TTS model · Q 9.5 · Speed 8.5
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
Boson AI
Higgs Audio v2
Local TTS model · Q 9.7 · Speed 7
StepFun
Step-Audio 2 Mini
Local TTS model · Q 9.3 · Speed 7.5
Nari Labs
Dia
Local TTS model · Q 9.3 · Speed 7