Local TTS model

Sesame CSM

Conversational Speech Model - generates speech with natural turn-taking, backchannels and interruptions. Built specifically for multi-turn dialogue with real-time response generation.

GPU recommended text-to-speech generation 1 languages Apache 2.0
Quality
9.5/10
Speed
7.5/10
Model size
3.5 GB
Voices
Built-in conversational voices

Can Sesame CSM run locally?

Sesame CSM can generate speech locally for private voice workflows. Start with pip install sesame-csm.

Apache 2.0 license. Still verify upstream usage notes before shipping.

dialoguestreamingrealtimeemotion

Audio profile

Quality
9.5
Speed
7.5
Local
8.7

Best fit

Sesame CSM is best for fast on-device voice responses and local assistants.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
sesame
Latency
low
Formats
pytorchsafetensors
Languages
en
Context
Turn-taking, backchannels, interruptions

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install sesame-csm

Good for

  • text-to-speech generation
  • GPU recommended local workflows
  • dialogue, streaming, realtime

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw