Local TTS model

Moshi

Q: Can Moshi run locally?

Moshi is listed by LocalClaw as a local TTS option. Hardware fit depends on runtime, model size and backend support.

Full-duplex spoken dialogue model - listens and speaks simultaneously with ~160 ms latency. Not just a TTS but a real-time conversational speech model. Runs on a single L4 GPU or Mac M3 Pro.

GPU recommended text-to-speech generation 1 languages CC-BY-4.0

Compare TTS models Open source page

Quality

9/10

Speed

9.5/10

Model size

7.5 GB

Voices

Full-duplex conversational

Can Moshi run locally?

Moshi can generate speech locally for private voice workflows. Start with pip install moshi.

CC-BY-4.0 license. Review upstream restrictions before commercial use.

pip install moshi Upstream source

dialoguestreamingrealtimelow-latencyemotion

Audio profile

Quality

Speed

9.5

Local

8.7

Best fit

Moshi is best for fast on-device voice responses and local assistants.

Hardware: gpuapple

Model details

Type

Local TTS model

Family

moshi

Latency

ultra-low

Formats

pytorchsafetensorsmlx

Languages

Context

Listens + speaks simultaneously, 160 ms latency

Install locally

Check runtimeConfirm the backend supports pytorch, safetensors, mlx on your machine.

Install modelUse the upstream command or repository instructions.

Test locallyRun a short private audio prompt before moving into production workflows.

pip install moshi

Good for

text-to-speech generation
GPU recommended local workflows
dialogue, streaming, realtime

Watch before shipping

Validate pronunciation, latency and artifacts with your own voice samples.
Review the upstream license and acceptable-use notes.
Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw