Local TTS model

VoxCPM2

Q: Can VoxCPM2 run locally?

VoxCPM2 is listed by LocalClaw as a local TTS option. Hardware fit depends on runtime, model size and backend support.

Tokenizer-free diffusion autoregressive TTS with 2B parameters, 30 languages, 48kHz output, voice design, controllable cloning and real-time streaming. Apache 2.0 and commercial-ready.

GPU recommended text-to-speech generation 30 languages Apache 2.0

Compare TTS models Open source page

Quality

9.4/10

Speed

8.3/10

Model size

4.2 GB

Voices

Voice design + short-clip controllable cloning

Can VoxCPM2 run locally?

VoxCPM2 can generate speech locally for private voice workflows. Start with pip install voxcpm.

Apache 2.0 license. Still verify upstream usage notes before shipping.

pip install voxcpm Upstream source

cloningstreamingrealtimemultilingualcontrollableemotion

Audio profile

Quality

9.4

Speed

8.3

Local

8.8

Best fit

VoxCPM2 is best for local voice cloning and expressive speech generation.

Hardware: gpuapple

Model details

Type

Local TTS model

Family

voxcpm

Latency

low

Formats

pytorchsafetensors

Languages

zh, en, ar, my, da, nl, fi, fr, de, el, he, hi, id, it, ja, km, ko, lo, ms, no, pl, pt, ru, es, sw, sv, tl, th, tr, vi

Context

2B params, 30 languages, 48kHz output, streaming RTF as low as ~0.3 on RTX 4090

Install locally

Check runtimeConfirm the backend supports pytorch, safetensors on your machine.

Install modelUse the upstream command or repository instructions.

Test locallyRun a short private audio prompt before moving into production workflows.

pip install voxcpm

Good for

text-to-speech generation
GPU recommended local workflows
cloning, streaming, realtime

Watch before shipping

Validate pronunciation, latency and artifacts with your own voice samples.
Review the upstream license and acceptable-use notes.
Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw