Text-to-Speech Open Weights
Speech Arena Leaderboard
Leaderboard text-to-speech APIs compared below using third-party data from Artificial Analysis leaderboard rankings (as of April 2026), production reliability metrics, and deployment flexibility, covering latency benchmarks, language coverage, and integration options.
Inworld AI TTS-1.5 Max ranks #1 with an ELO score of 1,208 based on thousands of blind user preference comparisons, with sub-250ms P90 latency.
| Range | Creator | Model | ELO | API Pricing |
|---|---|---|---|---|
| #1 | Fish Audio | Fish Audio S2 Pro | 1,124 | $15 /1M chars |
| #2 | StepFun | Step Audio EditX | 1,098 | N/A |
| #3 | NVIDIA | Magpie-Multilingual 357M | 1,063 | N/A |
| #4 | Kokoro | Kokoro 82M v1.0 | 1,055 | $0.7 /1M chars |
| #5 | Mistral | Voxtral TTS | 1,053 | $16 /1M chars |
| #6 | Maya Research | Maya1 | 1,049 | N/A |
| #7 | Fish Audio | Fish Audio 1.5 | 1,012 | $15 /1M chars |
| #8 | Resemble AI | Chatterbox | 1,006 | $25 /1M chars |
| #9 | Zyphra | Zonos-v0.1 | 1,000 | $20 /1M chars |
| #10 | Microsoft | VibeVoice 7B | 957 | N/A |
| #11 | OpenVoice | OpenVoice v2 | 948 | $8.3 /1M chars |
| #12 | Coqui | XTTS v2 | 885 | $40.4 /1M chars |
| #13 | StyleTTS | StyleTTS 2 | 877 | $2.8 /1M chars |
| #14 | MetaVoice | MetaVoice v1 | 764 | N/A |
Sub-200ms latency is now achievable through modern neural architectures, and zero-shot voice cloning from 3-15 seconds of audio has become a standard feature set rather than premium.
Top 27 TTS AI Models by ELO
4. Kokoro 82M
Quick Overview
Kokoro is the open-source option. At 82 million parameters, it runs on mid-tier CPUs without a GPU and scores ELO 1,060 on Artificial Analysis (#16, ahead of OpenAI’s TTS-1 HD). The tradeoff is that you host and maintain it yourself, there’s no managed API, and the language and voice selection is limited. Good for prototyping or cost-constrained teams with DevOps capacity.
Best For
Budget-constrained teams comfortable with self-hosting who want decent quality at minimal cost, or developers who need full control over the model for custom fine-tuning and edge deployment.
Pros
- Open-source under Apache 2.0 license
- ~$0.70/1M characters (self-hosted compute cost), making it the cheapest option by far
- 82M parameters runs on mid-tier CPUs with no GPU requirement
- Outranks OpenAI TTS-1 HD on Artificial Analysis despite being 100x+ cheaper
Cons
- Self-hosted only with no managed API or enterprise support
- 6 languages currently (English, French, Korean, Japanese, Mandarin, British English)
- Lower overall quality than commercial options in the top 10
Pricing
~$0.70/1M characters based on self-hosted compute costs. No subscription or API fees.