Cartesia Sonic is built around one thesis: latency is a feature. Sub-90ms time-to-first-token makes it the only TTS that genuinely feels real-time, which is why voice-agent vendors keep landing on it.
For pre-rendered narration the speed advantage is invisible, but the per-character pricing ($0.0065) is competitive even ignoring the latency story.
Quality is high, just a notch below ElevenLabs/Play.ht on expressive range. Where Cartesia stands out is consistency: variance run-to-run is the lowest in the category.
Verdict: the model to choose for voice agents and real-time UX. 8.7/10.