| Title: |
Voxtral TTS |
| Authors: |
Mistral-AI; Liu, Alexander H.; Tacnet, Alexis; Ehrenberg, Andy; Lo, Andy; Sun, Chen-Yo; Lample, Guillaume; Lagarde, Henry; Delignon, Jean-Malo; Kim, Jaeyoung; Harvill, John; Chandu, Khyathi Raghavi; Signoretti, Lorenzo; Jennings, Margaret; von Platen, Patrick; Muddireddy, Pavankumar Reddy; Arora, Rohin; Gandhi, Sanchit; Humeau, Samuel; Ghosh, Soham; Mishra, Srijan; Phung, Van; Bounhar, Abdelaziz; Rastogi, Abhinav; Sadé, Adrien; Jeffares, Alan; Jiang, Albert; Cahill, Alexandre; Gavaudan, Alexandre; Sablayrolles, Alexandre; Héliou, Amélie; You, Amos; Bai, Andrew; Zhao, Andrew; Lenglemetz, Angele; Agarwal, Anmol; Eliseev, Anton; Calvi, Antonia; Majumdar, Arjun; Fournier, Arthur; Joosen, Artjom; Sooriyarachchi, Avi; Utkur, Aysenur Karaduman; Bout, Baptiste; Rozière, Baptiste; De Monicault, Baudouin; Tibi, Benjamin; Yang, Bowen; Cronjäger, Charlotte; Lanfranchi, Clémence; Chen, Connor; Barreau, Corentin; Sautier, Corentin; Courtot, Cyprien; Dabert, Darius; Casas, Diego de las; Demyanenko, Elizaveta; Chane-Sane, Elliot; Gottlob, Emmanuel; Paquin, Enguerrand; Goffinet, Etienne; Niel, Fabien; Ahmed, Faruk; Baldassarre, Federico; Berrada, Gabrielle; Ecrepont, Gaëtan; Guinet, Gauthier; Hayes, Genevieve; Novikov, Georgii; Pistilli, Giada; Kunsch, Guillaume; Martin, Guillaume; Raille, Guillaume; Dhanuka, Gunjan; Gupta, Gunshi; Zhou, Han; Shah, Harshil; McGovern, Hope; Thimonier, Hugo; Mukherjee, Indraneel; Zhang, Irene; Sun, Jacques; Ludziejewski, Jan; Rute, Jason; Dentan, Jérémie; Studnia, Joachim; Amar, Jonas; Delas, Joséphine; Roberts, Josselin Somerville; Tauran, Julien; Yadav, Karmesh; Khandelwal, Kartik; Tep, Kilian; Jain, Kush; Aitchison, Laurence; Fainsin, Laurent; Blier, Léonard; Zhao, Lingxiao; Martin, Louis; Saulnier, Lucile; Gao, Luyu; Buyl, Maarten; Sharma, Manan; Pellat, Marie; Prins, Mark; Alexandre, Martin; Poirée, Mathieu; Schmitt, Mathieu; Guillaumin, Mathilde; Dinot, Matthieu; Futeral, Matthieu; Darrin, Maxime; Augustin, Maximilian; Unsal, Mert; Chiquier, Mia; Biriuchinskii, Mikhail; Pham, Minh-Quang; Lica, Mircea; Rivière, Morgane; Grinsztajn, Nathan; Gupta, Neha; Bousquet, Olivier; Duchenne, Olivier; Wang, Patricia; Jacob, Paul; Wambergue, Paul; Kurylowicz, Paula; Pinel, Philippe; Chagniot, Philomène; Stock, Pierre; Miłoś, Piotr; Gupta, Prateek; Agrawal, Pravesh; Torroba, Quentin; Ramrakhya, Ram; Isenhour, Randall; Shah, Rishi; Sauvestre, Romain; Soletskyi, Roman; Millner, Rosalie; Menneer, Rupert; Vaze, Sagar; Barry, Samuel; Belkadi, Samuel; Subramanian, Sandeep; Cha, Sean; Verma, Shashwat; Waghjale, Siddhant; Gandhi, Siddharth; Lepage, Simon; Aithal, Sumukh; Antoniak, Szymon; Vangani, Tarun Kumar; Scao, Teven Le; Cachet, Théo; Sorg, Theo Simon; Lavril, Thibaut; Chabal, Thomas; Foubert, Thomas; Robert, Thomas; Wang, Thomas; Lawson, Tim; Bewley, Tom; Edwards, Tom; Wang, Tyler; Jamil, Umar; Tomasini, Umberto; Nemychnikova, Valeriia; Nanda, Vedant; Jouault, Victor; Maladière, Vincent; Pfister, Vincent; Richard, Virgile; Bataev, Vladislav; Bouaziz, Wassim; Li, Wen-Ding; Havard, William; Marshall, William; Li, Xinghui; Guo, Xingran; Yang, Xinyu; Neuhaus, Yannic; Ouahidi, Yassine El; Bendou, Yassir; Wang, Yihan; Pan, Yimu; Ramzi, Zaccharie; Xu, Zhenlin |
| Publication Year: |
2026 |
| Collection: |
ArXiv.org (Cornell University Library) |
| Subject Terms: |
Artificial Intelligence |
| Description: |
We introduce Voxtral TTS, an expressive multilingual text-to-speech model that generates natural speech from as little as 3 seconds of reference audio. Voxtral TTS adopts a hybrid architecture that combines auto-regressive generation of semantic speech tokens with flow-matching for acoustic tokens. These tokens are encoded and decoded with Voxtral Codec, a speech tokenizer trained from scratch with a hybrid VQ-FSQ quantization scheme. In human evaluations conducted by native speakers, Voxtral TTS is preferred for multilingual voice cloning due to its naturalness and expressivity, achieving a 68.4\% win rate over ElevenLabs Flash v2.5. We release the model weights under a CC BY-NC license. |
| Document Type: |
text |
| Language: |
unknown |
| Relation: |
http://arxiv.org/abs/2603.25551 |
| Availability: |
http://arxiv.org/abs/2603.25551 |
| Accession Number: |
edsbas.D413CB87 |
| Database: |
BASE |