Mistral has introduced Voxtral TTS, a new open-source text-to-speech model designed for enterprise voice applications, positioning the company in direct competition with ElevenLabs, Deepgram, and OpenAI. The model supports nine languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic, and is built for deployment across edge devices such as smartphones, laptops, and wearables.
Pierre Stock, Vice President of Science Operations at Mistral, indicated the model was developed in response to enterprise demand for efficient, high-performance speech systems. Voxtral TTS enables rapid voice customization using minimal audio input while preserving accents, tone, and speech nuances, and can switch between languages without losing voice consistency.
Built for real-time performance, the model delivers low latency and fast audio generation. The launch builds on Mistral’s broader strategy to develop a full multimodal AI platform spanning audio, text, and image processing.