Voice Selection Guide [TTS]

Overview

The UNITH interFace supports multiple text-to-speech (TTS) providers to give your Digital Human a natural, engaging voice. You can select voices from ElevenLabs or Microsoft Azure directly through the interface, or integrate custom voice providers using our connector framework.

info

Please check our documentation on Voice connectors that we support.

Need a different voice provider? You have full flexibility to create custom voice connectors. Please check out the following repository.

How Voice Selection Impacts Performance

Digital Human responses require audio generation before video synthesis can begin. The audio generation speed directly affects the overall response time of your Digital Human.

Response Pipeline:

User query processed
Audio generated ← Voice model speed matters here
Video synthesized from audio
Complete response delivered

check_circle

Faster audio generation means quicker responses and a more natural, engaging user experience.

Recommended Voices by Provider

ElevenLabs

ElevenLabs offers a wide variety of voices powered by different models, each optimized for specific use cases. For Digital Human applications, we recommend using voices powered by their speed-optimized models.

Recommended Models

Model	Characteristics	Use Case
`flash_v2`	Fastest generation, balanced quality	Real-time conversations
`flash_v2_5`	Enhanced flash model	Real-time conversations with improved quality
`turbo_v2`	High-speed generation	Low-latency interactions
`turbo_v2_5`	Latest turbo generation	Optimal balance of speed and quality

Best Practice: Select ElevenLabs voices that use flash_v2, flash_v2_5, turbo_v2, or turbo_v2_5 models for the fastest Digital Human response times.

Important Notes

All ElevenLabs models will function correctly with Digital Humans
Non-optimized models may result in longer response delays
Speed-optimized models are specifically designed for real-time conversational applications

info

For a complete list of available voices and their associated models, visit the ElevenLabs Voice Library.

Microsoft Azure

Microsoft Azure offers an extensive voice catalog across multiple performance tiers. For optimal Digital Human performance, we recommend selecting voices from their speed-optimized tiers.

Recommended Voice Types

Select voices that include one of these identifiers in their name:

Voice Type	Identifier	Language Support	Performance
Turbo Multilingual	TurboMultilingual	40+ languages	Fastest generation across multiple languages
HD Flash	HDFlash	English (US), Chinese (Mandarin)	Very fast with high-definition quality

Voices to Avoid

Avoid voices containing HDNeural in their name, as these prioritize audio quality over generation speed and will result in longer response times.

Azure Voice Performance Tiers

The table below provides an overview of Microsoft Azure's voice catalog organized by performance characteristics:

Performance Tier	Language Coverage	Available Voices	Recommended
Turbo	English (US) only	7	✅ Yes - Fastest option
HD Flash	English (US), Mandarin Chinese	10	✅ Yes - Fast with HD quality
Multilingual	40+ languages	52	✅ Yes - Best for multilingual applications
HD Neural	Limited (10-15 languages)	54	⚠️ Not recommended - Slower generation
Standard Neural	150+ locales	500+	⚠️ Mixed performance

check_circle

For multilingual Digital Humans, prioritize voices with TurboMultilingual in their name to maintain fast response times across all supported languages.

info

For the complete Azure voice catalog and detailed specifications, visit the Microsoft Azure TTS Documentation.

Voice Selection Best Practices

Prioritize Speed-Optimized Models: Choose voices specifically designed for low-latency applications
Test Before Deploying: Always test selected voices with your Digital Human to ensure they meet your quality and performance requirements
Consider Your Audience: Balance response speed with voice quality based on your use case
Language Requirements: If you need multilingual support, select voices that cover all required languages while maintaining performance

scheduleLast updated Mar 6, 2026