Voice Connectors
This documentation provides an overview of our Text-to-Speech (TTS) service and how you can manage TTS voices for your Digital Humans using our Voice Connectors.
Please check our documentation on Voice connectors that we support.
Need a different voice provider? You have full flexibility to create custom voice connectors. Please check out the following repository.
Text-to-Speech (TTS) Overview
Text-to-Speech (TTS) is a service that converts written text into synthesized human-like speech. Your Digital Human must have a TTS voice configured to function correctly. Our platform's Voice Connectors are designed to integrate with a variety of external TTS providers, offering you a wide selection of voice options. We are continuously adding support for new providers.
If you're interested in creating a custom Voice Connector, please use the following template as a starting point. Once you've completed your connector, please get in touch with us for final review and approval.
Supported TTS Providers
The platform relies on a number of third-party TTS providers.
Currently supported providers include:
- Elevenlabs
- Microsoft azure
Each provider has its own unique set of usable voices, and it is not possible to cross-match voices with different providers. When creating or updating a Digital Human head, you need to configure both the ttsProvider and ttsVoice parameters. These values must correspond to a valid provider name and a specific voice ID from that provider. Use the /voice/all endpoint to obtain the corresponding provider and voiceId values.
Regional Voice Compatibility
Voices available in one region may not be present in others. This can cause a Digital Human created in one region (e.g., the US) to not be fully functional in another (e.g., the EU or Australia).
To ensure that the same Digital Human is fully operational in any region, you can use our common voices file. The provided JSON file contains a list of voices that are guaranteed to be available and fully supported across all regions. We recommend using these voices for maximum compatibility and reliability.
Voice Management Endpoints
You can retrieve the list of voices and their corresponding parameters using the following API endpoints.
1. Retrieving the Full List of Voices
Use the /voice/all endpoint to obtain the complete list of supported voices, including their respective providers and IDs.
Endpoint: /voice/all
Method: GET
curl -X 'GET' \
'https://platform-api.unith.ai/voice/all?provider=azure' \
-H 'accept: application/json' \
-H 'Authorization: Bearer yourBearerToken'The response data contains the full configuration for each voice, which includes the following controllable parameters:
| Parameter | Description | Value |
|---|---|---|
| ttsProvider | The specific TTS provider name. | elevenlabs | azure | audiostack |
| ttsVoice | The unique voice identifier (Voice ID). | N/A |
| language | The human language name. | N/A |
| languageCode | The standard language code (e.g., en-US). | N/A |
| accent | The voice's regional accent. | N/A |
| gender | The intended gender characteristic of the voice. | MALE | FEMALE | NEUTRAL | UNKNOWN | CHARACTER |
2. Previewing a Voice
You can instantly preview the audio output for any given voice using the preview endpoint.
Endpoint: /voice/preview Method: GET
curl -X 'GET' \
'[https://platform-api.unith.ai/voice/preview?voiceId=yourVoiceId&provider=azure&text=hello%20world](https://platform-api.unith.ai/voice/preview?voiceId=yourVoiceId&provider=azure&text=hello%20world)' \
-H 'accept: application/json' \
-H 'Authorization: Bearer yourBearerToken'| Parameter | Description |
|---|---|
| voiceId | The unique ID of the voice to preview. |
| provider | The corresponding TTS provider name. |
| text | The text to be converted to speech. |
For convenience, you can also select and preview voices directly within the platform's user interface (frontend) when creating or editing a Digital Human head.