Generating Videos from Text
Overview
This document describes how to use the "Text to Video" feature to generate videos from text using an existing Digital Human head.
The "Text to Video" feature allows you to create video content by providing text and selecting a voice. The system will then generate a video of the Digital Human speaking the provided text with the chosen voice.
Pre-requisites
- Valid API Access: You must have a valid UNITH API key and appropriate permissions.
- Existing Head ID: You need the unique identifier (Head ID) of the Digital Human you want to use for video generation. You can obtain this from your UNITH interface dashboard.

The "view modal" from the interFace dashboard is available for the "doc qa" and "oc" operation modes only. Regardless of the operation mode, the URL for accessing a Digital Human follows this structure: chat.unith.ai/orgID/headID?api_key. Therefore, even if a Digital Human is created with the "ttt" operation mode, its Head ID can be found within this URL.
Please refer to Created Your First Digital Human prior generating a video of talking Digital Human.
Process
The process involves using two API endpoints:
/head/text-to-video: To generate the video from text./head/talks/{id}: To retrieve the generated video.
Generate a Video
All Digital Humans, regardless of their operation mode, can be used used to generate a video (in mp4 format) by providing text and voice as the input.
To generate an mp4 video with a Digital Human speaking, the following request is needed against the POST head/text-to-video endpoint:
1. Generate Video from Text
- Endpoint: /head/text-to-video
- Method: POST
- Description: Generates a video of the Digital Human speaking the provided text.
- Request Headers:
Accept: application/jsonAuthorization: Bearer <yourBearerToken>(Replace <yourBearerToken> with your actual bearer token)Content-Type: application/json
- Request body
{
"id": "yourHeadID", // (Required) The ID of the Digital Human head.
"text": "Hello World", // (Required) The text the Digital Human will speak.
}curl -X 'POST' \
'https://platform-api.unith.ai/head/text-to-video' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <uniqueToken>' \
-H 'Content-Type: application/json' \
-d '{
"id": "<exampleheadid>",
"text": "Hello World"
}'- Error Handling:
- The API will return standard HTTP error codes for invalid requests
- 400 Bad Request: Indicates an issue with the request, such as an invalid Head ID or voice name.
- 401 Unauthorized: Indicates an invalid or expired bearer token.
2. Retrieve Generated Video
Once the above request is made, you need to fetch the videos from from the GET /head/talks/{id} endpoint.
- Endpoint:
/head/talks/{id}- Where
{id}is the Head ID of the Digital Human. - Method: GET
- Description: Retrieves a list of videos generated for a specific Digital Human head.
- Where
- Request Headers:
Accept: application/jsonAuthorization: Bearer <yourBearerToken>(Replace <yourBearerToken> with your actual bearer token)
- URL Parameters:
order(string, optional): The order in which to return videos. Use "ASC" for ascending or "DESC" for descending.page(integer, optional): The page number of the results to return.take(integer, optional): The number of videos to return per page.
- Curl example:
curl -X 'GET' \
'https://platform-api.unith.ai/head/talks/exampleheadid?order=DESC&page=1&take=10' \
-H 'accept: application/json' \
-H 'Authorization: Bearer string'Replace "yourHeadID" with your actual Head ID. You can also modify the order, page, and take parameters as needed.
Response:
{
"data": [
{
"id": "videoId",
"createdAt": "2025-05-15T07:00:19.827Z",
"updatedAt": "2025-05-15T07:00:23.025Z",
"voice": "voiceName",
"url": "videoURL",
"text": "Hello"
},
{
"id": "videoId2",
"createdAt": "2025-05-15T07:00:53.834Z",
"updatedAt": "2025-05-15T07:00:58.532Z",
"voice": "voiceName",
"url": "videoUrl2",
"text": "World"
}
],
"meta": {
"page": "1",
"take": "10",
"itemCount": 2,
"pageCount": 1,
"hasPreviousPage": false,
"hasNextPage": false
}
}Creating a playback Digital Human
Digital Humans deployed with operationMode:"ttt" as defined in Operation Mode Parameters, has the added benefit of giving you a Digital Human UI with a Digital Human ready to repeat any text you pass it.
To create a Text-To-Video Digital Human, follow the instructions in Create a Digital Human with "operationMode": "ttt", as shown below
{
"headVisualId": "yourHeadviualId",
"name": "avatarepeat",
"alias": "Repeater",
"languageSpeechRecognition": "en-US",
"langCode": "en-US",
"ttsProvider": "audiostack",
"operationMode": "ttt",
"ocProvider": "playground",
"ttsVoice": "coco"
}Response
{
"id": "<generatedID>",
"publicURL": "https://chat.unith.ai/<generatedPath>"
}