Generating Videos from Text

Overview

This document describes how to use the "Text to Video" feature to generate videos from text using an existing Digital Human head.

The "Text to Video" feature allows you to create video content by providing text and selecting a voice. The system will then generate a video of the Digital Human speaking the provided text with the chosen voice.

Pre-requisites

Valid API Access: You must have a valid UNITH API key and appropriate permissions.
Existing Head ID: You need the unique identifier (Head ID) of the Digital Human you want to use for video generation. You can obtain this from your UNITH interface dashboard.

warning_amber

The "view modal" from the interFace dashboard is available for the "doc qa" and "oc" operation modes only. Regardless of the operation mode, the URL for accessing a Digital Human follows this structure: chat.unith.ai/orgID/headID?api_key. Therefore, even if a Digital Human is created with the "ttt" operation mode, its Head ID can be found within this URL.

info

Please refer to Created Your First Digital Human prior generating a video of talking Digital Human.

Process

The process involves using two API endpoints:

/head/text-to-video: To generate the video from text.
/head/talks/{id}: To retrieve the generated video.

Generate a Video

All Digital Humans, regardless of their operation mode, can be used used to generate a video (in mp4 format) by providing text and voice as the input.

To generate an mp4 video with a Digital Human speaking, the following request is needed against the POST head/text-to-video endpoint:

1. Generate Video from Text

Endpoint: /head/text-to-video
Method: POST
Description: Generates a video of the Digital Human speaking the provided text.
Request Headers:
- Accept: application/json
- Authorization: Bearer <yourBearerToken> (Replace <yourBearerToken> with your actual bearer token)
- Content-Type: application/json
Request body

json

{
  "id": "yourHeadID",        //  (Required) The ID of the Digital Human head.
  "text": "Hello World",      //  (Required) The text the Digital Human will speak.
}

curl -X 'POST' \
  'https://platform-api.unith.ai/head/text-to-video' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <uniqueToken>' \
  -H 'Content-Type: application/json' \
  -d '{
  "id": "<exampleheadid>",
  "text": "Hello World"
}'

Error Handling:
- The API will return standard HTTP error codes for invalid requests
- 400 Bad Request: Indicates an issue with the request, such as an invalid Head ID or voice name.
- 401 Unauthorized: Indicates an invalid or expired bearer token.

2. Retrieve Generated Video

Once the above request is made, you need to fetch the videos from from the GET /head/talks/{id} endpoint.

Endpoint: /head/talks/{id}
- Where {id} is the Head ID of the Digital Human.
- Method: GET
- Description: Retrieves a list of videos generated for a specific Digital Human head.
Request Headers:
- Accept: application/json
- Authorization: Bearer <yourBearerToken> (Replace <yourBearerToken> with your actual bearer token)
URL Parameters:
- order (string, optional): The order in which to return videos. Use "ASC" for ascending or "DESC" for descending.
- page (integer, optional): The page number of the results to return.
- take (integer, optional): The number of videos to return per page.
Curl example:

curl

curl -X 'GET' \
  'https://platform-api.unith.ai/head/talks/exampleheadid?order=DESC&page=1&take=10' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer string'

Replace "yourHeadID" with your actual Head ID. You can also modify the order, page, and take parameters as needed.

Response:

json

{
  "data": [
    {
      "id": "videoId",
      "createdAt": "2025-05-15T07:00:19.827Z",
      "updatedAt": "2025-05-15T07:00:23.025Z",
      "voice": "voiceName",
      "url": "videoURL",
      "text": "Hello"
    },
    {
      "id": "videoId2",
      "createdAt": "2025-05-15T07:00:53.834Z",
      "updatedAt": "2025-05-15T07:00:58.532Z",
      "voice": "voiceName",
      "url": "videoUrl2",
      "text": "World"
    }
  ],
  "meta": {
    "page": "1",
    "take": "10",
    "itemCount": 2,
    "pageCount": 1,
    "hasPreviousPage": false,
    "hasNextPage": false
  }
}

Creating a playback Digital Human

Digital Humans deployed with operationMode:"ttt" as defined in Operation Mode Parameters, has the added benefit of giving you a Digital Human UI with a Digital Human ready to repeat any text you pass it.

To create a Text-To-Video Digital Human, follow the instructions in Create a Digital Human with "operationMode": "ttt", as shown below

json

{
  "headVisualId": "yourHeadviualId",
  "name": "avatarepeat",
  "alias": "Repeater",
  "languageSpeechRecognition": "en-US",
  "langCode": "en-US",
  "ttsProvider": "audiostack",
  "operationMode": "ttt",
  "ocProvider": "playground",
  "ttsVoice": "coco"
}

Response

json

{
  "id": "<generatedID>",
  "publicURL": "https://chat.unith.ai/<generatedPath>"
}

scheduleLast updated Oct 8, 2025