Creating Head Visuals
This document describes how to create head visuals—the personalized visual representations of Digital Humans.
A head visual is essential for deploying a Digital Human. It consists of a video asset that is preprocessed by the UNITH synthesis engine. This processed video becomes the foundation for the head visual, which serves as the face of your Digital Human.
Note: Once a head visual is created, multiple Digital Humans can share a single head visual ID.
Important Considerations:
- Content Policy: UNITH reserves the right to remove any head visual that is deemed offensive, harmful, or inappropriate.
- Video Best Practices: Before you begin, please refer to our separate documentation on the best practices for creating idle videos for Digital Humans. This will ensure optimal results.
- Maximum Video Length: The maximum supported video length for head visual creation is currently 20 seconds.
For more details on video best practices, please refer to our video guidelines.
Process Overview
The process of creating a custom head visual involves the following steps:
- Uploading the source video.
- Creating the head visual resource.
- Saving the head visual.
- Assigning the head visual to your organization.
API Endpoints
1. Upload Video
- Endpoint:
/video/upload - Method:
POST - Description: Uploads the video source for the head visual.
- Request Body:
- file: The video file to upload (e.g., video.mp4). The file parameter name is important.
curl -X 'POST' \
'https://platform-api.unith.ai/video/upload' \
-H 'accept: application/json' \
-H 'Authorization: Bearer yourAuthBearerToken' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@/path/to/your/video.mp4' # Replace /path/to/your/video.m- Response:
- Status Code:
200 (OK) - Response Body:
- Status Code:
{
"token": "temporary_video_token" // The temporary token for the uploaded video.
}
- Response Parameters:
- token (string): A temporary, unique token representing the uploaded video. This token is required for the next step.
- Error Handling:
- The endpoint will return standard HTTP error codes for invalid requests, upload failures, or server errors. Ensure your request is correctly formatted and the video file is valid.
2. Create Head Visual
- Endpoint:
/head_visual/create - Method:
POST - Description: Creates a new head visual resource from the uploaded video.
- Request Body:
{
"update": false,
"detector_version": "v2",
"detector_threshold": -0.2,
"mode": "default",
"cut_timestamp": 0.1,
"debug": false
}- Request Parameters:
update(boolean): Indicates whether to update an existing head visual (set to false for new). Don't change.detector_version(string): The version of the face detection algorithm to use. Use "v2" for best results.detector_threshold(number): The threshold for face detection. Don't change.mode(string): The processing mode. "default" is the standard mode. Don't change.cut_timestamp(number): The timestamp for cutting the video. Don't change.debug(boolean, optional): If set to true, the response will include a task_id. If video processing fails, a ZIP file containing frames and face detection results will be provided for debugging.
- Curl Example:
curl -X 'POST' \
'https://platform-api.unith.ai/head_visual/create' \
-H 'accept: application/json' \
-H 'x-head-video-token-id: yourTemporaryVideoToken' \
-H 'Authorization: Bearer yourAuthBearerToken' \
-H 'Content-Type: application/json' \
-d '{
"update": false,
"detector_version": "v2",
"detector_threshold": -0.2,
"mode": "default",
"cut_timestamp": 0.1,
"debug": false
}'- Response:
- Status Code:
200(OK) - Response Body:
- Status Code:
{
"data": {
"id": "yourNewHeadVisualId", // The unique ID of the new head visual.
"task_id": "yourTaskId" // The ID of the processing task (only if debug=true).
}
}
- Response Parameters:
- id (string): The unique identifier for the newly created head visual. This ID is used in subsequent steps.
- task_id (string, optional): The ID of the video processing task. This is only included if the debug parameter was set to true in the request.
- Error Handling:
- The endpoint will return standard HTTP error codes for invalid requests, missing headers, or server errors.
3. Save Head Visual
- Endpoint:
/head_visual/save - Method:
POST - Description: Saves the head visual resource with the specified metadata.
- Request Body:
{
"id": "yourNewHeadVisualId", // The head visual ID from the /head_visual/create response.
"name": "yourUniqueHeadVisualName", // A unique name for the head visual.
"gender": "MALE" or "FEMALE", // The gender of the Digital Human.
"type": "TALK" // The type of head visual.
}
- Request Parameters:
id(string, required): The ID of the head visual to save (obtained from the /head_visual/create response).name(string, required): A unique name for the head visual. This name must be unique within your organization.gender(string, required): The gender of the Digital Human. Use either "MALE" or "FEMALE".type(string, required): The type of head visual. Typically, this is "TALK".categoryId(string, optional): The category ID to assign to the head visual. Defaults toUnsetif omitted. See the Categories section below.
- Curl Example:
curl -X 'POST' \
'https://platform-api.unith.ai/head_visual/save' \
-H 'accept: application/json' \
-H 'Authorization: Bearer yourAuthBearerToken' \
-H 'Content-Type: application/json' \
-d '{
"id": "yourNewHeadVisualId",
"name": "yourUniqueHeadVisualName",
"gender": "FEMALE",
"type": "TALK"
}'- Response:
- Status Code:
200(OK) - Response Body: An empty string.
- Status Code:
- Important Notes:
- The name parameter must be unique. Choose a descriptive and unique name for your head visual.
- This endpoint may take some time to process, depending on the length of the uploaded video.
- The head visual status will initially be "pending" until the video processing is complete. You may need to check the status of the head visual separately if you need to confirm processing is done.
- This pending state may take a few minutes and correlates to the length of the video being processed.
- The url in the response body will be empty, unless debug was set to true, in which case the URL of the debug ZIP file is returned.
- Error Handling:
- The endpoint will return standard HTTP error codes for invalid requests, missing parameters, or if the head visual ID is invalid. It will also return an error if the chosen name is not unique.
Categories
As the number of head visuals grows, the platform provides a filtering system to help you organize and locate them efficiently using gender (male, female) and categories. When saving a head visual, you can assign it to one of the following categories using its corresponding ID.
Endpoint: /category/all
Method: GET
Description: Retrieves the list of existing categories
curl -X 'GET' \
'https://platform-api.unith.ai/category/all' \
-H 'accept: application/json' \
-H 'Authorization: Bearer yourBearer'In order to add a category to an existing head visual, use one of the existing category ids and assign it in your head visual save endpoint.
curl -X 'POST' \
'https://platform-api.unith.ai/head_visual/save' \
-H 'accept: application/json' \
-H 'Authorization: Bearer bearerToken' \
-H 'Content-Type: application/json' \
-d '{
"id": "yourHeadVisualId",
"name": "Test_video_category_Julie",
"gender": "MALE",
"type": "TALK",
"categoryId": "ID" // example: 30000000-0000-0000-0000-000000000003 (real)
}'Simple Head Visual Post-Processing Guide
This document outlines an optional, step-by-step procedure for post-processing your source videos manually to achieve custom idle video.
This documentation assumes you have recorded your model according to the best practices described in the "Creating Head Visuals" documentation and have a video with a green screen background.
General Post-Processing Procedure (Default Idle Loop)
This procedure focuses on creating a short, seamless idle loop for a default head visual.
1. Creating the Seamless Idle Loop
The goal is to create a short, natural-looking idle segment (under 5 seconds) that can be seamlessly looped. The total length of your final video must be shorter than 10 seconds.
- Select Software: Open your captured video in editing software (e.g., DaVinci Resolve, Adobe Premiere, etc.).
- Identify Loop Points: Find a brief segment of the video (ideally less than 5 seconds) where the model's movement (e.g., head movement, eye blinking) is natural and smooth. Avoid brisk or sudden movements, as these are highly visible when looping.
- Reverse and Duplicate: Cut the selected segment, duplicate it, and reverse the speed of the duplicated clip. By appending the reversed clip to the original, you create a perfect loop where the start and end frames match, ensuring a natural transition.
Try to take one blink during the video, preferably around the middle of the recording — not at the beginning or end. This helps create a more relaxed and natural appearance. Avoid a body or head movement in the first or last frame, we want a smooth movement and this will make the loop transition more noticeable.
2. Keying and Background Removal
- Key the Green Screen: Use keying tools (such as those available in After Effects or DaVinci Resolve) to accurately remove the green screen background from the subject.
3. Adding a Custom Background
- Select Background: Add a custom background of your choice behind the keyed subject.
- Avoid Distraction: If you use a video background, ensure the movement or activity is minimal. This video could also be turned into a loop to blend it better. This prevents a noticeable change when transitioning from the static idle state to the active speaking state.
4. Color Correction and Final Adjustments
- Color Match: Perform color correction on the foreground (the model) to ensure the lighting and color tone seamlessly match the new background layer. This can be done in any professional editing software.
5. Exporting the Final Video
The final video must adhere to the following specifications for processing by our synthesis pipeline as mentioned in the “head visual creation” documentation:
- Resolution: 1280 x 720p (16:9 HD)
- Frame Rate: 25 frames per second (25fps)
- Format: .mp4
- Duration: Less than 10 seconds total.
- Size: 3MB maximum
Key Difference: Two Loops Video Input
When creating a video for the Two Loops (more expressive) head visual. This format consists of a single video. The creation process is simpler
- Single Continuous Video: You do not need to manually cut and reverse the video (Step 1 is skipped).
- Video Structure: The input video is a single, continuous recording where the first half is the still idle state, (which will be defined by the Cut-Timestamp) and the second half is the expressive state (e.g., subject moving hands, changing facial expression).
- System Handles Looping: When creating the head visual in two_loops mode, you specify the cut_timestamp where the transition between the idle and speaking states occurs. Our system automatically handles the necessary looping and inversion for both the idle and expressive segments to ensure seamless, non-jarring transitions.
This distinction is crucial: for Two Loops, your editing work is focused purely on keying, background, and color correction, as the platform manages the looping mechanism.
- In the first part of the video, follow the same recommendations as for the Default Idle Loop format to ensure a seamless infinite loop.
- Include one blink in the first 4 seconds, and another between second 4 and the end. This enhances realism and avoids a robotic look.
- As before, avoid noticeable body or head movement in the first or last frame to generate a smooth loop.
The two_loops mode works slightly different between legacy Digital Humans and Digital Humans in streaming mode.
Please check this page for legacy and streaming digital humans.
Using AI Generation Tools
You have the freedom to use a variety of tools, including AI-based solutions for video creation, source image generation, or face swapping. However, please be aware:
- Training Data: Our model was trained on real human video footage, and real video may deliver better performance than visibly AI-generated content.
- Face Detection: Our synthesis model relies on accurate face detection in every frame of the video. Ensure that any post-processing or AI generation does not interfere with the clarity or consistency of the subject's face.
Happy creating!