Expressive Legacy Digital Humans

Feature Overview

The Idle & Active Video Segments feature lets you bring more life and realism to your Digital Human by using different visual states for moments of stillness versus moments of speech.

By providing distinct segments for “idle” and “speaking,” your Digital Human can display subtle, natural behavior when waiting, and more expressive, engaging behavior when responding.

Why This Matters

Using the same video segment for all states can look repetitive and reduce realism. This feature allows:

Smooth, natural idle behavior that feels alive, not frozen.
Expressive facial and body movements during speech.
A more human-like presence that keeps the viewer engaged.

How It Works for You (Configuration)

Upload a Source Video
- The video should include both a calm idle portion and a more active speaking portion.
- Make sure there is a clear moment where the idle segment ends and the active segment begins.
Mark the Transition Point
- Identify the exact timestamp (in seconds) where the change from idle to speaking occurs.
- This ensures that the first and last frames of each segment match, giving a continuous, natural visual flow.
Apply to Your Digital Human
- Once configured, the system will automatically use the idle segment when your Digital Human is waiting, and the active segment when it is speaking.

Video Guidelines

Start with minimal movement and a relaxed facial expression.
Transition naturally into speaking with expressive gestures or facial changes.
Keep the environment, lighting, and framing consistent for both segments.

Key Considerations

Timestamp Accuracy: The transition moment should be precise for seamless switching between states.
Video Quality: High-resolution, well-lit footage creates more lifelike results.
Preview: Test the setup to ensure smooth transitions and natural presentation.

Video Processing Details

Idle to Speaking State

This section describes how the idle and speech video segments are generated from the uploaded video and the provided cut_timestamp.

Idle Video Generation:
- Once the cut_timestamp (t) is set, the system takes the video from 0 to t.
- This segment (0 to t) is then used to create an infinite loop by inverting the video.
Speech Video Generation:
- The speech video is generated based on the duration of the audio to be spoken.
- If the audio duration is, for example, 5 seconds, the system takes a segment of the video from <t to t + 2.5> seconds (half the audio duration).
- This segment (t to t + 2.5s) is then inverted and appended to itself to create a 5-second video that ends on the same frame as the cut timestamp (t).
Ensuring Seamless Transition:
- The first and last frame of the idle video are the same as the first and last frame of the speech video.
- This ensures that there are no visible skips or jumps between the idle and speech states, resulting in a smooth transition.

Prerequisites

Before implementing this feature, ensure you have:

API Access: Valid access to the UNITH API.

Video Recording: A video recorded according to the guidelines in our "Best Practices for Video Recording" documentation.

Video Requirements

Continuous Recording: The video should be a continuous recording that includes both the inactive and active states.
Idle State: The video should begin with the actor in an idle state (e.g., looking at the camera with minimal movement).
In this case, the cut_timestamp would be "3".
Active State: The video should then transition to the active state, where the actor is speaking, raising their eyebrows, gesturing, or otherwise being expressive.
Clear Transition: The transition between the idle and active states should be clear and well-defined.
Timestamp: You must identify the precise timestamp (in seconds) where the transition from the idle state to the active state occurs. This timestamp is crucial for configuring the feature.
- Example Video Scenario:
  - Imagine a video where an actor:
    - Starts by looking directly at the camera, calmly and still.
    - At the 3-second mark, the actor begins to speak and uses hand gestures. In this case, the cut_timestamp would be "3".

Implementation Steps

The following steps outline how to implement the Two Loops feature using the API:

1. Upload Video

Endpoint: /video/upload
Method: POST
Description: Uploads the video containing both the idle and active states.
Request Headers:
- Accept: application/json
- Authorization: Bearer <yourBearerToken> (Replace <yourBearerToken> with your actual bearer token)
- Content-Type: multipart/form-data
Request Body:
- file: The video file to upload (e.g., my_video.mp4).
Curl example:

curl

curl -X 'POST' \
  'https://platform-api.unith.ai/video/upload' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer yourBearerToken' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@/path/to/your/video.mp4'

Replace /path/to/your/video.mp4 with the actual path to your video file.

Reponse:
- Status Code: 200 (OK)
- Response body:

json

{
  "token": "temporary_video_token"
}

Response Parameters:
- token (string): A temporary token representing the uploaded video. This token is used in the next step.

2. Create Head Visual

Endpoint: /head_visual/create
Method: POST
Description: Creates a new head visual resource, configuring it for the Two Loops feature.
Request Headers:
- Accept: application/json
- x-head-video-token-id: <yourTemporaryVideoToken> (Replace <yourTemporaryVideoToken> with the token from the /video/upload response)
- Authorization: Bearer <yourBearerToken> (Replace <yourBearerToken> with your authorization token)
- Content-Type: application/json
Request body:

json

{
  "update": false,
  "detector_version": "v2",
  "detector_threshold": -0.2,
  "mode": "two_loops",
  "cut_timestamp": 3, // Replace with the actual timestamp in seconds
  "debug": false
}

Request Parameters:
- update (boolean): Set to false for creating a new head visual.
- detector_version (string): Use "v2" for optimal results.
- detector_threshold (number): The threshold for face detection.
- mode (string): Set to "two_loops" to enable the Two Loops feature.
- cut_timestamp (number, required): The timestamp (in seconds) where the video transitions from the idle state to the active state. Crucially important parameter.
- debug (boolean, optional): If set to true, the response will include a task_id. If video processing fails, a ZIP file containing frames and face detection results will be provided for debugging.
Curl example:

curl

curl -X 'POST' \
  'https://platform-api.unith.ai/head_visual/create' \
  -H 'accept: application/json' \
  -H 'x-head-video-token-id: yourTemporaryVideoToken' \
  -H 'Authorization: Bearer yourBearerToken' \
  -H 'Content-Type: application/json' \
  -d '{
  "update": false,
  "detector_version": "v2",
  "detector_threshold": -0.2,
  "mode": "two_loops",
  "cut_timestamp": 3,
  "debug": false
}'

Replace yourTemporaryVideoToken with the token from the /video/upload response.

Replace 3 with the actual timestamp.

Reponse:
- Status Code: 200 (OK)
- Response Body:

json

{
  "data": {
    "id": "yourNewHeadVisualId",
    "task_id": "yourTaskId" // Only if debug is true
  }
}

Response Parameters:
- id (string): The unique ID of the new head visual. This ID is used in the next step.
- task_id (string, optional): The ID of the video processing task (only included if debug is true).

3. Save Head Visual

Endpoint: /head_visual/save
Method: POST
Description: Saves the new head visual resource.
Request Headers:
- Accept: application/json
- Authorization: Bearer <yourAuthBearerToken> (Replace <yourAuthBearerToken> with your authorization token)
- Content-Type: application/json
Request body:

json

{
  "id": "yourNewHeadVisualId",  //  The head visual ID from the /head_visual/create response.
  "name": "my_new_visual", //  A unique name for this head visual.
  "gender": "MALE",           //  The gender of the Digital Human.
  "type": "TALK"              //  The type of head visual.
}

Request Parameters:
- id (string, required): The ID of the head visual to save (obtained from the /head_visual/create response).
- name (string, required): A unique name for the head visual.
- gender (string, required): The gender of the Digital Human (e.g., "MALE", "FEMALE").
- type (string, required): The type of head visual (e.g., "TALK").
Curl Example:

curl

curl -X 'POST' \
  'https://platform-api.unith.ai/head_visual/save' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer yourAuthBearerToken' \
  -H 'Content-Type: application/json' \
  -d '{
  "id": "yourNewHeadVisualId",
  "name": "my_new_visual",
  "gender": "MALE",
  "type": "TALK"
}'

Replace yourNewHeadVisualId and my_new_visual with the actual values.

Response:
- Status Code: 200 (OK)

4. Use the new head visual.

To use your new head visual, simply select it when creating a new digital human.

Important Considerations

Video Quality: The quality of your source video is critical for achieving good results with the Two Loops feature. Refer to our video recording best practices for guidelines.
Cut Timestamp Accuracy: The cut_timestamp parameter must be accurate. An incorrect timestamp will result in a jarring or unnatural transition between the idle and active states.
Testing: Thoroughly test your Digital Human with the Two Loops feature to ensure the transitions are smooth and the behavior is as expected.

5. Activate the Two Loops Head Visual (Frontend)

To use your new head visual, simply select it when creating a new digital human.

In order to use the newly created head visual, you must explicitly enable the Two Loops feature on the client-side when accessing your Digital Human.

5.1. Via Direct URL Parameter

Append the &two_loops=true query parameter to the URL of your Digital Human.

Example:

https://chat.unith.ai/yourOrgId/yourHeadId?api_key=yourApiKey&two_loops=true

5.2 Via Embedded Snippet (Data Attribute)

You can also enable this from within your embedded snippet by adding the data-two_loops="true" attribute to the <body> tag.

Example:

curl

 <body
    id="talking-head"
    data-api_base="https://chat-origin.api.unith.live"
    data-api_key="yourApiKey"
    data-org_id="yourOrgId"
    data-head_id="yourHeadId"
    data-theme="demo"
   ...
    data-two_loops="true"
  ></body>

Important Considerations

Video Quality: The quality of your source video is critical for achieving good results with the Two Loops feature.
Cut Timestamp Accuracy: The cut_timestamp parameter must be accurate. An incorrect timestamp will result in a jarring or unnatural transition between the idle and active states.
Testing: Thoroughly test your Digital Human with the Two Loops feature to ensure the transitions are smooth and the behavior is as expected.

scheduleLast updated Feb 18, 2026