Create a Digital Human

To create a digital human you will need a user and organisation. Depending on your organization's type and privileges, you will have access to various head visuals. See User for more information.

Components

All Digital Humans are made up of the following required components:

Name
Alias
Face
Voice
Operating Mode
Video Streaming

Prior to starting the creation process, it is important to consider what operating mode your Digital Human should use. This is directly related to the intended use case.

Operating Modes

Digital Humans can operate in 5 distinct modes

Text-to-Video / Video: Specify text for a given Digital Human, and a video will be generated with the digital human speaking the text. The output is an mp4 file.
Open Dialogue: Configure a prompt for a given Digital Human, and the Digital Human will be conversational.
Document / Knowledge base: Provide content and configure a prompt, and the Digital Human will be conversational about the content provided.
Plugin mode: Leverage a webhook to connect any custom conversational engine or LLM to power the conversation of the Digital Human.

Digital Human Creation Process

All digital humans need an existing head visual and voice. To see your available head visuals, see List Faces and List Voices documentation.

Names and aliases are free text fields which are used for personalization. There are no restictions apart from being required fields.

Use the POST /head/create- to create Digital Humans. Each operation mode requires a different request body (see Operating Mode Parameters).

{ 
  "headVisualId": "<stringIdRetrievablefromListFaces>",
  "name": "test_qa",
  "alias": "AI responder",
  "languageSpeechRecognition": "en-US",
  "language": "en-US",
  "operationMode": "doc_qa",
  "promptConfig": {
    "system_prompt": "string"
  },
  "ttsProvider": "elevenlabs",
  "ocProvider": "playground",
  "ttsVoice": "Jessica_eleven_turbo_v2_5",
  "greetings":"Hi there!",
  "videoStreaming": true,
  "phrases": ["Unith","Barcelona"],
  "customWords": {"Barcelona":"Barthelona","AI":"A eye"}
}

info

The payload can contain many additional properties of the Digital Human as described in the Configuration Parameters page.

warning_amber

If you have access to multiple organsations, you will need to add "orgId":"<orgId>" to the payload.

warning_amber

Please note that Voiceflow is not a actively maintained mode anymore.

Request Body Parameters

Parameter	Data Type	Description
`headVisualId`	string	Head Visual ID: The unique identifier of the visual representation (the "look") chosen for the digital human, selected from a list of available head visuals.
`orgId`	string	Organization ID: The unique identifier of your organization. This parameter is used to associate the digital human with your organization account.
`suggestions`	string	Used in "document-based question answering" and open dialogue;(`operationMode=doc_qa`, `operationMode=oc`. Allows manual overriding or adding to automatically extracted suggestions from uploaded documents. Format as a JSON array string, e.g., `["example1", "example2"]`.
`name`	string	The system-generated name for the digital human, derived from the `alias`. It's generally recommended to leave this parameter untouched.
`alias`	string	The user-defined, short, and unique name for your digital human. This alias will be used to construct the public name of the digital human.
`languageSpeechRecognition`	string	Sets the language for speech recognition, using codes supported by Microsoft Azure Speech Services. Refer to [Microsoft Azure documentation]for a full list of supported languages.
`language`	string	Frontend Language: Sets the language for the user interface and frontend elements of the digital human. To get the list of currently supported languages please use the endpoint: `GET/languages/all`
`ttsProvider`	string	Text-to-Speech Provider: Specifies the service used for text-to-speech generation. Out of the box support for: `"elevenlabs"` `"azure"` `"audiostack"`
`operationMode`	string	Operation Mode: Defines the operational behavior of the digital human. Possible values are: `"oc"` (open conversation), `"doc_qa"` (document-based question answering), `"ttt"` (text-to-talk - video generation from text input) and `"plugin"`for custom plugin implementation
`promptConfig`	string	Used to customize the behavior of the digital human for `"doc_qa"` and `"oc"` operation modes. See nested parameters below for details.
`system_prompt (nested promptConfig)`	array	Sets the overall system prompt to guide the behavior and personality of the digital human in open conversation mode. Operational mode "doc_qa" includes function calls for accurate retrieval.
`ttsVoice`	string	Text-to-Speech Voice: Selects a specific voice for text-to-speech output. Choose from a wide range of voices across different providers. See [voice list documentation]for available voices.
`greetings`	string	Sets the initial greeting message that the digital human will use to start a conversation.
`voiceflowApiKey`	string	Required only when `operationMode` is set to `"voiceflow"`. Paste your Voiceflow conversation API key here to connect the digital human to your Voiceflow conversation flow.
`isRandomSuggestions`	boolean	Random Suggestions: Determines whether suggestions are displayed in a fixed order (`false`) or in a randomized order (`true`). Defaults to `true` if not specified.
`pluginOperationalModeConfig`	object	Plugin Configuration: Used when `operationMode` is set to `"plugin"`. Allows specifying plugin-specific configurations. Refer to the plugin documentation for detailed payload structure.
`customWords`	object	Custom Words: Allows defining custom pronunciations for specific words. Provided as a key-value object (dictionary) where keys are the words and values are their custom pronunciations (e.g., `{"unith": "iunit"}`). Case-sensitive.
`videoStreaming`	boolean	videoStreaming determines whether your digital human will be created in streaming or legacy (non streaming) mode.

info

By default, all digital humans are created in streaming mode. If you want to create a legacy, non streaming digital human, you need to include videoStreaming=false in your payload.

Example requests

curl

curl -X 'POST' \
  'https://platform-api.unith.ai/head/create' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <yourBearer>' \
  -H 'Content-Type: application/json' \
  -d '{
  "headVisualId": "abc123abc123",
  "name": "ttt",
  "alias": "Repeater",
  "languageSpeechRecognition": "en-US",
  "language": "en-US",
  "ttsProvider": "elevenlabs",
  "videoStreaming": true,
  "operationMode": "ttt",
  "ocProvider": "playground",
  "ttsVoice": "coco_eleven_turbo_v2_5"
}'

error_outline

Knowledge base (doc_qa) Digital Humans need an additional step to be functional. This is described in Create a Document Based Digital Human and related to uploading a knowledge document.

Interact with a Digital Human

Upon creating a Digital Human with a single call to the head/create endpoint, your Digital Human is now hosted by UNITH.

This URL can be found in the POST head/create response:

Public URL

{
  ...
  "publicId": "headId",
  "publicUrl": "https://stream.unith.ai/[org-id]/[head-id]?api_key=[org-api-key]",
  ...
}

scheduleLast updated Mar 16, 2026