Semantic Cache for Digital Human Responses
This document explains how to leverage the UNITH Digital Human platform's caching mechanisms, including the advanced Semantic Cache, to deliver faster and more efficient responses.
Overview
The caching layer mechanism significantly enhances the responsiveness of your Digital Human by storing and reusing previously generated responses. This means that as your Digital Human is used more frequently, its ability to deliver instantaneous responses improves, bypassing the full synthesis pipeline.
By default, your digital human is set to retrieve cached responses for exact match inquiries.
Default Caching: Exact Match
By default, the Digital Human platform supports caching for exact user inquiries. This means:
- If two or more users ask exactly the same question (character for character), the system will only generate the response once.
- For all subsequent identical queries, the system will retrieve the already generated response from the cache.
- This process effectively bypasses the entire speech synthesis and video generation pipeline, resulting in an instantaneous response delivery.
Enhancing Responsiveness with Semantic Cache
While exact match caching is effective for identical queries, real-world user interactions often involve variations in phrasing for the same underlying intent.
The Semantic Cache feature extends this capability by triggering the same response mechanism for different user queries that share the same semantic meaning, even if their phrasing is not identical.
How Semantic Cache Works
The Semantic Cache analyzes the meaning (semantics) of user queries. When a new query comes in, the system compares its semantic meaning to previously cached queries. If the semantic similarity is above a configurable threshold, the cached response for the semantically similar query is delivered.
Configuring the Semantic Cache Threshold
You can set a semantic cache threshold value to control the level of semantic similarity required to trigger a cached response. This threshold is a crucial parameter that balances response speed with contextual accuracy:
- Lower Threshold Value (Higher Precision):
- Setting a lower threshold (e.g., closer to 0) means the system requires a very high degree of semantic similarity between queries.
- This results in a higher probability that the cached response is highly relevant and adequate for the new user query.
- However, it may lead to fewer cache hits, as queries need to be very close in meaning.
- Higher Threshold Value (Higher Cache Hits):
- Setting a higher threshold (e.g., closer to 1) allows for a broader range of semantic similarity.
- This will most likely result in a higher number of instantaneous responses, as more varied queries will trigger cache hits.
- However, it carries an increased risk that the semantic meaning between two user queries might be low, potentially leading to a cached response that is less relevant or even inadequate for the new query.
The semantic cache threshold is a crucial parameter that balances response speed with contextual accuracy. Lower value = higher precision.

You can set values from0 - 1 when setting semantic cache. To do so, please use the following endpoint /HEAD/UPDATE/ together with your existing head Id.
curl -X 'PUT' \
'https://platform-api.unith.ai/head/update' \
-H 'accept: application/json' \
-H 'Authorization: Bearer yourBearerToken' \
-H 'Content-Type: application/json' \
-d '{
"id": "yourHeadId",
"semanticThreshold": 0.3
}'Here's a breakdown of how different threshold ranges typically correspond to semantic similarity:
- Distance 0.0-0.01: Nearly Identical
- Sentences are essentially saying the same thing with minor word variations.
- Example: "The cat is sleeping" vs "A cat is asleep"
- Distance 0.01-0.05: Very Similar
- Same core meaning with some structural or vocabulary differences.
- Example: "I love pizza" vs "Pizza is my favorite food"
- Distance 0.05-0.1: Moderately Similar
- Related topics or themes but different specific focus.
- Example: "The weather is sunny today" vs "It's a beautiful day outside"
- Distance 0.2-0.3: Somewhat similar
- Share some conceptual overlap but clearly different meanings.
- Example: “I’m cooking dinner” vs “The restaurant serves great food”
- Distance 0.5-0.8: Weakly Related
- Some semantic connection, perhaps sharing a broad category.
- Example: “My dog is barking” vs “I heard music playing”