Override Legacy Digital Human's Microphone + Language Detection Template Guide

This example template enables users to speak with a Unith Digital Human using a microphone button. It is intended for developers who would like to create a bespoke microphone experience - It includes:

Voice Activity Detection (VAD)
Azure Speech SDK
Automatic language detection
Transcript preview
Message delivery via postMessage

warning_amber

Assumes the Digital Human is configured to accept external events as defined here.

info

This documentation is relavant for legacy digital humans. If your digital human uses streaming mode, please refer to our SDK documentation instead.

To check whether your digital human is in legacy or streaming mode, see this page.

If the digital human configuration states that videoStreaming=true then your digital human is in streaming mode.

Features

Feature	Description
Click-to-activate mic	Manual start/stop mic via button
Voice Activity Detection	Only triggers when real speech is detected
Language Detection	Auto-detects up to 4 supported languages
Live Transcript	Displays recognized speech as text
Sends Message to DH	Final transcript sent to Unith iframe

How It Works

1. Embed UNITH Iframe

html

<iframe
  id="my-iframe"
  src="https://chat.unith.ai/ORG-ID/HEAD-ID?api_key=YOUR_API_KEY&mode=video"
  allow="microphone">
</iframe>

Use mode=video if you would like to hide the UNITH chat widget and only leverage the video-component
Unsert the appropriate api_key for your org
Must include allow="microphone"

warning_amber

For more information on video-only mode, see this page.

2. Azure Speech Key Configuration

Step 1: Get Your Credentials

Log in to Azure Portal
Create a Speech resource (Cognitive Services)
Copy your:
- Key
- Region

Step 2: Add to Template

Replace these lines in your template (below):

const speechKey = "YOUR_AZURE_SPEECH_KEY";
const serviceRegion = "YOUR_AZURE_REGION"; // e.g. "eastus"

warning_amber

Do not expose real keys in production environments.

3. Auto Language Detection

Configure your supported languages:

const autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([
  "en-US", "fr-FR", "es-ES"
]);

warning_amber

Limit: Azure allows max 4 languages for auto-detect.

4. Transcript Display

(OPTIONAL) Live updates as you speak:

transcriptEl.innerText = "Transcript: " + transcriptBuffer;

5. Message Delivery

After recognition ends, this is called:

iframe.contentWindow.postMessage({
  event: "DH_MESSAGE",
  payload: { message: finalMessage }
}, "https://chat.unith.ai");

warning_amber

Configure the Digital Human to accept external events as defined here. This can also be done directly via the advanced modification window in interFace.

Silence Handling

After 2 seconds of silence, the recognizer will stop:

function resetSilenceTimer() {
  silenceTimer = setTimeout(() => {
    status.innerText = "Status: Silence detected. Stopping recognition...";
    recognizer.stopContinuousRecognitionAsync();
  }, 2000);
}

Modify this if you want always-on behavior.

Customization Options

Task	How
Change languages	Edit the `fromLanguages` array
Change UNITH Digital Human	Modify iframe `src`
Customize UI	Change button or layout
Disable silence timeout	Remove `resetSilenceTimer()` logic

Setup Checklist

Task	Complete?
Embed iframe with correct URL	⬜
Add Azure `speechKey` and `region`	⬜
Configure languages	⬜
Replace placeholder `vad.js`	⬜
Test in browser	⬜

Template & Examples Files

example template.html
vad.js

example template

<!DOCTYPE html>
<html>
<head>
  <title>Unith Mic + Language Detection</title>
  <script src="https://aka.ms/csspeech/jsbrowserpackageraw"></script>
  <script src="vad.js"></script>
</head>
<body>
  <iframe
    id="my-iframe"
    src="https://chat.unith.ai/example/example?api_key=example"
    allow="microphone"
    width="100%"
    height="400"
    style="border:none; border-radius:10px;">
  </iframe>

  <button id="recordBtn" disabled>Activate Mic</button>
  <p id="status">Status: Waiting...</p>
  <p id="transcript">Transcript: ...</p>

  <script>
    const speechKey = "yourspeechkey";
    const serviceRegion = "yourregion";

    const recordBtn = document.getElementById('recordBtn');
    const status = document.getElementById('status');
    const transcriptEl = document.getElementById('transcript');
    const iframe = document.getElementById("my-iframe");

    let recognizer;
    let silenceTimer;
    let digitalHumanReady = false;
    let transcriptBuffer = "";
    let isRecording = false;

    window.addEventListener("message", function (event) {
      if (event.origin !== "https://chat.unith.ai") return;
      const payload = event.data.payload;
      if (event.data.event === "DH_READY" && payload?.isReady) {
        digitalHumanReady = true;
        status.innerText = "Status: Digital human is ready.";
        recordBtn.disabled = false;
      }
    });

    recordBtn.onclick = async () => {
      if (!digitalHumanReady) {
        alert("Digital human is not ready yet.");
        return;
      }
      if (!isRecording) {
        isRecording = true;
        recordBtn.classList.add("recording");
        recordBtn.title = "Click to stop";
        status.innerText = "Status: Waiting for speech...";
        const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
        VAD(stream, {
          onVoiceStart: () => {
            status.innerText = "Status: Detected voice. Starting recognition...";
            startAzureRecognizer();
          },
          interval: 100,
          play: false
        });
      } else {
        isRecording = false;
        recordBtn.classList.remove("recording");
        recordBtn.title = "Start Recording";
        status.innerText = "Status: Recording stopped.";
        if (recognizer) {
          recognizer.stopContinuousRecognitionAsync(() => {
            console.log("Manually stopped recognition.");
          });
        }
        transcriptBuffer = "";
        transcriptEl.innerText = "Transcript: ...";
      }
    };

    function startAzureRecognizer() {
      const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
      const autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([
        "en-US", "fr-FR", "es-ES"
      ]);
      const speechConfig = SpeechSDK.SpeechConfig.fromSubscription(speechKey, serviceRegion);
      recognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioConfig);

      recognizer.recognizing = (_, e) => {
        status.innerText = "Status: Listening...";
        resetSilenceTimer();
      };

      recognizer.recognized = (_, e) => {
        if (e.result.reason === SpeechSDK.ResultReason.RecognizedSpeech) {
          const part = e.result.text.trim();
          if (part && part.length > transcriptBuffer.length && !transcriptBuffer.includes(part)) {
            transcriptBuffer = part;
            transcriptEl.innerText = "Transcript: " + transcriptBuffer;
          }
        }
        resetSilenceTimer();
      };

      recognizer.canceled = (_, e) => {
        status.innerText = "Status: Canceled - " + e.errorDetails;
        recognizer.stopContinuousRecognitionAsync();
        recordBtn.disabled = false;
        isRecording = false;
        recordBtn.classList.remove("recording");
        recordBtn.title = "Start Recording";
        transcriptBuffer = "";
      };

      recognizer.sessionStopped = () => {
        status.innerText = "Status: Done";
        recordBtn.disabled = false;
        const finalMessage = transcriptBuffer.trim();
        if (digitalHumanReady && finalMessage) {
          console.log("📤 Sending FINAL buffered transcript to DH:", finalMessage);
          iframe.contentWindow.postMessage({
            event: "DH_MESSAGE",
            payload: { message: finalMessage }
          }, "https://chat.unith.ai");
        }
        transcriptBuffer = "";
        transcriptEl.innerText = "Transcript: ...";


          // Reactivate VAD to continue listening
        if (isRecording) {
          navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
            VAD(stream, {
              onVoiceStart: () => {
                status.innerText = "Status: Detected voice. Starting recognition...";
                startAzureRecognizer();
              },
              interval: 100,
              play: false
            });
          });
        }
      };

      recognizer.startContinuousRecognitionAsync();
      resetSilenceTimer();
    }
  
    function resetSilenceTimer() {
      if (silenceTimer) clearTimeout(silenceTimer);
      silenceTimer = setTimeout(() => {
        status.innerText = "Status: Silence detected. Stopping recognition...";
        recognizer.stopContinuousRecognitionAsync();
      }, 2000);
    }
  </script>
</body>
</html>

vad.js


// Minimal VAD mock for offline testing (replace with actual implementation if needed)
function VAD(stream, options) {
  console.warn(" VAD mock triggered. Replace with real vad.js.");
  if (options && typeof options.onVoiceStart === 'function') {
    setTimeout(() => {
      options.onVoiceStart(); // simulate voice start after delay
    }, 1000);
  }
}

scheduleLast updated Feb 18, 2026