Skip to content

Harden Deepgram parseData: drop transcript.split alignment, auto-detect diarization, clean ms conversion #304

@maboa

Description

@maboa

Background

parseData in js/hyperaudio-lite-editor-deepgram.js builds the editor's transcript HTML from a Deepgram response. It currently has a few fragile spots that bit a downstream pipeline using the same logic; the same issues apply in HLE.

What's fragile today

  1. transcript.split(' ') alignment with the words array. Word text comes from punctuatedWords[index], where punctuatedWords = alt.transcript.split(' '). That holds for clean transcripts but smart_format=true can break the 1:1 alignment on formatted numbers, hyphenations, etc. — words then silently shift.
  2. No detection of missing diarization. If a request goes out without diarize=true, every word's speaker is undefined and the output gets [speaker-undefined] markup throughout.
  3. .toFixed(2) * 1000 for ms conversion. Mildly buggy (string → multiply) and uglier than Math.round(s * 1000).
  4. Sentence-end check reads the split-transcript array (punctuatedWords[index - 1]), inheriting the same alignment fragility.
  5. No early guard for empty responses. A silent/truncated response fails cryptically deep in the loop instead of with a clear message.

Proposed changes

  • Read each word's text from element.punctuated_word || element.word directly. Drop the transcript.split(' ') array. Word data is canonical.
  • Add const showDiarization = wordData.some(w => w.speaker !== undefined); and gate speaker markup on that. No diarization → clean transcript with no speaker labels, instead of [speaker-undefined].
  • Add a small ms = s => Math.round(s * 1000) helper and use it everywhere ms appears in attributes. Replaces every .toFixed(2) * 1000.
  • Sentence-end check reads wordText(wordData[index - 1]) (using the same helper) so it can never go out of sync.
  • Early throw if alt.words is missing or empty, with a clear message identifying the failure.

Reference implementation

A hardened version of the loop body (slightly adapted from an n8n node that does the same work):

const alt = dg?.results?.channels?.[0]?.alternatives?.[0];
if (!alt || !Array.isArray(alt.words) || alt.words.length === 0) {
  throw new Error(`No transcribed words in Deepgram response`);
}
const wordData = alt.words;

const maxWordsInPara = 100;
const significantGapInSeconds = 4.0;
const speakerReassignGap = 0.3;

const ms = (s) => Math.round(s * 1000);
const wordText = (w) => (w.punctuated_word || w.word || '');
const showDiarization = wordData.some((w) => w.speaker !== undefined);

// Diarization edge-case fix (unchanged from current code)
for (let i = 1; i < wordData.length - 1; i++) {
  const prev = wordData[i - 1];
  const cur = wordData[i];
  const next = wordData[i + 1];
  if (cur.speaker !== prev.speaker && next.speaker === cur.speaker) {
    const gapBefore = cur.start - prev.end;
    const gapAfter = next.start - cur.end;
    if (gapBefore < speakerReassignGap && gapAfter > speakerReassignGap) {
      cur.speaker = prev.speaker;
    }
  }
}

let hyperTranscript = "<article>\n <section>\n  <p>\n   ";
let previousElementEnd = 0;
let wordsInPara = 0;

wordData.forEach((element, index) => {
  const currentWord = wordText(element);
  wordsInPara++;

  if ((previousElementEnd !== 0 && (element.start - previousElementEnd) > significantGapInSeconds) || wordsInPara > maxWordsInPara) {
    const previousWord = wordText(wordData[index - 1]);
    const lastChar = previousWord.charAt(previousWord.length - 1);
    if (lastChar === '.' || lastChar === '?' || lastChar === '!') {
      hyperTranscript += "\n  </p>\n  <p>\n   ";
      wordsInPara = 0;
    }
  }

  if (showDiarization && index > 0 && element.speaker !== wordData[index - 1].speaker) {
    hyperTranscript += "\n  </p>\n  <p>\n   ";
    wordsInPara = 0;
  }
  if (showDiarization && (index === 0 || element.speaker !== wordData[index - 1].speaker)) {
    hyperTranscript += `<span class="speaker" data-m='${ms(element.start)}' data-d='0'>[speaker-${element.speaker}] </span>`;
  }

  hyperTranscript += `<span data-m='${ms(element.start)}' data-d='${ms(element.end - element.start)}'>${currentWord} </span>`;
  previousElementEnd = element.end;
});

hyperTranscript += "\n </p> \n </section>\n</article>\n ";
hyperTranscript = hyperTranscript.replace(/<p>\s*<\/p>\s*/g, '');

This is structurally identical to today's parseData body — same paragraph-break rules, same diarization edge-case fix, same empty-<p> cleanup — so output is unchanged on the cases that already work. It just no longer relies on transcript.split(' '), no longer emits [speaker-undefined] when diarization is missing, uses a clean ms() helper, and fails loudly on empty responses.

Acceptance criteria

  • parseData in js/hyperaudio-lite-editor-deepgram.js no longer reads alt.transcript for per-word text — uses punctuated_word/word directly.
  • punctuatedWords array is removed from parseData.
  • Speaker markup only emitted when at least one word in the response has a speaker field.
  • All (...).toFixed(2) * 1000 ms conversions in parseData are replaced with Math.round(seconds * 1000).
  • Empty alt.words throws a clear error before the loop runs.
  • Existing happy-path behaviour is unchanged (paragraph breaks, diarization edge-case fix, empty <p> cleanup all still work).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions