Harden Deepgram parseData: drop transcript.split alignment, auto-detect diarization, clean ms conversion

### Background

`parseData` in `js/hyperaudio-lite-editor-deepgram.js` builds the editor's transcript HTML from a Deepgram response. It currently has a few fragile spots that bit a downstream pipeline using the same logic; the same issues apply in HLE.

### What's fragile today

1. **`transcript.split(' ')` alignment with the words array.** Word text comes from `punctuatedWords[index]`, where `punctuatedWords = alt.transcript.split(' ')`. That holds for clean transcripts but `smart_format=true` can break the 1:1 alignment on formatted numbers, hyphenations, etc. — words then silently shift.
2. **No detection of missing diarization.** If a request goes out without `diarize=true`, every word's `speaker` is `undefined` and the output gets `[speaker-undefined]` markup throughout.
3. **`.toFixed(2) * 1000` for ms conversion.** Mildly buggy (string → multiply) and uglier than `Math.round(s * 1000)`.
4. **Sentence-end check reads the split-transcript array** (`punctuatedWords[index - 1]`), inheriting the same alignment fragility.
5. **No early guard for empty responses.** A silent/truncated response fails cryptically deep in the loop instead of with a clear message.

### Proposed changes

- Read each word's text from `element.punctuated_word || element.word` directly. Drop the `transcript.split(' ')` array. Word data is canonical.
- Add `const showDiarization = wordData.some(w => w.speaker !== undefined);` and gate speaker markup on that. No diarization → clean transcript with no speaker labels, instead of `[speaker-undefined]`.
- Add a small `ms = s => Math.round(s * 1000)` helper and use it everywhere ms appears in attributes. Replaces every `.toFixed(2) * 1000`.
- Sentence-end check reads `wordText(wordData[index - 1])` (using the same helper) so it can never go out of sync.
- Early throw if `alt.words` is missing or empty, with a clear message identifying the failure.

### Reference implementation

A hardened version of the loop body (slightly adapted from an n8n node that does the same work):

```js
const alt = dg?.results?.channels?.[0]?.alternatives?.[0];
if (!alt || !Array.isArray(alt.words) || alt.words.length === 0) {
  throw new Error(`No transcribed words in Deepgram response`);
}
const wordData = alt.words;

const maxWordsInPara = 100;
const significantGapInSeconds = 4.0;
const speakerReassignGap = 0.3;

const ms = (s) => Math.round(s * 1000);
const wordText = (w) => (w.punctuated_word || w.word || '');
const showDiarization = wordData.some((w) => w.speaker !== undefined);

// Diarization edge-case fix (unchanged from current code)
for (let i = 1; i < wordData.length - 1; i++) {
  const prev = wordData[i - 1];
  const cur = wordData[i];
  const next = wordData[i + 1];
  if (cur.speaker !== prev.speaker && next.speaker === cur.speaker) {
    const gapBefore = cur.start - prev.end;
    const gapAfter = next.start - cur.end;
    if (gapBefore < speakerReassignGap && gapAfter > speakerReassignGap) {
      cur.speaker = prev.speaker;
    }
  }
}

let hyperTranscript = "<article>\n <section>\n  <p>\n   ";
let previousElementEnd = 0;
let wordsInPara = 0;

wordData.forEach((element, index) => {
  const currentWord = wordText(element);
  wordsInPara++;

  if ((previousElementEnd !== 0 && (element.start - previousElementEnd) > significantGapInSeconds) || wordsInPara > maxWordsInPara) {
    const previousWord = wordText(wordData[index - 1]);
    const lastChar = previousWord.charAt(previousWord.length - 1);
    if (lastChar === '.' || lastChar === '?' || lastChar === '!') {
      hyperTranscript += "\n  </p>\n  <p>\n   ";
      wordsInPara = 0;
    }
  }

  if (showDiarization && index > 0 && element.speaker !== wordData[index - 1].speaker) {
    hyperTranscript += "\n  </p>\n  <p>\n   ";
    wordsInPara = 0;
  }
  if (showDiarization && (index === 0 || element.speaker !== wordData[index - 1].speaker)) {
    hyperTranscript += `<span class="speaker" data-m='${ms(element.start)}' data-d='0'>[speaker-${element.speaker}] </span>`;
  }

  hyperTranscript += `<span data-m='${ms(element.start)}' data-d='${ms(element.end - element.start)}'>${currentWord} </span>`;
  previousElementEnd = element.end;
});

hyperTranscript += "\n </p> \n </section>\n</article>\n ";
hyperTranscript = hyperTranscript.replace(/<p>\s*<\/p>\s*/g, '');
```

This is structurally identical to today's `parseData` body — same paragraph-break rules, same diarization edge-case fix, same empty-`<p>` cleanup — so output is unchanged on the cases that already work. It just no longer relies on `transcript.split(' ')`, no longer emits `[speaker-undefined]` when diarization is missing, uses a clean `ms()` helper, and fails loudly on empty responses.

### Acceptance criteria

- [ ] `parseData` in `js/hyperaudio-lite-editor-deepgram.js` no longer reads `alt.transcript` for per-word text — uses `punctuated_word`/`word` directly.
- [ ] `punctuatedWords` array is removed from `parseData`.
- [ ] Speaker markup only emitted when at least one word in the response has a `speaker` field.
- [ ] All `(...).toFixed(2) * 1000` ms conversions in `parseData` are replaced with `Math.round(seconds * 1000)`.
- [ ] Empty `alt.words` throws a clear error before the loop runs.
- [ ] Existing happy-path behaviour is unchanged (paragraph breaks, diarization edge-case fix, empty `<p>` cleanup all still work).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden Deepgram parseData: drop transcript.split alignment, auto-detect diarization, clean ms conversion #304

Background

What's fragile today

Proposed changes

Reference implementation

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Harden Deepgram parseData: drop transcript.split alignment, auto-detect diarization, clean ms conversion #304

Description

Background

What's fragile today

Proposed changes

Reference implementation

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions