Speaker-naming simulator + same-voice consolidation to cut speakers-to-name by r3dbars · Pull Request #1114 · r3dbars/transcripted

r3dbars · 2026-06-13T21:02:20Z

What & why

Speaker naming feedback says the review sheet can ask the user to name 4-7 "people" after a 1-on-1 call. The likely failure is offline VBx over-segmenting one remote voice into several large clusters that survive small-cluster absorption.

This PR does two things:

adds a conservative same-voice consolidation pass in EmbeddingClusterer
adds a deterministic speaker-naming simulator that measures the user-facing review-row count, not just raw diarizer cluster count

Changes

Same-voice consolidation

EmbeddingClusterer.postProcess now runs consolidateSameVoiceClusters after small-cluster absorption and before DB-informed split.
Offline system and opt-in mic diarization paths both get the pass through their existing postProcess(... pairwiseMergeThreshold: nil) calls.
The pass agglomeratively merges clusters above the SpeakerNamingPolicy auto-accept bar (> 0.88) and recomputes centroids after each merge.
Added a boundary test so clusters at exactly 0.88 similarity stay separate, matching the naming policy edge.

Speaker-naming simulator

scripts/ops/speaker-naming-simulator.py now reports:

review_before / review_after
expected review rows
expected labels
remote vs local role
false-merge flags

Coverage now includes:

cold unknown 1:1 over-segmentation: 5 review rows -> 1
named repeat speaker persistence: saved name auto-applies, 0 review rows
tentative known speaker: 1 confirmation row, not duplicate rows
remote small/crowded groups
near-threshold similar distinct voices
local mic default-off behavior: You, 0 review rows
opt-in local split behavior: room speakers become local review rows

Docs

docs/qa-test-bench.md documents the simulator as review-row/label/false-merge coverage.

Local verification

Passed on macOS in this worktree:

python3 -m py_compile scripts/ops/speaker-naming-simulator.py
scripts/ops/speaker-naming-simulator.py
scripts/ops/speaker-naming-simulator.py --sweep
git diff --check
bash scripts/dev/agent-preflight.sh
bash -n scripts/ops/transcripted-qa-bench.sh
bash -n scripts/ops/run-local-summary-fixture.sh
python3 -m py_compile scripts/ops/validate-meeting-corpus.py
python3 -m py_compile scripts/ops/compare-meeting-corpus.py
bash build-deps.sh --force
bash build.sh --no-open
bash run-tests.sh
bash run-integration-smoke.sh
swift test
bash scripts/ops/run-local-summary-fixture.sh
swift test --package-path Tools/TranscriptedQA
bash scripts/ops/transcripted-qa-bench.sh --mode quick

Notable counts:

run-tests.sh: 5255 passed, 0 failed
swift test: 474 passed, 1 skipped, 0 failed
Tools/TranscriptedQA: 34 passed, 0 failed
quick QA bench: PASS, 6/6 checks

Review status / blockers

Keep this PR draft for now.

codex review --base origin/main was attempted but hit the Codex account usage limit before producing findings.
Claude Code second-opinion review was attempted through claude -p but hung without output and was interrupted.
CI should re-run on the pushed head 80ca3e7f.
No live audio/manual corpus proof has been run; this simulator is deterministic model coverage, not real speaker-ID accuracy proof on Justin's recordings.

Merge recommendation

Hold as draft until an independent review completes cleanly and, ideally, at least one real or corpus meeting verifies that the reduced review-row count does not hide a true distinct speaker.

r3dbars · 2026-06-15T13:19:51Z

Rebased onto `main` + cleared the merge-state CI failure

This branch was stale (forked at 0c72c29c, before SpeakerNamingSimulationRunner landed on main). On the merged tree CI was red on a single test:
SpeakerNamingSimulationRunnerTests.testSimulationReportFlagsConfusionFalseMergeAndFalseSplit (XCTAssertFalse at :107 + XCTAssertTrue at :110).

Root cause (verified locally, not just assumed)

The negative-control suite built its false split from two clusters with identical alex embeddings. The new same-voice consolidation pass (cosine > 0.88) correctly merges identical voices, so the two clusters collapsed into one review row and the false split disappeared — falseSplitIndicators came back empty. Reproduced on the rebased tree; the report showed False split indicators: none while false-merge/confusion detection still fired. The detector was never broken — consolidation (the feature) legitimately removed the split the test depended on.

Fixes

Rebased the 2 PR commits onto current main (clean, no conflicts; main had not touched any PR file since the fork point).
Negative control now models a drifted same-voice over-segmentation (near(alex, degrees: 35) ≈ 0.82 cosine, below the 0.88 bar) instead of an identical voice. Consolidation correctly leaves it split, the user mislabels it, and the runner's unchanged detector flags a genuine false split. No assertion was weakened — the scenario now exercises the residual false-split risk that consolidation cannot fix.
Threshold drift guard: named both thresholds (SpeakerNamingPolicy.autoAcceptSimilarityThreshold and EmbeddingClusterer.sameVoiceConsolidationThreshold), wired the consolidation defaults to the constant, and added EmbeddingClustererTests.testConsolidationThresholdMatchesAutoAcceptBar asserting they stay equal. Pure refactor — both remain 0.88.

Tests (local `swift test`, the same path CI runs)

Full package: 486 executed, 1 skipped, 0 failures (was 1 failure pre-fix).
Speaker/clusterer/pipeline suites: 128 passed, 0 failed — incl. EmbeddingClustererTests (12, with the new guard) and SpeakerNamingSimulationRunnerTests (6, with the previously-failing test now green for the right reason).

Still manual / unproven (unchanged by this PR)

No real-audio or corpus validation was run — speaker attribution on real meetings stays manual and unknown. The honest claim remains guardrails against over-merging, not solved speaker attribution: aggressive same-voice consolidation can still over-merge genuinely distinct speakers (e.g. a 6-person call collapsing). Recommend a real-corpus pass before relying on the review-row reduction in production.

r3dbars · 2026-06-15T21:44:42Z

Codex review follow-up: known-profile consolidation guard

Pushed f77aa172 to keep same-voice consolidation from merging two plausible different known speakers before review. The fix adds a lower known-profile conflict guard (aligned to the 0.70 DB match floor) plus regressions for both exact known-profile conflicts and looser profile matches below the 0.88 consolidation bar.

Local proof on the final patch:

bash build-deps.sh --force
bash build.sh --no-open
bash run-tests.sh (5350/5350)
bash run-integration-smoke.sh
swift test (488 executed, 1 skipped)
swift test --filter EmbeddingClustererTests (14/14)
swift test --filter SpeakerNamingSimulationRunnerTests (6/6)
scripts/ops/speaker-naming-simulator.py + --sweep
bash scripts/ops/transcripted-qa-bench.sh --mode quick (6/6)
codex-review --mode branch: clean, no accepted/actionable findings

Hold reason: PR is still draft; GitHub repo-hygiene passed after push, but build-and-test was still pending when I stopped watching. Real-audio/corpus validation is still not proven, so keep this YELLOW until CI is green and a real meeting corpus/manual audio pass confirms attribution behavior.

Reduce how many speakers a meeting asks you to name. Offline VBx clustering often splits one remote voice into several clusters that each exceed the 30s small-cluster floor, so a one-on-one call surfaces 4-7 "speakers" for a single person in the post-meeting naming sheet. - EmbeddingClusterer gains a same-voice consolidation pass that agglomeratively merges clusters whose mean embeddings clear the SpeakerNamingPolicy auto-accept bar (0.88), recomputing centroids after each merge so distinct speakers in a crowded meeting do not chain-collapse. Runs for both the system and mic offline paths via postProcess; opt-out with consolidationThreshold:nil. - New scripts/ops/speaker-naming-simulator.py models the naming count without audio or ML models: synthetic over-segmented scenarios run through a faithful port of the post-processing, reporting names_before vs names_after and a threshold sweep that shows the merge tradeoff. Exits non-zero on regression. - EmbeddingClustererTests cover consolidation, distinct-speaker preservation, chain-collapse resistance, and the end-to-end one-on-one case. - Document the simulator in docs/qa-test-bench.md. https://claude.ai/code/session_01NSNvZsNWRaqU1k27DTAXNZ

…plit post-consolidation After rebasing onto main, the negative-control suite built its false split from two clusters with identical "alex" embeddings. The new same-voice consolidation pass (cosine > 0.88) correctly merges them, so only one review row survived and the false split vanished — testSimulationReportFlagsConfusionFalseMergeAndFalseSplit failed because falseSplitIndicators was empty. Model the second "alex" cluster as a drifted same-voice over-segmentation (~0.82 cosine, below the 0.88 consolidation bar) instead. Consolidation legitimately leaves it split, the user mislabels it, and the runner's (unchanged) detector flags a real false split. No assertion was weakened; the scenario now exercises the residual false-split risk consolidation cannot fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The consolidation pass merges clusters above 0.88 — described as "the SpeakerNamingPolicy auto-accept bar" — but nothing tied the two values, so a future change to one could silently diverge from the other. Name both thresholds (SpeakerNamingPolicy.autoAcceptSimilarityThreshold and EmbeddingClusterer.sameVoiceConsolidationThreshold), wire the consolidation defaults to the named constant, and add a regression test asserting the two stay equal. Pure refactor: both values remain 0.88, behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

r3dbars mentioned this pull request Jun 14, 2026

Harden local summary failure feedback #1121

Merged

12 tasks

r3dbars force-pushed the claude/speaker-naming-simulator-guviy7 branch from 80ca3e7 to d858f9c Compare June 15, 2026 13:19

This was referenced Jun 16, 2026

feat(speakers): play/pause toggle, 3-dot overflow menu, name autocomplete #1136

Draft

feat(speaker): same-voice consolidation to cut over-segmented speakers #1140

Draft

claude and others added 5 commits June 16, 2026 05:54

Strengthen speaker naming simulator coverage

515507a

Guard speaker consolidation against known profile conflicts

7768b03

r3dbars force-pushed the claude/speaker-naming-simulator-guviy7 branch from f77aa17 to 7768b03 Compare June 16, 2026 11:01

r3dbars marked this pull request as ready for review June 16, 2026 11:01

r3dbars merged commit 8d75259 into main Jun 16, 2026
3 checks passed

r3dbars deleted the claude/speaker-naming-simulator-guviy7 branch June 16, 2026 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker-naming simulator + same-voice consolidation to cut speakers-to-name#1114

Speaker-naming simulator + same-voice consolidation to cut speakers-to-name#1114
r3dbars merged 5 commits into
mainfrom
claude/speaker-naming-simulator-guviy7

r3dbars commented Jun 13, 2026 •

edited

Loading

Uh oh!

r3dbars commented Jun 15, 2026

Uh oh!

r3dbars commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

r3dbars commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

Changes

Same-voice consolidation

Speaker-naming simulator

Docs

Local verification

Review status / blockers

Merge recommendation

Uh oh!

r3dbars commented Jun 15, 2026

Rebased onto main + cleared the merge-state CI failure

Root cause (verified locally, not just assumed)

Fixes

Tests (local swift test, the same path CI runs)

Still manual / unproven (unchanged by this PR)

Uh oh!

r3dbars commented Jun 15, 2026

Codex review follow-up: known-profile consolidation guard

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

r3dbars commented Jun 13, 2026 •

edited

Loading

Rebased onto `main` + cleared the merge-state CI failure

Tests (local `swift test`, the same path CI runs)