Skip to content

[bug] Bailian/DashScope realtime ASR duplicates cumulative partial results #530

@MaxxxDong

Description

@MaxxxDong

Summary

When using the Bailian / Alibaba Cloud DashScope realtime ASR provider, the raw transcript can contain repeated cumulative prefixes. This looks like the client is appending multiple result-generated interim results as final text segments.

This is not an LLM polish issue: the duplication is already present in the raw ASR transcript before polishing.

Why this appears to happen

Alibaba Cloud Fun-ASR realtime WebSocket documents result-generated as containing both interim and final sentence results. The documented finality flag is:

payload.output.sentence.sentence_end
  • sentence_end: false means the current sentence has not ended yet.
  • sentence_end: true means the current sentence is final.

The official Python SDK examples similarly use RecognitionResult.is_sentence_end(sentence) before treating a sentence as ended.

OpenLess currently appears to use end_time presence as the finality check in app/src-tauri/src/asr/bailian.rs:

let is_sentence_final = sentence.get("end_time").is_some();

st.last_result_text = trimmed.to_string();
if is_sentence_final && st.final_segments.last().map(|s| s.as_str()) != Some(trimmed) {
    st.final_segments.push(trimmed.to_string());
}

Then final output joins all collected segments:

st.final_segments.join("")

If DashScope sends cumulative/interim texts such as:

我看一下
我看一下阿里云这个
我看一下阿里云这个模型会不会...

OpenLess can produce duplicated raw transcript text by appending all of them.

Example observed output

Short dictation using Bailian/DashScope realtime ASR produced raw transcript patterns like:

那我试试看呗那我试试看呗,用阿里云的那我试试看呗,用阿里云的这个是不是可那我试试看呗,用阿里云的这个是不是可效果更那我试试看呗,用阿里云的这个是不是更效果更好一点?

Another example:

我看一下我看一下把阿里云这个我看一下把阿里云这个模型会不会输...

These are cumulative prefix repetitions, not normal acoustic recognition errors.

Expected behavior

Only final sentence text should be committed once. Interim results should update the current partial sentence, not be appended to final output.

Suggested fix

  1. In record_result, skip heartbeat events:
let is_heartbeat = sentence
    .get("heartbeat")
    .and_then(Value::as_bool)
    .unwrap_or(false);
if is_heartbeat {
    return;
}
  1. Use the documented finality flag:
let is_sentence_final = sentence
    .get("sentence_end")
    .and_then(Value::as_bool)
    .unwrap_or(false);
  1. Track text by sentence_id instead of pushing every final-looking event into a Vec. Suggested shape:
final_segments: BTreeMap<i64, String>,
partial_segments: BTreeMap<i64, String>,
  1. For sentence_end == false, update the current partial segment only.
  2. For sentence_end == true, commit that sentence_id once and remove the partial.
  3. Keep a prefix/overlap merge guard to tolerate duplicate/replayed server events.
  4. Add tests for multiple partial results, duplicate final events, heartbeat events, and multiple sentence IDs assembled in order.

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions