Skip to content

Source Chunk Highlight Is Inconsistent #4

@Uditaagarwal1

Description

@Uditaagarwal1

Title

Source chunk highlight is inconsistent when clicking citation chips`

Summary

Clicking a source citation chip in the chat (for example 1, 2, 3) does not always highlight the expected chunk in the book viewer. Some citations highlight correctly, while others do nothing.

Why this is a problem

This app's key promise is source-grounded answers. If source jumps are unreliable, users cannot trust or verify cited content quickly.

Affected areas

  • climate_streamlit/app.py (citation click -> viewer jump)
  • climate_streamlit/html_sectioning.py (chunk generation and anchor stamping)

Steps to reproduce

  1. Run the app.
  2. Ask a question that returns multiple citations.
  3. Click each citation chip one-by-one.
  4. Observe: some highlights work, some fail.

Expected behavior

Every citation click should always:

  1. target one valid paragraph anchor,
  2. scroll to the anchor,
  3. apply paragraph highlight.

Actual behavior

  • Intermittent no-highlight on some citations.
  • Sometimes fallback section jump works, but exact paragraph highlight does not.

Probable root cause

Chunking and DOM annotation appear to use different paragraph segmentation behavior in some paths.
This can produce citation anchor_id values that are not present in rendered HTML.

Potential mismatches include:

  • Different block handling between format modes (p vs lists/tables/div blocks).
  • Paragraph merge behavior for short fragments changing index alignment.

Proposed fix

  1. Centralize paragraph segmentation so indexing and annotation share the same source-of-truth logic.
  2. Align anchor stamping across format modes.
  3. Add validation test: every generated anchor_id must exist in annotated HTML.
  4. Add fallback logging when anchor_id lookup fails.

Acceptance criteria

  • Citation highlight success is consistent and reproducible.
  • No missing anchors for retrieved chunks.
  • Tests cover both format types and mixed block content.

Suggested labels

bug, rag, frontend, high priority, good first issue

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions