Skip to content

fix: fall back to catalog search when search_business_context returns empty (closes #61)#69

Merged
shirshanka merged 4 commits into
mainfrom
fix/issue-61-catalog-fallback
May 31, 2026
Merged

fix: fall back to catalog search when search_business_context returns empty (closes #61)#69
shirshanka merged 4 commits into
mainfrom
fix/issue-61-catalog-fallback

Conversation

@shirshanka

Copy link
Copy Markdown
Contributor

Summary

  • When a dataset exists in DataHub but has no docs, glossary terms, domains, or data products, all four search_business_context sub-searches return empty and the agent was incorrectly telling the user the entity doesn't exist.
  • _search_business_context_impl now detects the all-empty case and automatically runs a general catalog search, returning results as catalog_search + a note explaining the gap — so the agent sees the entity exists before drawing conclusions.
  • The context quality assessor prompt is updated to cap the score at 3 (Fair) when only catalog_search was found (no governed definition), while still allowing the assessor to score 1–2 if the entity description was unhelpful. Scores of 4–5 continue to require a governed definition (glossary, doc, domain, or data product).

Test plan

  • 11 new unit tests in tests/unit/test_search_business_context.py — all passing (271 total)
  • Verified against a fresh datahub docker quickstart instance: _search_business_context_impl("SampleHiveDataset") returns catalog_search with 3 hits and a note, rather than four empty results
  • End-to-end: ask "tell me about SampleHiveDataset" — agent should acknowledge the dataset exists and describe its schema rather than saying it doesn't exist

🤖 Generated with Claude Code

shirshanka and others added 4 commits May 30, 2026 22:26
… empty (closes #61)

When a dataset exists in DataHub but has no docs, glossary terms, domains, or
data products, all four business-context sub-searches return empty and the LLM
was incorrectly telling the user the entity does not exist.

_search_business_context_impl now detects the all-empty case and automatically
runs a general catalog search, returning the results as `catalog_search` so the
agent can confirm entity existence before drawing conclusions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…core

The assessor LLM could see catalog_search hits in a search_business_context
result and score context as Fair when all governance searches (docs, glossary,
domains, data products) were actually empty. Add an explicit rule to the
assessment prompt clarifying that catalog_search is a last-resort existence
check only and must not raise the score.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A rich dataset description from catalog_search is genuinely useful, so
treating it as "empty" was wrong. The correct rule: no governed definition
(glossary, docs, domain, data product) → score cannot exceed 3 (Fair),
but within 1-3 the assessor should still use get_entities results to judge
how informative the context was.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shirshanka shirshanka merged commit 173431b into main May 31, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant