fix: fall back to catalog search when search_business_context returns empty (closes #61) by shirshanka · Pull Request #69 · datahub-project/analytics-agent

shirshanka · 2026-05-31T06:51:41Z

Summary

When a dataset exists in DataHub but has no docs, glossary terms, domains, or data products, all four search_business_context sub-searches return empty and the agent was incorrectly telling the user the entity doesn't exist.
_search_business_context_impl now detects the all-empty case and automatically runs a general catalog search, returning results as catalog_search + a note explaining the gap — so the agent sees the entity exists before drawing conclusions.
The context quality assessor prompt is updated to cap the score at 3 (Fair) when only catalog_search was found (no governed definition), while still allowing the assessor to score 1–2 if the entity description was unhelpful. Scores of 4–5 continue to require a governed definition (glossary, doc, domain, or data product).

Test plan

11 new unit tests in tests/unit/test_search_business_context.py — all passing (271 total)
Verified against a fresh datahub docker quickstart instance: _search_business_context_impl("SampleHiveDataset") returns catalog_search with 3 hits and a note, rather than four empty results
End-to-end: ask "tell me about SampleHiveDataset" — agent should acknowledge the dataset exists and describe its schema rather than saying it doesn't exist

🤖 Generated with Claude Code

… empty (closes #61) When a dataset exists in DataHub but has no docs, glossary terms, domains, or data products, all four business-context sub-searches return empty and the LLM was incorrectly telling the user the entity does not exist. _search_business_context_impl now detects the all-empty case and automatically runs a general catalog search, returning the results as `catalog_search` so the agent can confirm entity existence before drawing conclusions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…core The assessor LLM could see catalog_search hits in a search_business_context result and score context as Fair when all governance searches (docs, glossary, domains, data products) were actually empty. Add an explicit rule to the assessment prompt clarifying that catalog_search is a last-resort existence check only and must not raise the score. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A rich dataset description from catalog_search is genuinely useful, so treating it as "empty" was wrong. The correct rule: no governed definition (glossary, docs, domain, data product) → score cannot exceed 3 (Fair), but within 1-3 the assessor should still use get_entities results to judge how informative the context was. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

shirshanka and others added 4 commits May 30, 2026 22:26

chore: fix import ordering in test_search_business_context

2a1455f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

shirshanka merged commit 173431b into main May 31, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fall back to catalog search when search_business_context returns empty (closes #61)#69

fix: fall back to catalog search when search_business_context returns empty (closes #61)#69
shirshanka merged 4 commits into
mainfrom
fix/issue-61-catalog-fallback

shirshanka commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shirshanka commented May 31, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant