Skip to content

Expand component search indexing fields#2423

Open
Mbeaulne wants to merge 1 commit into
masterfrom
06-18-expand_component_search_indexing_fields
Open

Expand component search indexing fields#2423
Mbeaulne wants to merge 1 commit into
masterfrom
06-18-expand_component_search_indexing_fields

Conversation

@Mbeaulne

@Mbeaulne Mbeaulne commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Description

Expands the component search index to include richer input/output details and a new metadata match field.

Previously, the io searchable field only contained input and output names. It now includes descriptions, types, and annotations for each input and output spec. A new metadata field has been added that indexes component-level metadata annotations (with a blocklist for noisy keys like python_original_code, editor state, and similar large/irrelevant blobs) as well as the source label and published_by value from the component reference.

The MatchField type and all related scoring, labeling, and UI display logic have been updated to include metadata alongside the existing fields. Annotation values longer than 500 characters are excluded from indexing to avoid polluting search with large blobs.

Related Issue and Pull requests

Type of Change

  • Bug fix
  • New feature
  • Improvement
  • Cleanup/Refactor
  • Breaking change
  • Documentation update

Checklist

  • I have tested this does not break current pipelines / runs functionality
  • I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

  1. Open the component dashboard and search for a term that appears in a component's metadata annotations (e.g. a framework name like sklearn or lightgbm).
  2. Verify the result surfaces with metadata listed as a matched field.
  3. Search for a publisher email address and confirm the matching component appears.
  4. Search for a term that exists only in python_original_code or other excluded annotation keys and confirm it does not return results.
  5. Search for an input/output description or type (e.g. parquet, artifact) and confirm results appear with io as the matched field.

Additional Comments

The annotation exclusion list (ANNOTATION_KEYS_EXCLUDED_FROM_SEARCH) and the 500-character value length cap are the primary mechanisms for keeping the metadata index clean. These can be extended as new noisy annotation keys are identified.

@github-actions

Copy link
Copy Markdown

🎩 Preview

A preview build has been created at: 06-18-expand_component_search_indexing_fields/327681f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant