Skip to content

Add source-diverse AI rerank candidate pool#2429

Open
Mbeaulne wants to merge 1 commit into
06-18-parse_negative_constraints_without_not_no_exclude_from
06-18-build_broader_ai_candidate_pools_for_component_search
Open

Add source-diverse AI rerank candidate pool#2429
Mbeaulne wants to merge 1 commit into
06-18-parse_negative_constraints_without_not_no_exclude_from
06-18-build_broader_ai_candidate_pools_for_component_search

Conversation

@Mbeaulne

@Mbeaulne Mbeaulne commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Description

The AI rerank candidate pool now uses a source-diverse selection strategy instead of relying purely on lexical hits. Previously, the candidate pool was capped at 50 results drawn entirely from lexical matches, falling back to a plain alphabetical slice when no matches were found. This meant components from underrepresented sources (e.g. user uploads) could be crowded out entirely when a query produced many strong lexical hits from a single source.

The new approach builds the candidate pool in three layers:

  1. Up to 60 of the strongest lexical hits for the query
  2. An evenly-sampled set of up to 8 candidates per source (source-diverse browse)
  3. An evenly-sampled alphabetical slice of the full index to fill remaining slots up to the new cap of 80

The rerank base is now always set to aiCandidateMatches rather than switching between lexical and AI candidate lists depending on whether lexical results existed.

Related Issue and Pull requests

Type of Change

  • Bug fix
  • New feature
  • Improvement
  • Cleanup/Refactor
  • Breaking change
  • Documentation update

Checklist

  • I have tested this does not break current pipelines / runs functionality
  • I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

  1. Open the component search panel in the editor.
  2. Enter a query that matches many components from a single source (e.g. a library with 100+ entries).
  3. Click the AI rerank button and verify that components from other sources (e.g. user-uploaded files) still appear in the ranked results.
  4. Verify that the total candidate count sent to the reranker does not exceed 80.
  5. Run the unit tests in componentSearchV2Logic.test.ts to confirm the new source-diversity test passes.

Additional Comments

The new sampleEvenly helper picks items at uniform intervals so that the browse sample is representative of the full sorted list rather than just the top entries.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🎩 Preview

A preview build has been created at: 06-18-build_broader_ai_candidate_pools_for_component_search/88f3546

@Mbeaulne Mbeaulne changed the title Build broader AI candidate pools for component search Add source-diverse AI rerank candidate pool Jun 18, 2026
@Mbeaulne Mbeaulne marked this pull request as ready for review June 18, 2026 17:56
@Mbeaulne Mbeaulne requested a review from a team as a code owner June 18, 2026 17:56
appendUniqueMatches(
candidates,
seenDigests,
sampleEvenly(sortedIndex, AI_CANDIDATE_LIMIT).map(indexEntryToLexicalMatch),

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

[MEDIUM] The source-diversity layer (buildSourceDiverseBrowseMatches) and this alphabetical-fill layer run unconditionally, padding the AI candidate pool toward the cap (AI_CANDIDATE_LIMIT = 80) with alphabetically-early, lexically-irrelevant components even when lexical search already returned a strong source-spanning set.

This is not a correctness bug — lexical hits are preserved first (appended before the fill layers), and RERANK_EXCLUSION_THRESHOLD keeps junk from being badged. But it sends more low-signal candidates to the billed reranker on every rerank (cost/latency), and can surface irrelevant items in the unbadged tail.

Optional fix: only run the fill layer when the pool is under a smaller floor, or skip the alphabetical fill when lexical + diversity already produced a source-diverse set. Worth confirming reranker cost at 80 vs 50 candidates is acceptable.

@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from d363ca7 to 60b076d Compare June 18, 2026 19:12
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 97e37c0 to 4f20ff2 Compare June 18, 2026 19:12
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 60b076d to 455266e Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 4f20ff2 to 638c7b7 Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 455266e to 4a246ee Compare June 18, 2026 20:49
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 638c7b7 to 3f91762 Compare June 18, 2026 20:49
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 4a246ee to 8cc6222 Compare June 18, 2026 21:02
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 3f91762 to 790c426 Compare June 18, 2026 21:02
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 8cc6222 to 88f3546 Compare June 18, 2026 21:16
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 790c426 to 554c927 Compare June 18, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant