Skip to content

feat(embedding): add auto_rebuild for embedding dimension mismatch#2740

Open
wangling12 wants to merge 1 commit into
volcengine:mainfrom
wangling12:main
Open

feat(embedding): add auto_rebuild for embedding dimension mismatch#2740
wangling12 wants to merge 1 commit into
volcengine:mainfrom
wangling12:main

Conversation

@wangling12

Copy link
Copy Markdown

Description

Add embedding.auto_rebuild configuration to automatically rebuild the vector index when the embedding dimension in the configuration differs from the existing collection. Previously, dimension mismatches between config and the existing collection would raise EmbeddingRebuildRequiredError with no automated recovery path. This feature extracts the duplicated auto-rebuild logic into a shared helper, adds proper error handling for drop_collection() failures (which previously caused infinite recursion), and includes a recursion guard.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Add auto_rebuild boolean field to EmbeddingConfig (default: false) in embedding_config.py
  • Extract _auto_rebuild_collection helper function in collection_schemas.py to deduplicate the auto-rebuild logic that was previously copy-pasted across two dimension-mismatch branches (old collections without embedding metadata, and collections with embedding metadata)
  • Check drop_collection() return value — raise EmbeddingRebuildRequiredError on failure instead of silently recursing into an infinite loop
  • Add _allow_recurse guard parameter to init_context_collection to prevent recursive rebuild cycles
  • Add 4 test cases in test_collection_schemas.py covering all auto-rebuild paths
  • Document embedding.auto_rebuild in both EN and ZH configuration guides (docs/en/guides/01-configuration.md, docs/zh/guides/01-configuration.md)

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

New test cases:

  • test_auto_rebuild_drops_and_recreates_on_dimension_mismatch — verifies collection is dropped and recreated when dimension mismatches and auto_rebuild=true
  • test_auto_rebuild_disabled_raises_error_on_dimension_mismatch — verifies EmbeddingRebuildRequiredError is raised when auto_rebuild is not set
  • test_auto_rebuild_raises_error_when_drop_fails — verifies the core bug fix: drop_collection() returning False raises an error instead of infinite recursion
  • test_auto_rebuild_old_collection_without_embedding_metadata — verifies auto-rebuild works for legacy collections that lack embedding metadata but have a Dimension field

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

@github-actions

Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 92
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Logic Bug

The allow_recurse check happens after dropping the collection. If allow_recurse=False, this drops the collection then raises an error, leaving the collection in a dropped state. The check should be reordered before the drop operation.

dropped = await storage.drop_collection()
if not dropped:
    raise EmbeddingRebuildRequiredError(
        "Failed to drop existing collection for auto-rebuild. "
        "Manual rebuild required."
    )
if not allow_recurse:
    raise EmbeddingRebuildRequiredError(
        "Auto-rebuild recursion detected. Manual rebuild required."
    )

@github-actions

Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Add embedding.auto_rebuild config to automatically rebuild vector index
when embedding dimension changes between config and existing collection.

- Add auto_rebuild field to EmbeddingConfig (default: false)
- Extract _auto_rebuild_collection helper with drop failure handling
  and recursion guard to prevent infinite loops
- Add 4 test cases covering success, disabled, drop-failure, and
  legacy-collection paths
- Document auto_rebuild in EN/ZH configuration guides
@wangling12 wangling12 force-pushed the main branch 4 times, most recently from 3d3848f to 35ccc64 Compare June 21, 2026 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant