Skip to content

bug: incremental indexing can leave stale indexed locations when languages are disabled #142

@kapral18

Description

@kapral18

Blocked

This issue is blocked on PR #135 being merged/adopted: #135

If #135 is not merged, this issue likely needs re-scoping or closing.


Problem

Incremental indexing computes supportedExtensions from SEMANTIC_CODE_INDEXER_LANGUAGES. For modified files, stale indexed locations were previously only removed when the modified file's extension is currently supported.

That means changing SEMANTIC_CODE_INDEXER_LANGUAGES (or temporarily disabling a language) can leave stale documents / stale filePaths entries in Elasticsearch for files that were previously indexed.

Suggested fix

  • Always call deleteDocumentsByFilePaths for any changed/deleted paths (M/D/R-old), regardless of whether the extension is currently supported.
  • Continue to only parse/enqueue files for indexing if the extension is supported.

Test plan

  • Unit test: feed a git diff output containing an M\tfoo.unsupported and assert it is included in deleteDocumentsByFilePaths args even though it is not enqueued.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions