Blocked
This issue is blocked on PR #135 being merged/adopted: #135
If #135 is not merged, this issue likely needs re-scoping or closing.
Problem
Incremental indexing computes supportedExtensions from SEMANTIC_CODE_INDEXER_LANGUAGES. For modified files, stale indexed locations were previously only removed when the modified file's extension is currently supported.
That means changing SEMANTIC_CODE_INDEXER_LANGUAGES (or temporarily disabling a language) can leave stale documents / stale filePaths entries in Elasticsearch for files that were previously indexed.
Suggested fix
- Always call
deleteDocumentsByFilePaths for any changed/deleted paths (M/D/R-old), regardless of whether the extension is currently supported.
- Continue to only parse/enqueue files for indexing if the extension is supported.
Test plan
- Unit test: feed a git diff output containing an
M\tfoo.unsupported and assert it is included in deleteDocumentsByFilePaths args even though it is not enqueued.
Blocked
This issue is blocked on PR #135 being merged/adopted: #135
If #135 is not merged, this issue likely needs re-scoping or closing.
Problem
Incremental indexing computes
supportedExtensionsfromSEMANTIC_CODE_INDEXER_LANGUAGES. For modified files, stale indexed locations were previously only removed when the modified file's extension is currently supported.That means changing
SEMANTIC_CODE_INDEXER_LANGUAGES(or temporarily disabling a language) can leave stale documents / stalefilePathsentries in Elasticsearch for files that were previously indexed.Suggested fix
deleteDocumentsByFilePathsfor any changed/deleted paths (M/D/R-old), regardless of whether the extension is currently supported.Test plan
M\tfoo.unsupportedand assert it is included in deleteDocumentsByFilePaths args even though it is not enqueued.