Conversation
…f aborting the cycle A single CSV file in the scan set that does not match any pattern in csv-config.yaml used to abort the entire vectorization cycle, preventing all other files (including brand new / modified markdown) from being indexed. In follow mode on production this meant every 5-minute cycle failed and no new documents were vectorized. Introduce a sentinel error `csv.ErrNoCSVConfig` and teach `VectorizerService.expandCSVFiles` to recognise it: such files are now logged as a warning and skipped while the rest of the batch continues. All other CSV errors (parse failures, invalid headers, IO errors, etc.) still propagate unchanged. - Add csv.ErrNoCSVConfig sentinel and wrap the four "no pattern matches" call sites in internal/ingestion/csv/reader.go with it. - Update expandCSVFiles to errors.Is check, warn+skip, and emit a summary line when any files were skipped. - Update the existing csv reader test to assert on ErrNoCSVConfig via errors.Is, and add expand_csv_test.go covering skip behaviour, error propagation for non-config errors, and the all-unmatched edge case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概要
follow モードでの vectorize サイクルが、CSV ファイル 1 件の設定ミスで毎回サイクル全体が失敗し、新規/更新ファイル(Markdown 含む)が一切インデックスされない問題を修正。
原因
#59 (
18aefdf) で導入されたVectorizerService.expandCSVFilesは、csvReader.Read{File,Content}が返す「no pattern matches」エラーをそのまま上流に伝播させるため、バッチ全体が即時失敗する fail-fast になっていた。変更点
internal/ingestion/csv/errors.goに sentinelErrNoCSVConfigを追加。reader.goの 4 箇所 (ReadFile/ReadContent/GetDetectedColumns/GetDetectedColumnsFromContent) の該当エラーをfmt.Errorf(\"%w: %s\", ErrNoCSVConfig, path)に書き換え。エラーメッセージはファイルパスを保持。service.goのexpandCSVFilesにerrors.Is(err, csv.ErrNoCSVConfig)の分岐を追加: 該当ファイルはlog.Printf(\"Warning: Skipping CSV file %s: ...\")で skip し、サイクルは継続。他のエラー (IO / parse / invalid header 等) はこれまで通り propagate。skip 件数はサマリとして 1 行出力。reader_test.goのTestReader_ReadFile_NoMatchingPatternをerrors.Is(err, ErrNoCSVConfig)ベースに更新し、file path がエラーに含まれることもアサート。expand_csv_test.goを新設し、(a) unmatched CSV が skip されマッチ分だけ expand される、(b) 非ErrNoCSVConfigエラーは伝播、(c) 全件 unmatched でも markdown だけは通過、の 3 ケースを追加。非目的
f0b8154/f22c0ec/f16c55d) は本問題とは無関係(検索レスポンス側の話)。検証
TestExpandCSVFiles_SkipsFilesWithoutMatchingConfig/_PropagatesNonConfigErrors/_AllUnmatchedStillSucceedsの 3 本追加、TestReader_ReadFile_NoMatchingPattern更新、全て pass