feat: incremental folder scan — skip unchanged files via MD5 manifest#239
feat: incremental folder scan — skip unchanged files via MD5 manifest#239jwchmodx wants to merge 1 commit intoHKUDS:mainfrom
Conversation
…KUDS#156) When `incremental=True` is passed, `process_folder_complete` computes the MD5 digest of each discovered file and compares it against a per-folder manifest stored in `config.working_dir`. Files whose digest matches the manifest are skipped; new or changed files are (re-)processed. Failed files are removed from the manifest so they are automatically retried on the next run. New helpers on BatchMixin: - `_file_md5(path)` – compute hex MD5 of a file - `_manifest_path(folder_path)` – locate the JSON manifest for a folder - `_load_manifest(path)` / `_save_manifest(path, data)` – read/write manifest New test file: tests/test_incremental_folder_scan.py (12 tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for your contribution!
|
Summary
Closes #156 — adds
incremental=Truetoprocess_folder_completeso that unchanged files are skipped on subsequent runs.How it works
When
incremental=Trueis passed:process_folder_completecomputes the MD5 digest of every discovered file.config.working_dir(one manifest per source folder, keyed by a hash of the folder path so multiple folders coexist safely).The
incremental=Falsedefault means existing code is completely unaffected.New API
Changes
raganything/batch.pyincrementalparam + 4 helper methods + manifest update logictests/test_incremental_folder_scan.pyTest plan
test_processes_all_files_on_first_run— first run always processes everythingtest_skips_unchanged_files_on_second_run— unchanged files are skippedtest_reprocesses_changed_file— modified file is re-processedtest_processes_newly_added_file— new files are picked uptest_failed_file_is_retried_next_run— failed files removed from manifesttest_non_incremental_does_not_create_manifest— no side effects whenincremental=FalseAll 12 pass; the 4 pre-existing failures in
test_callbacksandtest_chinese_cid_fontare unrelated to this change.🤖 Generated with Claude Code