Skip to content

Support repeated --glob/--exclude flags on download/delete/list#769

Merged
jjjake merged 3 commits into
masterfrom
feature/repeatable-glob-flags
May 15, 2026
Merged

Support repeated --glob/--exclude flags on download/delete/list#769
jjjake merged 3 commits into
masterfrom
feature/repeatable-glob-flags

Conversation

@jjjake
Copy link
Copy Markdown
Owner

@jjjake jjjake commented May 14, 2026

Summary

  • ia download, ia delete, and ia list now accept --glob (and --exclude on download) repeated, pipe-separated, or mixed — all four forms produce the union of patterns. Existing single-flag and pipe usage is unchanged.
  • Item.get_files() now pipe-splits list-element patterns too, so API callers can pass mixed forms like glob_pattern=['*.mp4|*.xml', '*.jpg'].
Form Before After
--glob "a" works works
--glob "a|b" works (split on |) works (split on |)
--glob a --glob b silently keeps only b works
--glob "a|b" --glob c silently keeps only c works

Implementation notes

  • New _flatten_pipe_patterns() helper in item.py normalizes a glob arg (str, list, or mixed) to a flat list. Item.get_files() uses it for both glob_pattern and exclude_pattern.
  • CLI flags switched to nargs=1, action="extend" — same pattern as --format. args.glob / args.exclude are now list[str] | None; downstream code already accepted both shapes via Item.get_files().
  • ia_list's local pipe-split was updated to flatten the new list shape via itertools.chain.

This PR also carries forward release scaffolding that was sitting uncommitted in the working tree before branching: changelog entries for #753/#767/#768 and the 5.9.0.dev1 version bump.

Test plan

  • pytest tests/test_item.py tests/cli/test_ia_download.py tests/cli/test_ia_list.py tests/cli/test_ia_delete.py — 63 passed
  • pytest tests/ (excluding live-network tests) — 239 passed, 37 skipped
  • ruff check clean on all touched files
  • ia download --help renders the new help text
  • Smoke test against real archive.org item once CI passes:
    ia download nasa --glob "*.jpg" --glob "*meta.xml" --dry-run
    ia download nasa --glob "*.jpg|*meta.xml" --glob "*.gif" --dry-run
    ia list nasa --glob "*.jpg" --glob "*meta.xml"
    
  • Existing live-network tests (test_glob, test_exclude) continue to pass on CI — they exercise back-compat for the single-flag and pipe forms.

🤖 Generated with Claude Code

jjjake and others added 3 commits May 14, 2026 12:55
Previously --glob accepted a single string and split on `|` to form
multiple patterns. Users would naturally reach for `--glob A --glob B`,
which argparse silently truncated to the last value -- files were
missed without warning. This change accepts all four forms consistently:

    --glob "a"               -> match a
    --glob "a|b"             -> match a or b   (unchanged)
    --glob a --glob b        -> match a or b   (new)
    --glob "a|b" --glob c    -> match a, b, c  (new)

The same applies to --exclude on `ia download` and --glob on
`ia delete` / `ia list`.

Implementation:

- Added _flatten_pipe_patterns() in item.py to normalize a glob arg
  (str or list[str], elements optionally `|`-separated) into a flat
  list. Item.get_files() now uses it for both glob_pattern and
  exclude_pattern, so API callers passing mixed forms like
  glob_pattern=['*.mp4|*.xml', '*.jpg'] now work.

- Switched the three CLI flags to `nargs=1, action="extend"` -- the
  same pattern already used by --format, --source, --exclude-source.
  args.glob / args.exclude are now list[str] | None; downstream code
  was already accepting both shapes via Item.get_files().

- ia_list's local pipe-split was updated to flatten the new list shape
  via itertools.chain.

Tests cover the four call styles in tests/cli/test_ia_download.py
using the in-process ia_call + IaRequestsMock + --dry-run pattern
(offline). API tests in tests/test_item.py extend
test_get_files_by_glob{,_with_exclude} to cover the new mixed-form
inputs.

Also includes carried-over 5.9.0 release scaffolding (changelog
entries for #753, #767, #768 and version bump to 5.9.0.dev1) that was
present in the working tree before branching from master.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the "patterns may also be combined with `|`" clause from the
--glob and --exclude help strings. Repetition (`--glob a --glob b`) is
the discoverable form; the pipe form keeps working but is documented
in docs/source/cli.rst rather than the per-flag help. Also normalize
"files whose filename matches" -> "files matching" on `ia download`
to match `ia delete` and `ia list`.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups from the review:

1. Promote `flatten_pipe_patterns` to `internetarchive/utils.py` (was
   a module-private `_flatten_pipe_patterns` in item.py) so the CLI
   can reuse it. `ia_list.filter_files` now calls the shared helper
   instead of duplicating the `chain.from_iterable(...)` algorithm
   inline.

2. Widen `glob_pattern` / `exclude_pattern` annotations from
   `str | None` to `str | list[str] | None` everywhere they were
   typed -- `Item.download()` and the `api.py` wrappers
   (`get_files`, `download`, `delete`). The runtime already accepted
   both shapes via `Item.get_files()`; only the annotations and
   docstrings were stale.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jjjake jjjake merged commit ecb129d into master May 15, 2026
23 checks passed
@jjjake jjjake deleted the feature/repeatable-glob-flags branch May 15, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant