Skip to content

v6.5.0#122

Merged
ajslater merged 790 commits into
mainfrom
develop
May 5, 2026
Merged

v6.5.0#122
ajslater merged 790 commits into
mainfrom
develop

Conversation

@ajslater
Copy link
Copy Markdown
Owner

@ajslater ajslater commented May 5, 2026

  • Faster PDF detection
  • Faster metadata extraction
  • Defer animated image duration doublecheck to only WebP animated and only at
    handler time. Speeds up other animated images.
  • Use rich_argparse to format cli help.

ajslater added 30 commits April 25, 2025 16:00
archive gets an _optimize_in_place_on_disk flag
container copy_skipped_files turns into a hydrate_optimized paths
zip deletes files it would replace on disk and closes the file before
appending the new files.
ajslater and others added 29 commits April 20, 2026 15:29
commit 83a5253785fccc471a6dbd75b4d1eba3074c9e8c
Author: AJ Slater <aj@slater.net>
Date:   Wed Apr 29 02:18:19 2026 -0700

    bump version and news

commit 9b03aed
Author: AJ Slater <aj@slater.net>
Date:   Tue Apr 28 20:44:04 2026 -0700

    update treestamp loggins

commit 1b7bdc7
Author: AJ Slater <aj@slater.net>
Date:   Tue Apr 28 20:32:59 2026 -0700

    fix implementation of timestamp loggins to log after not before

commit 8d25937
Author: AJ Slater <aj@slater.net>
Date:   Tue Apr 28 20:25:28 2026 -0700

    Log timestamp load/dump and surface INFO at default verbose=1 (#111)

    Treestamps 4 dropped its own load/dump prints, and picopt's new logger
    mapped INFO to verbose>=2, so default runs went silent for messages the
    old termcolor Printer always showed (config-style force_verbose=True).

    - _VERBOSE_LEVEL: bump so verbose=1 (the argparse default) emits INFO,
      matching the old printer's "force_verbose" tier.
    - walk.py: log "Loading timestamps for: …" before Grovestamps init and
      "Dumping timestamps for: …" before dumpf(). INFO renders cyan via the
      existing LEVEL_STYLES.

    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit 4415376
Author: AJ Slater <aj@slater.net>
Date:   Tue Apr 28 18:41:23 2026 -0700

    Pre-walk file count for a determinate progress bar (#110)

    * Pre-walk file count for a determinate progress bar

    Without a total the bar showed only an indeterminate spinner + count.
    Add ``Walk._count_total`` that mirrors ``walk_file``'s recursion gate
    (symlinks, timestamp filenames, ignore patterns, recurse flag) so each
    non-recursing visit contributes one mark — matching the events the
    scheduler dispatches through Reporter — and pass it as ``total=`` to
    ``make_progress``.

    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    * Use os.scandir for the pre-walk count

    DirEntry caches ``is_dir`` / ``is_symlink`` from the directory listing,
    so deep trees skip an extra ``stat`` per entry — meaningful on slow or
    network filesystems.

    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    ---------

    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit 6a46d43
Author: AJ Slater <aj@slater.net>
Date:   Tue Apr 28 17:48:59 2026 -0700

    Replace termcolor Printer with loguru + rich logger (#109)

    The old Printer class wrote dots and messages directly to stdout from
    both the main process and worker processes. This change discards it for
    a centralized logger modeled on nudebomb's progress branch:

    - New picopt/log/ package: shared rich Console, loguru sink, a streaming
      CharStreamColumn progress bar, a Stats + render() summary, and a
      Reporter that bundles them and dispatches each ReportStats outcome.
    - All call sites converted: Printer.{saved,converted,lost,error,skip,
      warn,config,...} → logger.* and progress.mark_*. Worker-side dot and
      lifecycle calls dropped — workers can't reach the parent's live region,
      so per-file progress is now driven from the scheduler when each result
      comes back.
    - Centralized MARKS table in picopt/log/styles.py drives the streaming
      chars, the loguru sink colors, the summary table row colors, and the
      --help epilogue legend, so the same outcome reads identically
      everywhere. Style choices mirror the old termcolor palette so longtime
      users see the same colors for the same outcomes.
    - Scheduler now takes a Reporter; Totals removed. report.py is a pure
      data class (ReportStats) with no printer dependency.
    - doctor.py and the cli help epilogue rewritten with rich (rich.markup
      escape() for path strings).
    - Grovestamps now constructed with verbose=0 so treestamps's internal
      printer doesn't bypass the rich Live region.
    - loguru~=0.7 added; termcolor dropped.

    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit 03cdcb0
Author: AJ Slater <aj@slater.net>
Date:   Tue Apr 28 17:34:02 2026 -0700

    move set_jpeg_xmp into jpeg plugin

commit b283374
Author: AJ Slater <aj@slater.net>
Date:   Tue Apr 28 17:06:50 2026 -0700

    update devenv. treestamps 4, rich 15
) (#115)

Confuse 2.2.0 made ``AttrDict`` a properly-parameterized generic; an
unparameterized ``AttrDict`` resolves to ``AttrDict[str, object]``, so
every downstream access — ``cfg.bigger``, ``int(cfg.jobs)``,
``cfg.paths`` flowing into ``Iterable[str]``, etc. — now reads as
``object`` and stops type-checking. We had ~31 basedpyright errors after
the bump, all rooted at AttrDict consumers.

Fix: keep using confuse for what it's good at — YAML / env / CLI parsing,
type coercion, requiredness — and convert the validated ``AttrDict``
into a typed frozen dataclass once. Every downstream module then takes
``PicoptSettings`` instead of ``AttrDict``.

- ``picopt/config/settings.py``: new ``PicoptSettings`` (and nested
  ``ComputedSettings`` / ``IgnorePatterns``) frozen dataclasses mirroring
  the existing ``MappingTemplate`` schema. ``Sequence(...)`` fields are
  ``tuple[X, ...]`` because the dataclass is frozen. ``handler_stages``
  is typed ``dict[Any, Any]`` to avoid an import cycle with
  ``picopt.plugins.base``; consumers know its concrete shape.
- ``picopt/config/__init__.py``: ``PicoptConfig.get_config`` now returns
  ``PicoptSettings``. New ``_settings_from_attrdict`` does the
  field-by-field copy (typed ``Any`` so it can read AttrDict attributes
  directly — the conversion *is* the schema escape hatch).
- All consumers (``picopt/path.py``, ``picopt/report.py``, ``picopt/walk/*``,
  ``picopt/plugins/base/{handler,container}.py``) replace
  ``from confuse(.templates) import AttrDict`` with
  ``from picopt.config.settings import PicoptSettings``.
- ``Walk._init_timestamps`` builds a shallow ``Mapping[str, Any]`` for
  Grovestamps' ``program_config=`` so we don't have to serialize the
  whole frozen dataclass (especially the computed sub-fields, which
  carry ``re.Pattern`` and class-keyed ``dict`` objects).
- ``is_path_ignored`` switched from ``ignore and bool(...)`` to
  ``ignore is not None and bool(...)`` — its return type is now
  honestly ``bool``, not ``bool | re.Pattern | None``.
- ``Walk._dump_timestamps`` joins the dumpf return paths via ``str(p)``
  (treestamps' ``dumpf`` returns ``tuple[Path, ...]``).

pyproject pin: ``confuse~=2.1.0,<2.2.0`` → ``confuse~=2.2.0``.

Verification: ruff clean, basedpyright clean (0 errors, was 31),
``uv run pytest --deselect …test_timestamp_parents`` 149 passed
(deselected test is a pre-existing treestamps v4 regression),
``radon cc`` all A/B, CLI smoke run with env var + CLI flag still
layers correctly.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two warnings flagged by `make typecheck complexity`:

- ``picopt/config/__init__.py``: confuse 2.2.0 already types
  ``config.get(MappingTemplate)`` as ``AttrDict[str, object]``, so the
  defensive ``isinstance(ad, AttrDict)`` check after it was statically
  always-true (reportUnnecessaryIsInstance). Drop it and the now-unused
  ``AttrDict`` import.
- ``picopt/walk/scheduler.py``: ``_submit_ready_job`` was rank C in
  radon. Split into three:
  * ``_drop_cancelled_ready_job`` — counter cleanup for a leaf of a
    cancelled subtree (UnpackJob/RepackJob skip; only leaf jobs
    decrement the parent counter).
  * ``_track_submitted_job`` — record an in-flight future under the
    right map and update node state.
  * ``_submit_ready_job`` — orchestrate: pop, cancelled-skip, submit,
    track.

  ``_submit_ready_job`` is now rank A; the new helpers are A/B; behavior
  unchanged.

Verified: ``make typecheck`` 0/0/0; ``make complexity`` no findings;
``uv run ruff check picopt/`` clean; ``uv run pytest`` 150 passed,
6 skipped.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch picopt's CLI to RawDescriptionRichHelpFormatter so the help
output is colorized. A PicoptHelpFormatter subclass adds one extra
highlight regex matching every registered format string (PNG, ZIP,
WEBP, …) under the `metavar` named group, so format names mentioned
inside help text share the color of their corresponding FORMATS /
EXTRA_FORMATS / CONVERT_TO metavars.

Rewrite the dot-color-key / doctor-mode epilog as Rich markup instead
of capturing ANSI from a Console. rich-argparse renders descriptions
and epilogs through `console.use_theme(Theme(self.styles))`, so the
epilog can use [argparse.groups], [argparse.prog], and [argparse.args]
to keep its section header, program name, and `doctor` subcommand
visually consistent with how the rest of the help is rendered.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
detect_format eagerly iterated every frame of every animated image to
populate info["durations"], which only WebPMuxAnimatedLossless ever
consumed -- and that handler already had a webpmux-based fallback
(_read_durations) for when PIL's frame iteration was unreliable.

Removing the eager extraction speeds up format detection on animated
images (animated PNG ~11x, animated GIF ~4x in local microbenchmarks)
and simplifies the webp.walk path to always use the webpmux subprocess
fallback, which is the more reliable source per the original "PIL
frequently fails to populate per-frame durations on WebP" comment.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PdfDetector was opening every file and reading 1024 bytes for every
detection attempt -- including non-PDF files where it would read,
fail the magic check, and return None. On a directory of non-image
non-archive files, this was ~14us per file (88% of the non-PIL
detector chain cost).

PathInfo now lazily reads up to 4 KB of file head into _header_bytes
on first access. PdfDetector consumes path_info.header_bytes() instead
of reopening the file.

Local benchmark of running the full non-PIL detector chain on
non-image files (pyproject.toml, README.md, uv.lock, Makefile):
~16us before -> ~2us after (8x).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#121)

PIL's verify() closes its internal fp for some Path-opened formats
(notably GIF), which then breaks lazy attrs like is_animated and
n_frames. The previous code worked around this by opening the file
twice -- once to verify, again to read format/info.

Reading the lazy attrs before verify() preserves the safety check
(verify still runs and propagates corruption errors via the same
exception path) while collapsing to a single Image.open. Confirmed
on Pillow 12.2 against GIF (still + animated), TIFF (single + multi-
frame), MPO, animated PNG, animated WebP, JPEG, BMP, PPM.

_extract_image_info per-file (warm cache, microbenchmark):
  test_animated_gif.gif    165us -> 95us
  test_animated_png.png    145us -> 113us
  test_animated_webp.webp   66us -> 30us
  test_png.png              74us -> 35us

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ajslater ajslater merged commit b2a8b4a into main May 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant