Skip to content

v3.0.0#128

Merged
ajslater merged 321 commits into
mainfrom
develop
May 4, 2026
Merged

v3.0.0#128
ajslater merged 321 commits into
mainfrom
develop

Conversation

@ajslater
Copy link
Copy Markdown
Owner

@ajslater ajslater commented May 4, 2026

  • Breaking Changes
    • get_config() now returns a ComicboxSettings dataclass, not a Confuse
      AttrDict. Comicbox constructor now accepts this dataclass instead of an
      AttrDict
  • Fixes
    • Stop emitting metron.cloud/{genre,location,reprint,role,story,tag}/...
      URLs for Metron identifiers — those paths 404 because Metron has no public
      web pages for those types (only API endpoints). The numeric Metron ID is
      still preserved on the identifier.
    • Security against suspicious archive paths when extracting pages and
      metadata to the filesystem.
  • Performance
    • Reducing startup time for new instances of comicbox.
    • General performance improvements for reading metadata from many files.
    • Special multiprocessing and async methods
      comicbox.process.iter_process_files() and
      comicbox.process.aread_metadata() for reading large batches of files at
      once.
    • Comicbox.get_cover_page(skip_metadata=True) skips metadata parsing for
      callers that just need the first archive image as a thumbnail. Removes
      per-call schema instantiation and Union resolution overhead.
    • Drop the DEBUG-level emission of intentionally-ignored Marshmallow
      validation errors (Invalid input type. from Union variant misses,
      Field may not be null. from sparse fields). These were context-free
      noise — ~50 lines per archive at DEBUG that read like real failures. Real
      schema errors still log at WARNING with full context.
  • Features
    • Add Age Rating conversion function
      comicbox.enums.maps.to_metron_age_rating(value: str | Enum) ->
      MetronAgeRatingEnum | None

ajslater and others added 29 commits April 19, 2026 19:31
Loguru's logger object isn't picklable into ProcessPoolExecutor
workers, so callers like codex couldn't get worker log output to
match their parent-process format. Adds a worker_log_config dict
({level, format, sink}) that runs through the executor initializer
and reconfigures loguru in each worker via init_logging. Also adds
enqueue=True to the default sink for thread-safe logging.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* upgrade confuse to 2.2.0; replace AttrDict with typed Settings dataclass

confuse 2.2.0 makes AttrDict properly generic, so per-key types resolve
to `object` and consumers across the box mixins fail typecheck. Convert
the validated AttrDict into a frozen `Settings` dataclass once in
get_config() and propagate that typed object everywhere; confuse stays
confined to comicbox/config.

- New comicbox/config/settings.py defines `Settings` and
  `ComputedSettings` (frozen, slots).
- get_config() returns Settings; new _build_settings() does the
  conversion. post_process_set_for_path() rebuilt around
  dataclasses.replace.
- FrozenAttrDict deleted — frozen dataclass enforces immutability.
- process.py passes Settings through pickle directly so workers skip
  re-running confuse.
- Drops dead `dest_path is None` checks now that the field is required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* rename Settings to ComicboxSettings

So that client programs that already define their own `Settings` type
don't collide on import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* flatten ComputedSettings into ComicboxSettings

The hierarchical split was a confuse-template setup convenience, not a
logical grouping — there's no API benefit to keeping client code
chained through `cfg.computed.X`. Promote the six computed fields onto
ComicboxSettings under a clearly labeled comment block. The confuse
template's nested `computed` MappingTemplate is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_get_source_config_metadata` early-returned an empty list whenever the
caller set `metadata_format`, because `fmt not in self._config.read`
compared a string against a frozenset of `MetadataFormats` enums —
always True. The conversion + correct membership check happens in the
try block on the next lines, so the early return was both wrong and
redundant.

Adds tests/unit/test_sources.py covering the four behavioral cases:
fmt-in-read, no-fmt, fmt-not-in-read, invalid-fmt.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
)

read_config_sources used config.add() for the Mapping branch, which
appends to the BOTTOM of confuse's source priority stack — below the
config_default.yaml loaded by config.read() at the top of the
function. So any caller passing a dict / Mapping override (e.g.
`get_config({"comicbox": {"compute_pages": True}})`) silently got the
default instead. Switch to config.set() so Mapping args land on top,
matching set_args() for the Namespace branch.

Surfaced by a downstream Codex migration that hit dead Mapping
overrides; covered now by tests/unit/test_config_layering.py.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The template arms for `read`, `write`, `export`, `delete_keys`,
`read_ignore`, and `print` previously combined `frozenset` (a
pass-through marker) and `Sequence(str)` (list-of-strings coercion).
That works for the common YAML/CLI list path but rejects callers
passing a `set` / `tuple` / `frozenset` literal — which is logically
fine for fields whose post-compute value is always a frozenset.

Replaces the per-field unions with `OneOf((set, frozenset, tuple, list))`
(`print` also accepts `str` for the historical phase-char form). The
`_build_settings` boundary already calls `frozenset(...)` on these
values, so any of the four containers normalize correctly.

Also adapts `compute_config`'s helpers — Subview iteration only
supports dict/list source values, so user-supplied set/frozenset/tuple
inputs would error before reaching the template. New `_raw_or_empty`
pulls the Python value via `.get()` and explicitly rejects mappings
with a clear error (dict iteration would silently accept dict input
otherwise). `_parse_print` now accepts a phase-char string OR any
iterable of phase chars.

Path-list fields (`paths`, `import_paths`, `metadata_cli`) keep their
existing `Sequence(...)` form with element-type validation — that
trade-off felt worth keeping.

14 new tests in tests/unit/test_config_container_inputs.py cover the
four container types per field and assert mapping rejection.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Callers that only want a thumbnail (e.g. codex's CoverThread) don't
need the full ComicInfo/CoverImage hint resolution. Parsing the
metadata for every cover dominates the cost of cover extraction
and emits a flood of debug-bucket Union ValidationErrors that look
like real failures in DEBUG logs.

When skip_metadata=True, bypass generate_cover_paths entirely and
read archive index 0 directly. This drops per-call schema
instantiation, Union resolution, and path normalization.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* compact news

* update deps
…122)

ClearingErrorStoreSchema previously split each schema's errors into
two buckets: ignored ones logged at DEBUG, real ones at WARNING.
The DEBUG bucket only ever held errors from ``_ignore_errors`` —
``Field may not be null.`` (sparse-field tolerance) and
``Invalid input type.`` (Union variant misses) — both of which are
internal mechanics, not operator-actionable signal. Each Union miss
emitted one ``ValidationError - {'_schema': ['Invalid input type.']}``
line per field per archive, drowning the genuinely useful per-source
DEBUG messages emitted by ``_except_on_load``.

Filter ignored errors at split time, log only WARNINGs. Real schema
failures still surface with full context (path, schema class,
normalized message). Collapses the dual-bucket _split_*_errors
methods into _filter_* + _log_warnings.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* compact news

* update deps

* metron: drop broken URL slugs for genre, location, reprint, role, story, tag

Metron has no public web pages for these types — only API endpoints — so
URLs like https://metron.cloud/genre/3 always 404. Stop emitting them.
The numeric Metron ID is still preserved on the identifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shortens the import path for the helper from
comicbox.enums.maps.age_rating to comicbox.enums.maps so downstream
callers can reach it without drilling into the submodule.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove unused module/class constants: _COMMENT_ARCHIVE_TYPES, SUFFIXES,
  _LOG_FORMAT, comet.py IDENTIFIER_TAG/IS_VERSION_OF_TAG, comictagger.py
  IDENTIFIER_TAG/PAGES_TAG, XmlCountryField (and now-orphaned imports
  RarFile, ZipFile, CountryField).
- Fix latent bug in TrapExceptionsMeta: `attr_name in "deserialize"` was a
  substring check that wrapped any callable whose name was a substring of
  "deserialize" (e.g. "er", "size", "ali"). Use the existing _WRAP_METHODS
  tuple instead so only the exact `deserialize` method is wrapped.
- Simplify _get_pdf_enabled() to a plain `import pdffile` probe; the
  except-arm stub import had no effect.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate the optional comicbox-pdffile integration into one module
(comicbox/_pdf.py) and delete the hand-maintained pdffile_stub.py.

Previously six call sites each duplicated a `try: from pdffile import X /
except: from pdffile_stub import X` block, and the stub class mirrored
the real PDFFile API method-for-method — silent drift risk every time
upstream pdffile shipped.

Now:
- comicbox/_pdf.py is the single source of truth for PDF_ENABLED,
  PDFFile, and PAGE_FORMAT_VALUES. When pdffile is absent, PDFFile is
  None at runtime; type checkers see the real class via TYPE_CHECKING.
- Every call site that touches PDFFile is gated by `if PDF_ENABLED`.
- The `case PDFFile():` arm in box/archive/archive.py is lifted to an
  `if PDF_ENABLED and isinstance(archive, PDFFile):` guard above the
  match (the match form would fail when PDFFile is None).
- config/__init__.py reads PAGE_FORMAT_VALUES instead of iterating an
  empty stub Enum.

Verified with `pdffile` installed (307/307 tests pass) and in a fresh
venv without it (PDF_ENABLED=False, CBZ archives still work, PDF files
raise UnsupportedArchiveTypeError, CLI shows the "not installed" hint).

Net: -70 lines across 9 files.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* compact news

* update deps

* update news and version to alpha 4

* update deps

* rename function path in NEWS

* bump alpha version to 3.0.0a5
@ajslater ajslater merged commit beab7ad into main May 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant