comicbox 3 alpha 5#123
Merged
Merged
Conversation
ajslater
added a commit
that referenced
this pull request
May 4, 2026
* replace tag feature
* use const
* replace and delete options
* Squashed commit of the following:
zipfile with remove
* update deps
* update machine image for circleci
* update deps
* update deps and type hint
* expand linear yaml help to be more useful'
* update deps
* update eslint
* update deps
* bump news and version
* update pdf pages. binary difference with new mupdf
* update docker images
* fix make install dependencies
* add jxl to image extensions
* fix ignoring macos resource forks
* resource fork test file
* update deps
* adjust news
* Squashed commit of the following:
type annotate magic metron field functions and make all params kwargs
use eslint outside of editor
update deps, new ruff rules. lint & format
* add venv upgrade script
* ignore PERF203
* update deps and install pdffile
* update deps. appease typechecker. new eslint.config
* Squashed commit of the following:
commit e27050fbd42f0cf8e549871cc06c70f041672306
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 7 21:36:49 2024 -0800
rename deserializeMeta class to TrapExcepionsMeta
fix type issues with field metaclass wrapper
* add eslint-plugin-json-schema-validator
* update deps and lint
* use mdx instead of markdown
* remove unused import
* remove superfluous plugins. remove first level globs
* update deps
* Squashed commit of the following:
fix notes parsing for metron and many variations
move notes parsing into another file.
add comicinfo metron origin test
rename modules to not shadow python builtins
fix binary pdf files for new mupdf
* bump version and news
* fix type errors
* format
* refactor dynamic class creation to appease typchecker
* add libmupdf docs
* Simplify Identifier URL construction for Metron pk ids.
* update deps
* fix story arc parsing. bump version
* update dockerfile with modern node
* Squashed commit of the following:
Comicbox 2.0
* Resolve circular import if not installed with \[pdf\] option.
* Make archive comments that aren't ComicBookInfo JSON log as debug comments
more often.
* update package links
* add more aliases for comicvine sources
* ensure dattetimes from archives are timezone aware
* update deps and bump version
* bump news
* drop version back appropriately
* fix alias tree builder
* update deps, typecheck with ty
* alphabetize comicbox fields
* uv_build
* update pyproect, eslint config, deps
* update deps
* update deps
* normalize Trade Paper Back into Trade Paperback
* update deps
* update deps
* Squashed commit of the following:
update to xmltodict 1.0. remove special code for xmltodict #text type conversion bugs
compact code for xml_fields that get cdata
remove cdatata mixn from xml lists
* update deps
* pyright ignore
* fix age rating coercion for CIX"
* add github issue code example.
* update deps
* update deps
* replace poetry with uv for run script
* update deps
* no support for python 3.14
* explicitly build with 3.13 trixie
* remove ruamel.yaml.clib from test docker
* update deps
* update deps
* new verson. fix comicbox.json dump crash
* remove unused typing exceptions. add typing exceptions for ty foolishness
* update deps add ty to makefile
* python 3.14 support
* bump version and news
* update deps
* ignore ty type ignores
* update deps
* update deps
* Squashed commit of the following:
commit 259e561
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:51:31 2025 -0800
use released pdffile
commit 4136a3b
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:41:28 2025 -0800
use a proper base RenderModule and clean loads for tabs because it breaks yaml
commit 3426cf0
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:20:05 2025 -0800
bump deps
commit 9fcaded
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:19:49 2025 -0800
reduce complexity of dump
commit f96d27a
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:12:05 2025 -0800
gate writing pdf metadata on delete all or data exists
commit 7415b82
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:08:26 2025 -0800
optimize pdf writing by writing pdf data in the same context and only saving once
commit 2bd0f2c
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:57:26 2025 -0800
rename legacy embedded variables to LEGACY_NESTED equivalents
commit 5222159
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:45:06 2025 -0800
lint
commit 5d38acb
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:40:34 2025 -0800
fix print test
commit 65410c7
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:37:18 2025 -0800
fix most tests
commit 19d2dfe
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 17:40:51 2025 -0800
fix pdf xml tests
commit f6bf854
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 16:50:08 2025 -0800
fix tests for pdf_json
commit 590ffb8
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 14:44:51 2025 -0800
fix accepting flexible datetimes from pdfs
commit e18925f
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:37 2025 -0800
fix pdf tests using removed params
commit 55725b5
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:19 2025 -0800
fix set subtraction
commit 2673e3a
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:57:35 2025 -0800
add bpepple to news
commit 3de741d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:56:30 2025 -0800
update schemas doc for pdf embeds
commit 484737d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:52:54 2025 -0800
add bpepple to news
commit 0b6cdaf
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:47:31 2025 -0800
bump version and news
commit bda414c
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:42 2025 -0800
pdf write to embed files. pdf metadata keywords write tags.
commit 29fd04b
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:12 2025 -0800
ty ignore
commit b795c49
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:34:20 2025 -0800
add ty ignores
commit ce3ef91
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:33:43 2025 -0800
update pdffile stub
commit fd2f4a0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:38 2025 -0800
update deps
commit 267d9d0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:30 2025 -0800
add alpha pdffile to sources
commit 041ce67
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:30:55 2025 -0800
add pythondevmode to test script
* fix typing
* update deps
* Squashed commit of the following:
commit b31f22e6d178fcc1a5896c0dd7f680c26bc91657
Author: AJ Slater <aj@slater.net>
Date: Mon Dec 1 20:03:13 2025 -0800
typecheck with ty
* update deps
* complexipy & group deps
* reduce complexity
* update py7z library
* remove unused ty ignores
* ty fixes and ignores
* update deps
* update deps
* remove unused ty ignore
* update deps
* remove unusued ty ignores
* use OneOf instead of list syntax sugar for confuse
* update deps
* Raw yaml datetimes (#102)
* use OneOf instead of list syntax sugar for confuse
* update deps
* let yaml have raw yaml datetimes instead of strings
* use simplejson decode errors
* bump news and version
* fix test script
* fix lint backend groups
* remove unused groups
* fix test script
* really fix test script
* use grooup lint in tests for jsonschema
* tweak dep version ranges
* update deps. use dockerfmt. ruff changes inlie ifs to ors
* update dockerfile base
* update deps, remove unused ty warning ignore
* update deps add eslint plugins
* add mbake
* update deps
* fix tests for new pymupdf
* Squashed commit of the following:
commit 1fb394e109263188a16c4addeaab87bbdfdf882e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:25 2026 -0800
generate-schema scripts
commit fc9b4f5c27db827ae1592010b01708865cf3733e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:08 2026 -0800
format schemas
commit 9ccdf70d8c2318220c443714e509b6746f19a90e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 16:39:04 2026 -0800
fix schema
commit 1a082c52887571cd258ebbc467846461c8e9686f
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 13:29:02 2026 -0800
add marshmallow jsonschema
* bump version and news
* add script comment
* update deps
* ty ignores
* lots more type annotaions. include py.typed sentinel
* remove unneeded ruff ignores
* prettier xml schema xsds
* convert to devenv
* update devenv and deps
* update devenv
* update devenv
* fix pytests. update pycountry
* fix cli help
* fix date serializization if already a string
* update devenv & deps
* import accepts quoted globs. bump version and news
* VALIDATE FEATURE
Squashed commit of the following:
commit 4f712ddc46859bb82eb6383d41a72502bf49f7be
Merge: 2b0b5db 06af8e3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:01:25 2026 -0800
Merge branch 'develop' into validate
commit 2b0b5db77d073da699cdf26e9481e5efd69ad424
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:32:39 2026 -0800
better validate cli help
commit f78dd859c3c8c8adf44399f723de171da9d5467a
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:25:48 2026 -0800
xsd printWidth to 120. fixes CoMet xsd.
commit d1563e96bbc944dc0669e4df0d647c44cce8c7dd
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:03:47 2026 -0800
format test files with validator
commit 59350c9e3c13e9248368146e403a1cc05c755523
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:01:09 2026 -0800
no available validator is a warning
commit f80fc325bc1cfecc9a9286f7538ac02eb6391ad6
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:00:40 2026 -0800
use original schema definitions unreformatted
commit 8eb5d884136e215a19754f1d6ae2fdc9c0cd2cd3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:51:35 2026 -0800
fix symlink
commit bffef02777ba01b6c4f54ba36df7f433c45841da
Merge: 3547d24 6478b78
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:26:18 2026 -0800
Merge branch 'develop' into validate
commit 3547d24639eed74841fb76b49aa49ab238b820a4
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:25:50 2026 -0800
update deps
commit 29dba04deaf029466ca6794060c55b81d5c0a054
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:44:30 2026 -0800
update deps
commit 273da7ab3e87d60eea56167199e466c61867c57c
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:43:55 2026 -0800
only catch and warn on validation errors
commit 5ec0ad1928c709388facb054b3f6915285a4e4a8
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:35:41 2026 -0800
move xmlschema and jsonschema into regular deps
commit 0887cf1e07daec89b59972e9cf8ffc59c143dba2
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:33:33 2026 -0800
fix getting format from input files. change validation exception to warning
commit 4e7be5f44225522398a407c94d76c26fbd22a925
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:32:54 2026 -0800
fix guess_format
commit 2342605b4b08ce641f056a38b5b634bae75bcfec
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:13:41 2026 -0800
fix script for new location of validate_cli
commit b2ab1995e543204d15e83c00e8596681e52b70f7
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:12:08 2026 -0800
move schema to schema_definitions
commit deacf119c2d0823af5c6405162d23b7e32f8fb37
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:23:05 2026 -0800
better validation logging
commit c9a615f5885b53dd8e8b81c9d735808f4eaa7736
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:15:10 2026 -0800
fix validation format assignment. validation info logging
commit 914d35d15f536a6a042cbe66346b2cb4a38d636a
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:05:04 2026 -0800
basically working validation with definitions dir
commit 7e860e8f6110dd868dbec2f724a8bff1bd0a980d
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:22:04 2026 -0800
ignore bad typecheck warnings
commit 3d5ae84354b772ce8fc08793a2c7db64e95c46ac
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:13:02 2026 -0800
fix validate tests
commit 324a0c6fd9935c153fc5a172020a8c02b6f901d0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 18:26:12 2026 -0800
most tests pass. validate test fails. typecheck fails. schemas need moving into the package
commit 5c3d4cd77020b5318a0f45c8d72d432d50ad158e
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:27 2026 -0800
update deps
commit 112a71aece1adba12e4d380359da3a167456af8c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:19 2026 -0800
pin comicbox-pdffile
* bump NEWS
* PDF2CBZ extract images
Squashed commit of the following:
commit b6296ee49b49556b04adaefb12bed332f4fee857
Merge: 5bf0007 bdd3879
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:07:16 2026 -0800
Merge branch 'develop' into pdf2cbz
commit 5bf0007
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:44:53 2026 -0800
bump news and version
commit 362123c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:39:19 2026 -0800
update pdffile to released version
commit f09571c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 13:36:29 2026 -0800
switch image_pdf to more powerful pdf_page_format
commit b1d2d1b
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:37:12 2026 -0800
fix pdf cover compare test
commit 5aaeae0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:36:39 2026 -0800
move pdf format decision to _archive_readfile()
commit 2107241
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 10:32:34 2026 -0800
update deps
commit 566e426
Merge: cdc2250 38bcfe2
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:58:28 2026 -0800
Merge branch 'develop' into pdf2cbz
commit cdc2250
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:57:57 2026 -0800
fix cli help
commit 1190fe4
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:52:12 2026 -0800
fix cli option collision"
commit 63bf418
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:49:48 2026 -0800
cli option for image_pdf
commit 1d7d852
Merge: db7061c f5f03b5
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:39:49 2026 -0800
Merge branch 'develop' into pdf2cbz
commit db7061c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:36:11 2026 -0800
basic support for extract image from pdf
* move docker-compose.yaml to compose.yaml
* fix dockerfile for new devenv
* fix dockerfile for new devenv the kludgey way
* fix news
* update deps
* format dockerfile
* color and clarification for help
* fix colors for help
* update devenv
* fix prettierignore
* update devenv
* fix makefile
* v2.2.1 fix pdf datetimes
* update devenv
* update deps
* update devenv
* add ty ignores to match pyright ignores
* update devenv
* add node_root feature
* update devenv
* update devenv
* update devenv, deps and fix some ty typing
* update deepdiff and bump version
* fix news typo
* update pdfs for new pymupdf
* update deps & devenv
* update deps v2.2.3
* use usr/env for scripts
* gha workflow
* switch to github actions
* Add to_metron_age_rating() public conversion function (#108)
Provide a standalone function to convert any age rating enum or string
to a MetronAgeRatingEnum. Supports Marvel, DC, Generic, ComicInfo, and
Metron enums with fuzzy string matching (case/space-insensitive).
* add claude.md
* bump news and version
* when extracting pages make path absolute
* use python convenience method
* rename variable
* Fix path traversal vulnerability in archive extraction (#109)
Validate that resolved output paths stay within the destination
directory before writing, preventing zip-slip attacks from crafted
archive member names.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* bump news
* Optimize for large-scale workloads (600K+ files) (#110)
Reduce per-file overhead for bulk metadata reading:
- Extension-hint archive detection: check file extension first to avoid
unnecessary magic-byte disk reads (saves ~1.2M file opens for CBZ collections)
- Cache marshmallow schema instances by (class, exclude_keys) to eliminate
~4.8M schema constructions at scale
- Cache transform instances per Comicbox instance to avoid redundant creation
- Skip FrozenAttrDict re-wrapping when pre-built config is passed
- Skip redundant logger init when loglevel hasn't changed
- Remove always-on glom_debug=True from transform calls
Add parallelization API (comicbox/process.py):
- process_files() for ProcessPoolExecutor-based batch processing
- aread_metadata() async wrapper for event loop integration
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* sort ignorefiles
* bump news
* fix datetime ordering bug
* Code quality pass: match statements, pathlib, immutable constants (#111)
* Targeted code quality pass: match statements, pathlib, immutable constants
- Convert isinstance if/elif chains to match statements in archive.py,
archiveinfo.py, and time_fields.py
- Replace os.walk with Path.rglob in run.py, fixing a double-recursion
bug where recurse() re-walked subdirectories already visited by os.walk
- Wrap _HANDLE_MERGE dict in MappingProxyType in mergedeep.py
- Replace accumulator loop with list comprehension in config/computed.py
- Replace loop-append with extend + generator in box/sources.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sort ignore files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* use pdf group for tests
* update deps
* iterprocess files
* fix print test
* Righttyper Typing with corrections (#112)
* righttyper
* raw types commit
* Fix righttyper auto-generated type annotations
Correct ~535 basedpyright errors and 10 ruff errors introduced by
righttyper's runtime type capture, which used overly-literal types.
Key changes:
- Replace PosixPath annotations with Path throughout
- Simplify overly-specific dict union types to dict[str, Any]
- Remove broken self: "Module.ClassName" annotations in mixins
- Rename/remove rt_T1 TypeVars (N815/N816)
- Move Callable import to TYPE_CHECKING block (TC003)
- Make boolean params keyword-only in tests/util.py (FBT001)
- Add pyright: ignore on marshmallow method override incompatibilities
- Fix _path override annotations in archive write/dump_files
- Widen function signatures to accept Path | str | None where needed
- Fix circular import in transforms/spec.py (was referencing xml_reprints.MetaSpec)
- Guard None.items() calls in metroninfo identifiers with early returns
- Clean up various unused imports left by annotation removals
Result: 0 errors, 259 tests passing, make fix/lint/typecheck all clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* sort ignore files
* massively typed
* remove righttyper. back to python 3.10 req
* update devenv. switch to bun
* remove quoted self typing
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* remove self types
* remove self typing
* typing attempt
* fix typing errors
* fix circular import
* reorg news
* update devenv
* add bun to dockerfile
* only copy bun deps first for dockerfile
* update devenv & deps
* switch back to main marshmallow-jsonschema now that it's back from the dead
* update devenv
* fix process pool runs to deliver exceptions back and not break on passing in the logger
* test the process module
* comments
* enhance news for iterfiles
* decomplexify box init
* decomplexify process iterfiles
* allow callers to configure subprocess loguru via picklable dict (#113)
Loguru's logger object isn't picklable into ProcessPoolExecutor
workers, so callers like codex couldn't get worker log output to
match their parent-process format. Adds a worker_log_config dict
({level, format, sink}) that runs through the executor initializer
and reconfigures loguru in each worker via init_logging. Also adds
enqueue=True to the default sink for thread-safe logging.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Upgrade confuse to 2.2.0; replace AttrDict with typed Settings (#114)
* upgrade confuse to 2.2.0; replace AttrDict with typed Settings dataclass
confuse 2.2.0 makes AttrDict properly generic, so per-key types resolve
to `object` and consumers across the box mixins fail typecheck. Convert
the validated AttrDict into a frozen `Settings` dataclass once in
get_config() and propagate that typed object everywhere; confuse stays
confined to comicbox/config.
- New comicbox/config/settings.py defines `Settings` and
`ComputedSettings` (frozen, slots).
- get_config() returns Settings; new _build_settings() does the
conversion. post_process_set_for_path() rebuilt around
dataclasses.replace.
- FrozenAttrDict deleted — frozen dataclass enforces immutability.
- process.py passes Settings through pickle directly so workers skip
re-running confuse.
- Drops dead `dest_path is None` checks now that the field is required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* rename Settings to ComicboxSettings
So that client programs that already define their own `Settings` type
don't collide on import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* flatten ComputedSettings into ComicboxSettings
The hierarchical split was a confuse-template setup convenience, not a
logical grouping — there's no API benefit to keeping client code
chained through `cfg.computed.X`. Promote the six computed fields onto
ComicboxSettings under a clearly labeled comment block. The confuse
template's nested `computed` MappingTemplate is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix metadata_format hint silently dropping all api metadata (#115)
`_get_source_config_metadata` early-returned an empty list whenever the
caller set `metadata_format`, because `fmt not in self._config.read`
compared a string against a frozenset of `MetadataFormats` enums —
always True. The conversion + correct membership check happens in the
try block on the next lines, so the early return was both wrong and
redundant.
Adds tests/unit/test_sources.py covering the four behavioral cases:
fmt-in-read, no-fmt, fmt-not-in-read, invalid-fmt.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* bump news and version to 3.0.0
* fix Mapping config args silently dropped under config_default.yaml (#116)
read_config_sources used config.add() for the Mapping branch, which
appends to the BOTTOM of confuse's source priority stack — below the
config_default.yaml loaded by config.read() at the top of the
function. So any caller passing a dict / Mapping override (e.g.
`get_config({"comicbox": {"compute_pages": True}})`) silently got the
default instead. Switch to config.set() so Mapping args land on top,
matching set_args() for the Namespace branch.
Surfaced by a downstream Codex migration that hit dead Mapping
overrides; covered now by tests/unit/test_config_layering.py.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* widen set-like config fields to accept any non-mapping container (#117)
The template arms for `read`, `write`, `export`, `delete_keys`,
`read_ignore`, and `print` previously combined `frozenset` (a
pass-through marker) and `Sequence(str)` (list-of-strings coercion).
That works for the common YAML/CLI list path but rejects callers
passing a `set` / `tuple` / `frozenset` literal — which is logically
fine for fields whose post-compute value is always a frozenset.
Replaces the per-field unions with `OneOf((set, frozenset, tuple, list))`
(`print` also accepts `str` for the historical phase-char form). The
`_build_settings` boundary already calls `frozenset(...)` on these
values, so any of the four containers normalize correctly.
Also adapts `compute_config`'s helpers — Subview iteration only
supports dict/list source values, so user-supplied set/frozenset/tuple
inputs would error before reaching the template. New `_raw_or_empty`
pulls the Python value via `.get()` and explicitly rejects mappings
with a clear error (dict iteration would silently accept dict input
otherwise). `_parse_print` now accepts a phase-char string OR any
iterable of phase chars.
Path-list fields (`paths`, `import_paths`, `metadata_cli`) keep their
existing `Sequence(...)` form with element-type validation — that
trade-off felt worth keeping.
14 new tests in tests/unit/test_config_container_inputs.py cover the
four container types per field and assert mapping rejection.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* reuse types tuple
* update deps
* 3.0.0 alpha version 0
* update compose for generic gha build
* ReadResults data structure for process functions
* compact news (#119)
* Add skip_metadata flag to get_cover_page (#120)
Callers that only want a thumbnail (e.g. codex's CoverThread) don't
need the full ComicInfo/CoverImage hint resolution. Parsing the
metadata for every cover dominates the cost of cover extraction
and emits a flood of debug-bucket Union ValidationErrors that look
like real failures in DEBUG logs.
When skip_metadata=True, bypass generate_cover_paths entirely and
read archive index 0 directly. This drops per-call schema
instantiation, Union resolution, and path normalization.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* v3.0.0a2 (#121)
* compact news
* update deps
* Drop DEBUG-bucket logging of intentionally-ignored validation errors (#122)
ClearingErrorStoreSchema previously split each schema's errors into
two buckets: ignored ones logged at DEBUG, real ones at WARNING.
The DEBUG bucket only ever held errors from ``_ignore_errors`` —
``Field may not be null.`` (sparse-field tolerance) and
``Invalid input type.`` (Union variant misses) — both of which are
internal mechanics, not operator-actionable signal. Each Union miss
emitted one ``ValidationError - {'_schema': ['Invalid input type.']}``
line per field per archive, drowning the genuinely useful per-source
DEBUG messages emitted by ``_except_on_load``.
Filter ignored errors at split time, log only WARNINGs. Real schema
failures still surface with full context (path, schema class,
normalized message). Collapses the dual-bucket _split_*_errors
methods into _filter_* + _log_warnings.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* metron: drop URL slugs for types with no public web page (#124)
* compact news
* update deps
* metron: drop broken URL slugs for genre, location, reprint, role, story, tag
Metron has no public web pages for these types — only API endpoints — so
URLs like https://metron.cloud/genre/3 always 404. Stop emitting them.
The numeric Metron ID is still preserved on the identifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Re-export to_metron_age_rating from comicbox.enums.maps (#125)
Shortens the import path for the helper from
comicbox.enums.maps.age_rating to comicbox.enums.maps so downstream
callers can reach it without drilling into the submodule.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Drop dead code surfaced by skylos scan (#126)
- Remove unused module/class constants: _COMMENT_ARCHIVE_TYPES, SUFFIXES,
_LOG_FORMAT, comet.py IDENTIFIER_TAG/IS_VERSION_OF_TAG, comictagger.py
IDENTIFIER_TAG/PAGES_TAG, XmlCountryField (and now-orphaned imports
RarFile, ZipFile, CountryField).
- Fix latent bug in TrapExceptionsMeta: `attr_name in "deserialize"` was a
substring check that wrapped any callable whose name was a substring of
"deserialize" (e.g. "er", "size", "ali"). Use the existing _WRAP_METHODS
tuple instead so only the exact `deserialize` method is wrapped.
- Simplify _get_pdf_enabled() to a plain `import pdffile` probe; the
except-arm stub import had no effect.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Replace pdffile_stub with single-shim optional integration (#127)
Consolidate the optional comicbox-pdffile integration into one module
(comicbox/_pdf.py) and delete the hand-maintained pdffile_stub.py.
Previously six call sites each duplicated a `try: from pdffile import X /
except: from pdffile_stub import X` block, and the stub class mirrored
the real PDFFile API method-for-method — silent drift risk every time
upstream pdffile shipped.
Now:
- comicbox/_pdf.py is the single source of truth for PDF_ENABLED,
PDFFile, and PAGE_FORMAT_VALUES. When pdffile is absent, PDFFile is
None at runtime; type checkers see the real class via TYPE_CHECKING.
- Every call site that touches PDFFile is gated by `if PDF_ENABLED`.
- The `case PDFFile():` arm in box/archive/archive.py is lifted to an
`if PDF_ENABLED and isinstance(archive, PDFFile):` guard above the
match (the match form would fail when PDFFile is None).
- config/__init__.py reads PAGE_FORMAT_VALUES instead of iterating an
empty stub Enum.
Verified with `pdffile` installed (307/307 tests pass) and in a fresh
venv without it (PDF_ENABLED=False, CBZ archives still work, PDF files
raise UnsupportedArchiveTypeError, CLI shows the "not installed" hint).
Net: -70 lines across 9 files.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* remove unused ty ignore
* comicbox 3 alpha 5 (#123)
* compact news
* update deps
* update news and version to alpha 4
* update deps
* rename function path in NEWS
* bump alpha version to 3.0.0a5
* version 3.0.0
* massage news
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ajslater
added a commit
that referenced
this pull request
May 9, 2026
* update deps
* update deps and type hint
* expand linear yaml help to be more useful'
* update deps
* update eslint
* update deps
* bump news and version
* update pdf pages. binary difference with new mupdf
* update docker images
* fix make install dependencies
* add jxl to image extensions
* fix ignoring macos resource forks
* resource fork test file
* update deps
* adjust news
* Squashed commit of the following:
type annotate magic metron field functions and make all params kwargs
use eslint outside of editor
update deps, new ruff rules. lint & format
* add venv upgrade script
* ignore PERF203
* update deps and install pdffile
* update deps. appease typechecker. new eslint.config
* Squashed commit of the following:
commit e27050fbd42f0cf8e549871cc06c70f041672306
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 7 21:36:49 2024 -0800
rename deserializeMeta class to TrapExcepionsMeta
fix type issues with field metaclass wrapper
* add eslint-plugin-json-schema-validator
* update deps and lint
* use mdx instead of markdown
* remove unused import
* remove superfluous plugins. remove first level globs
* update deps
* Squashed commit of the following:
fix notes parsing for metron and many variations
move notes parsing into another file.
add comicinfo metron origin test
rename modules to not shadow python builtins
fix binary pdf files for new mupdf
* bump version and news
* fix type errors
* format
* refactor dynamic class creation to appease typchecker
* add libmupdf docs
* Simplify Identifier URL construction for Metron pk ids.
* update deps
* fix story arc parsing. bump version
* update dockerfile with modern node
* Squashed commit of the following:
Comicbox 2.0
* Resolve circular import if not installed with \[pdf\] option.
* Make archive comments that aren't ComicBookInfo JSON log as debug comments
more often.
* update package links
* add more aliases for comicvine sources
* ensure dattetimes from archives are timezone aware
* update deps and bump version
* bump news
* drop version back appropriately
* fix alias tree builder
* update deps, typecheck with ty
* alphabetize comicbox fields
* uv_build
* update pyproect, eslint config, deps
* update deps
* update deps
* normalize Trade Paper Back into Trade Paperback
* update deps
* update deps
* Squashed commit of the following:
update to xmltodict 1.0. remove special code for xmltodict #text type conversion bugs
compact code for xml_fields that get cdata
remove cdatata mixn from xml lists
* update deps
* pyright ignore
* fix age rating coercion for CIX"
* add github issue code example.
* update deps
* update deps
* replace poetry with uv for run script
* update deps
* no support for python 3.14
* explicitly build with 3.13 trixie
* remove ruamel.yaml.clib from test docker
* update deps
* update deps
* new verson. fix comicbox.json dump crash
* remove unused typing exceptions. add typing exceptions for ty foolishness
* update deps add ty to makefile
* python 3.14 support
* bump version and news
* update deps
* ignore ty type ignores
* update deps
* update deps
* Squashed commit of the following:
commit 259e561
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:51:31 2025 -0800
use released pdffile
commit 4136a3b
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:41:28 2025 -0800
use a proper base RenderModule and clean loads for tabs because it breaks yaml
commit 3426cf0
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:20:05 2025 -0800
bump deps
commit 9fcaded
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:19:49 2025 -0800
reduce complexity of dump
commit f96d27a
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:12:05 2025 -0800
gate writing pdf metadata on delete all or data exists
commit 7415b82
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:08:26 2025 -0800
optimize pdf writing by writing pdf data in the same context and only saving once
commit 2bd0f2c
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:57:26 2025 -0800
rename legacy embedded variables to LEGACY_NESTED equivalents
commit 5222159
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:45:06 2025 -0800
lint
commit 5d38acb
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:40:34 2025 -0800
fix print test
commit 65410c7
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:37:18 2025 -0800
fix most tests
commit 19d2dfe
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 17:40:51 2025 -0800
fix pdf xml tests
commit f6bf854
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 16:50:08 2025 -0800
fix tests for pdf_json
commit 590ffb8
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 14:44:51 2025 -0800
fix accepting flexible datetimes from pdfs
commit e18925f
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:37 2025 -0800
fix pdf tests using removed params
commit 55725b5
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:19 2025 -0800
fix set subtraction
commit 2673e3a
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:57:35 2025 -0800
add bpepple to news
commit 3de741d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:56:30 2025 -0800
update schemas doc for pdf embeds
commit 484737d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:52:54 2025 -0800
add bpepple to news
commit 0b6cdaf
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:47:31 2025 -0800
bump version and news
commit bda414c
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:42 2025 -0800
pdf write to embed files. pdf metadata keywords write tags.
commit 29fd04b
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:12 2025 -0800
ty ignore
commit b795c49
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:34:20 2025 -0800
add ty ignores
commit ce3ef91
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:33:43 2025 -0800
update pdffile stub
commit fd2f4a0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:38 2025 -0800
update deps
commit 267d9d0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:30 2025 -0800
add alpha pdffile to sources
commit 041ce67
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:30:55 2025 -0800
add pythondevmode to test script
* fix typing
* update deps
* Squashed commit of the following:
commit b31f22e6d178fcc1a5896c0dd7f680c26bc91657
Author: AJ Slater <aj@slater.net>
Date: Mon Dec 1 20:03:13 2025 -0800
typecheck with ty
* update deps
* complexipy & group deps
* reduce complexity
* update py7z library
* remove unused ty ignores
* ty fixes and ignores
* update deps
* update deps
* remove unused ty ignore
* update deps
* remove unusued ty ignores
* use OneOf instead of list syntax sugar for confuse
* update deps
* Raw yaml datetimes (#102)
* use OneOf instead of list syntax sugar for confuse
* update deps
* let yaml have raw yaml datetimes instead of strings
* use simplejson decode errors
* bump news and version
* fix test script
* fix lint backend groups
* remove unused groups
* fix test script
* really fix test script
* use grooup lint in tests for jsonschema
* tweak dep version ranges
* update deps. use dockerfmt. ruff changes inlie ifs to ors
* update dockerfile base
* update deps, remove unused ty warning ignore
* update deps add eslint plugins
* add mbake
* update deps
* fix tests for new pymupdf
* Squashed commit of the following:
commit 1fb394e109263188a16c4addeaab87bbdfdf882e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:25 2026 -0800
generate-schema scripts
commit fc9b4f5c27db827ae1592010b01708865cf3733e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:08 2026 -0800
format schemas
commit 9ccdf70d8c2318220c443714e509b6746f19a90e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 16:39:04 2026 -0800
fix schema
commit 1a082c52887571cd258ebbc467846461c8e9686f
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 13:29:02 2026 -0800
add marshmallow jsonschema
* bump version and news
* add script comment
* update deps
* ty ignores
* lots more type annotaions. include py.typed sentinel
* remove unneeded ruff ignores
* prettier xml schema xsds
* convert to devenv
* update devenv and deps
* update devenv
* update devenv
* fix pytests. update pycountry
* fix cli help
* fix date serializization if already a string
* update devenv & deps
* import accepts quoted globs. bump version and news
* VALIDATE FEATURE
Squashed commit of the following:
commit 4f712ddc46859bb82eb6383d41a72502bf49f7be
Merge: 2b0b5db 06af8e3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:01:25 2026 -0800
Merge branch 'develop' into validate
commit 2b0b5db77d073da699cdf26e9481e5efd69ad424
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:32:39 2026 -0800
better validate cli help
commit f78dd859c3c8c8adf44399f723de171da9d5467a
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:25:48 2026 -0800
xsd printWidth to 120. fixes CoMet xsd.
commit d1563e96bbc944dc0669e4df0d647c44cce8c7dd
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:03:47 2026 -0800
format test files with validator
commit 59350c9e3c13e9248368146e403a1cc05c755523
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:01:09 2026 -0800
no available validator is a warning
commit f80fc325bc1cfecc9a9286f7538ac02eb6391ad6
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:00:40 2026 -0800
use original schema definitions unreformatted
commit 8eb5d884136e215a19754f1d6ae2fdc9c0cd2cd3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:51:35 2026 -0800
fix symlink
commit bffef02777ba01b6c4f54ba36df7f433c45841da
Merge: 3547d24 6478b78
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:26:18 2026 -0800
Merge branch 'develop' into validate
commit 3547d24639eed74841fb76b49aa49ab238b820a4
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:25:50 2026 -0800
update deps
commit 29dba04deaf029466ca6794060c55b81d5c0a054
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:44:30 2026 -0800
update deps
commit 273da7ab3e87d60eea56167199e466c61867c57c
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:43:55 2026 -0800
only catch and warn on validation errors
commit 5ec0ad1928c709388facb054b3f6915285a4e4a8
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:35:41 2026 -0800
move xmlschema and jsonschema into regular deps
commit 0887cf1e07daec89b59972e9cf8ffc59c143dba2
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:33:33 2026 -0800
fix getting format from input files. change validation exception to warning
commit 4e7be5f44225522398a407c94d76c26fbd22a925
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:32:54 2026 -0800
fix guess_format
commit 2342605b4b08ce641f056a38b5b634bae75bcfec
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:13:41 2026 -0800
fix script for new location of validate_cli
commit b2ab1995e543204d15e83c00e8596681e52b70f7
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:12:08 2026 -0800
move schema to schema_definitions
commit deacf119c2d0823af5c6405162d23b7e32f8fb37
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:23:05 2026 -0800
better validation logging
commit c9a615f5885b53dd8e8b81c9d735808f4eaa7736
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:15:10 2026 -0800
fix validation format assignment. validation info logging
commit 914d35d15f536a6a042cbe66346b2cb4a38d636a
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:05:04 2026 -0800
basically working validation with definitions dir
commit 7e860e8f6110dd868dbec2f724a8bff1bd0a980d
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:22:04 2026 -0800
ignore bad typecheck warnings
commit 3d5ae84354b772ce8fc08793a2c7db64e95c46ac
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:13:02 2026 -0800
fix validate tests
commit 324a0c6fd9935c153fc5a172020a8c02b6f901d0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 18:26:12 2026 -0800
most tests pass. validate test fails. typecheck fails. schemas need moving into the package
commit 5c3d4cd77020b5318a0f45c8d72d432d50ad158e
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:27 2026 -0800
update deps
commit 112a71aece1adba12e4d380359da3a167456af8c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:19 2026 -0800
pin comicbox-pdffile
* bump NEWS
* PDF2CBZ extract images
Squashed commit of the following:
commit b6296ee49b49556b04adaefb12bed332f4fee857
Merge: 5bf0007 bdd3879
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:07:16 2026 -0800
Merge branch 'develop' into pdf2cbz
commit 5bf0007
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:44:53 2026 -0800
bump news and version
commit 362123c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:39:19 2026 -0800
update pdffile to released version
commit f09571c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 13:36:29 2026 -0800
switch image_pdf to more powerful pdf_page_format
commit b1d2d1b
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:37:12 2026 -0800
fix pdf cover compare test
commit 5aaeae0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:36:39 2026 -0800
move pdf format decision to _archive_readfile()
commit 2107241
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 10:32:34 2026 -0800
update deps
commit 566e426
Merge: cdc2250 38bcfe2
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:58:28 2026 -0800
Merge branch 'develop' into pdf2cbz
commit cdc2250
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:57:57 2026 -0800
fix cli help
commit 1190fe4
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:52:12 2026 -0800
fix cli option collision"
commit 63bf418
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:49:48 2026 -0800
cli option for image_pdf
commit 1d7d852
Merge: db7061c f5f03b5
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:39:49 2026 -0800
Merge branch 'develop' into pdf2cbz
commit db7061c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:36:11 2026 -0800
basic support for extract image from pdf
* move docker-compose.yaml to compose.yaml
* fix dockerfile for new devenv
* fix dockerfile for new devenv the kludgey way
* fix news
* update deps
* format dockerfile
* color and clarification for help
* fix colors for help
* update devenv
* fix prettierignore
* update devenv
* fix makefile
* v2.2.1 fix pdf datetimes
* update devenv
* update deps
* update devenv
* add ty ignores to match pyright ignores
* update devenv
* add node_root feature
* update devenv
* update devenv
* update devenv, deps and fix some ty typing
* update deepdiff and bump version
* fix news typo
* update pdfs for new pymupdf
* update deps & devenv
* update deps v2.2.3
* use usr/env for scripts
* gha workflow
* switch to github actions
* Add to_metron_age_rating() public conversion function (#108)
Provide a standalone function to convert any age rating enum or string
to a MetronAgeRatingEnum. Supports Marvel, DC, Generic, ComicInfo, and
Metron enums with fuzzy string matching (case/space-insensitive).
* add claude.md
* bump news and version
* when extracting pages make path absolute
* use python convenience method
* rename variable
* Fix path traversal vulnerability in archive extraction (#109)
Validate that resolved output paths stay within the destination
directory before writing, preventing zip-slip attacks from crafted
archive member names.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* bump news
* Optimize for large-scale workloads (600K+ files) (#110)
Reduce per-file overhead for bulk metadata reading:
- Extension-hint archive detection: check file extension first to avoid
unnecessary magic-byte disk reads (saves ~1.2M file opens for CBZ collections)
- Cache marshmallow schema instances by (class, exclude_keys) to eliminate
~4.8M schema constructions at scale
- Cache transform instances per Comicbox instance to avoid redundant creation
- Skip FrozenAttrDict re-wrapping when pre-built config is passed
- Skip redundant logger init when loglevel hasn't changed
- Remove always-on glom_debug=True from transform calls
Add parallelization API (comicbox/process.py):
- process_files() for ProcessPoolExecutor-based batch processing
- aread_metadata() async wrapper for event loop integration
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* sort ignorefiles
* bump news
* fix datetime ordering bug
* Code quality pass: match statements, pathlib, immutable constants (#111)
* Targeted code quality pass: match statements, pathlib, immutable constants
- Convert isinstance if/elif chains to match statements in archive.py,
archiveinfo.py, and time_fields.py
- Replace os.walk with Path.rglob in run.py, fixing a double-recursion
bug where recurse() re-walked subdirectories already visited by os.walk
- Wrap _HANDLE_MERGE dict in MappingProxyType in mergedeep.py
- Replace accumulator loop with list comprehension in config/computed.py
- Replace loop-append with extend + generator in box/sources.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sort ignore files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* use pdf group for tests
* update deps
* iterprocess files
* fix print test
* Righttyper Typing with corrections (#112)
* righttyper
* raw types commit
* Fix righttyper auto-generated type annotations
Correct ~535 basedpyright errors and 10 ruff errors introduced by
righttyper's runtime type capture, which used overly-literal types.
Key changes:
- Replace PosixPath annotations with Path throughout
- Simplify overly-specific dict union types to dict[str, Any]
- Remove broken self: "Module.ClassName" annotations in mixins
- Rename/remove rt_T1 TypeVars (N815/N816)
- Move Callable import to TYPE_CHECKING block (TC003)
- Make boolean params keyword-only in tests/util.py (FBT001)
- Add pyright: ignore on marshmallow method override incompatibilities
- Fix _path override annotations in archive write/dump_files
- Widen function signatures to accept Path | str | None where needed
- Fix circular import in transforms/spec.py (was referencing xml_reprints.MetaSpec)
- Guard None.items() calls in metroninfo identifiers with early returns
- Clean up various unused imports left by annotation removals
Result: 0 errors, 259 tests passing, make fix/lint/typecheck all clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* sort ignore files
* massively typed
* remove righttyper. back to python 3.10 req
* update devenv. switch to bun
* remove quoted self typing
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* remove self types
* remove self typing
* typing attempt
* fix typing errors
* fix circular import
* reorg news
* update devenv
* add bun to dockerfile
* only copy bun deps first for dockerfile
* update devenv & deps
* switch back to main marshmallow-jsonschema now that it's back from the dead
* update devenv
* fix process pool runs to deliver exceptions back and not break on passing in the logger
* test the process module
* comments
* enhance news for iterfiles
* decomplexify box init
* decomplexify process iterfiles
* allow callers to configure subprocess loguru via picklable dict (#113)
Loguru's logger object isn't picklable into ProcessPoolExecutor
workers, so callers like codex couldn't get worker log output to
match their parent-process format. Adds a worker_log_config dict
({level, format, sink}) that runs through the executor initializer
and reconfigures loguru in each worker via init_logging. Also adds
enqueue=True to the default sink for thread-safe logging.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Upgrade confuse to 2.2.0; replace AttrDict with typed Settings (#114)
* upgrade confuse to 2.2.0; replace AttrDict with typed Settings dataclass
confuse 2.2.0 makes AttrDict properly generic, so per-key types resolve
to `object` and consumers across the box mixins fail typecheck. Convert
the validated AttrDict into a frozen `Settings` dataclass once in
get_config() and propagate that typed object everywhere; confuse stays
confined to comicbox/config.
- New comicbox/config/settings.py defines `Settings` and
`ComputedSettings` (frozen, slots).
- get_config() returns Settings; new _build_settings() does the
conversion. post_process_set_for_path() rebuilt around
dataclasses.replace.
- FrozenAttrDict deleted — frozen dataclass enforces immutability.
- process.py passes Settings through pickle directly so workers skip
re-running confuse.
- Drops dead `dest_path is None` checks now that the field is required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* rename Settings to ComicboxSettings
So that client programs that already define their own `Settings` type
don't collide on import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* flatten ComputedSettings into ComicboxSettings
The hierarchical split was a confuse-template setup convenience, not a
logical grouping — there's no API benefit to keeping client code
chained through `cfg.computed.X`. Promote the six computed fields onto
ComicboxSettings under a clearly labeled comment block. The confuse
template's nested `computed` MappingTemplate is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix metadata_format hint silently dropping all api metadata (#115)
`_get_source_config_metadata` early-returned an empty list whenever the
caller set `metadata_format`, because `fmt not in self._config.read`
compared a string against a frozenset of `MetadataFormats` enums —
always True. The conversion + correct membership check happens in the
try block on the next lines, so the early return was both wrong and
redundant.
Adds tests/unit/test_sources.py covering the four behavioral cases:
fmt-in-read, no-fmt, fmt-not-in-read, invalid-fmt.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* bump news and version to 3.0.0
* fix Mapping config args silently dropped under config_default.yaml (#116)
read_config_sources used config.add() for the Mapping branch, which
appends to the BOTTOM of confuse's source priority stack — below the
config_default.yaml loaded by config.read() at the top of the
function. So any caller passing a dict / Mapping override (e.g.
`get_config({"comicbox": {"compute_pages": True}})`) silently got the
default instead. Switch to config.set() so Mapping args land on top,
matching set_args() for the Namespace branch.
Surfaced by a downstream Codex migration that hit dead Mapping
overrides; covered now by tests/unit/test_config_layering.py.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* widen set-like config fields to accept any non-mapping container (#117)
The template arms for `read`, `write`, `export`, `delete_keys`,
`read_ignore`, and `print` previously combined `frozenset` (a
pass-through marker) and `Sequence(str)` (list-of-strings coercion).
That works for the common YAML/CLI list path but rejects callers
passing a `set` / `tuple` / `frozenset` literal — which is logically
fine for fields whose post-compute value is always a frozenset.
Replaces the per-field unions with `OneOf((set, frozenset, tuple, list))`
(`print` also accepts `str` for the historical phase-char form). The
`_build_settings` boundary already calls `frozenset(...)` on these
values, so any of the four containers normalize correctly.
Also adapts `compute_config`'s helpers — Subview iteration only
supports dict/list source values, so user-supplied set/frozenset/tuple
inputs would error before reaching the template. New `_raw_or_empty`
pulls the Python value via `.get()` and explicitly rejects mappings
with a clear error (dict iteration would silently accept dict input
otherwise). `_parse_print` now accepts a phase-char string OR any
iterable of phase chars.
Path-list fields (`paths`, `import_paths`, `metadata_cli`) keep their
existing `Sequence(...)` form with element-type validation — that
trade-off felt worth keeping.
14 new tests in tests/unit/test_config_container_inputs.py cover the
four container types per field and assert mapping rejection.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* reuse types tuple
* update deps
* 3.0.0 alpha version 0
* update compose for generic gha build
* ReadResults data structure for process functions
* compact news (#119)
* Add skip_metadata flag to get_cover_page (#120)
Callers that only want a thumbnail (e.g. codex's CoverThread) don't
need the full ComicInfo/CoverImage hint resolution. Parsing the
metadata for every cover dominates the cost of cover extraction
and emits a flood of debug-bucket Union ValidationErrors that look
like real failures in DEBUG logs.
When skip_metadata=True, bypass generate_cover_paths entirely and
read archive index 0 directly. This drops per-call schema
instantiation, Union resolution, and path normalization.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* v3.0.0a2 (#121)
* compact news
* update deps
* Drop DEBUG-bucket logging of intentionally-ignored validation errors (#122)
ClearingErrorStoreSchema previously split each schema's errors into
two buckets: ignored ones logged at DEBUG, real ones at WARNING.
The DEBUG bucket only ever held errors from ``_ignore_errors`` —
``Field may not be null.`` (sparse-field tolerance) and
``Invalid input type.`` (Union variant misses) — both of which are
internal mechanics, not operator-actionable signal. Each Union miss
emitted one ``ValidationError - {'_schema': ['Invalid input type.']}``
line per field per archive, drowning the genuinely useful per-source
DEBUG messages emitted by ``_except_on_load``.
Filter ignored errors at split time, log only WARNINGs. Real schema
failures still surface with full context (path, schema class,
normalized message). Collapses the dual-bucket _split_*_errors
methods into _filter_* + _log_warnings.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* metron: drop URL slugs for types with no public web page (#124)
* compact news
* update deps
* metron: drop broken URL slugs for genre, location, reprint, role, story, tag
Metron has no public web pages for these types — only API endpoints — so
URLs like https://metron.cloud/genre/3 always 404. Stop emitting them.
The numeric Metron ID is still preserved on the identifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Re-export to_metron_age_rating from comicbox.enums.maps (#125)
Shortens the import path for the helper from
comicbox.enums.maps.age_rating to comicbox.enums.maps so downstream
callers can reach it without drilling into the submodule.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Drop dead code surfaced by skylos scan (#126)
- Remove unused module/class constants: _COMMENT_ARCHIVE_TYPES, SUFFIXES,
_LOG_FORMAT, comet.py IDENTIFIER_TAG/IS_VERSION_OF_TAG, comictagger.py
IDENTIFIER_TAG/PAGES_TAG, XmlCountryField (and now-orphaned imports
RarFile, ZipFile, CountryField).
- Fix latent bug in TrapExceptionsMeta: `attr_name in "deserialize"` was a
substring check that wrapped any callable whose name was a substring of
"deserialize" (e.g. "er", "size", "ali"). Use the existing _WRAP_METHODS
tuple instead so only the exact `deserialize` method is wrapped.
- Simplify _get_pdf_enabled() to a plain `import pdffile` probe; the
except-arm stub import had no effect.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Replace pdffile_stub with single-shim optional integration (#127)
Consolidate the optional comicbox-pdffile integration into one module
(comicbox/_pdf.py) and delete the hand-maintained pdffile_stub.py.
Previously six call sites each duplicated a `try: from pdffile import X /
except: from pdffile_stub import X` block, and the stub class mirrored
the real PDFFile API method-for-method — silent drift risk every time
upstream pdffile shipped.
Now:
- comicbox/_pdf.py is the single source of truth for PDF_ENABLED,
PDFFile, and PAGE_FORMAT_VALUES. When pdffile is absent, PDFFile is
None at runtime; type checkers see the real class via TYPE_CHECKING.
- Every call site that touches PDFFile is gated by `if PDF_ENABLED`.
- The `case PDFFile():` arm in box/archive/archive.py is lifted to an
`if PDF_ENABLED and isinstance(archive, PDFFile):` guard above the
match (the match form would fail when PDFFile is None).
- config/__init__.py reads PAGE_FORMAT_VALUES instead of iterating an
empty stub Enum.
Verified with `pdffile` installed (307/307 tests pass) and in a fresh
venv without it (PDF_ENABLED=False, CBZ archives still work, PDF files
raise UnsupportedArchiveTypeError, CLI shows the "not installed" hint).
Net: -70 lines across 9 files.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* remove unused ty ignore
* comicbox 3 alpha 5 (#123)
* compact news
* update deps
* update news and version to alpha 4
* update deps
* rename function path in NEWS
* bump alpha version to 3.0.0a5
* version 3.0.0
* massage news
* bump version and news and update deps
* require comicbox-pdffile 0.6.x for image-dominant page detection (#131)
* require comicbox-pdffile 0.6.x for image-dominant page detection
Widens the optional ``[pdf]`` extra to require comicbox-pdffile 0.6.x.
The new minor release adds image-dominant page detection (
``PDFFile.classify_page``, ``PDFFile.read_image_if_dominant``,
``PDFFile.read_full_pixmap_jpeg``) used by browser readers to serve
scanned-comic PDF pages as plain ``<img>`` instead of routing through
pdf.js on the client.
comicbox itself doesn't use the new API — the bump is purely a pin
update so downstream callers (Codex, OPDS readers) can adopt it.
The ``[tool.uv.sources]`` block is transient: it points at the
pdffile PR branch so this CI can resolve dependencies before
0.6.x lands on PyPI. Drop it once 0.6.x publishes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* just use the released pdffile
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* regenerate pdf page fixtures for pdffile 0.6.x (#132)
Add bin/regenerate-pdf-test-pages.py — drives Comicbox.get_page_by_index
against tests/files/test_pdf.pdf to refresh tests/files/pdf/{N}.pdf when
pymupdf or pdffile change page-extraction output. Run on the next drift.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* bump pdffile to 0.6.1
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ajslater
added a commit
that referenced
this pull request
May 10, 2026
* update deps and type hint
* expand linear yaml help to be more useful'
* update deps
* update eslint
* update deps
* bump news and version
* update pdf pages. binary difference with new mupdf
* update docker images
* fix make install dependencies
* add jxl to image extensions
* fix ignoring macos resource forks
* resource fork test file
* update deps
* adjust news
* Squashed commit of the following:
type annotate magic metron field functions and make all params kwargs
use eslint outside of editor
update deps, new ruff rules. lint & format
* add venv upgrade script
* ignore PERF203
* update deps and install pdffile
* update deps. appease typechecker. new eslint.config
* Squashed commit of the following:
commit e27050fbd42f0cf8e549871cc06c70f041672306
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 7 21:36:49 2024 -0800
rename deserializeMeta class to TrapExcepionsMeta
fix type issues with field metaclass wrapper
* add eslint-plugin-json-schema-validator
* update deps and lint
* use mdx instead of markdown
* remove unused import
* remove superfluous plugins. remove first level globs
* update deps
* Squashed commit of the following:
fix notes parsing for metron and many variations
move notes parsing into another file.
add comicinfo metron origin test
rename modules to not shadow python builtins
fix binary pdf files for new mupdf
* bump version and news
* fix type errors
* format
* refactor dynamic class creation to appease typchecker
* add libmupdf docs
* Simplify Identifier URL construction for Metron pk ids.
* update deps
* fix story arc parsing. bump version
* update dockerfile with modern node
* Squashed commit of the following:
Comicbox 2.0
* Resolve circular import if not installed with \[pdf\] option.
* Make archive comments that aren't ComicBookInfo JSON log as debug comments
more often.
* update package links
* add more aliases for comicvine sources
* ensure dattetimes from archives are timezone aware
* update deps and bump version
* bump news
* drop version back appropriately
* fix alias tree builder
* update deps, typecheck with ty
* alphabetize comicbox fields
* uv_build
* update pyproect, eslint config, deps
* update deps
* update deps
* normalize Trade Paper Back into Trade Paperback
* update deps
* update deps
* Squashed commit of the following:
update to xmltodict 1.0. remove special code for xmltodict #text type conversion bugs
compact code for xml_fields that get cdata
remove cdatata mixn from xml lists
* update deps
* pyright ignore
* fix age rating coercion for CIX"
* add github issue code example.
* update deps
* update deps
* replace poetry with uv for run script
* update deps
* no support for python 3.14
* explicitly build with 3.13 trixie
* remove ruamel.yaml.clib from test docker
* update deps
* update deps
* new verson. fix comicbox.json dump crash
* remove unused typing exceptions. add typing exceptions for ty foolishness
* update deps add ty to makefile
* python 3.14 support
* bump version and news
* update deps
* ignore ty type ignores
* update deps
* update deps
* Squashed commit of the following:
commit 259e561
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:51:31 2025 -0800
use released pdffile
commit 4136a3b
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:41:28 2025 -0800
use a proper base RenderModule and clean loads for tabs because it breaks yaml
commit 3426cf0
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:20:05 2025 -0800
bump deps
commit 9fcaded
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:19:49 2025 -0800
reduce complexity of dump
commit f96d27a
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:12:05 2025 -0800
gate writing pdf metadata on delete all or data exists
commit 7415b82
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:08:26 2025 -0800
optimize pdf writing by writing pdf data in the same context and only saving once
commit 2bd0f2c
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:57:26 2025 -0800
rename legacy embedded variables to LEGACY_NESTED equivalents
commit 5222159
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:45:06 2025 -0800
lint
commit 5d38acb
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:40:34 2025 -0800
fix print test
commit 65410c7
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:37:18 2025 -0800
fix most tests
commit 19d2dfe
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 17:40:51 2025 -0800
fix pdf xml tests
commit f6bf854
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 16:50:08 2025 -0800
fix tests for pdf_json
commit 590ffb8
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 14:44:51 2025 -0800
fix accepting flexible datetimes from pdfs
commit e18925f
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:37 2025 -0800
fix pdf tests using removed params
commit 55725b5
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:19 2025 -0800
fix set subtraction
commit 2673e3a
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:57:35 2025 -0800
add bpepple to news
commit 3de741d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:56:30 2025 -0800
update schemas doc for pdf embeds
commit 484737d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:52:54 2025 -0800
add bpepple to news
commit 0b6cdaf
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:47:31 2025 -0800
bump version and news
commit bda414c
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:42 2025 -0800
pdf write to embed files. pdf metadata keywords write tags.
commit 29fd04b
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:12 2025 -0800
ty ignore
commit b795c49
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:34:20 2025 -0800
add ty ignores
commit ce3ef91
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:33:43 2025 -0800
update pdffile stub
commit fd2f4a0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:38 2025 -0800
update deps
commit 267d9d0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:30 2025 -0800
add alpha pdffile to sources
commit 041ce67
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:30:55 2025 -0800
add pythondevmode to test script
* fix typing
* update deps
* Squashed commit of the following:
commit b31f22e6d178fcc1a5896c0dd7f680c26bc91657
Author: AJ Slater <aj@slater.net>
Date: Mon Dec 1 20:03:13 2025 -0800
typecheck with ty
* update deps
* complexipy & group deps
* reduce complexity
* update py7z library
* remove unused ty ignores
* ty fixes and ignores
* update deps
* update deps
* remove unused ty ignore
* update deps
* remove unusued ty ignores
* use OneOf instead of list syntax sugar for confuse
* update deps
* Raw yaml datetimes (#102)
* use OneOf instead of list syntax sugar for confuse
* update deps
* let yaml have raw yaml datetimes instead of strings
* use simplejson decode errors
* bump news and version
* fix test script
* fix lint backend groups
* remove unused groups
* fix test script
* really fix test script
* use grooup lint in tests for jsonschema
* tweak dep version ranges
* update deps. use dockerfmt. ruff changes inlie ifs to ors
* update dockerfile base
* update deps, remove unused ty warning ignore
* update deps add eslint plugins
* add mbake
* update deps
* fix tests for new pymupdf
* Squashed commit of the following:
commit 1fb394e109263188a16c4addeaab87bbdfdf882e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:25 2026 -0800
generate-schema scripts
commit fc9b4f5c27db827ae1592010b01708865cf3733e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:08 2026 -0800
format schemas
commit 9ccdf70d8c2318220c443714e509b6746f19a90e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 16:39:04 2026 -0800
fix schema
commit 1a082c52887571cd258ebbc467846461c8e9686f
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 13:29:02 2026 -0800
add marshmallow jsonschema
* bump version and news
* add script comment
* update deps
* ty ignores
* lots more type annotaions. include py.typed sentinel
* remove unneeded ruff ignores
* prettier xml schema xsds
* convert to devenv
* update devenv and deps
* update devenv
* update devenv
* fix pytests. update pycountry
* fix cli help
* fix date serializization if already a string
* update devenv & deps
* import accepts quoted globs. bump version and news
* VALIDATE FEATURE
Squashed commit of the following:
commit 4f712ddc46859bb82eb6383d41a72502bf49f7be
Merge: 2b0b5db 06af8e3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:01:25 2026 -0800
Merge branch 'develop' into validate
commit 2b0b5db77d073da699cdf26e9481e5efd69ad424
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:32:39 2026 -0800
better validate cli help
commit f78dd859c3c8c8adf44399f723de171da9d5467a
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:25:48 2026 -0800
xsd printWidth to 120. fixes CoMet xsd.
commit d1563e96bbc944dc0669e4df0d647c44cce8c7dd
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:03:47 2026 -0800
format test files with validator
commit 59350c9e3c13e9248368146e403a1cc05c755523
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:01:09 2026 -0800
no available validator is a warning
commit f80fc325bc1cfecc9a9286f7538ac02eb6391ad6
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:00:40 2026 -0800
use original schema definitions unreformatted
commit 8eb5d884136e215a19754f1d6ae2fdc9c0cd2cd3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:51:35 2026 -0800
fix symlink
commit bffef02777ba01b6c4f54ba36df7f433c45841da
Merge: 3547d24 6478b78
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:26:18 2026 -0800
Merge branch 'develop' into validate
commit 3547d24639eed74841fb76b49aa49ab238b820a4
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:25:50 2026 -0800
update deps
commit 29dba04deaf029466ca6794060c55b81d5c0a054
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:44:30 2026 -0800
update deps
commit 273da7ab3e87d60eea56167199e466c61867c57c
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:43:55 2026 -0800
only catch and warn on validation errors
commit 5ec0ad1928c709388facb054b3f6915285a4e4a8
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:35:41 2026 -0800
move xmlschema and jsonschema into regular deps
commit 0887cf1e07daec89b59972e9cf8ffc59c143dba2
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:33:33 2026 -0800
fix getting format from input files. change validation exception to warning
commit 4e7be5f44225522398a407c94d76c26fbd22a925
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:32:54 2026 -0800
fix guess_format
commit 2342605b4b08ce641f056a38b5b634bae75bcfec
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:13:41 2026 -0800
fix script for new location of validate_cli
commit b2ab1995e543204d15e83c00e8596681e52b70f7
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:12:08 2026 -0800
move schema to schema_definitions
commit deacf119c2d0823af5c6405162d23b7e32f8fb37
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:23:05 2026 -0800
better validation logging
commit c9a615f5885b53dd8e8b81c9d735808f4eaa7736
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:15:10 2026 -0800
fix validation format assignment. validation info logging
commit 914d35d15f536a6a042cbe66346b2cb4a38d636a
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:05:04 2026 -0800
basically working validation with definitions dir
commit 7e860e8f6110dd868dbec2f724a8bff1bd0a980d
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:22:04 2026 -0800
ignore bad typecheck warnings
commit 3d5ae84354b772ce8fc08793a2c7db64e95c46ac
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:13:02 2026 -0800
fix validate tests
commit 324a0c6fd9935c153fc5a172020a8c02b6f901d0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 18:26:12 2026 -0800
most tests pass. validate test fails. typecheck fails. schemas need moving into the package
commit 5c3d4cd77020b5318a0f45c8d72d432d50ad158e
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:27 2026 -0800
update deps
commit 112a71aece1adba12e4d380359da3a167456af8c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:19 2026 -0800
pin comicbox-pdffile
* bump NEWS
* PDF2CBZ extract images
Squashed commit of the following:
commit b6296ee49b49556b04adaefb12bed332f4fee857
Merge: 5bf0007 bdd3879
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:07:16 2026 -0800
Merge branch 'develop' into pdf2cbz
commit 5bf0007
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:44:53 2026 -0800
bump news and version
commit 362123c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:39:19 2026 -0800
update pdffile to released version
commit f09571c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 13:36:29 2026 -0800
switch image_pdf to more powerful pdf_page_format
commit b1d2d1b
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:37:12 2026 -0800
fix pdf cover compare test
commit 5aaeae0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:36:39 2026 -0800
move pdf format decision to _archive_readfile()
commit 2107241
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 10:32:34 2026 -0800
update deps
commit 566e426
Merge: cdc2250 38bcfe2
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:58:28 2026 -0800
Merge branch 'develop' into pdf2cbz
commit cdc2250
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:57:57 2026 -0800
fix cli help
commit 1190fe4
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:52:12 2026 -0800
fix cli option collision"
commit 63bf418
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:49:48 2026 -0800
cli option for image_pdf
commit 1d7d852
Merge: db7061c f5f03b5
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:39:49 2026 -0800
Merge branch 'develop' into pdf2cbz
commit db7061c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:36:11 2026 -0800
basic support for extract image from pdf
* move docker-compose.yaml to compose.yaml
* fix dockerfile for new devenv
* fix dockerfile for new devenv the kludgey way
* fix news
* update deps
* format dockerfile
* color and clarification for help
* fix colors for help
* update devenv
* fix prettierignore
* update devenv
* fix makefile
* v2.2.1 fix pdf datetimes
* update devenv
* update deps
* update devenv
* add ty ignores to match pyright ignores
* update devenv
* add node_root feature
* update devenv
* update devenv
* update devenv, deps and fix some ty typing
* update deepdiff and bump version
* fix news typo
* update pdfs for new pymupdf
* update deps & devenv
* update deps v2.2.3
* use usr/env for scripts
* gha workflow
* switch to github actions
* Add to_metron_age_rating() public conversion function (#108)
Provide a standalone function to convert any age rating enum or string
to a MetronAgeRatingEnum. Supports Marvel, DC, Generic, ComicInfo, and
Metron enums with fuzzy string matching (case/space-insensitive).
* add claude.md
* bump news and version
* when extracting pages make path absolute
* use python convenience method
* rename variable
* Fix path traversal vulnerability in archive extraction (#109)
Validate that resolved output paths stay within the destination
directory before writing, preventing zip-slip attacks from crafted
archive member names.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* bump news
* Optimize for large-scale workloads (600K+ files) (#110)
Reduce per-file overhead for bulk metadata reading:
- Extension-hint archive detection: check file extension first to avoid
unnecessary magic-byte disk reads (saves ~1.2M file opens for CBZ collections)
- Cache marshmallow schema instances by (class, exclude_keys) to eliminate
~4.8M schema constructions at scale
- Cache transform instances per Comicbox instance to avoid redundant creation
- Skip FrozenAttrDict re-wrapping when pre-built config is passed
- Skip redundant logger init when loglevel hasn't changed
- Remove always-on glom_debug=True from transform calls
Add parallelization API (comicbox/process.py):
- process_files() for ProcessPoolExecutor-based batch processing
- aread_metadata() async wrapper for event loop integration
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* sort ignorefiles
* bump news
* fix datetime ordering bug
* Code quality pass: match statements, pathlib, immutable constants (#111)
* Targeted code quality pass: match statements, pathlib, immutable constants
- Convert isinstance if/elif chains to match statements in archive.py,
archiveinfo.py, and time_fields.py
- Replace os.walk with Path.rglob in run.py, fixing a double-recursion
bug where recurse() re-walked subdirectories already visited by os.walk
- Wrap _HANDLE_MERGE dict in MappingProxyType in mergedeep.py
- Replace accumulator loop with list comprehension in config/computed.py
- Replace loop-append with extend + generator in box/sources.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sort ignore files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* use pdf group for tests
* update deps
* iterprocess files
* fix print test
* Righttyper Typing with corrections (#112)
* righttyper
* raw types commit
* Fix righttyper auto-generated type annotations
Correct ~535 basedpyright errors and 10 ruff errors introduced by
righttyper's runtime type capture, which used overly-literal types.
Key changes:
- Replace PosixPath annotations with Path throughout
- Simplify overly-specific dict union types to dict[str, Any]
- Remove broken self: "Module.ClassName" annotations in mixins
- Rename/remove rt_T1 TypeVars (N815/N816)
- Move Callable import to TYPE_CHECKING block (TC003)
- Make boolean params keyword-only in tests/util.py (FBT001)
- Add pyright: ignore on marshmallow method override incompatibilities
- Fix _path override annotations in archive write/dump_files
- Widen function signatures to accept Path | str | None where needed
- Fix circular import in transforms/spec.py (was referencing xml_reprints.MetaSpec)
- Guard None.items() calls in metroninfo identifiers with early returns
- Clean up various unused imports left by annotation removals
Result: 0 errors, 259 tests passing, make fix/lint/typecheck all clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* sort ignore files
* massively typed
* remove righttyper. back to python 3.10 req
* update devenv. switch to bun
* remove quoted self typing
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* remove self types
* remove self typing
* typing attempt
* fix typing errors
* fix circular import
* reorg news
* update devenv
* add bun to dockerfile
* only copy bun deps first for dockerfile
* update devenv & deps
* switch back to main marshmallow-jsonschema now that it's back from the dead
* update devenv
* fix process pool runs to deliver exceptions back and not break on passing in the logger
* test the process module
* comments
* enhance news for iterfiles
* decomplexify box init
* decomplexify process iterfiles
* allow callers to configure subprocess loguru via picklable dict (#113)
Loguru's logger object isn't picklable into ProcessPoolExecutor
workers, so callers like codex couldn't get worker log output to
match their parent-process format. Adds a worker_log_config dict
({level, format, sink}) that runs through the executor initializer
and reconfigures loguru in each worker via init_logging. Also adds
enqueue=True to the default sink for thread-safe logging.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Upgrade confuse to 2.2.0; replace AttrDict with typed Settings (#114)
* upgrade confuse to 2.2.0; replace AttrDict with typed Settings dataclass
confuse 2.2.0 makes AttrDict properly generic, so per-key types resolve
to `object` and consumers across the box mixins fail typecheck. Convert
the validated AttrDict into a frozen `Settings` dataclass once in
get_config() and propagate that typed object everywhere; confuse stays
confined to comicbox/config.
- New comicbox/config/settings.py defines `Settings` and
`ComputedSettings` (frozen, slots).
- get_config() returns Settings; new _build_settings() does the
conversion. post_process_set_for_path() rebuilt around
dataclasses.replace.
- FrozenAttrDict deleted — frozen dataclass enforces immutability.
- process.py passes Settings through pickle directly so workers skip
re-running confuse.
- Drops dead `dest_path is None` checks now that the field is required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* rename Settings to ComicboxSettings
So that client programs that already define their own `Settings` type
don't collide on import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* flatten ComputedSettings into ComicboxSettings
The hierarchical split was a confuse-template setup convenience, not a
logical grouping — there's no API benefit to keeping client code
chained through `cfg.computed.X`. Promote the six computed fields onto
ComicboxSettings under a clearly labeled comment block. The confuse
template's nested `computed` MappingTemplate is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix metadata_format hint silently dropping all api metadata (#115)
`_get_source_config_metadata` early-returned an empty list whenever the
caller set `metadata_format`, because `fmt not in self._config.read`
compared a string against a frozenset of `MetadataFormats` enums —
always True. The conversion + correct membership check happens in the
try block on the next lines, so the early return was both wrong and
redundant.
Adds tests/unit/test_sources.py covering the four behavioral cases:
fmt-in-read, no-fmt, fmt-not-in-read, invalid-fmt.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* bump news and version to 3.0.0
* fix Mapping config args silently dropped under config_default.yaml (#116)
read_config_sources used config.add() for the Mapping branch, which
appends to the BOTTOM of confuse's source priority stack — below the
config_default.yaml loaded by config.read() at the top of the
function. So any caller passing a dict / Mapping override (e.g.
`get_config({"comicbox": {"compute_pages": True}})`) silently got the
default instead. Switch to config.set() so Mapping args land on top,
matching set_args() for the Namespace branch.
Surfaced by a downstream Codex migration that hit dead Mapping
overrides; covered now by tests/unit/test_config_layering.py.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* widen set-like config fields to accept any non-mapping container (#117)
The template arms for `read`, `write`, `export`, `delete_keys`,
`read_ignore`, and `print` previously combined `frozenset` (a
pass-through marker) and `Sequence(str)` (list-of-strings coercion).
That works for the common YAML/CLI list path but rejects callers
passing a `set` / `tuple` / `frozenset` literal — which is logically
fine for fields whose post-compute value is always a frozenset.
Replaces the per-field unions with `OneOf((set, frozenset, tuple, list))`
(`print` also accepts `str` for the historical phase-char form). The
`_build_settings` boundary already calls `frozenset(...)` on these
values, so any of the four containers normalize correctly.
Also adapts `compute_config`'s helpers — Subview iteration only
supports dict/list source values, so user-supplied set/frozenset/tuple
inputs would error before reaching the template. New `_raw_or_empty`
pulls the Python value via `.get()` and explicitly rejects mappings
with a clear error (dict iteration would silently accept dict input
otherwise). `_parse_print` now accepts a phase-char string OR any
iterable of phase chars.
Path-list fields (`paths`, `import_paths`, `metadata_cli`) keep their
existing `Sequence(...)` form with element-type validation — that
trade-off felt worth keeping.
14 new tests in tests/unit/test_config_container_inputs.py cover the
four container types per field and assert mapping rejection.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* reuse types tuple
* update deps
* 3.0.0 alpha version 0
* update compose for generic gha build
* ReadResults data structure for process functions
* compact news (#119)
* Add skip_metadata flag to get_cover_page (#120)
Callers that only want a thumbnail (e.g. codex's CoverThread) don't
need the full ComicInfo/CoverImage hint resolution. Parsing the
metadata for every cover dominates the cost of cover extraction
and emits a flood of debug-bucket Union ValidationErrors that look
like real failures in DEBUG logs.
When skip_metadata=True, bypass generate_cover_paths entirely and
read archive index 0 directly. This drops per-call schema
instantiation, Union resolution, and path normalization.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* v3.0.0a2 (#121)
* compact news
* update deps
* Drop DEBUG-bucket logging of intentionally-ignored validation errors (#122)
ClearingErrorStoreSchema previously split each schema's errors into
two buckets: ignored ones logged at DEBUG, real ones at WARNING.
The DEBUG bucket only ever held errors from ``_ignore_errors`` —
``Field may not be null.`` (sparse-field tolerance) and
``Invalid input type.`` (Union variant misses) — both of which are
internal mechanics, not operator-actionable signal. Each Union miss
emitted one ``ValidationError - {'_schema': ['Invalid input type.']}``
line per field per archive, drowning the genuinely useful per-source
DEBUG messages emitted by ``_except_on_load``.
Filter ignored errors at split time, log only WARNINGs. Real schema
failures still surface with full context (path, schema class,
normalized message). Collapses the dual-bucket _split_*_errors
methods into _filter_* + _log_warnings.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* metron: drop URL slugs for types with no public web page (#124)
* compact news
* update deps
* metron: drop broken URL slugs for genre, location, reprint, role, story, tag
Metron has no public web pages for these types — only API endpoints — so
URLs like https://metron.cloud/genre/3 always 404. Stop emitting them.
The numeric Metron ID is still preserved on the identifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Re-export to_metron_age_rating from comicbox.enums.maps (#125)
Shortens the import path for the helper from
comicbox.enums.maps.age_rating to comicbox.enums.maps so downstream
callers can reach it without drilling into the submodule.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Drop dead code surfaced by skylos scan (#126)
- Remove unused module/class constants: _COMMENT_ARCHIVE_TYPES, SUFFIXES,
_LOG_FORMAT, comet.py IDENTIFIER_TAG/IS_VERSION_OF_TAG, comictagger.py
IDENTIFIER_TAG/PAGES_TAG, XmlCountryField (and now-orphaned imports
RarFile, ZipFile, CountryField).
- Fix latent bug in TrapExceptionsMeta: `attr_name in "deserialize"` was a
substring check that wrapped any callable whose name was a substring of
"deserialize" (e.g. "er", "size", "ali"). Use the existing _WRAP_METHODS
tuple instead so only the exact `deserialize` method is wrapped.
- Simplify _get_pdf_enabled() to a plain `import pdffile` probe; the
except-arm stub import had no effect.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Replace pdffile_stub with single-shim optional integration (#127)
Consolidate the optional comicbox-pdffile integration into one module
(comicbox/_pdf.py) and delete the hand-maintained pdffile_stub.py.
Previously six call sites each duplicated a `try: from pdffile import X /
except: from pdffile_stub import X` block, and the stub class mirrored
the real PDFFile API method-for-method — silent drift risk every time
upstream pdffile shipped.
Now:
- comicbox/_pdf.py is the single source of truth for PDF_ENABLED,
PDFFile, and PAGE_FORMAT_VALUES. When pdffile is absent, PDFFile is
None at runtime; type checkers see the real class via TYPE_CHECKING.
- Every call site that touches PDFFile is gated by `if PDF_ENABLED`.
- The `case PDFFile():` arm in box/archive/archive.py is lifted to an
`if PDF_ENABLED and isinstance(archive, PDFFile):` guard above the
match (the match form would fail when PDFFile is None).
- config/__init__.py reads PAGE_FORMAT_VALUES instead of iterating an
empty stub Enum.
Verified with `pdffile` installed (307/307 tests pass) and in a fresh
venv without it (PDF_ENABLED=False, CBZ archives still work, PDF files
raise UnsupportedArchiveTypeError, CLI shows the "not installed" hint).
Net: -70 lines across 9 files.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* remove unused ty ignore
* comicbox 3 alpha 5 (#123)
* compact news
* update deps
* update news and version to alpha 4
* update deps
* rename function path in NEWS
* bump alpha version to 3.0.0a5
* version 3.0.0
* massage news
* bump version and news and update deps
* require comicbox-pdffile 0.6.x for image-dominant page detection (#131)
* require comicbox-pdffile 0.6.x for image-dominant page detection
Widens the optional ``[pdf]`` extra to require comicbox-pdffile 0.6.x.
The new minor release adds image-dominant page detection (
``PDFFile.classify_page``, ``PDFFile.read_image_if_dominant``,
``PDFFile.read_full_pixmap_jpeg``) used by browser readers to serve
scanned-comic PDF pages as plain ``<img>`` instead of routing through
pdf.js on the client.
comicbox itself doesn't use the new API — the bump is purely a pin
update so downstream callers (Codex, OPDS readers) can adopt it.
The ``[tool.uv.sources]`` block is transient: it points at the
pdffile PR branch so this CI can resolve dependencies before
0.6.x lands on PyPI. Drop it once 0.6.x publishes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* just use the released pdffile
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* regenerate pdf page fixtures for pdffile 0.6.x (#132)
Add bin/regenerate-pdf-test-pages.py — drives Comicbox.get_page_by_index
against tests/files/test_pdf.pdf to refresh tests/files/pdf/{N}.pdf when
pymupdf or pdffile change page-extraction output. Run on the next drift.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* bump pdffile to 0.6.1
* bump version and news to 3.0.2
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ajslater
added a commit
that referenced
this pull request
May 15, 2026
* update pdf pages. binary difference with new mupdf
* update docker images
* fix make install dependencies
* add jxl to image extensions
* fix ignoring macos resource forks
* resource fork test file
* update deps
* adjust news
* Squashed commit of the following:
type annotate magic metron field functions and make all params kwargs
use eslint outside of editor
update deps, new ruff rules. lint & format
* add venv upgrade script
* ignore PERF203
* update deps and install pdffile
* update deps. appease typechecker. new eslint.config
* Squashed commit of the following:
commit e27050fbd42f0cf8e549871cc06c70f041672306
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 7 21:36:49 2024 -0800
rename deserializeMeta class to TrapExcepionsMeta
fix type issues with field metaclass wrapper
* add eslint-plugin-json-schema-validator
* update deps and lint
* use mdx instead of markdown
* remove unused import
* remove superfluous plugins. remove first level globs
* update deps
* Squashed commit of the following:
fix notes parsing for metron and many variations
move notes parsing into another file.
add comicinfo metron origin test
rename modules to not shadow python builtins
fix binary pdf files for new mupdf
* bump version and news
* fix type errors
* format
* refactor dynamic class creation to appease typchecker
* add libmupdf docs
* Simplify Identifier URL construction for Metron pk ids.
* update deps
* fix story arc parsing. bump version
* update dockerfile with modern node
* Squashed commit of the following:
Comicbox 2.0
* Resolve circular import if not installed with \[pdf\] option.
* Make archive comments that aren't ComicBookInfo JSON log as debug comments
more often.
* update package links
* add more aliases for comicvine sources
* ensure dattetimes from archives are timezone aware
* update deps and bump version
* bump news
* drop version back appropriately
* fix alias tree builder
* update deps, typecheck with ty
* alphabetize comicbox fields
* uv_build
* update pyproect, eslint config, deps
* update deps
* update deps
* normalize Trade Paper Back into Trade Paperback
* update deps
* update deps
* Squashed commit of the following:
update to xmltodict 1.0. remove special code for xmltodict #text type conversion bugs
compact code for xml_fields that get cdata
remove cdatata mixn from xml lists
* update deps
* pyright ignore
* fix age rating coercion for CIX"
* add github issue code example.
* update deps
* update deps
* replace poetry with uv for run script
* update deps
* no support for python 3.14
* explicitly build with 3.13 trixie
* remove ruamel.yaml.clib from test docker
* update deps
* update deps
* new verson. fix comicbox.json dump crash
* remove unused typing exceptions. add typing exceptions for ty foolishness
* update deps add ty to makefile
* python 3.14 support
* bump version and news
* update deps
* ignore ty type ignores
* update deps
* update deps
* Squashed commit of the following:
commit 259e561
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:51:31 2025 -0800
use released pdffile
commit 4136a3b
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 19:41:28 2025 -0800
use a proper base RenderModule and clean loads for tabs because it breaks yaml
commit 3426cf0
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:20:05 2025 -0800
bump deps
commit 9fcaded
Author: AJ Slater <aj@slater.net>
Date: Sat Nov 22 17:19:49 2025 -0800
reduce complexity of dump
commit f96d27a
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:12:05 2025 -0800
gate writing pdf metadata on delete all or data exists
commit 7415b82
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 19:08:26 2025 -0800
optimize pdf writing by writing pdf data in the same context and only saving once
commit 2bd0f2c
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:57:26 2025 -0800
rename legacy embedded variables to LEGACY_NESTED equivalents
commit 5222159
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:45:06 2025 -0800
lint
commit 5d38acb
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:40:34 2025 -0800
fix print test
commit 65410c7
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 18:37:18 2025 -0800
fix most tests
commit 19d2dfe
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 17:40:51 2025 -0800
fix pdf xml tests
commit f6bf854
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 16:50:08 2025 -0800
fix tests for pdf_json
commit 590ffb8
Author: AJ Slater <aj@slater.net>
Date: Fri Nov 21 14:44:51 2025 -0800
fix accepting flexible datetimes from pdfs
commit e18925f
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:37 2025 -0800
fix pdf tests using removed params
commit 55725b5
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 15:33:19 2025 -0800
fix set subtraction
commit 2673e3a
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:57:35 2025 -0800
add bpepple to news
commit 3de741d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:56:30 2025 -0800
update schemas doc for pdf embeds
commit 484737d
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:52:54 2025 -0800
add bpepple to news
commit 0b6cdaf
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:47:31 2025 -0800
bump version and news
commit bda414c
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:42 2025 -0800
pdf write to embed files. pdf metadata keywords write tags.
commit 29fd04b
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:38:12 2025 -0800
ty ignore
commit b795c49
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:34:20 2025 -0800
add ty ignores
commit ce3ef91
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:33:43 2025 -0800
update pdffile stub
commit fd2f4a0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:38 2025 -0800
update deps
commit 267d9d0
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:31:30 2025 -0800
add alpha pdffile to sources
commit 041ce67
Author: AJ Slater <aj@slater.net>
Date: Thu Nov 20 14:30:55 2025 -0800
add pythondevmode to test script
* fix typing
* update deps
* Squashed commit of the following:
commit b31f22e6d178fcc1a5896c0dd7f680c26bc91657
Author: AJ Slater <aj@slater.net>
Date: Mon Dec 1 20:03:13 2025 -0800
typecheck with ty
* update deps
* complexipy & group deps
* reduce complexity
* update py7z library
* remove unused ty ignores
* ty fixes and ignores
* update deps
* update deps
* remove unused ty ignore
* update deps
* remove unusued ty ignores
* use OneOf instead of list syntax sugar for confuse
* update deps
* Raw yaml datetimes (#102)
* use OneOf instead of list syntax sugar for confuse
* update deps
* let yaml have raw yaml datetimes instead of strings
* use simplejson decode errors
* bump news and version
* fix test script
* fix lint backend groups
* remove unused groups
* fix test script
* really fix test script
* use grooup lint in tests for jsonschema
* tweak dep version ranges
* update deps. use dockerfmt. ruff changes inlie ifs to ors
* update dockerfile base
* update deps, remove unused ty warning ignore
* update deps add eslint plugins
* add mbake
* update deps
* fix tests for new pymupdf
* Squashed commit of the following:
commit 1fb394e109263188a16c4addeaab87bbdfdf882e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:25 2026 -0800
generate-schema scripts
commit fc9b4f5c27db827ae1592010b01708865cf3733e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 17:09:08 2026 -0800
format schemas
commit 9ccdf70d8c2318220c443714e509b6746f19a90e
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 16:39:04 2026 -0800
fix schema
commit 1a082c52887571cd258ebbc467846461c8e9686f
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 11 13:29:02 2026 -0800
add marshmallow jsonschema
* bump version and news
* add script comment
* update deps
* ty ignores
* lots more type annotaions. include py.typed sentinel
* remove unneeded ruff ignores
* prettier xml schema xsds
* convert to devenv
* update devenv and deps
* update devenv
* update devenv
* fix pytests. update pycountry
* fix cli help
* fix date serializization if already a string
* update devenv & deps
* import accepts quoted globs. bump version and news
* VALIDATE FEATURE
Squashed commit of the following:
commit 4f712ddc46859bb82eb6383d41a72502bf49f7be
Merge: 2b0b5db 06af8e3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:01:25 2026 -0800
Merge branch 'develop' into validate
commit 2b0b5db77d073da699cdf26e9481e5efd69ad424
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:32:39 2026 -0800
better validate cli help
commit f78dd859c3c8c8adf44399f723de171da9d5467a
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:25:48 2026 -0800
xsd printWidth to 120. fixes CoMet xsd.
commit d1563e96bbc944dc0669e4df0d647c44cce8c7dd
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:03:47 2026 -0800
format test files with validator
commit 59350c9e3c13e9248368146e403a1cc05c755523
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:01:09 2026 -0800
no available validator is a warning
commit f80fc325bc1cfecc9a9286f7538ac02eb6391ad6
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 13:00:40 2026 -0800
use original schema definitions unreformatted
commit 8eb5d884136e215a19754f1d6ae2fdc9c0cd2cd3
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:51:35 2026 -0800
fix symlink
commit bffef02777ba01b6c4f54ba36df7f433c45841da
Merge: 3547d24 6478b78
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:26:18 2026 -0800
Merge branch 'develop' into validate
commit 3547d24639eed74841fb76b49aa49ab238b820a4
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 11:25:50 2026 -0800
update deps
commit 29dba04deaf029466ca6794060c55b81d5c0a054
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:44:30 2026 -0800
update deps
commit 273da7ab3e87d60eea56167199e466c61867c57c
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:43:55 2026 -0800
only catch and warn on validation errors
commit 5ec0ad1928c709388facb054b3f6915285a4e4a8
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:35:41 2026 -0800
move xmlschema and jsonschema into regular deps
commit 0887cf1e07daec89b59972e9cf8ffc59c143dba2
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:33:33 2026 -0800
fix getting format from input files. change validation exception to warning
commit 4e7be5f44225522398a407c94d76c26fbd22a925
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:32:54 2026 -0800
fix guess_format
commit 2342605b4b08ce641f056a38b5b634bae75bcfec
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:13:41 2026 -0800
fix script for new location of validate_cli
commit b2ab1995e543204d15e83c00e8596681e52b70f7
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 01:12:08 2026 -0800
move schema to schema_definitions
commit deacf119c2d0823af5c6405162d23b7e32f8fb37
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:23:05 2026 -0800
better validation logging
commit c9a615f5885b53dd8e8b81c9d735808f4eaa7736
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:15:10 2026 -0800
fix validation format assignment. validation info logging
commit 914d35d15f536a6a042cbe66346b2cb4a38d636a
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 22:05:04 2026 -0800
basically working validation with definitions dir
commit 7e860e8f6110dd868dbec2f724a8bff1bd0a980d
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:22:04 2026 -0800
ignore bad typecheck warnings
commit 3d5ae84354b772ce8fc08793a2c7db64e95c46ac
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 21:13:02 2026 -0800
fix validate tests
commit 324a0c6fd9935c153fc5a172020a8c02b6f901d0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 18:26:12 2026 -0800
most tests pass. validate test fails. typecheck fails. schemas need moving into the package
commit 5c3d4cd77020b5318a0f45c8d72d432d50ad158e
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:27 2026 -0800
update deps
commit 112a71aece1adba12e4d380359da3a167456af8c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 17:16:19 2026 -0800
pin comicbox-pdffile
* bump NEWS
* PDF2CBZ extract images
Squashed commit of the following:
commit b6296ee49b49556b04adaefb12bed332f4fee857
Merge: 5bf0007 bdd3879
Author: AJ Slater <aj@slater.net>
Date: Wed Feb 18 14:07:16 2026 -0800
Merge branch 'develop' into pdf2cbz
commit 5bf0007
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:44:53 2026 -0800
bump news and version
commit 362123c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 14:39:19 2026 -0800
update pdffile to released version
commit f09571c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 13:36:29 2026 -0800
switch image_pdf to more powerful pdf_page_format
commit b1d2d1b
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:37:12 2026 -0800
fix pdf cover compare test
commit 5aaeae0
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 12:36:39 2026 -0800
move pdf format decision to _archive_readfile()
commit 2107241
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 10:32:34 2026 -0800
update deps
commit 566e426
Merge: cdc2250 38bcfe2
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:58:28 2026 -0800
Merge branch 'develop' into pdf2cbz
commit cdc2250
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:57:57 2026 -0800
fix cli help
commit 1190fe4
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:52:12 2026 -0800
fix cli option collision"
commit 63bf418
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:49:48 2026 -0800
cli option for image_pdf
commit 1d7d852
Merge: db7061c f5f03b5
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:39:49 2026 -0800
Merge branch 'develop' into pdf2cbz
commit db7061c
Author: AJ Slater <aj@slater.net>
Date: Tue Feb 17 01:36:11 2026 -0800
basic support for extract image from pdf
* move docker-compose.yaml to compose.yaml
* fix dockerfile for new devenv
* fix dockerfile for new devenv the kludgey way
* fix news
* update deps
* format dockerfile
* color and clarification for help
* fix colors for help
* update devenv
* fix prettierignore
* update devenv
* fix makefile
* v2.2.1 fix pdf datetimes
* update devenv
* update deps
* update devenv
* add ty ignores to match pyright ignores
* update devenv
* add node_root feature
* update devenv
* update devenv
* update devenv, deps and fix some ty typing
* update deepdiff and bump version
* fix news typo
* update pdfs for new pymupdf
* update deps & devenv
* update deps v2.2.3
* use usr/env for scripts
* gha workflow
* switch to github actions
* Add to_metron_age_rating() public conversion function (#108)
Provide a standalone function to convert any age rating enum or string
to a MetronAgeRatingEnum. Supports Marvel, DC, Generic, ComicInfo, and
Metron enums with fuzzy string matching (case/space-insensitive).
* add claude.md
* bump news and version
* when extracting pages make path absolute
* use python convenience method
* rename variable
* Fix path traversal vulnerability in archive extraction (#109)
Validate that resolved output paths stay within the destination
directory before writing, preventing zip-slip attacks from crafted
archive member names.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* bump news
* Optimize for large-scale workloads (600K+ files) (#110)
Reduce per-file overhead for bulk metadata reading:
- Extension-hint archive detection: check file extension first to avoid
unnecessary magic-byte disk reads (saves ~1.2M file opens for CBZ collections)
- Cache marshmallow schema instances by (class, exclude_keys) to eliminate
~4.8M schema constructions at scale
- Cache transform instances per Comicbox instance to avoid redundant creation
- Skip FrozenAttrDict re-wrapping when pre-built config is passed
- Skip redundant logger init when loglevel hasn't changed
- Remove always-on glom_debug=True from transform calls
Add parallelization API (comicbox/process.py):
- process_files() for ProcessPoolExecutor-based batch processing
- aread_metadata() async wrapper for event loop integration
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* sort ignorefiles
* bump news
* fix datetime ordering bug
* Code quality pass: match statements, pathlib, immutable constants (#111)
* Targeted code quality pass: match statements, pathlib, immutable constants
- Convert isinstance if/elif chains to match statements in archive.py,
archiveinfo.py, and time_fields.py
- Replace os.walk with Path.rglob in run.py, fixing a double-recursion
bug where recurse() re-walked subdirectories already visited by os.walk
- Wrap _HANDLE_MERGE dict in MappingProxyType in mergedeep.py
- Replace accumulator loop with list comprehension in config/computed.py
- Replace loop-append with extend + generator in box/sources.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sort ignore files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* use pdf group for tests
* update deps
* iterprocess files
* fix print test
* Righttyper Typing with corrections (#112)
* righttyper
* raw types commit
* Fix righttyper auto-generated type annotations
Correct ~535 basedpyright errors and 10 ruff errors introduced by
righttyper's runtime type capture, which used overly-literal types.
Key changes:
- Replace PosixPath annotations with Path throughout
- Simplify overly-specific dict union types to dict[str, Any]
- Remove broken self: "Module.ClassName" annotations in mixins
- Rename/remove rt_T1 TypeVars (N815/N816)
- Move Callable import to TYPE_CHECKING block (TC003)
- Make boolean params keyword-only in tests/util.py (FBT001)
- Add pyright: ignore on marshmallow method override incompatibilities
- Fix _path override annotations in archive write/dump_files
- Widen function signatures to accept Path | str | None where needed
- Fix circular import in transforms/spec.py (was referencing xml_reprints.MetaSpec)
- Guard None.items() calls in metroninfo identifiers with early returns
- Clean up various unused imports left by annotation removals
Result: 0 errors, 259 tests passing, make fix/lint/typecheck all clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* sort ignore files
* massively typed
* remove righttyper. back to python 3.10 req
* update devenv. switch to bun
* remove quoted self typing
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* remove self types
* remove self typing
* typing attempt
* fix typing errors
* fix circular import
* reorg news
* update devenv
* add bun to dockerfile
* only copy bun deps first for dockerfile
* update devenv & deps
* switch back to main marshmallow-jsonschema now that it's back from the dead
* update devenv
* fix process pool runs to deliver exceptions back and not break on passing in the logger
* test the process module
* comments
* enhance news for iterfiles
* decomplexify box init
* decomplexify process iterfiles
* allow callers to configure subprocess loguru via picklable dict (#113)
Loguru's logger object isn't picklable into ProcessPoolExecutor
workers, so callers like codex couldn't get worker log output to
match their parent-process format. Adds a worker_log_config dict
({level, format, sink}) that runs through the executor initializer
and reconfigures loguru in each worker via init_logging. Also adds
enqueue=True to the default sink for thread-safe logging.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Upgrade confuse to 2.2.0; replace AttrDict with typed Settings (#114)
* upgrade confuse to 2.2.0; replace AttrDict with typed Settings dataclass
confuse 2.2.0 makes AttrDict properly generic, so per-key types resolve
to `object` and consumers across the box mixins fail typecheck. Convert
the validated AttrDict into a frozen `Settings` dataclass once in
get_config() and propagate that typed object everywhere; confuse stays
confined to comicbox/config.
- New comicbox/config/settings.py defines `Settings` and
`ComputedSettings` (frozen, slots).
- get_config() returns Settings; new _build_settings() does the
conversion. post_process_set_for_path() rebuilt around
dataclasses.replace.
- FrozenAttrDict deleted — frozen dataclass enforces immutability.
- process.py passes Settings through pickle directly so workers skip
re-running confuse.
- Drops dead `dest_path is None` checks now that the field is required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* rename Settings to ComicboxSettings
So that client programs that already define their own `Settings` type
don't collide on import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* flatten ComputedSettings into ComicboxSettings
The hierarchical split was a confuse-template setup convenience, not a
logical grouping — there's no API benefit to keeping client code
chained through `cfg.computed.X`. Promote the six computed fields onto
ComicboxSettings under a clearly labeled comment block. The confuse
template's nested `computed` MappingTemplate is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix metadata_format hint silently dropping all api metadata (#115)
`_get_source_config_metadata` early-returned an empty list whenever the
caller set `metadata_format`, because `fmt not in self._config.read`
compared a string against a frozenset of `MetadataFormats` enums —
always True. The conversion + correct membership check happens in the
try block on the next lines, so the early return was both wrong and
redundant.
Adds tests/unit/test_sources.py covering the four behavioral cases:
fmt-in-read, no-fmt, fmt-not-in-read, invalid-fmt.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* bump news and version to 3.0.0
* fix Mapping config args silently dropped under config_default.yaml (#116)
read_config_sources used config.add() for the Mapping branch, which
appends to the BOTTOM of confuse's source priority stack — below the
config_default.yaml loaded by config.read() at the top of the
function. So any caller passing a dict / Mapping override (e.g.
`get_config({"comicbox": {"compute_pages": True}})`) silently got the
default instead. Switch to config.set() so Mapping args land on top,
matching set_args() for the Namespace branch.
Surfaced by a downstream Codex migration that hit dead Mapping
overrides; covered now by tests/unit/test_config_layering.py.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* widen set-like config fields to accept any non-mapping container (#117)
The template arms for `read`, `write`, `export`, `delete_keys`,
`read_ignore`, and `print` previously combined `frozenset` (a
pass-through marker) and `Sequence(str)` (list-of-strings coercion).
That works for the common YAML/CLI list path but rejects callers
passing a `set` / `tuple` / `frozenset` literal — which is logically
fine for fields whose post-compute value is always a frozenset.
Replaces the per-field unions with `OneOf((set, frozenset, tuple, list))`
(`print` also accepts `str` for the historical phase-char form). The
`_build_settings` boundary already calls `frozenset(...)` on these
values, so any of the four containers normalize correctly.
Also adapts `compute_config`'s helpers — Subview iteration only
supports dict/list source values, so user-supplied set/frozenset/tuple
inputs would error before reaching the template. New `_raw_or_empty`
pulls the Python value via `.get()` and explicitly rejects mappings
with a clear error (dict iteration would silently accept dict input
otherwise). `_parse_print` now accepts a phase-char string OR any
iterable of phase chars.
Path-list fields (`paths`, `import_paths`, `metadata_cli`) keep their
existing `Sequence(...)` form with element-type validation — that
trade-off felt worth keeping.
14 new tests in tests/unit/test_config_container_inputs.py cover the
four container types per field and assert mapping rejection.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* reuse types tuple
* update deps
* 3.0.0 alpha version 0
* update compose for generic gha build
* ReadResults data structure for process functions
* compact news (#119)
* Add skip_metadata flag to get_cover_page (#120)
Callers that only want a thumbnail (e.g. codex's CoverThread) don't
need the full ComicInfo/CoverImage hint resolution. Parsing the
metadata for every cover dominates the cost of cover extraction
and emits a flood of debug-bucket Union ValidationErrors that look
like real failures in DEBUG logs.
When skip_metadata=True, bypass generate_cover_paths entirely and
read archive index 0 directly. This drops per-call schema
instantiation, Union resolution, and path normalization.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* v3.0.0a2 (#121)
* compact news
* update deps
* Drop DEBUG-bucket logging of intentionally-ignored validation errors (#122)
ClearingErrorStoreSchema previously split each schema's errors into
two buckets: ignored ones logged at DEBUG, real ones at WARNING.
The DEBUG bucket only ever held errors from ``_ignore_errors`` —
``Field may not be null.`` (sparse-field tolerance) and
``Invalid input type.`` (Union variant misses) — both of which are
internal mechanics, not operator-actionable signal. Each Union miss
emitted one ``ValidationError - {'_schema': ['Invalid input type.']}``
line per field per archive, drowning the genuinely useful per-source
DEBUG messages emitted by ``_except_on_load``.
Filter ignored errors at split time, log only WARNINGs. Real schema
failures still surface with full context (path, schema class,
normalized message). Collapses the dual-bucket _split_*_errors
methods into _filter_* + _log_warnings.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* metron: drop URL slugs for types with no public web page (#124)
* compact news
* update deps
* metron: drop broken URL slugs for genre, location, reprint, role, story, tag
Metron has no public web pages for these types — only API endpoints — so
URLs like https://metron.cloud/genre/3 always 404. Stop emitting them.
The numeric Metron ID is still preserved on the identifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Re-export to_metron_age_rating from comicbox.enums.maps (#125)
Shortens the import path for the helper from
comicbox.enums.maps.age_rating to comicbox.enums.maps so downstream
callers can reach it without drilling into the submodule.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update devenv
* Drop dead code surfaced by skylos scan (#126)
- Remove unused module/class constants: _COMMENT_ARCHIVE_TYPES, SUFFIXES,
_LOG_FORMAT, comet.py IDENTIFIER_TAG/IS_VERSION_OF_TAG, comictagger.py
IDENTIFIER_TAG/PAGES_TAG, XmlCountryField (and now-orphaned imports
RarFile, ZipFile, CountryField).
- Fix latent bug in TrapExceptionsMeta: `attr_name in "deserialize"` was a
substring check that wrapped any callable whose name was a substring of
"deserialize" (e.g. "er", "size", "ali"). Use the existing _WRAP_METHODS
tuple instead so only the exact `deserialize` method is wrapped.
- Simplify _get_pdf_enabled() to a plain `import pdffile` probe; the
except-arm stub import had no effect.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Replace pdffile_stub with single-shim optional integration (#127)
Consolidate the optional comicbox-pdffile integration into one module
(comicbox/_pdf.py) and delete the hand-maintained pdffile_stub.py.
Previously six call sites each duplicated a `try: from pdffile import X /
except: from pdffile_stub import X` block, and the stub class mirrored
the real PDFFile API method-for-method — silent drift risk every time
upstream pdffile shipped.
Now:
- comicbox/_pdf.py is the single source of truth for PDF_ENABLED,
PDFFile, and PAGE_FORMAT_VALUES. When pdffile is absent, PDFFile is
None at runtime; type checkers see the real class via TYPE_CHECKING.
- Every call site that touches PDFFile is gated by `if PDF_ENABLED`.
- The `case PDFFile():` arm in box/archive/archive.py is lifted to an
`if PDF_ENABLED and isinstance(archive, PDFFile):` guard above the
match (the match form would fail when PDFFile is None).
- config/__init__.py reads PAGE_FORMAT_VALUES instead of iterating an
empty stub Enum.
Verified with `pdffile` installed (307/307 tests pass) and in a fresh
venv without it (PDF_ENABLED=False, CBZ archives still work, PDF files
raise UnsupportedArchiveTypeError, CLI shows the "not installed" hint).
Net: -70 lines across 9 files.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* remove unused ty ignore
* comicbox 3 alpha 5 (#123)
* compact news
* update deps
* update news and version to alpha 4
* update deps
* rename function path in NEWS
* bump alpha version to 3.0.0a5
* version 3.0.0
* massage news
* bump version and news and update deps
* require comicbox-pdffile 0.6.x for image-dominant page detection (#131)
* require comicbox-pdffile 0.6.x for image-dominant page detection
Widens the optional ``[pdf]`` extra to require comicbox-pdffile 0.6.x.
The new minor release adds image-dominant page detection (
``PDFFile.classify_page``, ``PDFFile.read_image_if_dominant``,
``PDFFile.read_full_pixmap_jpeg``) used by browser readers to serve
scanned-comic PDF pages as plain ``<img>`` instead of routing through
pdf.js on the client.
comicbox itself doesn't use the new API — the bump is purely a pin
update so downstream callers (Codex, OPDS readers) can adopt it.
The ``[tool.uv.sources]`` block is transient: it points at the
pdffile PR branch so this CI can resolve dependencies before
0.6.x lands on PyPI. Drop it once 0.6.x publishes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* just use the released pdffile
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* update deps
* regenerate pdf page fixtures for pdffile 0.6.x (#132)
Add bin/regenerate-pdf-test-pages.py — drives Comicbox.get_page_by_index
against tests/files/test_pdf.pdf to refresh tests/files/pdf/{N}.pdf when
pymupdf or pdffile change page-extraction output. Run on the next drift.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* bump pdffile to 0.6.1
* bump version and news to 3.0.2
* update deps
* fix initializing pdf vars with no path
* make transforming metron credits more durable
* bump news
* bump version to v3.0.3
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AttrDict. Comicbox constructor now accepts this dataclass instead of an
AttrDict
metron.cloud/{genre,location,reprint,role,story,tag}/...URLs for Metron identifiers — those paths 404 because Metron has no public
web pages for those types (only API endpoints). The numeric Metron ID is
still preserved on the identifier.
metadata to the filesystem.
comicbox.process.iter_process_files() and
comicbox.process.aread_metadata() for reading large batches of files at
once.
Comicbox.get_cover_page(skip_metadata=True)skips metadata parsing forcallers that just need the first archive image as a thumbnail. Removes
per-call schema instantiation and Union resolution overhead.
validation errors (
Invalid input type.from Union variant misses,Field may not be null.from sparse fields). These were context-freenoise — ~50 lines per archive at DEBUG that read like real failures. Real
schema errors still log at WARNING with full context.
comicbox.enums.maps.to_metron_age_rating(value: str | Enum) ->
MetronAgeRatingEnum | None