PraDigi: Android HTML5 fix + fresh chef from imported Kolibri DB#29
Draft
nucleogenesis wants to merge 1 commit intolearningequality:mainfrom
Draft
PraDigi: Android HTML5 fix + fresh chef from imported Kolibri DB#29nucleogenesis wants to merge 1 commit intolearningequality:mainfrom
nucleogenesis wants to merge 1 commit intolearningequality:mainfrom
Conversation
The PraDigi channel's HTML5 apps fail to load on Android because their JS sets `Utils.mobileDeviceFlag=true`, which calls into an Android-only native bridge that isn't always present. This change ships the fix and rebuilds the channel from a trustworthy source. The previous chef (`sushichef.py` + `transform.py` + `pradigi_crawlers.py` + `structure.py`) couldn't produce a clean rebuild: the crawl JSONs at `chefdata/vader/trees/` are stale (3,443 videos vs the live channel's 4,686), local zip caches contain partial/empty/no-index zips that pass `os.path.exists()` but fail HTML5 validation, and the prior in-tree Android fix at `transform.py` opened files in `'w'` mode then read from them, corrupting at least one zip with a `b'...'` bytestring signature. `fresh_chef.py` walks the imported Kolibri channel DB (`.kolibri/content/databases/e832106c639854e181616015a8b87910.sqlite3`) and emits topics, html5 apps, videos, and documents directly from local storage. `content_id` is preserved from the DB so node_ids stay stable and Kolibri user progress carries through. The Android fix is applied inline in `prepare_html5_zip(checksum)` and only re-zips when the source contains `Utils.mobileDeviceFlag=true`; unchanged zips pass through with their checksum intact, skipping re-upload entirely. A monkey-patch shims `ricecooker 0.8.0`'s `HTML5ConversionHandler.validate_archive`, which crashes when `<body>.text is None` in BeautifulSoup output. `sushichef.py` and `transform.py` are no longer used by the new chef but are kept here for review continuity (and for git blame). Safe to delete in a follow-up. Surface bugs that surfaced during recovery (NameError in `build_tree_from_json`, broken android-fix block) are fixed. Helper scripts: - `scripts/verify_zipfix.py` spot-checks that zipfix has been applied. - `scripts/scan_all_zips.py` does a full integrity scan of `chefdata/zipfiles/` for empty/missing/no-index zips. - `scripts/html5_test_server.py` is a tiny LAN server that lists every HTML5 app and serves the fixed zips for testing on a real Android device. `zipfix.py` is restored from the prior maintainer's stash; it is a one-shot Android-fix script for the legacy `chefdata/zipfiles/` layout. The new chef applies the fix inline so this script is legacy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The PraDigi channel's HTML5 apps fail to load on Android because their JS sets
Utils.mobileDeviceFlag=true, which calls into an Android-only native bridge that isn't always present. This PR ships the fix (=falsein every shipped HTML5 zip) and rebuilds the channel from a trustworthy source.The old chef couldn't produce a clean rebuild: the crawl JSONs at
chefdata/vader/trees/are stale (3,443 videos vs the live channel's 4,686), local zip caches contain partial / empty / no-index zips that passos.path.exists()but fail HTML5 validation later, and the prior in-tree Android fix intransform.pyopened files in'w'mode then read from them, corrupting at least one zip. The previous maintainer'sStudioContentNode-based refactor (stashed aslatestmbcomback) was abandoned with 5–6 missing HTML5 apps.fresh_chef.py(~370 lines) replaces the crawl-driven pipeline with one that walks the imported Kolibri channel DB at.kolibri/content/databases/e832106c639854e181616015a8b87910.sqlite3and uploads from local storage. Topics, html5 apps (with the Android fix applied inline), videos, and documents are emitted directly.content_idis preserved from the DB sonode_ids stay stable and Kolibri user progress survives the rebuild.prepare_html5_zip(checksum)only re-zips when the source contains the bad flag, so unchanged zips skip re-upload entirely.The rebuilt channel is byte-equivalent to the existing public PraDigi channel on every kind except HTML5, which gets the Android fix. End-to-end runtime is ~26 minutes single-threaded against staging.
sushichef.py,transform.py,pradigi_crawlers.py,structure.py,corrections.py, anddebugutils.pyare no longer used by the new chef.sushichef.pyandtransform.pyhave surface bugs fixed (NameError inbuild_tree_from_json, broken android-fix block) but are kept here for review continuity / git blame; safe to delete in a follow-up.A monkey-patch shims
ricecooker 0.8.0'sHTML5ConversionHandler.validate_archive, which crashes when<body>.text is None(BeautifulSoup returnsNonefor bodies with only child elements; upstream assumes always-string).References
This work was originally done in the deprovisioned
learningequality/sushi-chef-pradigirepo on theandroid-fixbranch. It is being migrated here as part of consolidating per-chef repos intokolibri-library.The
pradigi/sushichef.pyhere had a small monorepo-onlyRestore original tree after linearizationpatch (Migration Script, 2026-03-05) that is overwritten by this migration. Per the handoff,sushichef.pyis no longer used by the new chef and is safe to delete in a follow-up — but that overwrite is in this PR's diff for review.Reviewer guidance
How to verify (the loop the user runs):
set -a; . .envrc; set +aTASK_THREADS=1 STUDIO_URL=$HOTFIXES ./.venv/bin/python ./fresh_chef.py -v --token=$STUDIO_PRODUCTION_ADMIN_TOKEN --stagepradigi/scripts/html5_test_server.py --port 8008starts a LAN server with a searchable list of all 794 HTML5 apps. Openhttp://<lan-ip>:8008/on an Android device, tap an app, verify it renders.TASK_THREADS=1is required becausericecooker.utils.zip.create_predictable_zipis not thread-safe when multiple ContentNodes share an underlying zip path (PraDigi has 794 nodes sharing 323 unique zips; multi-threaded runs hitEOFError). Worth filing upstream.Risky areas:
pradigi/fresh_chef.py:prepare_html5_zip— theUtils.mobileDeviceFlag=true → =falsetransform is the actual hotfix. Idempotent; only re-zips when the flag is present, otherwise passes through unchanged so the upload deduplicates.pradigi/fresh_chef.py— thecontent_idpreservation override afterparent.add_child(child). If this drops, every node gets a newnode_idand Kolibri users lose their progress.validate_archivemonkey-patch shim. Tightly bound to ricecooker 0.8.0 internals and will need to be revisited if ricecooker is upgraded.Intentionally not fixed: the 1,267-resource gap between the 2019 published sqlite and Studio's current
main_tree. Those resources were added directly in Studio's editor over 7 years without ever being published; nobody who imports PraDigi today has them. This rebuild matches the published state. If/when those should ship, the chef has to grow a path that reads frommain_tree(Studio API or a fresh "Publish v2" snapshot).Production gate: don't run the chef against production Studio without explicit per-run approval.
STUDIO_URL=$HOTFIXESis the default.AI usage
I used Claude Code (Opus 4.7) extensively for this work — it helped diagnose the broken android-fix in
transform.py, restore the abandoned stash, writefresh_chef.pyfrom scratch against the Kolibri DB schema, write the helper scripts, and assemble this PR description from the handoff doc. I reviewed the chef line-by-line, ran end-to-end staging chefs, and verified the published-channel diff against the existing PraDigi sqlite before opening this.