Skip to content

PraDigi: Android HTML5 fix + fresh chef from imported Kolibri DB#29

Draft
nucleogenesis wants to merge 1 commit intolearningequality:mainfrom
nucleogenesis:pradigi/android-html5-fix-fresh-chef
Draft

PraDigi: Android HTML5 fix + fresh chef from imported Kolibri DB#29
nucleogenesis wants to merge 1 commit intolearningequality:mainfrom
nucleogenesis:pradigi/android-html5-fix-fresh-chef

Conversation

@nucleogenesis
Copy link
Copy Markdown
Contributor

Summary

The PraDigi channel's HTML5 apps fail to load on Android because their JS sets Utils.mobileDeviceFlag=true, which calls into an Android-only native bridge that isn't always present. This PR ships the fix (=false in every shipped HTML5 zip) and rebuilds the channel from a trustworthy source.

The old chef couldn't produce a clean rebuild: the crawl JSONs at chefdata/vader/trees/ are stale (3,443 videos vs the live channel's 4,686), local zip caches contain partial / empty / no-index zips that pass os.path.exists() but fail HTML5 validation later, and the prior in-tree Android fix in transform.py opened files in 'w' mode then read from them, corrupting at least one zip. The previous maintainer's StudioContentNode-based refactor (stashed as latestmbcomback) was abandoned with 5–6 missing HTML5 apps.

fresh_chef.py (~370 lines) replaces the crawl-driven pipeline with one that walks the imported Kolibri channel DB at .kolibri/content/databases/e832106c639854e181616015a8b87910.sqlite3 and uploads from local storage. Topics, html5 apps (with the Android fix applied inline), videos, and documents are emitted directly. content_id is preserved from the DB so node_ids stay stable and Kolibri user progress survives the rebuild. prepare_html5_zip(checksum) only re-zips when the source contains the bad flag, so unchanged zips skip re-upload entirely.

The rebuilt channel is byte-equivalent to the existing public PraDigi channel on every kind except HTML5, which gets the Android fix. End-to-end runtime is ~26 minutes single-threaded against staging.

sushichef.py, transform.py, pradigi_crawlers.py, structure.py, corrections.py, and debugutils.py are no longer used by the new chef. sushichef.py and transform.py have surface bugs fixed (NameError in build_tree_from_json, broken android-fix block) but are kept here for review continuity / git blame; safe to delete in a follow-up.

A monkey-patch shims ricecooker 0.8.0's HTML5ConversionHandler.validate_archive, which crashes when <body>.text is None (BeautifulSoup returns None for bodies with only child elements; upstream assumes always-string).

References

This work was originally done in the deprovisioned learningequality/sushi-chef-pradigi repo on the android-fix branch. It is being migrated here as part of consolidating per-chef repos into kolibri-library.

The pradigi/sushichef.py here had a small monorepo-only Restore original tree after linearization patch (Migration Script, 2026-03-05) that is overwritten by this migration. Per the handoff, sushichef.py is no longer used by the new chef and is safe to delete in a follow-up — but that overwrite is in this PR's diff for review.

Reviewer guidance

How to verify (the loop the user runs):

  1. set -a; . .envrc; set +a
  2. TASK_THREADS=1 STUDIO_URL=$HOTFIXES ./.venv/bin/python ./fresh_chef.py -v --token=$STUDIO_PRODUCTION_ADMIN_TOKEN --stage
  3. Manually Publish on hotfixes Studio.
  4. Re-import the resulting sqlite into a fresh Kolibri.
  5. Confirm 4,490 resources (3,467 video + 794 html5 + 229 doc).
  6. Tablet smoke test: pradigi/scripts/html5_test_server.py --port 8008 starts a LAN server with a searchable list of all 794 HTML5 apps. Open http://<lan-ip>:8008/ on an Android device, tap an app, verify it renders.

TASK_THREADS=1 is required because ricecooker.utils.zip.create_predictable_zip is not thread-safe when multiple ContentNodes share an underlying zip path (PraDigi has 794 nodes sharing 323 unique zips; multi-threaded runs hit EOFError). Worth filing upstream.

Risky areas:

  • pradigi/fresh_chef.py:prepare_html5_zip — the Utils.mobileDeviceFlag=true → =false transform is the actual hotfix. Idempotent; only re-zips when the flag is present, otherwise passes through unchanged so the upload deduplicates.
  • pradigi/fresh_chef.py — the content_id preservation override after parent.add_child(child). If this drops, every node gets a new node_id and Kolibri users lose their progress.
  • The validate_archive monkey-patch shim. Tightly bound to ricecooker 0.8.0 internals and will need to be revisited if ricecooker is upgraded.

Intentionally not fixed: the 1,267-resource gap between the 2019 published sqlite and Studio's current main_tree. Those resources were added directly in Studio's editor over 7 years without ever being published; nobody who imports PraDigi today has them. This rebuild matches the published state. If/when those should ship, the chef has to grow a path that reads from main_tree (Studio API or a fresh "Publish v2" snapshot).

Production gate: don't run the chef against production Studio without explicit per-run approval. STUDIO_URL=$HOTFIXES is the default.

AI usage

I used Claude Code (Opus 4.7) extensively for this work — it helped diagnose the broken android-fix in transform.py, restore the abandoned stash, write fresh_chef.py from scratch against the Kolibri DB schema, write the helper scripts, and assemble this PR description from the handoff doc. I reviewed the chef line-by-line, ran end-to-end staging chefs, and verified the published-channel diff against the existing PraDigi sqlite before opening this.

The PraDigi channel's HTML5 apps fail to load on Android because their
JS sets `Utils.mobileDeviceFlag=true`, which calls into an Android-only
native bridge that isn't always present. This change ships the fix and
rebuilds the channel from a trustworthy source.

The previous chef (`sushichef.py` + `transform.py` + `pradigi_crawlers.py`
+ `structure.py`) couldn't produce a clean rebuild: the crawl JSONs at
`chefdata/vader/trees/` are stale (3,443 videos vs the live channel's
4,686), local zip caches contain partial/empty/no-index zips that pass
`os.path.exists()` but fail HTML5 validation, and the prior in-tree
Android fix at `transform.py` opened files in `'w'` mode then read from
them, corrupting at least one zip with a `b'...'` bytestring signature.

`fresh_chef.py` walks the imported Kolibri channel DB
(`.kolibri/content/databases/e832106c639854e181616015a8b87910.sqlite3`)
and emits topics, html5 apps, videos, and documents directly from local
storage. `content_id` is preserved from the DB so node_ids stay stable
and Kolibri user progress carries through. The Android fix is applied
inline in `prepare_html5_zip(checksum)` and only re-zips when the source
contains `Utils.mobileDeviceFlag=true`; unchanged zips pass through with
their checksum intact, skipping re-upload entirely.

A monkey-patch shims `ricecooker 0.8.0`'s
`HTML5ConversionHandler.validate_archive`, which crashes when
`<body>.text is None` in BeautifulSoup output.

`sushichef.py` and `transform.py` are no longer used by the new chef but
are kept here for review continuity (and for git blame). Safe to delete
in a follow-up. Surface bugs that surfaced during recovery (NameError in
`build_tree_from_json`, broken android-fix block) are fixed.

Helper scripts:
- `scripts/verify_zipfix.py` spot-checks that zipfix has been applied.
- `scripts/scan_all_zips.py` does a full integrity scan of
  `chefdata/zipfiles/` for empty/missing/no-index zips.
- `scripts/html5_test_server.py` is a tiny LAN server that lists every
  HTML5 app and serves the fixed zips for testing on a real Android
  device.

`zipfix.py` is restored from the prior maintainer's stash; it is a
one-shot Android-fix script for the legacy `chefdata/zipfiles/` layout.
The new chef applies the fix inline so this script is legacy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant