Skip to content

feat(cli): Add post-processing steps to optimize the bundle size#91

Open
ryanking13 wants to merge 2 commits intomainfrom
gyeongjae/wheel-optimize
Open

feat(cli): Add post-processing steps to optimize the bundle size#91
ryanking13 wants to merge 2 commits intomainfrom
gyeongjae/wheel-optimize

Conversation

@ryanking13
Copy link
Copy Markdown
Contributor

This adds a post-processing step after installing python packages that reduces the size of the python_modules directory by applying several optimizers.

By default, following optimizers are on:

  • remove_docstrings: cleans up all the docstrings
  • remove_comments: cleans up all the comments
  • minify_whitespace: 4 spaces => 1 space
  • remove_pycache: Removes all pycache directory

while users can addtionally turn on other optimizers if they face size limit issues.

These optimizers can be configured by the pyproject.toml file:

[tool.pywrangler.optimize]
remove_docstrings = false
remove_comments = true
...

@ryanking13
Copy link
Copy Markdown
Contributor Author

A very simple worker that uses pandas library gives the following size difference (uncompressed)

before

Attaching additional modules:
┌──────────────────────┬────────┬──────────────┐
│ Name                 │ Type   │ Size         │
├──────────────────────┼────────┼──────────────┤
│ submodule.py         │ python │ 0.05 KiB     │
├──────────────────────┼────────┼──────────────┤
│ Vendored Modules     │        │ 36097.02 KiB │
├──────────────────────┼────────┼──────────────┤
│ Total (2083 modules) │        │ 36097.07 KiB │
└──────────────────────┴────────┴──────────────┘

after

Attaching additional modules:
┌──────────────────────┬────────┬──────────────┐
│ Name                 │ Type   │ Size         │
├──────────────────────┼────────┼──────────────┤
│ submodule.py         │ python │ 0.05 KiB     │
├──────────────────────┼────────┼──────────────┤
│ Vendored Modules     │        │ 29963.09 KiB │
├──────────────────────┼────────┼──────────────┤
│ Total (2083 modules) │        │ 29963.14 KiB │
└──────────────────────┴────────┴──────────────┘

Copy link
Copy Markdown

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR adds a post-processing optimization step using wheel-optimizer to reduce python_modules bundle size, with sensible defaults and user-configurable overrides via pyproject.toml.

Issues

  1. Unpinned wheel-optimizer dependency (medium): Every other dependency in pyproject.toml has version constraints (e.g., click>=8.0.0,<9.0.0). Leaving wheel-optimizer unpinned risks breakage if a future release changes the OptimizerConfig/OptimizerPipeline API.

  2. Integration test assumes py_files is non-empty (low): py_files[0] will raise IndexError if no .py files exceed 100 bytes, producing a confusing traceback instead of a clear test failure.

"pyodide-cli",
"pyjson5>=1.6.0",
"pyodide-py",
"wheel-optimizer",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other dependencies specify version bounds. Consider pinning wheel-optimizer to avoid unexpected breakage on a new major release.

Suggested change
"wheel-optimizer",
"wheel-optimizer>=0.1.0,<1.0.0",

(Adjust the range to match the version you're currently developing against.)

for f in vendor.rglob("*.py")
if f.stat().st_size > min_file_size and f.name != "pyvenv.cfg"
]
content = py_files[0].read_text()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no .py file exceeds 100 bytes (unlikely with six, but possible if the package layout changes), this line raises an IndexError instead of a clear test failure. Adding a guard assertion gives a better diagnostic.

Suggested change
content = py_files[0].read_text()
assert py_files, "No .py files larger than 100 bytes found in vendor directory"
content = py_files[0].read_text()

@ask-bonk
Copy link
Copy Markdown

ask-bonk bot commented Apr 14, 2026

Review posted successfully on PR #91 with two inline suggestion comments:

  1. Unpinned wheel-optimizer dependency (packages/cli/pyproject.toml:22) -- suggested adding version bounds consistent with the rest of the project's dependencies.

  2. Integration test IndexError risk (packages/cli/tests/test_optimize.py:229) -- suggested adding a guard assertion before indexing py_files[0] to produce a clear failure message instead of a cryptic traceback.

github run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant