Skip to content

Commit d8b1f3f

Browse files
Add some basic test coverage (#6)
* Add some basic test coverage * Doc comments and readme * Document args. Add test assert. * Fix failing test
1 parent 9a530b8 commit d8b1f3f

11 files changed

Lines changed: 300 additions & 148 deletions

File tree

.vscode/settings.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"python.testing.pytestArgs": [
3+
"."
4+
],
5+
"python.testing.unittestEnabled": false,
6+
"python.testing.pytestEnabled": true
7+
}

README.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,46 @@
11
# package-python-function
2-
Python script to package a Python function for deploying to AWS Lambda
2+
Python command-line (CLI) tool to package a Python function for deploying to AWS Lambda, and possibly other
3+
cloud platforms.
4+
5+
This tool builds a ZIP file from a virtual environment with all depedencies installed that are to be included in the final deployment asset. If the content is larger than AWS Lambda's maximum unzipped package size of 250 MiB,
6+
then this tool will employ the ZIP-inside-ZIP (nested-ZIP) workaround. This allows deploying Lambdas with large
7+
dependency packages, especially those with native code compiled extensions like Pandas, PyArrow, etc.
8+
9+
This technique was originally pioneered by [serverless-python-requirements](https://github.com/serverless/serverless-python-requirements), which is a NodeJS (JavaScript) plugin for the [Serverless Framework](https://github.com/serverless/serverless). This technique has been improved here to not require any special imports in your entrypoint source file. That is, no changes are needed to your source code to leverage the nested ZIP deployment.
10+
11+
The motivation for this Python tool is to achieve the same results as serverless-python-requirements but with a
12+
purely Python tool. This can simplify and speed up developer and CI/CD workflows.
13+
14+
One important thing that this tool does not do is build the target virtual environment and install all of the
15+
dependencies. You must first generate that with a tool like [Poetry](https://github.com/python-poetry/poetry) and the [poetry-plugin-bundle](https://github.com/python-poetry/poetry-plugin-bundle).
16+
17+
## Example command sequence
18+
```
19+
poetry bundle venv .build/.venv --without dev
20+
package-python-function .build/.venv --output-dir .build/lambda
21+
```
22+
23+
The output will be a .zip file with the same name as your project from your pyproject.toml file.
24+
25+
## Installation
26+
Use [pipx](https://github.com/pypa/pipx) to install:
27+
28+
```
29+
pipx install package-python-function
30+
```
31+
32+
## Usage / Arguments
33+
`package-python-function venv_dir [--project PROJECT] [--output-dir OUTPUT_DIR] [--output OUTPUT]`
34+
35+
- `venv_dir` [Required]: The path to the virtual environment to package.
36+
37+
- `--project` [Optional]: Path to the pyproject.toml file. Omit to use the pyproject.toml file in the current working directory.
38+
39+
One of the following must be specified:
40+
- `--output`: The full output path of the final zip file.
41+
42+
- `--output-dir`: The output directory for the final zip file. The name of the zip file will be based on the project's
43+
name in the pyproject.toml file.
44+
45+
46+

package_python_function/main.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
def main() -> None:
88
args = parse_args()
99
project_path = Path(args.project).resolve()
10-
venv_path = Path(args.venv).resolve()
10+
venv_path = Path(args.venv_dir).resolve()
1111
output_dir_path = Path(args.output_dir).resolve()
1212
output_file_path = Path(args.output).resolve() if args.output else None
1313
packager = Packager(venv_path, project_path, output_dir_path, output_file_path)
@@ -16,8 +16,8 @@ def main() -> None:
1616

1717
def parse_args() -> argparse.Namespace:
1818
arg_parser = argparse.ArgumentParser()
19-
arg_parser.add_argument("venv", type=str)
20-
arg_parser.add_argument("--project", type=str, default='pyproject.toml')
21-
arg_parser.add_argument("--output-dir", type=str, default='.')
22-
arg_parser.add_argument("--output", type=str, default='')
19+
arg_parser.add_argument("venv_dir", type=str, help="The directory path to the virtual environment to package into a zip file")
20+
arg_parser.add_argument("--project", type=str, default='pyproject.toml', help="The path to the project's pyproject.toml file. Omit to use pyproject.toml in the current working directory.")
21+
arg_parser.add_argument("--output-dir", type=str, default='.', help="The directory path to save the output zip file. Default is the current working directory.")
22+
arg_parser.add_argument("--output", type=str, default='', help="The full file path for the output file. Use this instead of --output-dir if you want total control of the output file path.")
2323
return arg_parser.parse_args()

package_python_function/nested_zip_loader.py

Lines changed: 38 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,27 @@
1-
# AWS imposes a 10 second limit on the INIT sequence of a Lambda function. If this time limit is reached, the process
2-
# is terminated and the INIT is performed again as part of the function's billable invocation.
3-
# Reference: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
4-
#
5-
# For this reason, we can be left with an incomplete extraction and so care is taken to avoid inadverently using it.
6-
#
7-
# From https://docs.python.org/3/reference/import.html
8-
# "The module will exist in sys.modules before the loader executes the module code. This is crucial because the module
9-
# code may (directly or indirectly) import itself"
10-
11-
# TODO: Inspired by serverless-python-requirements.
1+
"""
2+
Purpose: This module is responsible for extracting the nested zip file that contains your code and dependencies for
3+
the Lambda function.
4+
When activated, the content of this file is packaged as your entrypoint's __init__.py module in the
5+
outer ZIP file.
6+
7+
It works by leveraging the Python import system's ability for a module to dynamically replace its code.
8+
When Lambda performs the INIT sequence, it will import the module configured as the entrypoint. Python will
9+
then first import `the_project/__init__.py``, which is where package-python-function has put the code from
10+
this file. This code will then extract the nested zip file and replace the module's code with the extracted
11+
code, then trigger a reload of the original module.
12+
13+
From https://docs.python.org/3/reference/import.html
14+
"The module will exist in sys.modules before the loader executes the module code. This is crucial because the module
15+
code may (directly or indirectly) import itself"
16+
17+
Inspired by [serverless-python-requirements](https://github.com/serverless/serverless-python-requirements/blob/master/unzip_requirements.py).
18+
19+
Note:
20+
AWS imposes a 10 second limit on the INIT sequence of a Lambda function. If this time limit is reached, the process
21+
is terminated and the INIT is performed again as part of the function's billable invocation.
22+
Reference: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
23+
For this reason, we can be left with an incomplete extraction and so care is taken to avoid inadverently using it.
24+
"""
1225

1326
def load_nested_zip() -> None:
1427
from pathlib import Path
@@ -27,20 +40,28 @@ def load_nested_zip() -> None:
2740

2841
staging_package_path = temp_path / ".stage.package-python-function"
2942

30-
# TODO BW: Work this out.
3143
if staging_package_path.exists():
3244
shutil.rmtree(str(staging_package_path))
3345

3446
nested_zip_path = Path(__file__).parent / '.dependencies.zip'
3547

3648
zipfile.ZipFile(str(nested_zip_path), 'r').extractall(str(staging_package_path))
37-
os.rename(str(staging_package_path), str(target_package_path)) # Atomic -- TODO BW DOCME
3849

39-
# TODO BW: Update this comment
40-
# [No longer applicable] We want our path to look like [working_dir, /tmp/package-python-function, ...]
50+
# The idea here is that we don't rename the path until everything has been successfuly extracted.
51+
# This is expected to be a an atomic operation. That way, if AWS terminates us during the extraction,
52+
# we won't try and use the incomplete extraction.
53+
os.rename(str(staging_package_path), str(target_package_path))
54+
55+
# Lambda sets up the sys.path like this:
56+
# ['/var/task', '/opt/python/lib/python3.13/site-packages', '/opt/python',
57+
# '/var/lang/lib/python3.13/site-packages', '/var/runtime', ...]
58+
# Where the first entry is the directory where Lambda extracted the zip file
4159
# Refer to https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-searchpath
42-
# We need to replace the original path that AWS Lambda setup for us.
43-
# sys.path.insert(1, str(target_package_path))
60+
# We then replace the first entry with the directory where we extracted the nested zip file so that sys.path
61+
# becomes:
62+
# ['/tmp/package-python-function', '/opt/python/lib/python3.13/site-packages', '/opt/python',
63+
# '/var/lang/lib/python3.13/site-packages', '/var/runtime', ...]
64+
# Then we trigger a reload on the current module so that the original module code is loaded.
4465
sys.path[0] = str(target_package_path)
4566
importlib.reload(sys.modules[__name__])
4667

package_python_function/packager.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ def input_path(self) -> Path:
2626
return python_paths[0] / 'site-packages'
2727

2828
def package(self) -> None:
29+
# TODO: Improve logging.
2930
print("Packaging:", self.project.path)
3031
print("Output:", self.output_file)
3132
print("Input:", self.input_path)

package_python_function/python_project.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ def entrypoint_package_name(self) -> str:
2222
The subdirectory name in the source virtual environment's site-packages that contains the function's entrypoint
2323
code.
2424
"""
25-
# TODO : Parse out the project's package dir(s). Use the first one if there are multiple.
25+
# TODO : Parse out the project's package dir(s) if defined. Use the first one if there are multiple.
2626
return self.name.replace('-', '_')
2727

2828
def find_value(self, paths: tuple[tuple[str]]) -> str:

0 commit comments

Comments
 (0)