This guide covers the current local-development path for Kindling projects.
The generated workflow is Python-first: scaffold locally, run poe tasks, and
only move to remote workspaces when you want deployment or end-to-end tests.
# Install Kindling from this repo's latest GitHub release.
CURRENT_RUNTIME_URL=$(curl -fsSL https://github.com/sep/spark-kindling-framework/releases/latest/download/spark_kindling-current-url.txt)
CURRENT_CLI_URL="${CURRENT_RUNTIME_URL//spark_kindling-/spark_kindling_cli-}"
CURRENT_SDK_URL="${CURRENT_RUNTIME_URL//spark_kindling-/spark_kindling_sdk-}"
pip install "spark-kindling[standalone] @ ${CURRENT_RUNTIME_URL}"
pip install "spark-kindling-cli @ ${CURRENT_CLI_URL}"
pip install "spark-kindling-sdk @ ${CURRENT_SDK_URL}"
# Then scaffold and work on your local repo, package, and app.
kindling repo init my-pipeline --output-dir ./my_pipeline
cd my_pipeline
kindling package init my-pipeline
kindling app init my-pipeline --package my-pipeline
cd packages/my_pipeline
poetry install
cp .env.example .env
# Update .env with your environment settings
source .env
poetry run poe test
poetry run poe buildIf you generated integration tests and have Azure credentials available:
poetry run poe test-integrationThe explicit scaffold flow creates:
- repo root shared files:
.devcontainer/,.github/workflows/ci.yml,.gitignore - package-local source at
packages/<pkg>/src/<pkg>/... - package-local tests at
packages/<pkg>/tests/ - package-local
pyproject.tomlwithpoethepoettasks - app-local entrypoint and config at
apps/<app>/app.pyandapps/<app>/config/
Run the commands as separate steps so repos, packages, and apps can evolve independently:
git clone <your-empty-repo-url> data-platform
cd data-platform
kindling repo init data-platform
kindling package init my-pipeline --repo-root .
kindling app init my-pipeline --package my-pipeline --repo-root .
cd apps/my_pipelineIf you start from a repo that already has a .devcontainer/ so the Kindling
CLI is available inside the container, kindling repo init will warn and leave
that devcontainer unchanged. Re-run with --overwrite-devcontainer when you
intentionally want the generated Kindling devcontainer config.
To add a second package later:
cd ../..
kindling package init customer-360 --repo-root .
cd packages/customer_360
poetry install
poetry run poe testThe generated pyproject.toml depends on the published runtime distribution:
spark-kindling = { version = ">=0.9.2", extras = ["standalone"] }The package import still stays import kindling.
Assuming you are installing Kindling from this project's GitHub releases:
CURRENT_RUNTIME_URL=$(curl -fsSL https://github.com/sep/spark-kindling-framework/releases/latest/download/spark_kindling-current-url.txt)
CURRENT_CLI_URL="${CURRENT_RUNTIME_URL//spark_kindling-/spark_kindling_cli-}"
CURRENT_SDK_URL="${CURRENT_RUNTIME_URL//spark_kindling-/spark_kindling_sdk-}"
pip install "spark-kindling[standalone] @ ${CURRENT_RUNTIME_URL}"
pip install "spark-kindling-cli @ ${CURRENT_CLI_URL}"
pip install "spark-kindling-sdk @ ${CURRENT_SDK_URL}"Other supported paths are:
# 1. Released package for local/CI use from PyPI
pip install 'spark-kindling[standalone]'
# 2. Editable source install for framework iteration
pip install -e /path/to/kindling
# 3. Local wheel built from this repo
poetry run poe build
pip install 'spark-kindling[standalone] @ file:///path/to/dist/spark_kindling-<version>-py3-none-any.whl'Use the standalone extra for local work because it brings in the Spark runtime
packages that managed platforms already provide.
Generated packages expose these tasks:
poetry run poe test-unit
poetry run poe test-component
poetry run poe test
poetry run poe buildWhen integration tests are included, the scaffold also adds:
poetry run poe test-integration
poetry run poe test-allAt the repo level, the generated CI workflow loops over packages/* and runs
each package independently:
for pkg in packages/*; do
if [ -f "$pkg/pyproject.toml" ]; then
(cd "$pkg" && poetry install --no-interaction && poetry run poe test && poetry run poe build)
fi
doneThat means local day-to-day work stays package-scoped, while CI validates all scaffolded packages in the repo.
Use kindling app run to execute all registered pipes locally with the
standalone platform:
cd apps/my_pipeline
kindling app run .
kindling app run . --platform standalone --env localUse kindling pipeline run to execute one registered pipe without deploying to
a remote platform:
kindling pipeline run bronze_to_silverThe command auto-discovers app.py by walking up from the current directory. You
can also be explicit:
kindling pipeline run bronze_to_silver --app apps/my_pipeline/app.py --env local--env selects the config overlay (defaults to the KINDLING_ENV env var, then
"local"). On success you will see:
Running pipe: bronze_to_silver
Pipe 'bronze_to_silver' completed successfully.
If the pipe ID is not registered, the error message lists all available pipe IDs.
kindling app validate checks that your entity and pipe definitions are internally
consistent — without starting a SparkSession:
kindling app validateExample output:
[PASS] entities_registered — 3 entity/entities
[PASS] pipes_registered — 2 pipe/pipes
[PASS] pipe.bronze_to_silver.input_entities — OK
[PASS] pipe.bronze_to_silver.output_entity — OK
[PASS] entity.silver.records.merge_columns — OK
Validation passed.
Checks performed:
- At least one entity and one pipe are registered
- Every pipe's input entities and output entity exist in the registry
- Every delta entity has
merge_columnsset
kindling app validate is safe to run in CI before tests because it never creates a
Spark context.
The generated env.local.yaml now scaffolds entity tags with
provider_type: memory by default. This means kindling app run,
kindling pipeline run, and unit/component tests work out of the box - no Azure
credentials or ABFSS paths required.
To switch to real Azure storage, uncomment the ABFSS block in env.local.yaml
and set the required env vars in your .env file.
If you see KindlingNotInitializedError it means a @DataPipes.pipe or
@DataEntities.entity decorator fired before initialize() was called. The
fix is to ensure app.py calls initialize() before importing any module that
registers pipes or entities — i.e. before register_all(). The error message
includes a pointer to the correct order.
For local integration tests against ABFSS you still need:
- Java 11+ on
PATH - The Python environment installed via
poetry install - Hadoop Azure JARs in
/tmp/hadoop-jars
The CLI checks all of this for you:
kindling env check --local --config config/settings.yamlThe CLI now covers the basic local-to-remote app lifecycle:
# Package an app directory into a .kda archive
kindling app package path/to/app-dir
# Deploy an app directory or .kda package
kindling app deploy --local-folder path/to/app-dir --platform fabric --app-name my-app
# Run all registered pipes locally with standalone Spark
kindling app run path/to/app-dir
# Run an app remotely from a local directory or from a deployed app name
kindling runner ensure --platform synapse
kindling app run path/to/app-dir --platform synapse
kindling app status <run-id> --platform synapse
kindling app logs <run-id> --platform synapseRemote operations use spark-kindling-sdk, so install it alongside the CLI when
you want deploy/manage capabilities.
For notebook-backed platforms, the storage bootstrap flow still uses:
kindling workspace deploy --platform synapse --storage-account <account>workspace deploy now prefers the combined runtime wheel:
dist/spark_kindling-*.whl- falls back to legacy
dist/kindling_<platform>-*.whlif needed
It also uploads runtime/scripts/kindling_bootstrap.py plus config overlays.