openeo-processes-dask is a collection of Python implementations of OpenEO processes based on the xarray/dask ecosystem. It is intended to be used alongside with openeo-pg-parser-networkx, which handles the parsing and execution of OpenEO process graphs. There you'll also find a tutorial on how to register process implementations from an arbitrary source (e.g. this repo) to the registry of available processes.
Conda-forge provides GDAL system libraries and Python bindings as a single coherent package. Once installed, pip does not need to touch GDAL at all.
conda create -n openeo_processes_dask -c conda-forge python=3.12 gdal
conda activate openeo_processes_dask
pip install openeo-processes-dask[implementations]Micromamba (lightweight alternative):
micromamba create -n openeo_processes_dask -c conda-forge python=3.12 gdal
micromamba activate openeo_processes_dask
pip install openeo-processes-dask[implementations]If you already have GDAL system libraries, pin the pip gdal wheel to the matching version:
sudo apt-get install gdal-bin libgdal-dev python3-gdal
pip install "gdal==$(gdal-config --version)" openeo-processes-dask[implementations]This pin avoids the common mismatch between a pip
gdalwheel and your systemlibgdal. When using conda (above) this is unnecessary because conda-forge provides both the library and the Python binding together.
Note that by default pip install openeo-processes-dask only installs the JSON process specs.
In order to install the actual implementations, add the implementations extra as shown in the examples above.
A subset of process implementations with heavy or unstable dependencies are hidden behind these extras:
-
ML processes:
pip install openeo-processes-dask[ml]
implementations extra depends on GDAL transitively via rasterio, rioxarray, odc-stac, and geopandas.
Always install GDAL first (via conda-forge or system packages) before pip-installing extras.
The ml (xgboost) and deforestation (rqadeforestation) extras do not directly depend on GDAL.
This project requires GDAL >=3.9 and is CI-tested against conda-forge GDAL on Python 3.10–3.13.
openeo-processes-dask requires poetry >1.2, see their docs for installation instructions.
Clone the repository with --recurse-submodules to also fetch the process specs:
git clone --recurse-submodules git@github.com:Open-EO/openeo-processes-dask.gitDevelopment setup (CI pattern — conda-forge GDAL + Poetry):
# 1. Create conda env with GDAL (mirrors `.github/ci-environment.yml`)
conda create -n openeo_processes_dask_dev -c conda-forge python=3.12 gdal
conda activate openeo_processes_dask_dev
# 2. Install Poetry deps into the conda env
poetry config virtualenvs.create false
poetry install --all-extras
# 3. Verify GDAL
gdalinfo --version
python -c "from osgeo import gdal; print('GDAL Python:', gdal.__version__)"
poetry config virtualenvs.create falseensures GDAL from conda-forge is visible to pip-installed geospatial packages (rasterio,rioxarray, etc.). Without this, Poetry's isolated venv may not find the conda-forge GDAL libraries.
To add a new core dependency run:
poetry add some_new_dependencyTo add a new development dependency run:
poetry add some_new_dependency --group devTo run the test suite run:
poetry run python -m pytestNote that you can also use the virtual environment that's generated by poetry as the kernel for the ipynb notebooks.
This repo makes use of pre-commit hooks to enforce linting & a few sanity checks. In a fresh development setup, install the hooks using poetry run pre-commit install. These will then automatically be checked against your changes before making the commit.
The json specs for the individual processes are tracked as a git submodule in openeo_processes_dask/specs/openeo-processes.
The raw json for a specific process can be imported using from openeo_processes_dask.specs import reduce_dimension.
To bump these specs to a later version use:
git -C openeo_processes_dask/specs/openeo-processes checkout <tag>
git add openeo_processes_dask/specs/openeo-processes