Skip to content

Open-EO/openeo-processes-dask

OpenEO Processes Dask

Poetry PyPI - Status PyPI Python Version codecov

openeo-processes-dask is a collection of Python implementations of OpenEO processes based on the xarray/dask ecosystem. It is intended to be used alongside with openeo-pg-parser-networkx, which handles the parsing and execution of OpenEO process graphs. There you'll also find a tutorial on how to register process implementations from an arbitrary source (e.g. this repo) to the registry of available processes.

Installation

Conda (recommended — mirrors CI)

Conda-forge provides GDAL system libraries and Python bindings as a single coherent package. Once installed, pip does not need to touch GDAL at all.

conda create -n openeo_processes_dask -c conda-forge python=3.12 gdal
conda activate openeo_processes_dask
pip install openeo-processes-dask[implementations]

Micromamba (lightweight alternative):

micromamba create -n openeo_processes_dask -c conda-forge python=3.12 gdal
micromamba activate openeo_processes_dask
pip install openeo-processes-dask[implementations]

System packages (Ubuntu/Debian)

If you already have GDAL system libraries, pin the pip gdal wheel to the matching version:

sudo apt-get install gdal-bin libgdal-dev python3-gdal
pip install "gdal==$(gdal-config --version)" openeo-processes-dask[implementations]

This pin avoids the common mismatch between a pip gdal wheel and your system libgdal. When using conda (above) this is unnecessary because conda-forge provides both the library and the Python binding together.

Note that by default pip install openeo-processes-dask only installs the JSON process specs. In order to install the actual implementations, add the implementations extra as shown in the examples above.


Extra build variants

A subset of process implementations with heavy or unstable dependencies are hidden behind these extras:

  • ML processes:

    pip install openeo-processes-dask[ml]

⚠️ Note on GDAL: The implementations extra depends on GDAL transitively via rasterio, rioxarray, odc-stac, and geopandas. Always install GDAL first (via conda-forge or system packages) before pip-installing extras. The ml (xgboost) and deforestation (rqadeforestation) extras do not directly depend on GDAL. This project requires GDAL >=3.9 and is CI-tested against conda-forge GDAL on Python 3.10–3.13.


Development environment

openeo-processes-dask requires poetry >1.2, see their docs for installation instructions.

Clone the repository with --recurse-submodules to also fetch the process specs:

git clone --recurse-submodules git@github.com:Open-EO/openeo-processes-dask.git

Development setup (CI pattern — conda-forge GDAL + Poetry):

# 1. Create conda env with GDAL (mirrors `.github/ci-environment.yml`)
conda create -n openeo_processes_dask_dev -c conda-forge python=3.12 gdal
conda activate openeo_processes_dask_dev

# 2. Install Poetry deps into the conda env
poetry config virtualenvs.create false
poetry install --all-extras

# 3. Verify GDAL
gdalinfo --version
python -c "from osgeo import gdal; print('GDAL Python:', gdal.__version__)"

poetry config virtualenvs.create false ensures GDAL from conda-forge is visible to pip-installed geospatial packages (rasterio, rioxarray, etc.). Without this, Poetry's isolated venv may not find the conda-forge GDAL libraries.


To add a new core dependency run:

poetry add some_new_dependency

To add a new development dependency run:

poetry add some_new_dependency --group dev

To run the test suite run:

poetry run python -m pytest

Note that you can also use the virtual environment that's generated by poetry as the kernel for the ipynb notebooks.

Pre-commit hooks

This repo makes use of pre-commit hooks to enforce linting & a few sanity checks. In a fresh development setup, install the hooks using poetry run pre-commit install. These will then automatically be checked against your changes before making the commit.

Specs

The json specs for the individual processes are tracked as a git submodule in openeo_processes_dask/specs/openeo-processes. The raw json for a specific process can be imported using from openeo_processes_dask.specs import reduce_dimension.

To bump these specs to a later version use: git -C openeo_processes_dask/specs/openeo-processes checkout <tag> git add openeo_processes_dask/specs/openeo-processes

About

Python implementations of many OpenEO processes, dask-friendly by default.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages