Open-source Data integration platform
OpenHEXA is an open-source data integration platform developed by Bluesquare.
Its goal is to facilitate data integration and analysis workflows, in particular in the context of public health projects.
Please refer to the OpenHEXA wiki for more information about OpenHEXA.
This repository contains the code for what we call the app component, which mostly offers a GraphQL API and an
infrastructure to run data pipelines.
OpenHEXA App is published as a Docker Image on Docker Hub: blsq/openhexa-app.
You can use docker run blsq/openhexa-app help to list the available commands.
The Installation instructions section of our wiki gives an overview of the local development setup required to run OpenHEXA locally.
To ease the setup of the environment and management of dependencies, we are using containerization, in particular
Docker. As such, we provide a docker-compose.yaml file for local development.
When running the App component using docker compose, the code of this repository is mounted as a volume within the
container, so that any change you make in your local copy of the codebase is directly reflected in the running
container.
The following steps will get you up and running:
cp .env.dist .env # adapt the .env file with the required configuration values
docker network create openhexa
docker compose build
docker compose run app fixtures
docker compose upThis will correctly configure all the environment variables, fill the database with some initial data and start the base
db and app services. The app is then exposed on localhost:8000. Two main paths are available:
- http://localhost:8000/graphql for the GraphQL API
- http://localhost:8000/ready for the readiness endpoint
Anything else will be redirected to the frontend served at http://localhost:3000.
You can then log in with the following credentials: root@openhexa.org/root
Python requirements are handled with pip-tools, you will need to install it.
When you want to add a requirement, simply update requirements.in and run pip-compile in the root directory. You
can then rebuild the Docker image.
By default, the app will use GCP as the storage backend for workspaces. If you want to use a local storage backend, you
can use the minio profile:
docker compose --profile minio upYou also have to set the WORKSPACE_STORAGE_ENGINE environment variable to s3.
If you need the pipelines or want to work on them, there are 2 optional services to run: pipelines_runner and/or
pipelines_scheduler. You can run them with the following command instead of docker compose up:
docker compose --profile pipelines upThe Writing OpenHEXA pipelines section of the wiki contains the instructions needed to build and deploy a data pipeline on OpenHEXA.
To deploy and run data pipelines locally, you will need to:
- Create a workspace on your local instance
- Configure the SDK to use your local instance as the backend
openhexa config set_url http://localhost:8000You can now deploy your pipelines to your local OpenHEXA instance.
Please refer to the SDK documentation for more information.
If you need the optional services dataworker, you can run the following command instead of docker compose up:
docker compose --profile dataworker up The app Docker image contains an entrypoint. You can use the following to list the available commands:
docker compose run app helpAs an example, use the following command to run the migrations:
docker compose run app migrateRunning the tests is as simple as:
docker compose run app test --settings=config.settings.testSome tests call external resources (such as the public DHIS2 API) and will slow down the suite. You can exclude them when running the test suite for unrelated parts of the codebase:
docker compose run app test --exclude-tag=external --settings=config.settings.testYou can run a specific test as it follows:
docker compose run app test hexa.core.tests.CoreTest.test_ready_200 --settings=config.settings.testThere are many other options, if you want to find out more, look at the documentation of Django test harness, as it is what we are using.
You can extract the strings to translate with the following command:
docker compose run app manage makemessages -l fr # Where fr is the language codeYou can then translate the strings in the hexa/locale folder.
To compile the translations, run the following command:
docker compose run app manage compilemessagesOur python code is linted using ruff. It also handles code formatting, and import sorting.
We currently target the Python 3.9 syntax.
We use a pre-commit hook to lint the code before committing. Make sure that pre-commit is
installed, and run pre-commit install the first time you check out the code. Linting will again be checked
when submitting a pull request.
You can run the lint tools manually using pre-commit run --all.
This library follows Semantic Versioning. Tagging and releases' creation are managed by release-please that will create and maintain a pull request with the next release based on the commit messages of the new commits.
Triggering a new release is done by merging the pull request created by release-please. The result is:
- the changelog.md is updated with the commit messages
- a GitHub release is created
- a docker image is built for the new tag and pushed on the docker registry
This project assumes you are using Conventional Commit messages.
The most important prefixes you should have in mind are:
fix:which represents bug fixes, and correlates to a SemVer patch.feat:which represents a new feature, and correlates to a SemVer minor.feat!:, orfix!:,refactor!:, etc., which represent a breaking change (indicated by the!) and will result in a SemVer major.