A geospatial bioeconomy project for biositing analysis in California. This repository provides tools for ETL pipelines to process data from Google Sheets into PostgreSQL databases, geospatial analysis using QGIS, and a REST API for data access.
This project uses a PEP 420 namespace package structure with three main components:
ca_biositing.datamodels: Hand-written SQLModel database models, materialized views, and database configurationca_biositing.pipeline: ETL pipelines orchestrated with Prefect, deployed via Dockerca_biositing.webservice: FastAPI REST API for data access
ca-biositing/
โโโ src/ca_biositing/ # Namespace package root
โ โโโ datamodels/ # Database models (SQLModel) and Alembic migrations
โ โโโ pipeline/ # ETL pipelines (Prefect)
โ โโโ webservice/ # REST API (FastAPI)
โโโ resources/ # Deployment resources
โ โโโ docker/ # Docker Compose configuration
โ โโโ prefect/ # Prefect deployment files
โโโ tests/ # Integration tests
โโโ pixi.toml # Pixi dependencies and tasks
โ โโโ pixi.lock # Dependency lock file
- Pixi (v0.55.0+): Installation Guide
- Docker: For running the ETL pipeline
- Google Cloud credentials: For Google Sheets access (optional)
# Clone the repository
git clone https://github.com/sustainability-software-lab/ca-biositing.git
cd ca-biositing
# Install dependencies with Pixi
pixi install
# Install pre-commit hooks
pixi run pre-commit-installNote: Before starting the services for the first time, create the required environment file from the template:
cp resources/docker/.env.example resources/docker/.envCRITICAL (PostgreSQL 15 Upgrade): If you are upgrading from a version prior to Feb 2026, you must wipe your local volumes to support the PostgreSQL 15 image:
pixi run teardown-services-volumesThen start and use the services:
# 1. Start all services (PostgreSQL, Prefect server, worker)
# This will also automatically apply any pending database migrations.
pixi run start-services
# 2. Deploy flows to Prefect
pixi run deploy
# 3. Run the ETL pipeline
pixi run run-etl
# Monitor via Prefect UI: http://localhost:4200
# To apply new migrations after the initial setup
pixi run migrate
# Stop services
pixi run teardown-servicesSee resources/README.md for detailed pipeline
documentation.
# Start the web service
pixi run start-webservice
# Access API docs: http://localhost:8000/docspixi run qgisNote: On macOS, you may see a Python faulthandler error - this is expected and can be ignored. See QGIS Issue #52987.
# Run all tests
pixi run test
# Run tests with coverage
pixi run test-cov# Run pre-commit checks on staged files
pixi run pre-commit
# Run pre-commit on all files (before PR)
pixi run pre-commit-allView all available tasks:
pixi task listKey tasks:
- Service Management:
start-services,teardown-services,service-status - ETL Operations:
deploy,run-etl - Development:
test,test-cov,pre-commit,pre-commit-all - Applications:
start-webservice,qgis - Database:
access-db,check-db-health - Schema Management:
migrate,migrate-autogenerate,refresh-views - Validation (pgschema):
schema-plan,schema-analytics-plan,schema-dump,schema-analytics-list
This project uses PEP 420 namespace packages to organize code into independently installable components that share a common namespace:
- Each component has its own
pyproject.tomland can be installed separately - Shared models in
datamodelsare used by bothpipelineandwebservice - Clear separation of concerns while maintaining type consistency
The ETL pipeline uses:
- Prefect: Workflow orchestration and monitoring
- Docker: Containerized execution environment
- PostgreSQL: Data persistence
- Google Sheets API: Primary data source
Pipeline architecture:
- Extract: Pull data from Google Sheets
- Transform: Clean and normalize data with pandas
- Load: Insert/update records in PostgreSQL via SQLAlchemy
Database models are hand-written SQLModel classes organized into 15 domain
subdirectories under
src/ca_biositing/datamodels/ca_biositing/datamodels/models/. All schema
changes are managed through Alembic migrations.
Development workflow:
- Edit SQLModel classes in
models/ - Auto-generate a migration:
pixi run migrate-autogenerate -m "Description" - Apply the migration:
pixi run migrate
SQLModel-based models provide:
- Type-safe database operations (SQLAlchemy + Pydantic in one class)
- Versioned schema migrations (via Alembic)
- Shared models across ETL and API components
- Built-in Pydantic validation
Seven materialized views are defined in views.py and managed through Alembic
migrations. Refresh them after loading data with pixi run refresh-views.
Database models for:
- Biomass data (field samples, measurements)
- Geographic locations
- Experiments and analysis
- Metadata and samples
- Organizations and contacts
Documentation: datamodels/README.md
Prefect-orchestrated workflows for:
- Data extraction from Google Sheets
- Data transformation and validation
- Database loading and updates
- Lookup table management
Documentation: pipeline/README.md
Guides:
FastAPI REST API providing:
- Read access to database records
- Interactive API documentation (Swagger/OpenAPI)
- Type-safe endpoints using Pydantic
Documentation: webservice/README.md
Docker and Prefect configuration for:
- Service orchestration (Docker Compose)
- Prefect deployments
- Database initialization
Documentation: resources/README.md
# Add conda package to default environment
pixi add <package-name>
# Add PyPI package to default environment
pixi add --pypi <package-name>
# Add to specific feature (e.g., pipeline)
pixi add --feature pipeline --pypi <package-name>The pipeline dependencies are managed by Pixi's etl environment feature in
pixi.toml. When you add dependencies and rebuild Docker images, they are
automatically included:
# Add dependency to pipeline feature
pixi add --feature pipeline --pypi <package-name>
# Rebuild Docker images
pixi run rebuild-services
# Restart services
pixi run start-servicesThis project uses Pixi environments for different workflows:
default: General development, testing, pre-commit hooksgis: QGIS and geospatial analysis toolsetl: ETL pipeline (used in Docker containers)webservice: FastAPI web servicefrontend: Node.js/npm for frontend development
This repository now includes the Cal Bioscape Frontend as a Git submodule
located in the frontend/ directory.
When you first clone this repository, you can initialize and pull only the
frontend submodule with:
pixi run submodule-frontend-initThis project uses MkDocs Material for documentation.
You can preview the documentation locally using Pixi:
pixi install -e docs
pixi run -e docs docs-serveThen open your browser and go to:
http://127.0.0.1:8000
Most documentation should live in the relevant directories within the docs
folder.
When adding new pages to the documentation, make sure you update the
mkdocs.yml file
so they can be rendered on the website.
If you need to add documentation referencing a file that lives elsewhere in the repository, you'll need to do the following (this is an example, run from the package root directory)
# symlink the file to its destination
# Be sure to use relative paths here, otherwise it won't work!
ln -s ../../deployment/README.md docs/deployment/README.md
# stage your new file
git add docs/deployment/README.mdBe sure to preview the documentation to make sure it's accurate before submitting a PR.