Skip to content

techequitycollaborative/landscape-etl

Repository files navigation

Data Work Landscape: ETL

This repository contains scripts for ETL processes for the Data Work Landscape data set.

Extract

The data set is extracted from Google Sheets into Postgres on the TE droplet.

Transform

The data set is transformed according to a set of rules in order to be prepared for public consumption.

Load

The data set is primed to be loaded into the data work microsite (separate repository here) or any other source.

Running the ETL process

To run the ETL scripts, cd into the landscape-etl repo on your local computer and run the following command:

docker compose up --build

This will launch the docker container, which will run entrypoint.sh. This script runs etl.py and create_view.sql. So after the data is loaded to the database, a view is immediately created with the final data to display on the microsite.

When the process is run, it triggers a notification to a Slack bot that the database was updated.

Editing microsite data

To edit or add COLUMNS for display on the microsite, update create_view.sql. This is the table that the microsite pulls from directly.

To edit or add ROWS, just edit the Google Spreadsheet and re-run the docker command above.

About

ETL scripts to push DWL data from Google sheets to Postgres.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors