Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/workflows/ceqr_schools.yml
Copy link
Member Author

@damonmcc damonmcc Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I don't have to run a dev container to run builds that use geosupport, modeled on the in-use ceqr_dep_monthly.yml action. couldn't test this with workflow_dispatch untill it's merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I'm totally cool with us merging empty templates to main without review, then you can populate it in your PR and actually test it.

Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: CEQR - Schools
on:
workflow_dispatch:
inputs:
dataset:
description: "Dataset to build"
type: choice
required: true
options:
- sca_capacity_projects
- sca_e_projections_by_boro
- sca_e_projections_by_sd
- ceqr_school_buildings

jobs:
build:
runs-on: ubuntu-22.04
defaults:
run:
shell: bash
working-directory: products/ceqr/ceqr_app
container:
image: nycplanning/build-geosupport:${{ inputs.image_tag || 'latest' }}
steps:
- uses: actions/checkout@v4

- name: Load Secrets
uses: 1password/load-secrets-action@v1
with:
export-env: true
env:
OP_SERVICE_ACCOUNT_TOKEN: ${{ secrets.OP_SERVICE_ACCOUNT_TOKEN }}
BUILD_ENGINE_SERVER: "op://Data Engineering/EDM_DATA/server_url"
AWS_S3_ENDPOINT: "op://Data Engineering/DO_keys/AWS_S3_ENDPOINT"
AWS_SECRET_ACCESS_KEY: "op://Data Engineering/DO_keys/AWS_SECRET_ACCESS_KEY"
AWS_ACCESS_KEY_ID: "op://Data Engineering/DO_keys/AWS_ACCESS_KEY_ID"

- name: Setup build environment
working-directory: ./
run: ./bash/docker_container_setup.sh

- name: Run recipe
run: |
export RECIPE_EGNINE=$BUILD_ENGINE_SERVER/recipe
export EDM_DATA=$BUILD_ENGINE_SERVER/defaultdb
./ceqr run recipe ${{ inputs.dataset }}
2 changes: 1 addition & 1 deletion dcpy/library/archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def __call__(
Parameters
----------
path: path to the configutation file
output_format: currently supported formats: `'csv'`, `'geojson'`, `'shapefile'`, `'postgres'`
output_format: see ingest.Ingestor translator methods for currently supported formats`
push: if `True` then push to s3
clean: if `True`, the temporary files created under `.library` will be removed
latest: if `True` then tag this current version we are processing to be the `latest`
Expand Down
2 changes: 1 addition & 1 deletion dcpy/library/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
@app.command()
def archive(
path: str = typer.Option(None, "--path", "-f", help="Path to config yml"),
output_formats: list[str] = typer.Option(["pgdump", "parquet", "csv"], "--output-format", "-o", help="csv, geojson, shapefile, pgdump and parquet"),
output_formats: list[str] = typer.Option(["pgdump", "parquet", "csv", "shapefile", "postgres"], "--output-format", "-o", help="csv, geojson, shapefile, pgdump and parquet"),
push: bool = typer.Option(False, "--s3", "-s", help="Push to s3"),
clean: bool = typer.Option(False, "--clean", "-c", help="Remove temporary files"),
latest: bool = typer.Option(False, "--latest", "-l", help="Tag with latest"),
Expand Down
3 changes: 2 additions & 1 deletion dcpy/library/ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def format_field_names(
else:
geom_clause = ""
query = f"""SELECT\n\t{select}{geom_clause}\nFROM {layer_name}"""
print(query)
print(f"Formatting field names in layer '{layer_name}' using SQL query:\n{query}")
if not sql:
return query
else:
Expand Down Expand Up @@ -191,6 +191,7 @@ def wrapper(self: Ingestor, *args, **kwargs) -> tuple[list[str], library.Config]
layerName = dataset.name

# Initiate vector translate
print("Initiating vector translate ...")
with Progress(
SpinnerColumn(spinner_name="earth"),
TextColumn("[progress.description]{task.description}"),
Expand Down
32 changes: 32 additions & 0 deletions dcpy/library/templates/doe_lcgms.yml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure the LCGMS we get form CAPS for CEQR schools is different that what we get from the DOE website and use in the doe_lcgms ingest template. need to use library to archive this to the recipe postgres DB anyway so it wasn't worth resolving to a single ingest template to build the CEQR schools datasets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give this a different id then? Feels unlikely that it would happen by accident, but we still shouldn't have two different ingest/library templates with the same dataset id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bumping this comment - other than that, good to go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planning to rename but I think I'll have to resolve some weirdness with how this is used in this build

pretty sure the build uses past versions of datasets stored in the edm-recipes DB, so maybe I'll just have to re-rename this when it's imported during the build (instead of recreated tables with the new name and changing the build code)?

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
dataset:
name: doe_lcgms
acl: public-read
source:
url:
path: s3://edm-recipes/inbox/sca/{{ version }}/doe_lcgms.csv
options:
- AUTODETECT_TYPE=NO
- EMPTY_STRING_AS_NULL=YES
geometry:
SRS: null
type: NONE

destination:
geometry:
SRS: null
type: NONE
options:
- OVERWRITE=YES
- PRECISION=NO
fields: []
sql: null

info:
description: |
Provided by DCP Capital Planning team as an excel file
with a name like "LCGMS_SchoolData".

This is only needed for the legacy CEQR schools dataset ceqr_school_buildings
and is different from the doe_lcgms ingest source data used in FacDB.
url: ""
dependents: []
5 changes: 4 additions & 1 deletion dcpy/library/templates/sca_bluebook.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ dataset:
info:
description: |
### NYC School Construction Authority - Capacity Projects in Progress
Provided by DCP Capital Planning team as an excel file. This is the SCA's “Enrollment, Capacity, Utilization Report,” known as the “Blue Book”.
Provided by DCP Capital Planning team as an excel file
with a name like "20XX - 20XX Blue Book" and a sheet name like "XX-XX by Org".

This is the SCA's “Enrollment, Capacity, Utilization Report,” known as the “Blue Book”.
url: ""
dependents: []
3 changes: 2 additions & 1 deletion dcpy/library/templates/sca_capacity_projects_current.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dataset:
info:
description: |
### NYC School Construction Authority - Capacity Projects in Progress
Provided by DCP Capital Planning team as an excel file.
Provided by DCP Capital Planning team as an excel file
with a name like "Section 6 Capacity Projects in Process".
url: ""
dependents: []
3 changes: 2 additions & 1 deletion dcpy/library/templates/sca_e_pct.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dataset:
info:
description: |
### NYC School Construction Authority - Enrollment Percentages by Zone
Provided by DCP Capital Planning team as an excel file.
Provided by DCP Capital Planning team as an excel file
with a name like "20XX ENROLLMENT _ by Zone".
url: ""
dependents: []
3 changes: 2 additions & 1 deletion dcpy/library/templates/sca_e_projections.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dataset:
info:
description: |
### NYC School Construction Authority - Enrollment Projections by Grade
Provided by DCP Capital Planning team as an excel file.
Provided by DCP Capital Planning team as an excel file
with a name like "20XX-20XX Enrollment Projection By Grade".
url: ""
dependents: []
5 changes: 3 additions & 2 deletions ingest_templates/doe_lcgms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ attributes:

ingestion:
source:
type: local_file
path: ./LCGMS_SchoolData.xls
type: s3
bucket: edm-recipes
key: inbox/doe/{{ version }}/LCGMS_SchoolData.xls
file_format:
type: html
kwargs:
Expand Down
37 changes: 37 additions & 0 deletions products/ceqr/ceqr_app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,43 @@ This then gets passed to the EDM production database using `create.sql`, where f

## Build instructions

> [!IMPORTANT]
> This codebase is currently only used to build CEQR Schools datasets which are distributed to the Capital Planning and Support (CAPS) team and used in DCP's Schools Model excel workbook. This section is focused on buildings those datasets.

All source data comes from the CAPS team and must be archived using `library` with the output as postgres. For example:

```bash
library archive --name sca_capacity_projects_current --version 20251120 --latest --output-format postgres --postgres-url $RECIPE_ENGINE
```

These are the four CEQR school datasets and their source data. See each source dataset's `library` template for details.

`sca_capacity_projects`

- `sca_capacity_projects_current`

`sca_e_projections_by_boro`

- `sca_e_projections`

`sca_e_projections_by_sd`

- `sca_e_pct`
- `sca_e_projections`

`ceqr_school_buildings`

- `doe_lcgms`
- `sca_bluebook`

Outputs must be distributed to S3 file storage at `edm-publishing/ceqr-app-data-staging/`. Each dataset has it's own folder and all versions in them. Versions are based on the day the build was run and the `latest` folder has the latest version.

### Diagram of legacy CEQR app data flow

![Diagram of legacy CEQR app data flow](/docs/diagrams/dataflow_ceqr.drawio.png)

## DEPRECATED BUILD NOTES

### To build using github (NYCPlanning Members Only)

Running a recipe using github actions is easy! Simply open an
Expand Down