Skip to content

BYO-Polygon & downscaled CMIP6 zonal stats#662

Draft
Joshdpaul wants to merge 6 commits intomainfrom
byo_polygon
Draft

BYO-Polygon & downscaled CMIP6 zonal stats#662
Joshdpaul wants to merge 6 commits intomainfrom
byo_polygon

Conversation

@Joshdpaul
Copy link
Contributor

This PR implements a BYO-Polygon method and also implements zonal stats for the downscaled CMIP6 data. These projects were combined here to demonstrate how the BYO-Polygon could be achieved, and then test it without messing with any currently working area queries.

BYO-Polygon

How it works

The changes in application.py and the new upload_polygon.py route have all the important parts - the rest of the changes are mostly helper functions. Basically what we are doing is defining an application-level shared dictionary (app.uploaded_polygons) and a threading lock (app.store_lock) to protect that dictionary from concurrent access issues. When we read or write from the dictionary, we do it inside a with block that allows only one thread to access the dictionary at a time:

with app.store_lock:
        app.uploaded_polygons["my key"]

This momentarily blocks concurrent access, but is released immediately when exiting the with block. Note that we need to use: from flask import ... current_app in every route where we want to access this dictionary, in order to share the in-memory store between routes.

The idea is up upload the polygon info into this shared dictionary, where all routes can use it. Eventually, the user will have some kind of form input that will let them browse to their file and upload. There are a lot of ways this could be structured, but for now I focused on testing a general upload method via curl that provides the user with a token that can potentially be used in any area query, just like any other place ID. We will test this using curl commands to mimic the request that will someday come from the form input.

Create a test polygon

The easiest way to do this is by going to https://geojson.io and creating a polygon, then saving it as a shapefile. The file will automatically be zipped, and we will upload at zip file into our flask app.

Use this tool to create the polygon

image

Then save it as shapefile

image

Start the app and upload polygon

Start the API as usual, then in a separate terminal use the following command to upload your zipped shapefile. Optionally, you can name the polygon.

curl -F "file=@/Users/joshpaul/Desktop/test_poly/test_poly.zip" -F "name=my custom polygon" http://127.0.0.1:5000/upload_polygon

This should return some JSON, where you get a unique polygon ID representing your polygon and some other info:

{
  "expires_at": "2025-11-04T23:39:43.943659+00:00",
  "name": "my custom polygon",
  "polygon_id": "63_S3ues",
  "uploaded_at": "2025-11-04T22:39:43.943644+00:00"
}

Use the unique polygon ID to get zonal stats

You can now use your polygon to get zonal stats, by just substituting it where you would normally place one of our place IDs. For example, this is the polygon shown in the screenshot above:

http://127.0.0.1:5000/cmip6_downscaled/area/63_S3ues?vars=tasmax&models=6ModelAvg

The HUC10 for Chena Slough should give pretty similar results:

http://127.0.0.1:5000/cmip6_downscaled/area/190803060904?vars=tasmax&models=6ModelAvg

Expiration of the polygon is not handled yet, but we probably want to find some method of removing unused polygon info from the shared dictionary after a certain amount of time. It would be nice to persist the polygon info for a while, so the user can make multiple data requests from different endpoints without having to upload the polygon multiple times.

Zonal stats

The zonal stats implementation here is nothing fancy - it uses the vectorized method that is currently used by the ERA5 WRF and the CMIP6 FWI endpoints. There is probably some redundant code here between the point and area queries, but in the interest of testing the BYOP method, I haven't tried to trim this down yet.

Note that CSV output is not yet implemented - we need to find a way to sneak the user provided polygon name into the title of the CSV output. Right now, dropping this custom name in the place_id parameter of the CSV function will cause it to fail.

@Joshdpaul Joshdpaul requested a review from charparr November 4, 2025 23:27
@Joshdpaul
Copy link
Contributor Author

If our multiple instances of our API app are being deployed simultaneously, the the shared in-memory dictionary created by threading.Lock() will not be available across all instances.

In that case we need to consider using Redis or a similar distributed in-memory data store designed for this sort of thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant