Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 257 additions & 0 deletions docs/usecases/02-vegetation-change.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--\n",
" Licensed to the Apache Software Foundation (ASF) under one\n",
" or more contributor license agreements. See the NOTICE file\n",
" distributed with this work for additional information\n",
" regarding copyright ownership. The ASF licenses this file\n",
" to you under the Apache License, Version 2.0 (the\n",
" \"License\"); you may not use this file except in compliance\n",
" with the License. You may obtain a copy of the License at\n",
"\n",
" http://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
" Unless required by applicable law or agreed to in writing,\n",
" software distributed under the License is distributed on an\n",
" \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
" KIND, either express or implied. See the License for the\n",
" specific language governing permissions and limitations\n",
" under the License.\n",
"-->\n",
"\n",
"# Did this farmland green up?\n",
"\n",
"A complete raster pipeline. We answer:\n",
"\n",
"> **Between two satellite scenes a season apart, which farm parcels in this AOI greened up the most?**\n",
"\n",
"End-to-end on Sedona's raster surface: load two GeoTIFFs through the new tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take their difference for a \"greening\" delta raster, run `RS_ZonalStats` per parcel, rank, clip the delta raster to the top parcel with `RS_Clip`, round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized GeoTIFF, and visualize the four stages as a single matplotlib figure.\n",
"\n",
"The two source scenes are **synthesized** at the top of the notebook (256 \u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully reproducible and ships no new bytes. The same code path runs unchanged against real Sentinel-2 chips \u2014 only the raster filenames change.\n",
"\n",
"**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook will fail on older versions of the docker image."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Connect to Spark through SedonaContext"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sedona.spark import SedonaContext\n",
"\n",
"config = SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
"sedona = SedonaContext.create(config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Synthesize two scenes a season apart\n",
"\n",
"We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and near-infrared reflectance over the same AOI. The \"before\" scene is mostly bare; the \"after\" scene has a circular field of vegetation in the south-west corner with elevated NIR. Pixel values are scaled to the same 0-10000 reflectance range as Sentinel-2 surface-reflectance products so the NDVI math is identical to the real-world case."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "import os\n\nimport numpy as np\nimport rasterio\nfrom rasterio.transform import from_bounds\n\nWORK = \"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50, -91.00, 41.60) # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform = from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs = np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2 # circular field\n\n\ndef synth_scene(green_strength):\n red = (1500 + 200 * rng.standard_normal((H, W))).clip(0, 10000)\n nir = (1800 + 200 * rng.standard_normal((H, W))).clip(0, 10000)\n nir = np.where(field_mask, nir + green_strength, nir)\n return red.astype(\"uint16\"), nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200), (\"after\", 4000)]:\n red, nir = synth_scene(strength)\n with rasterio.open(\n f\"{WORK}/scene_{name}.tif\",\n \"w\",\n driver=\"GTiff\",\n tiled=True,\n blockxsize=256,\n blockysize=256,\n height=H,\n width=W,\n count=2,\n dtype=\"uint16\",\n crs=\"EPSG:4326\",\n transform=transform,\n ) as dst:\n dst.write(red, 1)\n dst.write(nir, 2)\n dst.set_band_description(1, \"red\")\n dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\", f\"{WORK}/scene_after.tif\"):\n print(f\"{f}: {os.path.getsize(f)} bytes\")"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## 3. Load both scenes with the `raster` data source\n\n`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and yields one `Raster`-typed row per tile. Each row carries `(x, y)` tile-index columns \u2014 keep them; we'll join on those when computing \u0394NDVI so the same query works for single-tile inputs (as here) and for multi-gigabyte rasters that yield thousands of tiles per scene."
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "scenes = (\n sedona.read.format(\"raster\")\n .load(f\"{WORK}/scene_*.tif\")\n .selectExpr(\"split(name, '\\\\\\\\.')[0] as scene\", \"x\", \"y\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Compute NDVI per scene with `RS_MapAlgebra`\n",
"\n",
"`RS_MapAlgebra(raster, pixelType, script)` runs a single-raster pixel script. The script syntax (`out[0] = ...; rast[i]` indexing) is documented [here](../api/sql/Raster-map-algebra.md). We coerce the result to `D` (double) so the negative side of NDVI survives."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "scenes.createOrReplaceTempView(\"scenes\")\n\nndvi = sedona.sql(\"\"\"\n SELECT scene, x, y,\n RS_MapAlgebra(\n rast, 'D',\n 'out[0] = (rast[1] - rast[0]) / (rast[1] + rast[0] + 0.000001);'\n ) AS rast\n FROM scenes\n\"\"\")\nndvi.cache()\nndvi.createOrReplaceTempView(\"ndvi\")\nndvi.selectExpr(\"scene\", \"x\", \"y\", \"RS_BandPixelType(rast, 1) as dtype\").show()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## 5. Compute the greening delta raster\n\nTwo-raster `RS_MapAlgebra(rast0, rast1, pixelType, script, noDataValue)` subtracts the before-NDVI from the after-NDVI in a single pass. We join the two NDVI dataframes on the tile coordinates `(x, y)` so the same query works whether each scene produced a single tile (as here) or thousands."
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "delta = sedona.sql(\"\"\"\n SELECT a.x, a.y,\n RS_MapAlgebra(\n a.rast, b.rast, 'D',\n 'out[0] = rast0[0] - rast1[0];',\n -9999.0\n ) AS rast\n FROM (SELECT x, y, rast FROM ndvi WHERE scene = 'scene_after') a\n JOIN (SELECT x, y, rast FROM ndvi WHERE scene = 'scene_before') b\n ON a.x = b.x AND a.y = b.y\n\"\"\")\ndelta.cache()\ndelta.createOrReplaceTempView(\"delta\")\ndelta.selectExpr(\n \"x\",\n \"y\",\n \"RS_BandPixelType(rast, 1) as dtype\",\n \"RS_Width(rast) as w\",\n \"RS_Height(rast) as h\",\n).show()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## 6. Synthesize parcel polygons and run `RS_ZonalStats`\n\nBuild a 4\u00d74 grid of square parcels covering the AOI and compute the mean \u0394NDVI per parcel. `RS_ZonalStats(raster, zone, statType)` is the canonical raster\u2192vector aggregation: every pixel inside `zone` contributes to the statistic.\n\nFor tiled inputs, the cross-join below produces one row per `(parcel, tile)`; with our single-tile delta it collapses to one row per parcel. To handle truly tiled inputs robustly you would compute `RS_ZonalStats(rast, geom, 'sum')` and `'count'` per tile and aggregate `SUM(sum) / SUM(count)` per parcel \u2014 same idiom, one extra `GROUP BY`."
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.sql import Row\n",
"\n",
"xmin, ymin, xmax, ymax = AOI\n",
"step_x = (xmax - xmin) / 4\n",
"step_y = (ymax - ymin) / 4\n",
"parcel_rows = []\n",
"for ix in range(4):\n",
" for iy in range(4):\n",
" x0, x1 = xmin + ix * step_x, xmin + (ix + 1) * step_x\n",
" y0, y1 = ymin + iy * step_y, ymin + (iy + 1) * step_y\n",
" wkt = f\"POLYGON(({x0} {y0}, {x1} {y0}, {x1} {y1}, {x0} {y1}, {x0} {y0}))\"\n",
" parcel_rows.append(Row(parcel_id=f\"P{ix}{iy}\", wkt=wkt))\n",
"\n",
"parcels = sedona.createDataFrame(parcel_rows).selectExpr(\n",
" \"parcel_id\", \"ST_GeomFromText(wkt) as geom\"\n",
")\n",
"parcels.createOrReplaceTempView(\"parcels\")\n",
"\n",
"ranked = sedona.sql(\"\"\"\n",
" SELECT p.parcel_id,\n",
" ROUND(RS_ZonalStats(d.rast, p.geom, 'mean'), 4) AS mean_delta_ndvi\n",
" FROM parcels p, delta d\n",
" ORDER BY mean_delta_ndvi DESC\n",
"\"\"\")\n",
"ranked.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Clip the delta raster to the top-greening parcel\n",
"\n",
"`RS_Clip(raster, band, geom)` crops the raster to the polygon's extent. We pick the parcel with the highest mean \u0394NDVI and zoom in on its delta values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"top_id = ranked.first()[\"parcel_id\"]\n",
"print(f\"top-greening parcel: {top_id}\")\n",
"\n",
"clipped = sedona.sql(f\"\"\"\n",
" SELECT RS_Clip(d.rast, 1, p.geom) AS rast\n",
" FROM delta d, parcels p\n",
" WHERE p.parcel_id = '{top_id}'\n",
"\"\"\")\n",
"clipped.cache()\n",
"clipped.createOrReplaceTempView(\"clipped\")\n",
"clipped.selectExpr(\n",
" \"RS_Width(rast) as w\",\n",
" \"RS_Height(rast) as h\",\n",
").show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Round-trip through `RS_AsCOG`\n",
"\n",
"`RS_AsCOG` returns the raster as a binary Cloud-Optimized GeoTIFF byte array. We persist it to disk, then read it back with the same `raster` data source we used to load the input \u2014 proving the output is a valid GeoTIFF that downstream tools can stream from object storage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "cog_bytes = clipped.selectExpr(\"RS_AsCOG(rast) as cog\").first()[\"cog\"]\nwith open(f\"{WORK}/delta_topparcel_cog.tif\", \"wb\") as fh:\n fh.write(cog_bytes)\nprint(f\"COG size: {len(cog_bytes):,} bytes\")\n\nroundtrip = sedona.read.format(\"raster\").load(f\"{WORK}/delta_topparcel_cog.tif\")\nroundtrip.selectExpr(\n \"RS_Width(rast) as w\",\n \"RS_Height(rast) as h\",\n \"RS_BandPixelType(rast, 1) as dtype\",\n).show()"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. Visualize the pipeline as a 4-panel figure\n",
"\n",
"Read the rasters back via `rasterio` and lay them out side by side: NDVI before, NDVI after, \u0394NDVI across the full AOI with parcel boundaries overlaid, and the top-parcel close-up."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "import matplotlib.patches as mpatches\nimport matplotlib.pyplot as plt\n\n\ndef ndvi_arr(path):\n with rasterio.open(path) as ds:\n red = ds.read(1).astype(\"float32\")\n nir = ds.read(2).astype(\"float32\")\n return (nir - red) / (nir + red + 1e-6)\n\n\nndvi_before = ndvi_arr(f\"{WORK}/scene_before.tif\")\nndvi_after = ndvi_arr(f\"{WORK}/scene_after.tif\")\ndelta_arr = ndvi_after - ndvi_before\nwith rasterio.open(f\"{WORK}/delta_topparcel_cog.tif\") as ds:\n top_arr = ds.read(1)\n top_extent = (ds.bounds.left, ds.bounds.right, ds.bounds.bottom, ds.bounds.top)\n\nfig, axes = plt.subplots(1, 4, figsize=(16, 4))\nextent = (AOI[0], AOI[2], AOI[1], AOI[3])\naxes[0].imshow(ndvi_before, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[0].set_title(\"NDVI before\")\naxes[1].imshow(ndvi_after, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[1].set_title(\"NDVI after\")\naxes[2].imshow(delta_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\", extent=extent)\naxes[2].set_title(\"\u0394NDVI (after \u2212 before) with parcel grid\")\nxmin, ymin, xmax, ymax = AOI\nstep_x = (xmax - xmin) / 4\nstep_y = (ymax - ymin) / 4\nfor ix in range(4):\n for iy in range(4):\n x0 = xmin + ix * step_x\n y0 = ymin + iy * step_y\n axes[2].add_patch(\n mpatches.Rectangle(\n (x0, y0), step_x, step_y, fill=False, edgecolor=\"black\", linewidth=0.6\n )\n )\n axes[2].text(\n x0 + step_x / 2,\n y0 + step_y / 2,\n f\"P{ix}{iy}\",\n ha=\"center\",\n va=\"center\",\n fontsize=7,\n color=\"black\",\n )\naxes[3].imshow(top_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\", extent=top_extent)\naxes[3].set_title(f\"Top parcel ({top_id}) \u0394NDVI\")\nfor ax in axes:\n ax.set_xticks([])\n ax.set_yticks([])\nfig.tight_layout()\nfig"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What just happened?\n",
"\n",
"We turned two synthetic Sentinel-2-style scenes into a ranked parcel-level greening report in nine steps:\n",
"\n",
"1. Synthesized two 256\u00d7256 red+NIR GeoTIFFs in `/tmp`.\n",
"2. Loaded both with `sedona.read.format(\"raster\")` \u2014 the new auto-tiling reader handles GeoTIFFs of any size with the same call shape.\n",
"3. Computed per-pixel NDVI per scene with single-raster `RS_MapAlgebra`.\n",
"4. Subtracted the two NDVI rasters with two-raster `RS_MapAlgebra` to get a \u0394NDVI \"greening\" layer.\n",
"5. Synthesized a 4\u00d74 parcel grid and ranked parcels by mean \u0394NDVI via `RS_ZonalStats`.\n",
"6. Clipped the delta raster to the top-greening parcel with `RS_Clip`.\n",
"7. Encoded the result as a Cloud-Optimized GeoTIFF with `RS_AsCOG` and read it back with the `raster` data source \u2014 proving the output is valid for cloud-hosted streaming.\n",
"8. Plotted the four pipeline stages side by side.\n",
"\n",
"Swap `/tmp/scene_*.tif` for a glob over real Sentinel-2 chips and the same SQL runs on a cluster against terabytes of imagery."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading