Skip to content

[FEATURE] Add changelog query/summary#89

Merged
Dana Bauer (danabauer) merged 1 commit intomainfrom
feature/changelog
Mar 20, 2026
Merged

[FEATURE] Add changelog query/summary#89
Dana Bauer (danabauer) merged 1 commit intomainfrom
feature/changelog

Conversation

@atiannicelli
Copy link
Copy Markdown
Collaborator

PR 3 of 4: Changelog Querying

This PR adds changelog querying functionality using pyarrow instead of DuckDB.

New files:

  • overturemaps/changelog.py - Changelog functions using pyarrow:
    • query_changelog_ids() - Returns (added_ids, modified_ids, removed_ids) sets for a bbox
    • summarize_changelog() - Returns counts by theme/type/change_type (no bbox filter)
    • _get_changelog_files_from_stac() - STAC acceleration prep (future-ready)

Modified files:

  • overturemaps/models.py - Added:

    • ChangeType enum (added, modified, removed)
    • ChangeRecord dataclass
  • overturemaps/cli.py - Added:

    • changelog query --bbox=... [--theme=...] [--type=...] - Query changes in a bbox
    • changelog summary [--theme=...] [--type=...] - Aggregate statistics

CLI Examples:

# Query changes in a bbox
overturemaps changelog query --bbox=-97.8,30.2,-97.6,30.4 --theme=buildings

# Summary for all buildings types
overturemaps changelog summary --theme=buildings

# Summary for specific type
overturemaps changelog summary --type=building

Implementation:

  • ✅ Uses pyarrow.dataset to read Parquet from S3 (anonymous access)
  • ✅ Uses pyarrow.compute for filtering (spatial bbox + change_type)
  • ✅ Push-down predicates for efficiency
  • ✅ STAC acceleration prep (will auto-enable when STAC adds changelog)
  • ✅ NO DuckDB dependency
  • ✅ NO geopandas dependency

Tests:

  • tests/test_changelog.py - Integration tests for changelog functions

Dependencies:

Related PRs:

Key differences from PR #85:

  • Uses pyarrow instead of DuckDB for all data access
  • Removed DuckDB SQL queries, replaced with pyarrow dataset API
  • Simpler, more direct approach following jwass's feedback

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds changelog querying and aggregation capabilities backed by pyarrow.dataset (reading Parquet directly from S3) and exposes the functionality via new overturemaps changelog CLI commands.

Changes:

  • Introduces overturemaps/changelog.py with query_changelog_ids() (bbox-based) and summarize_changelog() (global aggregation).
  • Extends models with ChangeType and ChangeRecord.
  • Adds changelog query / changelog summary CLI commands and integration tests for changelog behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
overturemaps/changelog.py New pyarrow-based changelog querying + summarization utilities (plus STAC prep).
overturemaps/cli.py Adds changelog CLI group and commands; wires in changelog module functions.
overturemaps/models.py Adds changelog-related enum/dataclass and StrEnum compatibility shim usage.
overturemaps/releases.py Refactors to use STAC-derived release listing and adds get_latest_release().
tests/test_changelog.py Adds integration tests covering changelog query + summary behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread overturemaps/cli.py Outdated
Comment thread overturemaps/cli.py
Comment thread overturemaps/cli.py Outdated
Comment thread overturemaps/changelog.py
Comment thread overturemaps/changelog.py
Comment thread overturemaps/changelog.py
Comment thread overturemaps/changelog.py Outdated
Comment thread overturemaps/models.py
Comment thread overturemaps/releases.py
Comment thread overturemaps/changelog.py
@atiannicelli Alex Iannicelli (atiannicelli) marked this pull request as ready for review March 11, 2026 14:12
@atiannicelli Alex Iannicelli (atiannicelli) requested a review from a team as a code owner March 11, 2026 14:12
Copy link
Copy Markdown

@connieksun Connie Sun (connieksun) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a couple small comments. This is great functionality!

Comment thread overturemaps/changelog.py Outdated
Comment thread overturemaps/changelog.py Outdated
Comment thread overturemaps/models.py Outdated
Signed-off-by: Alex Iannicelli <atiannicelli@gmail.com>
@atiannicelli
Copy link
Copy Markdown
Collaborator Author

Ready to merge when I get a review.

@danabauer Dana Bauer (danabauer) merged commit 3ca99bc into main Mar 20, 2026
9 checks passed
@danabauer Dana Bauer (danabauer) deleted the feature/changelog branch March 20, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

I wish that I could query the changelog using overturemaps-py

4 participants