Ar migrate storage builds drafts by alexrichey · Pull Request #2175 · NYCPlanning/data-engineering

alexrichey · 2026-01-14T19:07:57Z

Re-implements the Connectors for builds, drafts, published and gis, but doesn't swap them out just yet. You can read commit-by-commit for localized/grouped changes (e.g. just changes to builds functionality) as well as explanations for the smaller changes in the commit messages. (There's some 🌶️ content in the commit bodies)

Anyhow, the logic in published.py is basically distributed into files in a few different locations:

connectors/edm/{conn_type}.py: The pure connector interfaces for the types above
dcpy/lifecycle/builds/artifacts/{conn_type}.py: the business logic, including which connector to use, which files to upload, etc.
dcpy/lifecycle/builds/_cli.py: all the CLI commands

There will be some continuing commits where we implement this:

Swapping out the commands (PR here)
TODO: delete publishing.

@fvankrieken @damonmcc Here's where I could use some input:
Consider a command like publishing a draft. This command needs to

pull the draft (after determining some details about the draft itself)
push the draft to published

Ideally this code would live in a module a little bit "above" artifacts/drafts and artifacts/published, almost similar to how lifecycle/scripts lives "above" the various lifecycle stages.

So I'd love it if you could take a look at the files under dcpy/lifecycle/builds/artifacts/*, and consider these "glue" functions like promote and publish. In the CLI they fit really nicely under namespaced commands like
dcpy lc builds artifacts drafts publish -p nypl_libraries -v 24v1 -dn 2 -ip
which make sense, because we publishing is an action that we take on a draft. However, where this glue code should live... I could use some input.

codecov · 2026-01-18T02:55:56Z

Codecov Report

❌ Patch coverage is 83.88215% with 93 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.21%. Comparing base (6e5d0a3) to head (70ed00f).
⚠️ Report is 45 commits behind head on main.

Files with missing lines	Patch %	Lines
dcpy/lifecycle/builds/artifacts/published.py	82.07%	13 Missing and 6 partials ⚠️
dcpy/lifecycle/builds/artifacts/_cli.py	0.00%	17 Missing ⚠️
dcpy/lifecycle/builds/artifacts/drafts.py	75.40%	9 Missing and 6 partials ⚠️
dcpy/connectors/edm/gis.py	82.66%	7 Missing and 6 partials ⚠️
dcpy/connectors/edm/builds.py	85.52%	7 Missing and 4 partials ⚠️
dcpy/lifecycle/builds/artifacts/builds.py	89.55%	3 Missing and 4 partials ⚠️
dcpy/connectors/edm/drafts.py	94.31%	4 Missing and 1 partial ⚠️
dcpy/connectors/edm/published.py	95.83%	2 Missing and 1 partial ⚠️
dcpy/lifecycle/builds/_cli.py	0.00%	3 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
dcpy/configuration.py	`100.00% <100.00%> (ø)`
dcpy/connectors/hybrid_pathed_storage.py	`80.12% <ø> (-3.93%)`	⬇️
dcpy/lifecycle/builds/connector.py	`100.00% <100.00%> (ø)`
dcpy/lifecycle/config.py	`67.74% <ø> (ø)`
dcpy/lifecycle/connector_registry.py	`100.00% <100.00%> (ø)`
dcpy/connectors/edm/published.py	`95.83% <95.83%> (ø)`
dcpy/lifecycle/builds/_cli.py	`0.00% <0.00%> (ø)`
dcpy/connectors/edm/drafts.py	`94.31% <94.31%> (ø)`
dcpy/lifecycle/builds/artifacts/builds.py	`89.55% <89.55%> (ø)`
dcpy/connectors/edm/builds.py	`85.52% <85.52%> (ø)`
... and 4 more

... and 18 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This was taking a long time. I think there's absolutely utility in running a scraper against BYTES every week or so, but this is just too much

Having to re-populate my .env after an *unfortunate* incident (see two commits ahead) prompted this. It's a little out of date.

This was... maybe the worst two lines of code I have every written. If you used this connector to attempt to download a file to, say, your data-engineering/ directory, it would wipe the directory

fvankrieken · 2026-02-03T15:00:31Z

dcpy/connectors/hybrid_pathed_storage.py

-            if destination_path.exists():
-                shutil.rmtree(destination_path)


Omg. I'm sure I'm guilty of this too somewhere

dcpy/connectors/edm/builds.py

fvankrieken · 2026-02-03T15:10:39Z

dcpy/connectors/edm/builds.py

+    def data_local_sub_path(self, key: str, *, version: str, **kwargs) -> Path:
+        return Path("edm") / "builds" / "datasets" / key / version


Bit of a nit, but this would presumably be so someone else (i.e. the lifecycle code pulling this) knows where to put it right? Can we not do "datasets" then? builds/datasets makes me think of what's going into a build. Even if it's "data_products".

Or would it just make sense to have this match what's in DO? as in, key / builds / version?

I need to review how this even works, honestly... The local_sub_path stuff seems really misplaced. Would honestly like to refactor it out if possible. For now, I'll just leave it be

fvankrieken · 2026-02-03T15:28:24Z

dcpy/test_integration/lifecycle/builds/test_lifecycle_e2e.py

+
+
+@pytest.mark.incremental
+class TestLifecycleE2E:


fvankrieken

I do worry a little about how much boiler plate there currently is - the three connectors have a LOT of shared material, and I wonder it it'd be simpler to keep them a little closer to the storage - really just be configurable by foler (builds) and a string key (25v1/2-fixed-lots) and leave the rest of the logic - dataclasses, etc - to lifecycle code.

That said, this is a big step in getting us to any desired state. So more just food for thought, and curious how you've thought about that as this PR evolved.

Re the cli, I think I still have my opinion from before - I like to think of the ingest modules and submodules as stages, and "review" or something like that is more of the stage we're in when we're moving around these artifacts than "artifacts". That being said, especially with what I said above about how maybe more of the "draftkey" etc logic should live in lifecycle and not connectors, it doesn't hurt to have these dedicated files more coupled explicitly to the artifacts themselves.

But maybe there's room for both? Leave the code where it is, but have a cli that's more verb/stage focused?

alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch 18 times, most recently from 3ccfd61 to 232ee0d Compare January 18, 2026 02:48

alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch 7 times, most recently from e6d700f to 5d2328a Compare January 20, 2026 22:35

alexrichey assigned damonmcc and fvankrieken Jan 21, 2026

alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch from 34f6ef9 to 6180d94 Compare January 21, 2026 21:12

Comment out bytes test for now

388aa4c

This was taking a long time. I think there's absolutely utility in running a scraper against BYTES every week or so, but this is just too much

alexrichey added 11 commits January 21, 2026 16:17

Update sample .env

40d5791

Having to re-populate my .env after an *unfortunate* incident (see two commits ahead) prompted this. It's a little out of date.

Add pub root folder to config

2c669ec

DANGER ZONE! remove rmtree from hps

b1ae54f

This was... maybe the worst two lines of code I have every written. If you used this connector to attempt to download a file to, say, your data-engineering/ directory, it would wipe the directory

Configure the connectors in lifecycle builds

ebf3161

Add builds conn

3fe4439

Add drafts conn

3815ca2

Add published conn

296a88f

Add GIS conn

59cdffa

Add new connectors to the registry

4e8b20e

Add a CLI in lifecycle

74bd2a0

Add e2e test

8258bb3

alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch from 6180d94 to 8258bb3 Compare January 21, 2026 21:18

alexrichey marked this pull request as ready for review January 21, 2026 21:18

fvankrieken reviewed Feb 3, 2026

View reviewed changes

dcpy/connectors/edm/builds.py Outdated Show resolved Hide resolved

fvankrieken reviewed Feb 3, 2026

View reviewed changes

dcpy/connectors/edm/builds.py Show resolved Hide resolved

fvankrieken reviewed Feb 3, 2026

View reviewed changes

dcpy/test_integration/lifecycle/builds/test_lifecycle_e2e.py

@pytest.mark.incremental

class TestLifecycleE2E:

Copy link

Contributor

fvankrieken Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

fvankrieken approved these changes Feb 3, 2026

View reviewed changes

post-review: remove slop-comments

70ed00f

alexrichey merged commit 3314ca3 into main Feb 6, 2026
25 checks passed

alexrichey deleted the ar-migrate-storage-builds-drafts-2 branch February 6, 2026 21:05

damonmcc mentioned this pull request Feb 9, 2026

Azure Storage Migration: Refactor remaining S3 (Publishing, etc.) connectors to HybridPathedStorage #1987

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ar migrate storage builds drafts#2175

Ar migrate storage builds drafts#2175
alexrichey merged 13 commits intomainfrom
ar-migrate-storage-builds-drafts-2

alexrichey commented Jan 14, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 18, 2026 •

edited

Loading

Uh oh!

fvankrieken Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

fvankrieken Feb 3, 2026

Uh oh!

alexrichey Feb 6, 2026

Uh oh!

fvankrieken Feb 3, 2026

Uh oh!

fvankrieken left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if destination_path.exists():
		shutil.rmtree(destination_path)

		def data_local_sub_path(self, key: str, , version: str, *kwargs) -> Path:
		return Path("edm") / "builds" / "datasets" / key / version

Conversation

alexrichey commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fvankrieken Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fvankrieken Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

alexrichey Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

fvankrieken Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

fvankrieken left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexrichey commented Jan 14, 2026 •

edited

Loading

codecov bot commented Jan 18, 2026 •

edited

Loading