Conversation
3ccfd61 to
232ee0d
Compare
e6d700f to
5d2328a
Compare
34f6ef9 to
6180d94
Compare
This was taking a long time. I think there's absolutely utility in running a scraper against BYTES every week or so, but this is just too much
Having to re-populate my .env after an *unfortunate* incident (see two commits ahead) prompted this. It's a little out of date.
This was... maybe the worst two lines of code I have every written. If you used this connector to attempt to download a file to, say, your data-engineering/ directory, it would wipe the directory
6180d94 to
8258bb3
Compare
| if destination_path.exists(): | ||
| shutil.rmtree(destination_path) |
There was a problem hiding this comment.
Omg. I'm sure I'm guilty of this too somewhere
| def data_local_sub_path(self, key: str, *, version: str, **kwargs) -> Path: | ||
| return Path("edm") / "builds" / "datasets" / key / version |
There was a problem hiding this comment.
Bit of a nit, but this would presumably be so someone else (i.e. the lifecycle code pulling this) knows where to put it right? Can we not do "datasets" then? builds/datasets makes me think of what's going into a build. Even if it's "data_products".
Or would it just make sense to have this match what's in DO? as in, key / builds / version?
There was a problem hiding this comment.
I need to review how this even works, honestly... The local_sub_path stuff seems really misplaced. Would honestly like to refactor it out if possible. For now, I'll just leave it be
|
|
||
|
|
||
| @pytest.mark.incremental | ||
| class TestLifecycleE2E: |
fvankrieken
left a comment
There was a problem hiding this comment.
I do worry a little about how much boiler plate there currently is - the three connectors have a LOT of shared material, and I wonder it it'd be simpler to keep them a little closer to the storage - really just be configurable by foler (builds) and a string key (25v1/2-fixed-lots) and leave the rest of the logic - dataclasses, etc - to lifecycle code.
That said, this is a big step in getting us to any desired state. So more just food for thought, and curious how you've thought about that as this PR evolved.
Re the cli, I think I still have my opinion from before - I like to think of the ingest modules and submodules as stages, and "review" or something like that is more of the stage we're in when we're moving around these artifacts than "artifacts". That being said, especially with what I said above about how maybe more of the "draftkey" etc logic should live in lifecycle and not connectors, it doesn't hurt to have these dedicated files more coupled explicitly to the artifacts themselves.
But maybe there's room for both? Leave the code where it is, but have a cli that's more verb/stage focused?
Re-implements the Connectors for builds, drafts, published and gis, but doesn't swap them out just yet. You can read commit-by-commit for localized/grouped changes (e.g. just changes to builds functionality) as well as explanations for the smaller changes in the commit messages. (There's some 🌶️ content in the commit bodies)
Anyhow, the logic in
published.pyis basically distributed into files in a few different locations:There will be some continuing commits where we implement this:
@fvankrieken @damonmcc Here's where I could use some input:
Consider a command like publishing a draft. This command needs to
Ideally this code would live in a module a little bit "above" artifacts/drafts and artifacts/published, almost similar to how lifecycle/scripts lives "above" the various lifecycle stages.
So I'd love it if you could take a look at the files under dcpy/lifecycle/builds/artifacts/*, and consider these "glue" functions like promote and publish. In the CLI they fit really nicely under namespaced commands like
dcpy lc builds artifacts drafts publish -p nypl_libraries -v 24v1 -dn 2 -ipwhich make sense, because we
publishingis an action that we take on a draft. However, where this glue code should live... I could use some input.