Conversation
Why these changes are being introduced: * File names need to be redacted for all data points How this addresses that need: * Replace file name and file path with extension and bagname where they appear * Update uuid parsing from S3 key * Add loading message indicating whether S3 inventory data was found for a given date * Update dependencies Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1734
ghukill
left a comment
There was a problem hiding this comment.
Conceptually, looks good to me!
I did leave some comments about a possible inversion and use of mo.stop(). And, some ponderings about stopping other cells from firing.
| # Verify S3 Inventory data is available for the selected date | ||
| logger.info(f"Collecting parquet file URIs for date: {selected_date}") | ||
|
|
||
| if selected_date not in symlink_dict: | ||
| date_message = ( | ||
| f"No S3 Inventory data found for {selected_date}, select a different date" | ||
| ) | ||
| else: | ||
| date_message = f"S3 Inventory data found for {selected_date}, loading data tables and visualizations..." | ||
| mo.md(date_message) | ||
| return |
There was a problem hiding this comment.
This was interesting! I had to consule a notebook I had worked on awhile ago, alma tag export, to re-familiarize with mo.stop() which I think could be a good fit here.
To me, the happy path is that data exists for that date and the rest of the notebook loads. When that happens, I'm not sure you need any actual notebook output.
It's only when there is not data for the date, that output feels helpful.
An alternate pattern could be to stop any more work by this checker/validation/integrity cell if data is found, else the default behavior is to notify:
# Verify S3 Inventory data is available for the selected date
logger.info(f"Collecting parquet file URIs for date: {selected_date}")
if selected_date in symlink_dict:
mo.stop()
mo.md(f"No S3 Inventory data found for {selected_date}, select a different date")At least personally, this feels weird at first... and then it starts to feel really normal for cells that aren't meant to do anything unless something is true. In this case, the "something is true" = no data found for the date, so we want to notify the user.
There was a problem hiding this comment.
I'm unsure at the moment, without a deeper dive into the notebook, but this might set you up for stopping all downstream cells entirely from firing.
Like, you could set some kind of flag where other cells would also not fire if that flag was set.
In the run view, it's fine, there is nothing visible. But in the edit view, when data isn't found for the date, you get cascading errors because the next cell fails.
If interested, happy to brainstorm how to avoid that. The alma notebook has a possible pattern where you run mo.stop() in a cell if something isn't right. You could theoretically add this to all downstream cells... but that feels a little cumbersome.
In the alma notebook it was only 2-3 cells where a check is performed, looking for something in the cell above. But here, I'm wondering if there is a more idiomatic marimo approach like, "stop everything if X is true."
* Update notebook's flow control with stop function * Add spinners for long running processes
|
@ghukill Pushed a new commit based on our conversation, thanks again for the huddle! |
Coverage Report for CI Build 24207062828Coverage decreased (-0.1%) to 16.312%Details
Uncovered Changes
Coverage Regressions3 previously-covered lines in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
ghukill
left a comment
There was a problem hiding this comment.
Looks good to me! I learned a lot about marimo during this review.
100% optional, approval stands, I did notice that when the notebook first loads, it relies on the built-in hourglass in the upper-left to indicate loading before the spinners kick on.
If you wanted to, I'm betting there is some early data pull that could have a spinner as well? It must happen before the date picker is loaded.
Totally optional. Looking real good as-is.
| mo.stop( | ||
| selected_date not in symlink_dict, | ||
| mo.md(f"No S3 Inventory data found for {selected_date}, select a different date"), | ||
| ) |
There was a problem hiding this comment.
👍 - after we did that marimo research about how and when to use mo.stop(...), this feels like the right approach.
And maybe most importantly, how descendent cells are set; where the selected_date here makes all other cells descendent, and thus also stop with this mo.stop().
@ghukill I'll hold off on that change for now but keep it in mind for future PRs! |
Purpose and background context
The file redaction changes are the primary purpose of this PR but I would like you to weigh in on whether the added date message is this the most effective way to manage a lack of data without crashing the notebook.
How can a reviewer manually see the effects of these changes?
.envfileProdcredentials# Verify S3 Inventory data is available for the selected date.Proddata currently starts at 2025-10-09, select a date before that to see the error message and then select a date after to see the populated tables load.Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?