Lower memory usage for pin geometry calc#67
Draft
wrridgeway wants to merge 9 commits into2024-data-updatefrom
Draft
Lower memory usage for pin geometry calc#67wrridgeway wants to merge 9 commits into2024-data-updatefrom
wrridgeway wants to merge 9 commits into2024-data-updatefrom
Conversation
wrridgeway
commented
Feb 26, 2026
| fill(town_code, .direction = "updown", .by = pin10) %>% | ||
| # For remaining missing town codes, replace with 99 to make looping through | ||
| # them below easier. | ||
| mutate(town_code = replace_na(town_code, 99)) |
Member
Author
There was a problem hiding this comment.
It's annoying to loop through NA values. I tried using both group_map() and split() to avoid this, but they seemed to take far longer than just walking through the possible values and using filter().
wrridgeway
commented
Feb 26, 2026
data-raw/pin/pin.R
Outdated
| }, .progress = TRUE) | ||
|
|
||
| # Remove the full dataset to free up memory for processing | ||
| rm(pin_geometry_df_full) |
Member
Author
There was a problem hiding this comment.
This isn't necessary, but collecting the data and filling it takes a lot of time/memory and this let's us avoid having to redo it if anything goes wrong.
wrridgeway
commented
Feb 26, 2026
| ) %>% | ||
| arrange(pin10, start_year) | ||
| }, .progress = TRUE) %>% | ||
| bind_rows() |
Member
Author
There was a problem hiding this comment.
We're just moving this work inside of a map function to let us load less data into memory. I've chunked everything by town_code since we only need to look back in time within pin10, not across pin10.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We go from 1,469,467 to 1,470,359 rows (a difference of 892). Here is what the new data looks like compared to the old data, minus the geometry column: