Add 2024 AVs, exemptions, equalizer, and CPI#63
Add 2024 AVs, exemptions, equalizer, and CPI#63jeancochrane wants to merge 32 commits into2024-data-updatefrom
Conversation
This reverts commit 128edb9.
…qualizer-av-and-exemptions
| # Remove footer lines that do not contain any data | ||
| filter( | ||
| !str_detect( | ||
| vals, | ||
| regex("printed by the authority|ptax-115", ignore_case = TRUE) | ||
| ) | ||
| ) %>% |
There was a problem hiding this comment.
This footer appears to be new as of 2025. See here to examine it: https://tax.illinois.gov/content/dam/soi/en/web/tax/localgovernments/property/documents/cpihistory.pdf
| # Start and end years of data to query, inclusive. | ||
| # Set these to the same value if you want to update only one year of data | ||
| start_year <- 2006 | ||
| end_year <- 2024 |
There was a problem hiding this comment.
Adding these in so that we have an easy way of skipping prior years of data whenever we do an update. This is particularly useful right now because I don't have access to the AS400 mirror, which is required in order to reproduce pre-2024 data. At some point we should get that set up, but I don't want it to block us right now.
| # 2023. These values come from the legacy CCAO database, which mirrors the | ||
| # county mainframe. | ||
| # Only query this data if we are pulling data for years up to 2023 | ||
| if (start_year <= 2023) { |
There was a problem hiding this comment.
There are a few conditional branches in this file that split depending on whether we're ingesting data before or after 2023. I considered creating a new file dedicated exclusively to post-2024 data manipulation, since it feels to me like this file will get very messy very fast if they substantially change the data model again in the future (causing us to need to introduce further conditional branches based on year). For now, however, modifying this file feels like the simpler path, and I expect it's also easier to review that way.
There was a problem hiding this comment.
Got it, so if I'm understanding correctly, to run this just for 2024 update we'd define start_year and end_year as 2024?
| # This exemption is new in 2024 and does not exist in the legacy data | ||
| exe_vet_dis_100 = 0L |
There was a problem hiding this comment.
This is the second change I made to the pre-2023 query.
| ON C.PIN = BILLS.PIN | ||
| AND C.TAX_YEAR = BILLS.TAX_YEAR | ||
| WHERE C.TAX_YEAR >= {start_year} | ||
| AND C.TAX_YEAR <= 2023 |
There was a problem hiding this comment.
This is one of two changes I made to the pre-2023 query.
| mutate( | ||
| exe_vet_dis_lt50 = ifelse( | ||
| # If the vetdis total from the tax bill export matches the sum of all | ||
| # individual vetdis exemptions from Athena (ias), then we can be confident | ||
| # filling the individual vetdis exemptions directly from Athena | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_lt50_athena, | ||
| ifelse( | ||
| # If the total from the tax bill export does _not_ match the sum from | ||
| # Athena, then one of two cases is true, according to our investigation: | ||
| # | ||
| # 1. The tax bill export has a vetdis total that is >0 but different | ||
| # from the Athena sum. If Athena has a value >0 for this particular | ||
| # vetdis exemption, then we use the total from the tax bill export | ||
| # for this exemption, because we assume the Athena data just has the | ||
| # wrong amount (in theory, it is not possible for multiple vetdis | ||
| # exemption types to be >0 for the same PIN). If instead there are | ||
| # no vetdis exemptions with a value >0, then we fill the value into | ||
| # the >70% vetdis exemption type, which is the most common vetdis | ||
| # exemption type in the Athena data | ||
| # | ||
| # 2. The tax bill export has a vetdis total of 0, but Athena has | ||
| # a sum >0 for vetdis exemptions. In this case, we assume the tax | ||
| # bill export is correct, and we fill 0 for all individual | ||
| # exemption types. | ||
| exe_vet_dis > 0 & exe_vet_dis_lt50_athena > 0, | ||
| exe_vet_dis, | ||
| 0L | ||
| ) | ||
| ), | ||
| exe_vet_dis_50_69 = ifelse( | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_50_69_athena, | ||
| ifelse( | ||
| exe_vet_dis > 0 & exe_vet_dis_50_69_athena > 0, | ||
| exe_vet_dis, | ||
| 0L | ||
| ) | ||
| ), | ||
| exe_vet_dis_ge70 = ifelse( | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_ge70_athena, | ||
| case_when( | ||
| exe_vet_dis > 0 & exe_vet_dis_ge70_athena > 0 ~ exe_vet_dis, | ||
| # This is the most common type of vetdis exemption, so fill it with | ||
| # the total from the tax bill export if no vetdis exemption types | ||
| # have a value >0 in the Athena data | ||
| exe_vet_dis > 0 & exe_vet_dis_tot_athena == 0 ~ exe_vet_dis, | ||
| TRUE ~ 0L | ||
| ) | ||
| ), | ||
| exe_vet_dis_100 = ifelse( | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_100_athena, | ||
| ifelse( | ||
| exe_vet_dis > 0 & exe_vet_dis_100_athena > 0, | ||
| exe_vet_dis, | ||
| 0L | ||
| ) | ||
| ) | ||
| ) %>% |
There was a problem hiding this comment.
The logic here is pretty complicated, but I think it's important that we get it right, since we're making some interpretive decisions here rather than pulling information directly from the tax bill export. Let me know if it would help to walk through it together.
There was a problem hiding this comment.
I'm pretty sure I've got it! Did you have the chance to test the mismatch PINs to confirm the provided exemption amounts, EAV and tax rate result in the tax bill total provided?
It makes sense to me to prioritize accuracy of the tax bill export especially if using the athena exemption amounts led to a calculated tax bill amount that does not match tax_bill_total.
And then the logic of keeping the exemption amount in the same exemption tier seems the most straightforward. I wondered if we could create a rule based on the exemption amount, but those seem to be pretty inconsistent.
kyrasturgill
left a comment
There was a problem hiding this comment.
This looks great! Thanks for thinking through the veteran's disability logic - I had a couple questions/thoughts but don't have any concrete changes.
| # 2023. These values come from the legacy CCAO database, which mirrors the | ||
| # county mainframe. | ||
| # Only query this data if we are pulling data for years up to 2023 | ||
| if (start_year <= 2023) { |
There was a problem hiding this comment.
Got it, so if I'm understanding correctly, to run this just for 2024 update we'd define start_year and end_year as 2024?
| mutate( | ||
| exe_vet_dis_lt50 = ifelse( | ||
| # If the vetdis total from the tax bill export matches the sum of all | ||
| # individual vetdis exemptions from Athena (ias), then we can be confident | ||
| # filling the individual vetdis exemptions directly from Athena | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_lt50_athena, | ||
| ifelse( | ||
| # If the total from the tax bill export does _not_ match the sum from | ||
| # Athena, then one of two cases is true, according to our investigation: | ||
| # | ||
| # 1. The tax bill export has a vetdis total that is >0 but different | ||
| # from the Athena sum. If Athena has a value >0 for this particular | ||
| # vetdis exemption, then we use the total from the tax bill export | ||
| # for this exemption, because we assume the Athena data just has the | ||
| # wrong amount (in theory, it is not possible for multiple vetdis | ||
| # exemption types to be >0 for the same PIN). If instead there are | ||
| # no vetdis exemptions with a value >0, then we fill the value into | ||
| # the >70% vetdis exemption type, which is the most common vetdis | ||
| # exemption type in the Athena data | ||
| # | ||
| # 2. The tax bill export has a vetdis total of 0, but Athena has | ||
| # a sum >0 for vetdis exemptions. In this case, we assume the tax | ||
| # bill export is correct, and we fill 0 for all individual | ||
| # exemption types. | ||
| exe_vet_dis > 0 & exe_vet_dis_lt50_athena > 0, | ||
| exe_vet_dis, | ||
| 0L | ||
| ) | ||
| ), | ||
| exe_vet_dis_50_69 = ifelse( | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_50_69_athena, | ||
| ifelse( | ||
| exe_vet_dis > 0 & exe_vet_dis_50_69_athena > 0, | ||
| exe_vet_dis, | ||
| 0L | ||
| ) | ||
| ), | ||
| exe_vet_dis_ge70 = ifelse( | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_ge70_athena, | ||
| case_when( | ||
| exe_vet_dis > 0 & exe_vet_dis_ge70_athena > 0 ~ exe_vet_dis, | ||
| # This is the most common type of vetdis exemption, so fill it with | ||
| # the total from the tax bill export if no vetdis exemption types | ||
| # have a value >0 in the Athena data | ||
| exe_vet_dis > 0 & exe_vet_dis_tot_athena == 0 ~ exe_vet_dis, | ||
| TRUE ~ 0L | ||
| ) | ||
| ), | ||
| exe_vet_dis_100 = ifelse( | ||
| exe_vet_dis == exe_vet_dis_tot_athena, | ||
| exe_vet_dis_100_athena, | ||
| ifelse( | ||
| exe_vet_dis > 0 & exe_vet_dis_100_athena > 0, | ||
| exe_vet_dis, | ||
| 0L | ||
| ) | ||
| ) | ||
| ) %>% |
There was a problem hiding this comment.
I'm pretty sure I've got it! Did you have the chance to test the mismatch PINs to confirm the provided exemption amounts, EAV and tax rate result in the tax bill total provided?
It makes sense to me to prioritize accuracy of the tax bill export especially if using the athena exemption amounts led to a calculated tax bill amount that does not match tax_bill_total.
And then the logic of keeping the exemption amount in the same exemption tier seems the most straightforward. I wondered if we could create a rule based on the exemption amount, but those seem to be pretty inconsistent.
This PR tweaks a few
data-rawscripts to add 2024 data to thepin,cpi, andeq_factortables. I have already used this code to load the corresponding files into the testing bucket on S3.The most complicated of these changes relates to the
pintable, whose data source needs to change in 2024 following the Clerk's migration from the AS400 to iasWorld as their source-of-truth database. Rather than pull AV and exemption data from a SQL server mirror of the AS400, as we used to do, we now pull these data from a flat file stored in S3. In future years, we may pull this data from iasWorld directly, so I did a little bit of QC work to check the flat file against iasWorld; they mostly match up, though there remain a few thousand rows with discrepancies that I couldn't track down. (See EI issue 395, which will investigate these discrepancies in more detail.)Note that this PR doesn't yet include changes to import PIN geometry, because I haven't been able to get that to run successfully yet.
Connects #59.