Skip to content

Add 2024 AVs, exemptions, equalizer, and CPI#63

Open
jeancochrane wants to merge 32 commits into2024-data-updatefrom
jeancochrane/2024-cpi-equalizer-av-and-exemptions
Open

Add 2024 AVs, exemptions, equalizer, and CPI#63
jeancochrane wants to merge 32 commits into2024-data-updatefrom
jeancochrane/2024-cpi-equalizer-av-and-exemptions

Conversation

@jeancochrane
Copy link
Member

@jeancochrane jeancochrane commented Jan 2, 2026

This PR tweaks a few data-raw scripts to add 2024 data to the pin, cpi, and eq_factor tables. I have already used this code to load the corresponding files into the testing bucket on S3.

The most complicated of these changes relates to the pin table, whose data source needs to change in 2024 following the Clerk's migration from the AS400 to iasWorld as their source-of-truth database. Rather than pull AV and exemption data from a SQL server mirror of the AS400, as we used to do, we now pull these data from a flat file stored in S3. In future years, we may pull this data from iasWorld directly, so I did a little bit of QC work to check the flat file against iasWorld; they mostly match up, though there remain a few thousand rows with discrepancies that I couldn't track down. (See EI issue 395, which will investigate these discrepancies in more detail.)

Note that this PR doesn't yet include changes to import PIN geometry, because I haven't been able to get that to run successfully yet.

Connects #59.

@jeancochrane jeancochrane changed the base branch from master to 2024-data-update January 2, 2026 20:51
@jeancochrane jeancochrane changed the base branch from 2024-data-update to jeancochrane/fix-pre-commit January 12, 2026 16:35
Base automatically changed from jeancochrane/fix-pre-commit to 2024-data-update January 14, 2026 16:17
Comment on lines +30 to +36
# Remove footer lines that do not contain any data
filter(
!str_detect(
vals,
regex("printed by the authority|ptax-115", ignore_case = TRUE)
)
) %>%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This footer appears to be new as of 2025. See here to examine it: https://tax.illinois.gov/content/dam/soi/en/web/tax/localgovernments/property/documents/cpihistory.pdf

Comment on lines +25 to +28
# Start and end years of data to query, inclusive.
# Set these to the same value if you want to update only one year of data
start_year <- 2006
end_year <- 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these in so that we have an easy way of skipping prior years of data whenever we do an update. This is particularly useful right now because I don't have access to the AS400 mirror, which is required in order to reproduce pre-2024 data. At some point we should get that set up, but I don't want it to block us right now.

# 2023. These values come from the legacy CCAO database, which mirrors the
# county mainframe.
# Only query this data if we are pulling data for years up to 2023
if (start_year <= 2023) {
Copy link
Member Author

@jeancochrane jeancochrane Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few conditional branches in this file that split depending on whether we're ingesting data before or after 2023. I considered creating a new file dedicated exclusively to post-2024 data manipulation, since it feels to me like this file will get very messy very fast if they substantially change the data model again in the future (causing us to need to introduce further conditional branches based on year). For now, however, modifying this file feels like the simpler path, and I expect it's also easier to review that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so if I'm understanding correctly, to run this just for 2024 update we'd define start_year and end_year as 2024?

Comment on lines +98 to +99
# This exemption is new in 2024 and does not exist in the legacy data
exe_vet_dis_100 = 0L
Copy link
Member Author

@jeancochrane jeancochrane Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second change I made to the pre-2023 query.

ON C.PIN = BILLS.PIN
AND C.TAX_YEAR = BILLS.TAX_YEAR
WHERE C.TAX_YEAR >= {start_year}
AND C.TAX_YEAR <= 2023
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of two changes I made to the pre-2023 query.

Comment on lines +296 to +356
mutate(
exe_vet_dis_lt50 = ifelse(
# If the vetdis total from the tax bill export matches the sum of all
# individual vetdis exemptions from Athena (ias), then we can be confident
# filling the individual vetdis exemptions directly from Athena
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_lt50_athena,
ifelse(
# If the total from the tax bill export does _not_ match the sum from
# Athena, then one of two cases is true, according to our investigation:
#
# 1. The tax bill export has a vetdis total that is >0 but different
# from the Athena sum. If Athena has a value >0 for this particular
# vetdis exemption, then we use the total from the tax bill export
# for this exemption, because we assume the Athena data just has the
# wrong amount (in theory, it is not possible for multiple vetdis
# exemption types to be >0 for the same PIN). If instead there are
# no vetdis exemptions with a value >0, then we fill the value into
# the >70% vetdis exemption type, which is the most common vetdis
# exemption type in the Athena data
#
# 2. The tax bill export has a vetdis total of 0, but Athena has
# a sum >0 for vetdis exemptions. In this case, we assume the tax
# bill export is correct, and we fill 0 for all individual
# exemption types.
exe_vet_dis > 0 & exe_vet_dis_lt50_athena > 0,
exe_vet_dis,
0L
)
),
exe_vet_dis_50_69 = ifelse(
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_50_69_athena,
ifelse(
exe_vet_dis > 0 & exe_vet_dis_50_69_athena > 0,
exe_vet_dis,
0L
)
),
exe_vet_dis_ge70 = ifelse(
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_ge70_athena,
case_when(
exe_vet_dis > 0 & exe_vet_dis_ge70_athena > 0 ~ exe_vet_dis,
# This is the most common type of vetdis exemption, so fill it with
# the total from the tax bill export if no vetdis exemption types
# have a value >0 in the Athena data
exe_vet_dis > 0 & exe_vet_dis_tot_athena == 0 ~ exe_vet_dis,
TRUE ~ 0L
)
),
exe_vet_dis_100 = ifelse(
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_100_athena,
ifelse(
exe_vet_dis > 0 & exe_vet_dis_100_athena > 0,
exe_vet_dis,
0L
)
)
) %>%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is pretty complicated, but I think it's important that we get it right, since we're making some interpretive decisions here rather than pulling information directly from the tax bill export. Let me know if it would help to walk through it together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure I've got it! Did you have the chance to test the mismatch PINs to confirm the provided exemption amounts, EAV and tax rate result in the tax bill total provided?
It makes sense to me to prioritize accuracy of the tax bill export especially if using the athena exemption amounts led to a calculated tax bill amount that does not match tax_bill_total.
And then the logic of keeping the exemption amount in the same exemption tier seems the most straightforward. I wondered if we could create a rule based on the exemption amount, but those seem to be pretty inconsistent.

@jeancochrane jeancochrane marked this pull request as ready for review February 18, 2026 19:20
Copy link
Member

@kyrasturgill kyrasturgill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for thinking through the veteran's disability logic - I had a couple questions/thoughts but don't have any concrete changes.

# 2023. These values come from the legacy CCAO database, which mirrors the
# county mainframe.
# Only query this data if we are pulling data for years up to 2023
if (start_year <= 2023) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so if I'm understanding correctly, to run this just for 2024 update we'd define start_year and end_year as 2024?

Comment on lines +296 to +356
mutate(
exe_vet_dis_lt50 = ifelse(
# If the vetdis total from the tax bill export matches the sum of all
# individual vetdis exemptions from Athena (ias), then we can be confident
# filling the individual vetdis exemptions directly from Athena
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_lt50_athena,
ifelse(
# If the total from the tax bill export does _not_ match the sum from
# Athena, then one of two cases is true, according to our investigation:
#
# 1. The tax bill export has a vetdis total that is >0 but different
# from the Athena sum. If Athena has a value >0 for this particular
# vetdis exemption, then we use the total from the tax bill export
# for this exemption, because we assume the Athena data just has the
# wrong amount (in theory, it is not possible for multiple vetdis
# exemption types to be >0 for the same PIN). If instead there are
# no vetdis exemptions with a value >0, then we fill the value into
# the >70% vetdis exemption type, which is the most common vetdis
# exemption type in the Athena data
#
# 2. The tax bill export has a vetdis total of 0, but Athena has
# a sum >0 for vetdis exemptions. In this case, we assume the tax
# bill export is correct, and we fill 0 for all individual
# exemption types.
exe_vet_dis > 0 & exe_vet_dis_lt50_athena > 0,
exe_vet_dis,
0L
)
),
exe_vet_dis_50_69 = ifelse(
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_50_69_athena,
ifelse(
exe_vet_dis > 0 & exe_vet_dis_50_69_athena > 0,
exe_vet_dis,
0L
)
),
exe_vet_dis_ge70 = ifelse(
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_ge70_athena,
case_when(
exe_vet_dis > 0 & exe_vet_dis_ge70_athena > 0 ~ exe_vet_dis,
# This is the most common type of vetdis exemption, so fill it with
# the total from the tax bill export if no vetdis exemption types
# have a value >0 in the Athena data
exe_vet_dis > 0 & exe_vet_dis_tot_athena == 0 ~ exe_vet_dis,
TRUE ~ 0L
)
),
exe_vet_dis_100 = ifelse(
exe_vet_dis == exe_vet_dis_tot_athena,
exe_vet_dis_100_athena,
ifelse(
exe_vet_dis > 0 & exe_vet_dis_100_athena > 0,
exe_vet_dis,
0L
)
)
) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure I've got it! Did you have the chance to test the mismatch PINs to confirm the provided exemption amounts, EAV and tax rate result in the tax bill total provided?
It makes sense to me to prioritize accuracy of the tax bill export especially if using the athena exemption amounts led to a calculated tax bill amount that does not match tax_bill_total.
And then the logic of keeping the exemption amount in the same exemption tier seems the most straightforward. I wondered if we could create a rule based on the exemption amount, but those seem to be pretty inconsistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants