[PULL REQUEST] Updating Mortality Calculations by KPHSANDAG · Pull Request #104 · SANDAG/Cohort-Component-Model

KPHSANDAG · 2026-02-17T23:28:07Z

Describe this pull request. What changes are being made?

This pull request alters how mortality is calculated for the CCM. This pr mainly follows the procedures from Mortality Prediction Project (MPP), described in Issue 95. It then implements the previous CDC-related functions and splining techniques in the MPP, in addition to amending existing CCM functions.

A successful demo can be seen in run_id = 4 in [CohortComponentModel].[outputs].[rates].

What issues does this pull request address?

closes #95

Additional context

refer to Mortality Prediction Project and Issue 95 for further detail

Copilot

Pull request overview

This pull request updates the mortality calculation methodology for the Cohort Component Model (CCM) by implementing procedures from the Mortality Prediction Project (MPP). The changes primarily involve removing outdated CDC mortality data files from 2010-2020 and replacing the calculation approach with new CDC-related functions and splining techniques from the MPP.

Changes:

Removal of legacy CDC mortality data files for various demographic groups across multiple years (2010-2020)
Implementation of new mortality calculation procedures following the Mortality Prediction Project methodology
Integration of CDC-related functions and splining techniques into existing CCM functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GregorSchroeder

This is initial feedback. I think once these issues are resolved we can move into simplifying the whole process as it seems overly complex at the moment. With all these files and transformations, I think it would behoove us to move towards loading in mortality for the year of interest only as opposed to doing a mass loading and then subsequent filtering in get_death_rates.

`data` folder

We only need data back to 2010, remove all the pre-2010 datasets. Calling a folder "2018_2023" when it only contains 2022+ is confusing. Also, what is going to happen to that folder name when it includes 2024? I would suggest either not differentiating by product in separate folders, because they don't overlap in terms of uploaded data, or using more generic folder names.
Need to add a .gitattributes file so we aren't doing full track changes for all of these mortality rate files. See https://github.com/SANDAG/CA-DOF/blob/main/.gitattributes.

parsing file names

Why is there a mix of .csv and .txt files that necessitates the use of get_file_separator?
Do we need to have all these abbreviations and mappings in parse_filename or could we just use full names and mappings? Why separate the labels and maps in valid_config, couldn't it just use map and check for valid labels? [str(y) for y in range(1999, 2024)] introduces a year-restriction that should not be in place here

parsing and transforming files

replace usages of .query with .loc
is parse_not_stated actually using the input dataframe it takes in? it looks like it just overwrites it immediately?

KPHSANDAG · 2026-02-24T15:36:51Z

Why is there a mix of .csv and .txt files that necessitates the use of get_file_separator?

CDC underwent an update in which they no longer allow .txt downloads; they only have .xls, .tsv, or .csv. If we wanted to get rid of the file separator function, I need to redownload all of the .txt files and resave them as .csv. I am open to this if we think it would look better for all the files to be the same format. @GregorSchroeder what would you prefer?

GregorSchroeder

Why was the use of the logger removed and substituted with print statements? We want to write out warnings, messages, etc.. to the log file not to print statements.

I think there is a lot of complication right now in the mortality code and most of it might be due to the file/folder structure. I think moving back to a simpler folder structure of just deaths with years as subfolders will 1) provide more organization and clarity as to what data sources are used to calculate mortality rates for a given year and 2) simplify the code allowing us to pass year as a parameter and fully isolate our calculations within a single year. Your filenames are doing all the heavy lifting in terms of identifying product and geographic grain so we don't need subfolders to provide further identification. I was thinking of pseudocode as follows under this new folder structure.

Create a function that takes in a single file path load_cdc_wonder
- Check file name is consistent with pattern and return implied metadata validate_file_name
- Load the file and check against the implied metadata
- Load the Stated/Not Stated and apply inflation factor to deaths inflate_deaths
Wrapper function that takes a given year and calls the above function for each geography level parse_cdc_wonder
- Apply substitution methodology rate_substitution
- Apply smoothing smooth_rates
- If a given year's product is the old CDC WONDER (can tell from file names or even just a hardcode year value) then take the "All" races data and duplicate it for "NHPI" and "Two or More"

python/input_modules/death_rates.py

GregorSchroeder

Let's concentrate on naming, parsing, and validating the files then making sure we are using the metadata and a single year to drive everything.

python/input_modules/death_rates.py

#95: Updating Mortality Calculations

3d46260

KPHSANDAG requested a review from Copilot February 17, 2026 23:28

KPHSANDAG self-assigned this Feb 17, 2026

KPHSANDAG added the enhancement New feature or request label Feb 17, 2026

Copilot AI reviewed Feb 17, 2026

View reviewed changes

KPHSANDAG requested a review from GregorSchroeder February 18, 2026 00:23

GregorSchroeder requested changes Feb 23, 2026

View reviewed changes

#95: Response to GS

725cca1

KPHSANDAG requested a review from GregorSchroeder February 24, 2026 15:36

KPHSANDAG added 2 commits March 4, 2026 10:41

#95: Updating impution, dividing county rates by 5, splining <= 84

2e871dc

#95: taking in years, cached memory

454d648

GregorSchroeder requested changes Mar 5, 2026

View reviewed changes

#95: optimize code per GS comments

5068a02

KPHSANDAG requested a review from GregorSchroeder March 10, 2026 23:58

GregorSchroeder requested changes Mar 12, 2026

View reviewed changes

KPHSANDAG added 8 commits March 12, 2026 17:12

#95: new naming convention for files for 2018+ CDC product

c4e02da

#95: Adding 2024 data, fixing not stated data for 2010-2020

19bf8c2

#95: Changing naming convention from "SYA NS" to "NS"

c2cf581

#95: Fixing typo in naming

fa06ada

#95: Modifying parsing and validating file name functions

c4e0fe6

#95: Optimize code for reading in dataframes for CDC WONDER

3f539de

#95: Remove inflation function-> incorporate into loading files

533968f

#95: Update functions to accept and parse by year

c01be12

KPHSANDAG requested a review from GregorSchroeder March 20, 2026 03:07

#95: Rename files and update code for Hispanic

e33824b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PULL REQUEST] Updating Mortality Calculations#104

[PULL REQUEST] Updating Mortality Calculations#104
KPHSANDAG wants to merge 14 commits intomainfrom
mortality-updates

KPHSANDAG commented Feb 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

GregorSchroeder left a comment

Uh oh!

KPHSANDAG commented Feb 24, 2026

Uh oh!

GregorSchroeder left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GregorSchroeder left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KPHSANDAG commented Feb 17, 2026

Describe this pull request. What changes are being made?

What issues does this pull request address?

Additional context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

GregorSchroeder left a comment

Choose a reason for hiding this comment

data folder

parsing file names

parsing and transforming files

Uh oh!

KPHSANDAG commented Feb 24, 2026

Uh oh!

GregorSchroeder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GregorSchroeder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`data` folder