Conversation
| cat_last_reg_end_minus_death <- function(days) { | ||
| case_when( | ||
| is.na(days) ~ "missing_registration_end", | ||
| days <= -31 ~ "<-31", |
There was a problem hiding this comment.
Technically this should be days <= -31 ~ "<=-31". Just a labelling thing so not a big deal as long as you interpret it correctly.
There was a problem hiding this comment.
Thank you, I changed it to -31+
| clinical_events | ||
| .where(clinical_events.snomedct_code.is_in(tpp_coded_death_codes)) | ||
| .sort_by(clinical_events.date) | ||
| .last_for_patient() |
There was a problem hiding this comment.
I am curious about the decision to extract last recorded event, as opposed to first. I would have assumed the first coded event was most relevant
There was a problem hiding this comment.
Thank you, changed to .first_for_patient()
| has_possible_age = ( | ||
| ((age_at_death > 0) & (age_at_death < 110)) | ||
| | (patients.date_of_birth.year == ref_death_date.year) | ||
| ) |
There was a problem hiding this comment.
Would this work:
has_possible_age = (
((age_at_death >= 0) & (age_at_death < 110))
& (date_of_birth >= ref_death_date)
)
?
B/c age is calculated in whole years - if we include babies born in the year of their death (like you have) that will include people whose date of death is before their date of birth, and is a likely error.
There was a problem hiding this comment.
Thank you, changed to >= 0
| ) | ||
| has_any_death | ||
| & has_possible_age | ||
| & has_disclosive_sex |
There was a problem hiding this comment.
I think you mean has_non_disclosive_sex
There was a problem hiding this comment.
Thank you! I fixed this
| )|> | ||
| group_by(death_date_ref_year) |> | ||
| mutate( | ||
| total_year = sum(total, na.rm = TRUE), |
There was a problem hiding this comment.
it would be better to sum first, then apply rounding - if you sum rounded values, that will introduce additional uncertainty - probably not a big deal in the grand scheme of things but it is better practice.
There was a problem hiding this comment.
Modified to compute yearly totals first and apply rounding after summing, while keeping subgroup counts rounded before calculating percentages
group_by(death_date_ref_year, death_source) |>
summarise(
total = n(),
.groups = "drop"
)|>
group_by(death_date_ref_year) |>
mutate(
total_year = rounding(sum(total, na.rm = TRUE)),
total = rounding(total),
perc = total / total_year * 100
)
Hi Andrea,
I introduced changes to improve coherence and readability across the analysis.
0_utility_functions.RDefines reusable helper functions used across the analysis scripts.
1_derive_key_variables.RBuilds the core processed dataset used in all downstream analyses. It derives key variables including death source (ONS/TPP/Both), inclusion flags, reference death date, registration status and timing, TPP-coded death indicators, date differences, and demographic variables.
Outputs:
death_registration_processed.csv.gz(main dataset),death_registration_processed_skim.txt,table_source_raw.csv2_implausible_death_dates.RAssesses implausible death dates separately for ONS and TPP (e.g. before birth, outside study period). This acts as a data quality check.
Output:
death_implausible_source.csv3_registration_at_death.RDescribes registration status at the time of death and timing of registration relative to death. Restricted to valid death dates.
Outputs:
registration_status_source.csv,reg_start_timing_source.csv,reg_end_timing_source.csv4_death_source_comparison.RCompares where deaths are recorded (ONS only, TPP only, both), overall and by subgroup. Includes a sensitivity analysis incorporating TPP-coded deaths.
Outputs:
table_death_source.csv,table_death_source_25_26.csv,tpp_death_code_or_date.csv,table_death_source_overall_any_tpp.csv5_sources_date_agreement.RAssesses agreement between ONS and TPP death dates among individuals with records in both sources, grouping differences into meaningful categories.
Output:
table_ons_tpp_dates_diff.csv6_variation_source_by_practice.RQuantifies variation across GP practices in the proportion of deaths recorded only in ONS, summarised using practice-level percentiles by year.
Output:
table_practice_percentiles.csvPost-release script
Scripts in the
/post_releasefolder generate additional outputs derived from the main analysis tables after disclosure control, for tables / visualisation.