Skip to content

Regsitration#28

Merged
marrpesce merged 3 commits intomainfrom
regsitration
May 1, 2026
Merged

Regsitration#28
marrpesce merged 3 commits intomainfrom
regsitration

Conversation

@marrpesce
Copy link
Copy Markdown
Contributor

Hi Andrea,
I introduced changes to improve coherence and readability across the analysis.

0_utility_functions.R
Defines reusable helper functions used across the analysis scripts.

1_derive_key_variables.R
Builds the core processed dataset used in all downstream analyses. It derives key variables including death source (ONS/TPP/Both), inclusion flags, reference death date, registration status and timing, TPP-coded death indicators, date differences, and demographic variables.
Outputs: death_registration_processed.csv.gz (main dataset), death_registration_processed_skim.txt, table_source_raw.csv

2_implausible_death_dates.R
Assesses implausible death dates separately for ONS and TPP (e.g. before birth, outside study period). This acts as a data quality check.
Output: death_implausible_source.csv

3_registration_at_death.R
Describes registration status at the time of death and timing of registration relative to death. Restricted to valid death dates.
Outputs: registration_status_source.csv, reg_start_timing_source.csv, reg_end_timing_source.csv

4_death_source_comparison.R
Compares where deaths are recorded (ONS only, TPP only, both), overall and by subgroup. Includes a sensitivity analysis incorporating TPP-coded deaths.
Outputs: table_death_source.csv , table_death_source_25_26.csv , tpp_death_code_or_date.csv , table_death_source_overall_any_tpp.csv

5_sources_date_agreement.R
Assesses agreement between ONS and TPP death dates among individuals with records in both sources, grouping differences into meaningful categories.
Output: table_ons_tpp_dates_diff.csv

6_variation_source_by_practice.R
Quantifies variation across GP practices in the proportion of deaths recorded only in ONS, summarised using practice-level percentiles by year.
Output: table_practice_percentiles.csv

Post-release script
Scripts in the /post_release folder generate additional outputs derived from the main analysis tables after disclosure control, for tables / visualisation.

@marrpesce marrpesce requested a review from alschaffer April 17, 2026 14:43
Comment thread analysis/0_utility_functions.R Outdated
cat_last_reg_end_minus_death <- function(days) {
case_when(
is.na(days) ~ "missing_registration_end",
days <= -31 ~ "<-31",
Copy link
Copy Markdown
Contributor

@alschaffer alschaffer Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this should be days <= -31 ~ "<=-31". Just a labelling thing so not a big deal as long as you interpret it correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I changed it to -31+

clinical_events
.where(clinical_events.snomedct_code.is_in(tpp_coded_death_codes))
.sort_by(clinical_events.date)
.last_for_patient()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about the decision to extract last recorded event, as opposed to first. I would have assumed the first coded event was most relevant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, changed to .first_for_patient()

has_possible_age = (
((age_at_death > 0) & (age_at_death < 110))
| (patients.date_of_birth.year == ref_death_date.year)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work:

has_possible_age = (
((age_at_death >= 0) & (age_at_death < 110))
& (date_of_birth >= ref_death_date)
)

?

B/c age is calculated in whole years - if we include babies born in the year of their death (like you have) that will include people whose date of death is before their date of birth, and is a likely error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, changed to >= 0

)
has_any_death
& has_possible_age
& has_disclosive_sex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean has_non_disclosive_sex

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I fixed this

Comment thread analysis/1_derive_key_variables.R Outdated
)|>
group_by(death_date_ref_year) |>
mutate(
total_year = sum(total, na.rm = TRUE),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to sum first, then apply rounding - if you sum rounded values, that will introduce additional uncertainty - probably not a big deal in the grand scheme of things but it is better practice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to compute yearly totals first and apply rounding after summing, while keeping subgroup counts rounded before calculating percentages

     group_by(death_date_ref_year, death_source) |>
  summarise(
    total = n(), 
    .groups = "drop"
  )|>
  group_by(death_date_ref_year) |>
  mutate(
    total_year = rounding(sum(total, na.rm = TRUE)),    
    total = rounding(total),
        perc = total / total_year * 100          
  )

@marrpesce marrpesce requested a review from kyomuhai April 24, 2026 11:54
@marrpesce marrpesce merged commit 013342b into main May 1, 2026
1 check passed
@marrpesce marrpesce deleted the regsitration branch May 1, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants