Skip to content

Add recordLinkage function and tests#365

Merged
matthias-da merged 2 commits intosdcTools:masterfrom
MuellerRoman:feature/recordlinkage
Mar 25, 2026
Merged

Add recordLinkage function and tests#365
matthias-da merged 2 commits intosdcTools:masterfrom
MuellerRoman:feature/recordlinkage

Conversation

@MuellerRoman
Copy link
Contributor

Description

This pull request adds a new recordLinkage() function implementing Global Distance-Based Record Linkage (GDBRL) following Herranz et al. (2015).

What is included

  • New recordLinkage() function for one-to-one global record linkage via the Hungarian algorithm
  • Support for "gower", "euclidean", and "manhattan" distances
  • Support for na_action = "ignore" and na_action = "fail"
  • Optional return of the pairwise cross-dataset distance matrix

Unit tests covering:

  • published example cases
  • distance matrix calculations
  • weighted and unweighted linkage
  • Hungarian assignment behavior
  • input validation
  • missing-value handling
  • ordered/unordered factor harmonization

Notes

  • At present, the function supports data.frame inputs only; sdcObj objects are not yet supported!
  • Files added:
    - R/recordLinkage.R
    - tests/testthat/test_recordLinkage.R
    - generated Rd documentation

Reference

Herranz, J., Nin, J., Rodríguez, P., and Tassa, T. (2015). Revisiting distance-based record linkage for privacy-preserving release of statistical datasets. Data & Knowledge Engineering, 100, 78–93. https://www.sciencedirect.com/science/article/pii/S0169023X15000543

@matthias-da matthias-da merged commit ee3a303 into sdcTools:master Mar 25, 2026
2 checks passed
@MuellerRoman MuellerRoman deleted the feature/recordlinkage branch March 25, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants