Releases: jpweytjens/EntityMatchingModel
Releases · jpweytjens/EntityMatchingModel
v2.1.12+prometis
Performance: vectorized feature extraction.
Changes
compute_vocabulary_features~1.4× faster — replaced six per-row Python.applyloops with C-level set operations (s & vocab) and derived therarebucket arithmetically since the buckets are disjoint by construction.calc_lef_features~18× on the prometis pre-extracted-LEF path, ~36× when LEFs are extracted from names — replaced per-row.applyandaxis=1lambdas with per-unique-input caching forextract_lef,get_business_type, andmatching_legal_terms, plus pure vectorizedmake_combi.
The exported scalar helpers (extract_lef, get_business_type, make_combi, matching_legal_terms) are unchanged — vectorization happens only inside calc_lef_features and compute_vocabulary_features.
Verification
- Element-wise parity verified at 1k / 10k / 100k synthetic candidate rows.
- 115 unit + pandas integration tests pass.
Install
pip install "git+https://github.com/jpweytjens/EntityMatchingModel.git@v2.1.12+prometis"
v2.1.11+prometis
EMM 2.1.11 with custom Prometis features:
- GLEIF/custom LEF support — pre-extracted LEF columns, custom cleanco terms, custom legal abbreviations
- Vectorized pandas aggregation — replaces per-account groupby().apply() with bulk vectorized ops
- Multi-name GT aggregation — new
freq_weighted_entitymethod for multiple GT name variations per entity - Dependency updates — scikit-learn >=1.4,<1.9; pandas <3.0; Python >=3.9