Skip to content

Releases: jpweytjens/EntityMatchingModel

v2.1.12+prometis

29 Apr 15:41

Choose a tag to compare

Performance: vectorized feature extraction.

Changes

  • compute_vocabulary_features ~1.4× faster — replaced six per-row Python .apply loops with C-level set operations (s & vocab) and derived the rare bucket arithmetically since the buckets are disjoint by construction.
  • calc_lef_features ~18× on the prometis pre-extracted-LEF path, ~36× when LEFs are extracted from names — replaced per-row .apply and axis=1 lambdas with per-unique-input caching for extract_lef, get_business_type, and matching_legal_terms, plus pure vectorized make_combi.

The exported scalar helpers (extract_lef, get_business_type, make_combi, matching_legal_terms) are unchanged — vectorization happens only inside calc_lef_features and compute_vocabulary_features.

Verification

  • Element-wise parity verified at 1k / 10k / 100k synthetic candidate rows.
  • 115 unit + pandas integration tests pass.

Install

pip install "git+https://github.com/jpweytjens/EntityMatchingModel.git@v2.1.12+prometis"

v2.1.11+prometis

01 Apr 15:10

Choose a tag to compare

EMM 2.1.11 with custom Prometis features:

  • GLEIF/custom LEF support — pre-extracted LEF columns, custom cleanco terms, custom legal abbreviations
  • Vectorized pandas aggregation — replaces per-account groupby().apply() with bulk vectorized ops
  • Multi-name GT aggregation — new freq_weighted_entity method for multiple GT name variations per entity
  • Dependency updates — scikit-learn >=1.4,<1.9; pandas <3.0; Python >=3.9