Releases · jpweytjens/EntityMatchingModel

Performance: vectorized feature extraction.

Changes

compute_vocabulary_features ~1.4× faster — replaced six per-row Python .apply loops with C-level set operations (s & vocab) and derived the rare bucket arithmetically since the buckets are disjoint by construction.
calc_lef_features ~18× on the prometis pre-extracted-LEF path, ~36× when LEFs are extracted from names — replaced per-row .apply and axis=1 lambdas with per-unique-input caching for extract_lef, get_business_type, and matching_legal_terms, plus pure vectorized make_combi.

The exported scalar helpers (extract_lef, get_business_type, make_combi, matching_legal_terms) are unchanged — vectorization happens only inside calc_lef_features and compute_vocabulary_features.

Verification

Element-wise parity verified at 1k / 10k / 100k synthetic candidate rows.
115 unit + pandas integration tests pass.

Install

pip install "git+https://github.com/jpweytjens/EntityMatchingModel.git@v2.1.12+prometis"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changes

Verification

Install

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: jpweytjens/EntityMatchingModel

v2.1.12+prometis

Changes

Verification

Install

Uh oh!

v2.1.11+prometis

Uh oh!