Transform JSON output from the Senzing SDK for use with graph technologies, semantics, and downstream LLM integration.
pip install sz_sematicsMask the PII values within Senzing JSON output with tokens which can be substituted back later. For example, mask PII values before calling a remote service (such as an LLM-based chat) then unmask returned text after the roundtrip, to maintain data privacy.
import json
from sz_semantics import Mask
data: dict = { "ENTITY_NAME": "Robert Smith" }
sz_mask: Mask = Mask()
masked_data: dict = sz_mask.mask_data(data)
masked_text: str = json.dumps(masked_data)
print(masked_text)
unmasked: str = sz_mask.unmask_text(masked_text)
print(unmasked)For an example, run the demo1.py script with a data file which
captures Senzing JSON output:
python3 demo1.py data/get.jsonThe two lists Mask.KNOWN_KEYS and Mask.MASKED_KEYS enumerate
respectively the:
- keys for known elements which do not require masking
- keys for PII elements which require masking
Any other keys encountered will be masked by default and reported as warnings in the logging. Adjust these lists as needed for a given use case.
For work with large numbers of entities, subclass KeyValueStore to
provide a distributed key/value store (other than the Python built-in
dict default) to use for scale-out.
Starting with a small SKOS-based taxonomy
in the domain.ttl file, parse the Senzing
entity resolution
(ER) results to generate an
RDFlib semantic graph
then transform this into a
NetworkX property graph, which represents a
semantic layer.
In other words, generate the "backbone" for constructing an
Entity Resolved Knowledge Graph.
import pathlib
from sz_semantics import Thesaurus
thes: Thesaurus = Thesaurus()
thes.parse_er_export(
[
"data/truth/customers.json",
"data/truth/reference.json",
"data/truth/watchlist.json",
],
export_path = pathlib.Path("data/truth/export.json"),
er_path = pathlib.Path("thesaurus.ttl"),
)
thes.load_er_thesaurus(
er_path = pathlib.Path("thesaurus.ttl"),
)
thes.save_sem_layer(pathlib.Path("sem.json"))For an example, run the demo2.py script to process the JSON file
data/export.json which captures Senzing ER exported results:
python3 demo2.pyCheck the generated RDF in thesaurus.ttl and the resulting property
graph in the sem.json node-link format file.
Note: this portion is a work-in-progress, currently refactoring toward a more scalable approach for streaming updates.
License and Copyright
Source code for sz_semantics plus any logo, documentation, and
examples have an MIT license
which is succinct and simplifies use in commercial applications.
All materials herein are Copyright © 2025 Senzing, Inc.
Kudos to @brianmacy, @jbutcher21, @docktermj, @cj2001, and the kind folks at GraphGeeks for their support.
