Skip to content

jeanremacle/sz-semantics

 
 

Repository files navigation

sz_semantics

Transform JSON output from the Senzing SDK for use with graph technologies, semantics, and downstream LLM integration.

Install

pip install sz_sematics

Usage: Masking PII

Mask the PII values within Senzing JSON output with tokens which can be substituted back later. For example, mask PII values before calling a remote service (such as an LLM-based chat) then unmask returned text after the roundtrip, to maintain data privacy.

import json
from sz_semantics import Mask

data: dict = { "ENTITY_NAME": "Robert Smith" }

sz_mask: Mask = Mask()
masked_data: dict = sz_mask.mask_data(data)

masked_text: str = json.dumps(masked_data)
print(masked_text)

unmasked: str = sz_mask.unmask_text(masked_text)
print(unmasked)

For an example, run the demo1.py script with a data file which captures Senzing JSON output:

python3 demo1.py data/get.json

The two lists Mask.KNOWN_KEYS and Mask.MASKED_KEYS enumerate respectively the:

  • keys for known elements which do not require masking
  • keys for PII elements which require masking

Any other keys encountered will be masked by default and reported as warnings in the logging. Adjust these lists as needed for a given use case.

For work with large numbers of entities, subclass KeyValueStore to provide a distributed key/value store (other than the Python built-in dict default) to use for scale-out.

Usage: Semantic Represenation

Starting with a small SKOS-based taxonomy in the domain.ttl file, parse the Senzing entity resolution (ER) results to generate an RDFlib semantic graph then transform this into a NetworkX property graph, which represents a semantic layer. In other words, generate the "backbone" for constructing an Entity Resolved Knowledge Graph.

import pathlib
from sz_semantics import Thesaurus

thes: Thesaurus = Thesaurus()

thes.parse_er_export(
    [
        "data/truth/customers.json",
        "data/truth/reference.json",
        "data/truth/watchlist.json",
    ],
    export_path = pathlib.Path("data/truth/export.json"),
    er_path = pathlib.Path("thesaurus.ttl"),
)

thes.load_er_thesaurus(
    er_path = pathlib.Path("thesaurus.ttl"),
)

thes.save_sem_layer(pathlib.Path("sem.json"))

For an example, run the demo2.py script to process the JSON file data/export.json which captures Senzing ER exported results:

python3 demo2.py

Check the generated RDF in thesaurus.ttl and the resulting property graph in the sem.json node-link format file.

Note: this portion is a work-in-progress, currently refactoring toward a more scalable approach for streaming updates.



License and Copyright

Source code for sz_semantics plus any logo, documentation, and examples have an MIT license which is succinct and simplifies use in commercial applications.

All materials herein are Copyright © 2025 Senzing, Inc.

Kudos to @brianmacy, @jbutcher21, @docktermj, @cj2001, and the kind folks at GraphGeeks for their support.

Star History

Star History Chart

About

Transform JSON output from Senzing SDK for use with graph technologies, semantics, and downstream LLM integration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%