sz_semantics


Namesz_semantics JSON
Version 1.2.3 PyPI version JSON
download
home_pagehttps://github.com/senzing-garage/sz-semantics
SummaryTransform JSON output from Senzing SDK for use with graph technologies, semantics, and downstream LLM integration
upload_time2025-11-01 20:31:30
maintainerNone
docs_urlNone
authorPaco Nathan
requires_python>=3.11
licenseMIT
keywords context-engineering data-privacy entity-resolution entity-resolved-knowledge-graph grpc ontology rdf semantic-layer semantics skos taxonomy thesaurus
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # sz_semantics

Transform JSON output from the [Senzing SDK](https://senzing.com/docs/python/)
for use with graph technologies, semantics, and downstream LLM integration.


## Install

This library uses [`poetry`](https://python-poetry.org/docs/) for
demos:

```bash
poetry update
```

Otherwise, to use the library:

```bash
pip install sz_sematics
```

For the [gRCP server](https://github.com/senzing-garage/serve-grpc), 
if you don't already have Senzing and its gRPC server otherwise
installed pull the latest Docker container:

```bash
docker pull senzing/serve-grpc:latest
```


## Usage: Masking PII

Mask the PII values within Senzing JSON output with tokens which can
be substituted back later. For example, _mask_ PII values before
calling a remote service (such as an LLM-based chat) then _unmask_
returned text after the roundtrip, to maintain _data privacy_.

```python
import json
from sz_semantics import Mask

data: dict = { "ENTITY_NAME": "Robert Smith" }

sz_mask: Mask = Mask()
masked_data: dict = sz_mask.mask_data(data)

masked_text: str = json.dumps(masked_data)
print(masked_text)

unmasked: str = sz_mask.unmask_text(masked_text)
print(unmasked)
```

For an example, run the `demo1.py` script with a data file which
captures Senzing JSON output:

```bash
poetry run python3 demo1.py data/get.json
```

The two lists `Mask.KNOWN_KEYS` and `Mask.MASKED_KEYS` enumerate
respectively the:

  * keys for known elements which do not require masking
  * keys for PII elements which require masking

Any other keys encountered will be masked by default and reported as
warnings in the logging. Adjust these lists as needed for a given use
case.

For work with large numbers of entities, subclass `KeyValueStore` to
provide a distributed key/value store (other than the Python built-in
`dict` default) to use for scale-out.


## Usage: Semantic Represenation

Starting with a small [SKOS-based taxonomy](https://www.w3.org/2004/02/skos/)
in the `domain.ttl` file, parse the Senzing
[_entity resolution_](https://senzing.com/what-is-entity-resolution/)
(ER) results to generate an 
[`RDFlib`](https://rdflib.readthedocs.io/) _semantic graph_.

In other words, generate the "backbone" for constructing an
[_Entity Resolved Knowledge Graph_](https://senzing.com/entity-resolved-knowledge-graphs/),
as a core componet of a
[_semantic layer_](https://enterprise-knowledge.com/what-is-a-semantic-layer-components-and-enterprise-applications/).

The example code below serializes the _thesaurus_ generated from
Senzing ER results as `"thesaurus.ttl"` combined with the Senzing
_taxonomy_ definitions, which can be used for constructing knowledge
graphs:

```python
import pathlib
from sz_semantics import Thesaurus

thesaurus: Thesaurus = Thesaurus()
thesaurus.load_source(Thesaurus.DOMAIN_TTL)

export_path: pathlib.Path = pathlib.Path("data/truth/export.json")

with open(export_path, "r", encoding = "utf-8") as fp_json:
    for line in fp_json:
        for rdf_frag in thesaurus.parse_iter(line, language = "en"):
            thesaurus.load_source_text(
                Thesaurus.RDF_PREAMBLE + rdf_frag,
                format = "turtle",
            )

thesaurus_path: pathlib.Path = pathlib.Path("thesaurus.ttl")
thesaurus.save_source(thesaurus_path, format = "turtle")
```

For an example, run the `demo2.py` script to process the JSON file
`data/export.json` which captures Senzing ER exported results:

```bash
poetry run python3 demo2.py
```

Then check the RDF definitions in the generated `thesaurus.ttl` file.


## Usage: gRPC Client/Server

For a demo of `SzClient` to simplify accessing the Senzing SDK via a
gRPC server, then running _entity resolution_ on the "truthset"
collection of sample datasets, first launch this container and have it
running in the background:

```bash
docker run -it --publish 8261:8261 --rm senzing/serve-grpc
```

Then run:

```bash
poetry run python3 demo3.py
```

Restart the container each time before re-running the `demo3.py`
script.

---

![](./assets/mask.png)

---

<details>
  <summary>License and Copyright</summary>

Source code for `sz_semantics` plus any logo, documentation, and
examples have an [MIT license](https://spdx.org/licenses/MIT.html)
which is succinct and simplifies use in commercial applications.

All materials herein are Copyright © 2025 Senzing, Inc.
</details>

Kudos to 
[@brianmacy](https://github.com/brianmacy),
[@jbutcher21](https://github.com/jbutcher21),
[@docktermj](https://github.com/docktermj),
[@cj2001](https://github.com/cj2001),
[@503jmt](https://github.com/503jmt),
and the kind folks at [GraphGeeks](https://graphgeeks.org/) for their support.
</details>


## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=senzing-garage/sz-semantics&type=Date)](https://star-history.com/#senzing-garage/sz-semantics&Date)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/senzing-garage/sz-semantics",
    "name": "sz_semantics",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "context-engineering, data-privacy, entity-resolution, entity-resolved-knowledge-graph, grpc, ontology, rdf, semantic-layer, semantics, skos, taxonomy, thesaurus",
    "author": "Paco Nathan",
    "author_email": "paco@senzing.com",
    "download_url": "https://files.pythonhosted.org/packages/ff/74/8f0d2495fca4d6170b50833ae2d81133081ae7911f30d840eabcbf818fc6/sz_semantics-1.2.3.tar.gz",
    "platform": null,
    "description": "# sz_semantics\n\nTransform JSON output from the [Senzing SDK](https://senzing.com/docs/python/)\nfor use with graph technologies, semantics, and downstream LLM integration.\n\n\n## Install\n\nThis library uses [`poetry`](https://python-poetry.org/docs/) for\ndemos:\n\n```bash\npoetry update\n```\n\nOtherwise, to use the library:\n\n```bash\npip install sz_sematics\n```\n\nFor the [gRCP server](https://github.com/senzing-garage/serve-grpc), \nif you don't already have Senzing and its gRPC server otherwise\ninstalled pull the latest Docker container:\n\n```bash\ndocker pull senzing/serve-grpc:latest\n```\n\n\n## Usage: Masking PII\n\nMask the PII values within Senzing JSON output with tokens which can\nbe substituted back later. For example, _mask_ PII values before\ncalling a remote service (such as an LLM-based chat) then _unmask_\nreturned text after the roundtrip, to maintain _data privacy_.\n\n```python\nimport json\nfrom sz_semantics import Mask\n\ndata: dict = { \"ENTITY_NAME\": \"Robert Smith\" }\n\nsz_mask: Mask = Mask()\nmasked_data: dict = sz_mask.mask_data(data)\n\nmasked_text: str = json.dumps(masked_data)\nprint(masked_text)\n\nunmasked: str = sz_mask.unmask_text(masked_text)\nprint(unmasked)\n```\n\nFor an example, run the `demo1.py` script with a data file which\ncaptures Senzing JSON output:\n\n```bash\npoetry run python3 demo1.py data/get.json\n```\n\nThe two lists `Mask.KNOWN_KEYS` and `Mask.MASKED_KEYS` enumerate\nrespectively the:\n\n  * keys for known elements which do not require masking\n  * keys for PII elements which require masking\n\nAny other keys encountered will be masked by default and reported as\nwarnings in the logging. Adjust these lists as needed for a given use\ncase.\n\nFor work with large numbers of entities, subclass `KeyValueStore` to\nprovide a distributed key/value store (other than the Python built-in\n`dict` default) to use for scale-out.\n\n\n## Usage: Semantic Represenation\n\nStarting with a small [SKOS-based taxonomy](https://www.w3.org/2004/02/skos/)\nin the `domain.ttl` file, parse the Senzing\n[_entity resolution_](https://senzing.com/what-is-entity-resolution/)\n(ER) results to generate an \n[`RDFlib`](https://rdflib.readthedocs.io/) _semantic graph_.\n\nIn other words, generate the \"backbone\" for constructing an\n[_Entity Resolved Knowledge Graph_](https://senzing.com/entity-resolved-knowledge-graphs/),\nas a core componet of a\n[_semantic layer_](https://enterprise-knowledge.com/what-is-a-semantic-layer-components-and-enterprise-applications/).\n\nThe example code below serializes the _thesaurus_ generated from\nSenzing ER results as `\"thesaurus.ttl\"` combined with the Senzing\n_taxonomy_ definitions, which can be used for constructing knowledge\ngraphs:\n\n```python\nimport pathlib\nfrom sz_semantics import Thesaurus\n\nthesaurus: Thesaurus = Thesaurus()\nthesaurus.load_source(Thesaurus.DOMAIN_TTL)\n\nexport_path: pathlib.Path = pathlib.Path(\"data/truth/export.json\")\n\nwith open(export_path, \"r\", encoding = \"utf-8\") as fp_json:\n    for line in fp_json:\n        for rdf_frag in thesaurus.parse_iter(line, language = \"en\"):\n            thesaurus.load_source_text(\n                Thesaurus.RDF_PREAMBLE + rdf_frag,\n                format = \"turtle\",\n            )\n\nthesaurus_path: pathlib.Path = pathlib.Path(\"thesaurus.ttl\")\nthesaurus.save_source(thesaurus_path, format = \"turtle\")\n```\n\nFor an example, run the `demo2.py` script to process the JSON file\n`data/export.json` which captures Senzing ER exported results:\n\n```bash\npoetry run python3 demo2.py\n```\n\nThen check the RDF definitions in the generated `thesaurus.ttl` file.\n\n\n## Usage: gRPC Client/Server\n\nFor a demo of `SzClient` to simplify accessing the Senzing SDK via a\ngRPC server, then running _entity resolution_ on the \"truthset\"\ncollection of sample datasets, first launch this container and have it\nrunning in the background:\n\n```bash\ndocker run -it --publish 8261:8261 --rm senzing/serve-grpc\n```\n\nThen run:\n\n```bash\npoetry run python3 demo3.py\n```\n\nRestart the container each time before re-running the `demo3.py`\nscript.\n\n---\n\n![](./assets/mask.png)\n\n---\n\n<details>\n  <summary>License and Copyright</summary>\n\nSource code for `sz_semantics` plus any logo, documentation, and\nexamples have an [MIT license](https://spdx.org/licenses/MIT.html)\nwhich is succinct and simplifies use in commercial applications.\n\nAll materials herein are Copyright \u00a9 2025 Senzing, Inc.\n</details>\n\nKudos to \n[@brianmacy](https://github.com/brianmacy),\n[@jbutcher21](https://github.com/jbutcher21),\n[@docktermj](https://github.com/docktermj),\n[@cj2001](https://github.com/cj2001),\n[@503jmt](https://github.com/503jmt),\nand the kind folks at [GraphGeeks](https://graphgeeks.org/) for their support.\n</details>\n\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=senzing-garage/sz-semantics&type=Date)](https://star-history.com/#senzing-garage/sz-semantics&Date)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Transform JSON output from Senzing SDK for use with graph technologies, semantics, and downstream LLM integration",
    "version": "1.2.3",
    "project_urls": {
        "Homepage": "https://github.com/senzing-garage/sz-semantics",
        "package": "https://pypi.org/project/sz_semantics/",
        "semantics": "https://github.com/senzing-garage/sz-semantics/wiki/ns"
    },
    "split_keywords": [
        "context-engineering",
        " data-privacy",
        " entity-resolution",
        " entity-resolved-knowledge-graph",
        " grpc",
        " ontology",
        " rdf",
        " semantic-layer",
        " semantics",
        " skos",
        " taxonomy",
        " thesaurus"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c06bb26c61f67bd32e607f2e1fd986b492b9b28fff45a8fe9da015b51d647e1",
                "md5": "efdf3688353d32a230b855ed55035c3e",
                "sha256": "68c8aedc278bc77d44b359838dd3edc98aa0544690ce050734c3f493123885d3"
            },
            "downloads": -1,
            "filename": "sz_semantics-1.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "efdf3688353d32a230b855ed55035c3e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 13468,
            "upload_time": "2025-11-01T20:31:29",
            "upload_time_iso_8601": "2025-11-01T20:31:29.344415Z",
            "url": "https://files.pythonhosted.org/packages/8c/06/bb26c61f67bd32e607f2e1fd986b492b9b28fff45a8fe9da015b51d647e1/sz_semantics-1.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ff748f0d2495fca4d6170b50833ae2d81133081ae7911f30d840eabcbf818fc6",
                "md5": "ea61579f7205a34da0ece157b4cce351",
                "sha256": "7e96505836833ce60e61f00a2422d9fb1286a460c50cffe489edbb572e30438b"
            },
            "downloads": -1,
            "filename": "sz_semantics-1.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "ea61579f7205a34da0ece157b4cce351",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 12467,
            "upload_time": "2025-11-01T20:31:30",
            "upload_time_iso_8601": "2025-11-01T20:31:30.585562Z",
            "url": "https://files.pythonhosted.org/packages/ff/74/8f0d2495fca4d6170b50833ae2d81133081ae7911f30d840eabcbf818fc6/sz_semantics-1.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-01 20:31:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "senzing-garage",
    "github_project": "sz-semantics",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sz_semantics"
}
        
Elapsed time: 1.51994s