pprl-model


Namepprl-model JSON
Version 0.1.5 PyPI version JSON
download
home_pagehttps://github.com/ul-mds/pprl
SummaryData models for use with a HTTP-based service for privacy-preserving record linkage using Bloom filters.
upload_time2024-09-17 13:19:09
maintainerNone
docs_urlNone
authorMaximilian Jugl
requires_python<4.0,>=3.10
licenseMIT
keywords record linkage privacy bloom filter
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            This package contains model classes that are used in the PPRL service for validation purposes.
They have been conceived with the idea of an HTTP-based service for record linkage based on Bloom filters in mind.
It encompasses models for the service's data transformation, masking and bit vector matching routines.
[Pydantic](https://docs.pydantic.dev/latest/) is used for validation, serialization and deserialization.
This package is rarely to be used directly.
Instead, it is used by other packages to power their functionalities.

# Data models

Models for entity pre-processing, masking and bit vector matching are exposed through this package.
The following examples are taken from the test suites of the
[PPRL service package](https://github.com/ul-mds/pprl/tree/main/packages/pprl_service) and show additional
validation steps in addition to the ones native to Pydantic.

## Entity transformation

```python
from pprl_model import EntityTransformRequest, TransformConfig, EmptyValueHandling, AttributeValueEntity, \
    AttributeTransformerConfig, NumberTransformer, GlobalTransformerConfig, NormalizationTransformer, \
    CharacterFilterTransformer

# This is a valid config.
_ = EntityTransformRequest(
    config=TransformConfig(empty_value=EmptyValueHandling.ignore),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "bar1": "  12.345  ",
                "bar2": "  12.345  "
            }
        )
    ],
    attribute_transformers=[
        AttributeTransformerConfig(
            attribute_name="bar1",
            transformers=[
                NumberTransformer(decimal_places=2)
            ]
        )
    ],
    global_transformers=GlobalTransformerConfig(
        before=[
            NormalizationTransformer()
        ],
        after=[
            CharacterFilterTransformer(characters=".")
        ]
    )
)

from uuid import uuid4

# Validation will fail since no transformers have been defined.
_ = EntityTransformRequest(
    config=TransformConfig(empty_value=EmptyValueHandling.ignore),
    entities=[
        AttributeValueEntity(
            id=str(uuid4()),
            attributes={
                "foo": "bar"
            }
        )
    ],
    attribute_transformers=[]
)
# => ValidationError: attribute and global transformers are empty: must contain at least one
```

## Entity masking

```python
from pprl_model import EntityMaskRequest, MaskConfig, HashConfig, HashFunction, HashAlgorithm, \
    DoubleHash, CLKFilter, AttributeValueEntity, StaticAttributeConfig, AttributeSalt, CLKRBFFilter

# This is a valid config.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "John",
                "last_name": "Doe",
                "date_of_birth": "1987-06-05",
                "gender": "m"
            }
        )
    ]
)

# This is an invalid config since salting an attribute can only be done through a fixed value
# or another attribute on an entity, not both at the same time.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="first_name",
            salt=AttributeSalt(
                value="my_salt",
                attribute="salt"
            )
        )
    ]
)
# => ValidationError: value and attribute cannot be set at the same time

# This also fails if neither a static value nor an attribute are set for salting.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="first_name",
            salt=AttributeSalt()
        )
    ]
)
# => ValidationError: neither value nor attribute is set

# When using a weighted filter (RBF, CLKRBF), an error will be thrown if any attribute configuration 
# provided is static, not weighted. The same applies vice versa, meaning if CLK is specified as a filter and
# weighted attribute configurations are provided.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKRBFFilter(hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="first_name",
            salt=AttributeSalt(value="my_salt")
        )
    ]
)
# => ValidationError: `clkrbf` filters require weighted attribute configurations, but static ones were found

# Weighted filters (RBF, CLKRBF) always require weighted attribute configurations. If none
# are provided, validation fails.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKRBFFilter(hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ]
)
# => ValidationError: `clkrbf` filters require weighted attribute configurations, but none were found

# If a configuration is provided for an attribute that doesn't exist on some entities, validation fails.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="last_name",
            salt=AttributeSalt(value="my_salt")
        )
    ]
)
# => ValidationError: some configured attributes are not present on entities: `last_name` on entities with ID `001`
```

## Bit vector matching

```python
from pprl_model import VectorMatchRequest, MatchConfig, SimilarityMeasure, BitVectorEntity

_ = VectorMatchRequest(
    config=MatchConfig(
        measure=SimilarityMeasure.jaccard,
        threshold=0.8
    ),
    domain=[
        BitVectorEntity(
            id="D001",
            value="kY7yXn+rmp8L0nyGw5NlMw=="
        )
    ],
    range=[
        BitVectorEntity(
            id="R001",
            value="qig0C1i8YttqhPwo4VqLlg=="
        )
    ]
)
```

# License

MIT.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ul-mds/pprl",
    "name": "pprl-model",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "record linkage, privacy, bloom filter",
    "author": "Maximilian Jugl",
    "author_email": "Maximilian.Jugl@medizin.uni-leipzig.de",
    "download_url": "https://files.pythonhosted.org/packages/c4/55/dc05b98dc34c948e86a02b026da6faaededc7b2df4cd7b921b320eec1565/pprl_model-0.1.5.tar.gz",
    "platform": null,
    "description": "This package contains model classes that are used in the PPRL service for validation purposes.\nThey have been conceived with the idea of an HTTP-based service for record linkage based on Bloom filters in mind.\nIt encompasses models for the service's data transformation, masking and bit vector matching routines.\n[Pydantic](https://docs.pydantic.dev/latest/) is used for validation, serialization and deserialization.\nThis package is rarely to be used directly.\nInstead, it is used by other packages to power their functionalities.\n\n# Data models\n\nModels for entity pre-processing, masking and bit vector matching are exposed through this package.\nThe following examples are taken from the test suites of the\n[PPRL service package](https://github.com/ul-mds/pprl/tree/main/packages/pprl_service) and show additional\nvalidation steps in addition to the ones native to Pydantic.\n\n## Entity transformation\n\n```python\nfrom pprl_model import EntityTransformRequest, TransformConfig, EmptyValueHandling, AttributeValueEntity, \\\n    AttributeTransformerConfig, NumberTransformer, GlobalTransformerConfig, NormalizationTransformer, \\\n    CharacterFilterTransformer\n\n# This is a valid config.\n_ = EntityTransformRequest(\n    config=TransformConfig(empty_value=EmptyValueHandling.ignore),\n    entities=[\n        AttributeValueEntity(\n            id=\"001\",\n            attributes={\n                \"bar1\": \"  12.345  \",\n                \"bar2\": \"  12.345  \"\n            }\n        )\n    ],\n    attribute_transformers=[\n        AttributeTransformerConfig(\n            attribute_name=\"bar1\",\n            transformers=[\n                NumberTransformer(decimal_places=2)\n            ]\n        )\n    ],\n    global_transformers=GlobalTransformerConfig(\n        before=[\n            NormalizationTransformer()\n        ],\n        after=[\n            CharacterFilterTransformer(characters=\".\")\n        ]\n    )\n)\n\nfrom uuid import uuid4\n\n# Validation will fail since no transformers have been defined.\n_ = EntityTransformRequest(\n    config=TransformConfig(empty_value=EmptyValueHandling.ignore),\n    entities=[\n        AttributeValueEntity(\n            id=str(uuid4()),\n            attributes={\n                \"foo\": \"bar\"\n            }\n        )\n    ],\n    attribute_transformers=[]\n)\n# => ValidationError: attribute and global transformers are empty: must contain at least one\n```\n\n## Entity masking\n\n```python\nfrom pprl_model import EntityMaskRequest, MaskConfig, HashConfig, HashFunction, HashAlgorithm, \\\n    DoubleHash, CLKFilter, AttributeValueEntity, StaticAttributeConfig, AttributeSalt, CLKRBFFilter\n\n# This is a valid config.\n_ = EntityMaskRequest(\n    config=MaskConfig(\n        token_size=2,\n        hash=HashConfig(\n            function=HashFunction(algorithms=[HashAlgorithm.sha1]),\n            strategy=DoubleHash()\n        ),\n        filter=CLKFilter(filter_size=1024, hash_values=5),\n        padding=\"_\"\n    ),\n    entities=[\n        AttributeValueEntity(\n            id=\"001\",\n            attributes={\n                \"first_name\": \"John\",\n                \"last_name\": \"Doe\",\n                \"date_of_birth\": \"1987-06-05\",\n                \"gender\": \"m\"\n            }\n        )\n    ]\n)\n\n# This is an invalid config since salting an attribute can only be done through a fixed value\n# or another attribute on an entity, not both at the same time.\n_ = EntityMaskRequest(\n    config=MaskConfig(\n        token_size=2,\n        hash=HashConfig(\n            function=HashFunction(algorithms=[HashAlgorithm.sha1]),\n            strategy=DoubleHash()\n        ),\n        filter=CLKFilter(filter_size=1024, hash_values=5),\n        padding=\"_\"\n    ),\n    entities=[\n        AttributeValueEntity(\n            id=\"001\",\n            attributes={\n                \"first_name\": \"foobar\",\n                \"salt\": \"0123456789\"\n            }\n        )\n    ],\n    attributes=[\n        StaticAttributeConfig(\n            attribute_name=\"first_name\",\n            salt=AttributeSalt(\n                value=\"my_salt\",\n                attribute=\"salt\"\n            )\n        )\n    ]\n)\n# => ValidationError: value and attribute cannot be set at the same time\n\n# This also fails if neither a static value nor an attribute are set for salting.\n_ = EntityMaskRequest(\n    config=MaskConfig(\n        token_size=2,\n        hash=HashConfig(\n            function=HashFunction(algorithms=[HashAlgorithm.sha1]),\n            strategy=DoubleHash()\n        ),\n        filter=CLKFilter(filter_size=1024, hash_values=5),\n        padding=\"_\"\n    ),\n    entities=[\n        AttributeValueEntity(\n            id=\"001\",\n            attributes={\n                \"first_name\": \"foobar\",\n                \"salt\": \"0123456789\"\n            }\n        )\n    ],\n    attributes=[\n        StaticAttributeConfig(\n            attribute_name=\"first_name\",\n            salt=AttributeSalt()\n        )\n    ]\n)\n# => ValidationError: neither value nor attribute is set\n\n# When using a weighted filter (RBF, CLKRBF), an error will be thrown if any attribute configuration \n# provided is static, not weighted. The same applies vice versa, meaning if CLK is specified as a filter and\n# weighted attribute configurations are provided.\n_ = EntityMaskRequest(\n    config=MaskConfig(\n        token_size=2,\n        hash=HashConfig(\n            function=HashFunction(algorithms=[HashAlgorithm.sha1]),\n            strategy=DoubleHash()\n        ),\n        filter=CLKRBFFilter(hash_values=5),\n        padding=\"_\"\n    ),\n    entities=[\n        AttributeValueEntity(\n            id=\"001\",\n            attributes={\n                \"first_name\": \"foobar\",\n                \"salt\": \"0123456789\"\n            }\n        )\n    ],\n    attributes=[\n        StaticAttributeConfig(\n            attribute_name=\"first_name\",\n            salt=AttributeSalt(value=\"my_salt\")\n        )\n    ]\n)\n# => ValidationError: `clkrbf` filters require weighted attribute configurations, but static ones were found\n\n# Weighted filters (RBF, CLKRBF) always require weighted attribute configurations. If none\n# are provided, validation fails.\n_ = EntityMaskRequest(\n    config=MaskConfig(\n        token_size=2,\n        hash=HashConfig(\n            function=HashFunction(algorithms=[HashAlgorithm.sha1]),\n            strategy=DoubleHash()\n        ),\n        filter=CLKRBFFilter(hash_values=5),\n        padding=\"_\"\n    ),\n    entities=[\n        AttributeValueEntity(\n            id=\"001\",\n            attributes={\n                \"first_name\": \"foobar\",\n                \"salt\": \"0123456789\"\n            }\n        )\n    ]\n)\n# => ValidationError: `clkrbf` filters require weighted attribute configurations, but none were found\n\n# If a configuration is provided for an attribute that doesn't exist on some entities, validation fails.\n_ = EntityMaskRequest(\n    config=MaskConfig(\n        token_size=2,\n        hash=HashConfig(\n            function=HashFunction(algorithms=[HashAlgorithm.sha1]),\n            strategy=DoubleHash()\n        ),\n        filter=CLKFilter(filter_size=1024, hash_values=5),\n        padding=\"_\"\n    ),\n    entities=[\n        AttributeValueEntity(\n            id=\"001\",\n            attributes={\n                \"first_name\": \"foobar\"\n            }\n        )\n    ],\n    attributes=[\n        StaticAttributeConfig(\n            attribute_name=\"last_name\",\n            salt=AttributeSalt(value=\"my_salt\")\n        )\n    ]\n)\n# => ValidationError: some configured attributes are not present on entities: `last_name` on entities with ID `001`\n```\n\n## Bit vector matching\n\n```python\nfrom pprl_model import VectorMatchRequest, MatchConfig, SimilarityMeasure, BitVectorEntity\n\n_ = VectorMatchRequest(\n    config=MatchConfig(\n        measure=SimilarityMeasure.jaccard,\n        threshold=0.8\n    ),\n    domain=[\n        BitVectorEntity(\n            id=\"D001\",\n            value=\"kY7yXn+rmp8L0nyGw5NlMw==\"\n        )\n    ],\n    range=[\n        BitVectorEntity(\n            id=\"R001\",\n            value=\"qig0C1i8YttqhPwo4VqLlg==\"\n        )\n    ]\n)\n```\n\n# License\n\nMIT.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Data models for use with a HTTP-based service for privacy-preserving record linkage using Bloom filters.",
    "version": "0.1.5",
    "project_urls": {
        "Homepage": "https://github.com/ul-mds/pprl",
        "Repository": "https://github.com/ul-mds/pprl"
    },
    "split_keywords": [
        "record linkage",
        " privacy",
        " bloom filter"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c5e9eb38328c9988a2dcd24d507047375a63607765bd1cd4347e3084542cf89a",
                "md5": "cb3e7b684f038b216caefc2211c58eb9",
                "sha256": "2292c4587904a28b0786074ea5ea4fb5c88a7e82c676a57df034dc0902e3d383"
            },
            "downloads": -1,
            "filename": "pprl_model-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cb3e7b684f038b216caefc2211c58eb9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 8872,
            "upload_time": "2024-09-17T13:19:07",
            "upload_time_iso_8601": "2024-09-17T13:19:07.326488Z",
            "url": "https://files.pythonhosted.org/packages/c5/e9/eb38328c9988a2dcd24d507047375a63607765bd1cd4347e3084542cf89a/pprl_model-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c455dc05b98dc34c948e86a02b026da6faaededc7b2df4cd7b921b320eec1565",
                "md5": "ab560de8d94d2730864c27ed78a47090",
                "sha256": "84132d2a8b387f48b7122ce450781c9510e5644e0ff5940ef02ff87c67d641bf"
            },
            "downloads": -1,
            "filename": "pprl_model-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "ab560de8d94d2730864c27ed78a47090",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 8416,
            "upload_time": "2024-09-17T13:19:09",
            "upload_time_iso_8601": "2024-09-17T13:19:09.066584Z",
            "url": "https://files.pythonhosted.org/packages/c4/55/dc05b98dc34c948e86a02b026da6faaededc7b2df4cd7b921b320eec1565/pprl_model-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-17 13:19:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ul-mds",
    "github_project": "pprl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pprl-model"
}
        
Elapsed time: 0.38064s