entityshape


Nameentityshape JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/dpriskorn/entityshape
SummaryPython library to validate Wikidata items.
upload_time2023-06-25 15:26:51
maintainer
docs_urlNone
authorMark Tully aka Teester
requires_python>=3.8,<=3.12
licenseGPLv3+
keywords wikidata entityschema entity validation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # [Entityshape](https://www.wikidata.org/wiki/Q119899931)
A python library to compare a wikidata item with an entityschema

Based on https://github.com/Teester/entityshape by Mark Tully 
and https://github.com/dpriskorn/PyEntityshape by Dennis Priskorn

# Features
* compare a given wikidata item with an entityschema and dig into missing properties, too many statement, etc.
* determine whether an item is valid according to a certain schema or not

# Limitations
The shape and compareshape classes currently only support:
* cardinality (too many or not enough values)
* whether the property is allowed or not
* whether the value of a statement on a given property is correct/incorrect

It is still a bit unclear if and how the qualifier validation works.

# Installation
Get it from pypi

`$ pip install pyentityshape`

# Usage

## Jupyter Notebooks
Example notebooks with code for validation of multiple items: 
[hiking paths](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-hiking-paths-in-sweden.ipynb) 
[campsites](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-campsites-in-sweden.ipynb) 
[shelters](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-shelters-in-sweden.ipynb)

## CLI
Example:
```
e = EntityShape(eid="E1", lang="en", qid="Q1")
result = e.get_result()
# Get human readable result
print(result)
"Valid: False\nproperties_without_enough_correct_statements: {'P31'}"
# Access the data
print(result.properties_without_enough_correct_statements)
"{'P31'}"
```

## Validation
The is_valid method on the Result object mimics all red warnings displayed by https://www.wikidata.org/wiki/User:Teester/EntityShape.js 

It currently checks these five conditions that all have to be false for the item to be valid:
1.  properties with too many statements found
2.   incorrect statements found
3.   some required properties are missing
4.   properties without enough correct statements found
5.   statements with properties that are not allowed found

## Known working schemas
This library currently only supports a subset of all features in the ShEx specifikation.

The following Entity Schemas are known to work:
* [hiking path](https://www.wikidata.org/w/index.php?title=EntitySchema:E375&oldid=1833851062)
* [shelter](https://www.wikidata.org/w/index.php?title=EntitySchema:E398&oldid=1923235264)

# Background
This library is the glue between libraries like [Wikibase 
Integrator](https://github.com/LeMyst/WikibaseIntegrator/) and entityschemas. 

It makes it easy to batch check a whole subset of Wikidata 
items against a schema. Nice!

# TODO
The CompareShape and Shape classes should be rewritten using OOP 
and enums to avoid passing strings around because that is not 
nice to debug or maintain.

What do we want to know from the CompareShape class?

On the property level:
* whether the property is mandatory and present/missing

On the statement level
* whether the cardinality of values is allowed (min/max)
* whether the value(s) are correct/incorrect

Cases:
* mandatory property is missing
* optional property is missing (this is not invalidating)
* a property has an incorrect value
* a property has a correct value
* a property has too many values
* a property has not enough values
* ?

# ShEx Tip
When working on your Entity Schemas the constraints here are nice to know/remember
https://shex.io/shex-primer/#tripleConstraints

# Thanks
Big thanks to [Myst](https://github.com/LeMyst) and 
[Christian Clauss](https://github.com/cclauss) for 
advice and help with Ruff to make this better. 

# License
GPLv3+

# What I learned
* Forking other peoples undocumented spaghetti code is not much fun.
* I want to find a more reliable validator that support somevalue and novalue
* Pydantic is wonderful yet again it makes working with OOP easy peasy :)
* Ruff is crazy fast and very nice!
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dpriskorn/entityshape",
    "name": "entityshape",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<=3.12",
    "maintainer_email": "",
    "keywords": "wikidata,entityschema,entity validation",
    "author": "Mark Tully aka Teester",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/53/3f/476da3c44a40d50b2850534f02a2a77a41cfd576c31b7234e2135be5af64/entityshape-0.1.0.tar.gz",
    "platform": null,
    "description": "# [Entityshape](https://www.wikidata.org/wiki/Q119899931)\nA python library to compare a wikidata item with an entityschema\n\nBased on https://github.com/Teester/entityshape by Mark Tully \nand https://github.com/dpriskorn/PyEntityshape by Dennis Priskorn\n\n# Features\n* compare a given wikidata item with an entityschema and dig into missing properties, too many statement, etc.\n* determine whether an item is valid according to a certain schema or not\n\n# Limitations\nThe shape and compareshape classes currently only support:\n* cardinality (too many or not enough values)\n* whether the property is allowed or not\n* whether the value of a statement on a given property is correct/incorrect\n\nIt is still a bit unclear if and how the qualifier validation works.\n\n# Installation\nGet it from pypi\n\n`$ pip install pyentityshape`\n\n# Usage\n\n## Jupyter Notebooks\nExample notebooks with code for validation of multiple items: \n[hiking paths](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-hiking-paths-in-sweden.ipynb) \n[campsites](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-campsites-in-sweden.ipynb) \n[shelters](https://public-paws.wmcloud.org/User:So9q/Validating%20a%20group%20of%20items-all-shelters-in-sweden.ipynb)\n\n## CLI\nExample:\n```\ne = EntityShape(eid=\"E1\", lang=\"en\", qid=\"Q1\")\nresult = e.get_result()\n# Get human readable result\nprint(result)\n\"Valid: False\\nproperties_without_enough_correct_statements: {'P31'}\"\n# Access the data\nprint(result.properties_without_enough_correct_statements)\n\"{'P31'}\"\n```\n\n## Validation\nThe is_valid method on the Result object mimics all red warnings displayed by https://www.wikidata.org/wiki/User:Teester/EntityShape.js \n\nIt currently checks these five conditions that all have to be false for the item to be valid:\n1.  properties with too many statements found\n2.   incorrect statements found\n3.   some required properties are missing\n4.   properties without enough correct statements found\n5.   statements with properties that are not allowed found\n\n## Known working schemas\nThis library currently only supports a subset of all features in the ShEx specifikation.\n\nThe following Entity Schemas are known to work:\n* [hiking path](https://www.wikidata.org/w/index.php?title=EntitySchema:E375&oldid=1833851062)\n* [shelter](https://www.wikidata.org/w/index.php?title=EntitySchema:E398&oldid=1923235264)\n\n# Background\nThis library is the glue between libraries like [Wikibase \nIntegrator](https://github.com/LeMyst/WikibaseIntegrator/) and entityschemas. \n\nIt makes it easy to batch check a whole subset of Wikidata \nitems against a schema. Nice!\n\n# TODO\nThe CompareShape and Shape classes should be rewritten using OOP \nand enums to avoid passing strings around because that is not \nnice to debug or maintain.\n\nWhat do we want to know from the CompareShape class?\n\nOn the property level:\n* whether the property is mandatory and present/missing\n\nOn the statement level\n* whether the cardinality of values is allowed (min/max)\n* whether the value(s) are correct/incorrect\n\nCases:\n* mandatory property is missing\n* optional property is missing (this is not invalidating)\n* a property has an incorrect value\n* a property has a correct value\n* a property has too many values\n* a property has not enough values\n* ?\n\n# ShEx Tip\nWhen working on your Entity Schemas the constraints here are nice to know/remember\nhttps://shex.io/shex-primer/#tripleConstraints\n\n# Thanks\nBig thanks to [Myst](https://github.com/LeMyst) and \n[Christian Clauss](https://github.com/cclauss) for \nadvice and help with Ruff to make this better. \n\n# License\nGPLv3+\n\n# What I learned\n* Forking other peoples undocumented spaghetti code is not much fun.\n* I want to find a more reliable validator that support somevalue and novalue\n* Pydantic is wonderful yet again it makes working with OOP easy peasy :)\n* Ruff is crazy fast and very nice!",
    "bugtrack_url": null,
    "license": "GPLv3+",
    "summary": "Python library to validate Wikidata items.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/dpriskorn/entityshape",
        "Repository": "https://github.com/dpriskorn/entityshape"
    },
    "split_keywords": [
        "wikidata",
        "entityschema",
        "entity validation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c4b90a261180018aed6c34da28bdc86a424429c64a177d1a19933c3d3ac96212",
                "md5": "e655003913bb7f1bed0005cc0c8813c1",
                "sha256": "9d02176e195efb8be88d6a2e389c7b89c360a04ee2f135d3f070a7264a9c1f3d"
            },
            "downloads": -1,
            "filename": "entityshape-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e655003913bb7f1bed0005cc0c8813c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<=3.12",
            "size": 25288,
            "upload_time": "2023-06-25T15:26:49",
            "upload_time_iso_8601": "2023-06-25T15:26:49.709445Z",
            "url": "https://files.pythonhosted.org/packages/c4/b9/0a261180018aed6c34da28bdc86a424429c64a177d1a19933c3d3ac96212/entityshape-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "533f476da3c44a40d50b2850534f02a2a77a41cfd576c31b7234e2135be5af64",
                "md5": "53d052cea42ca52aecd2903155884394",
                "sha256": "bd85593c7b0c833f085b33b382800653bb9d5faefe35ad3c2156fb092b75ce23"
            },
            "downloads": -1,
            "filename": "entityshape-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "53d052cea42ca52aecd2903155884394",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<=3.12",
            "size": 48543,
            "upload_time": "2023-06-25T15:26:51",
            "upload_time_iso_8601": "2023-06-25T15:26:51.726909Z",
            "url": "https://files.pythonhosted.org/packages/53/3f/476da3c44a40d50b2850534f02a2a77a41cfd576c31b7234e2135be5af64/entityshape-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-25 15:26:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dpriskorn",
    "github_project": "entityshape",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "entityshape"
}
        
Elapsed time: 0.24991s