fingerprints


Namefingerprints JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttp://github.com/alephdata/fingerprints
SummaryA library to generate entity fingerprints.
upload_time2021-02-24 10:57:58
maintainer
docs_urlNone
authorFriedrich Lindenberg
requires_python
licenseMIT
keywords names people companies normalisation iso20275
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fingerprints

![package](https://github.com/alephdata/fingerprints/workflows/package/badge.svg)

This library helps with the generation of fingerprints for entity data. A fingerprint
in this context is understood as a simplified entity identifier, derived from it's
name or address and used for cross-referencing of entity across different datasets.

## Usage

```python
import fingerprints

fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'

fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'

fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
```

## Company type names

A significant part of what `fingerprints` does it to recognize company legal form
names. For example, `fingerprints` will be able to simplify `Общество с ограниченной ответственностью` to `ООО`, or `Aktiengesellschaft` to `AG`. The required database
is based on two different sources:

* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.
* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)

Wikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).

## See also

* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.
* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.




            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/alephdata/fingerprints",
    "name": "fingerprints",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "names people companies normalisation iso20275",
    "author": "Friedrich Lindenberg",
    "author_email": "friedrich@pudo.org",
    "download_url": "https://files.pythonhosted.org/packages/86/02/64e9cf0f71aca6cd133528c9315732ea8ac8a0011552d91360446d1da411/fingerprints-1.0.3.tar.gz",
    "platform": "",
    "description": "# fingerprints\n\n![package](https://github.com/alephdata/fingerprints/workflows/package/badge.svg)\n\nThis library helps with the generation of fingerprints for entity data. A fingerprint\nin this context is understood as a simplified entity identifier, derived from it's\nname or address and used for cross-referencing of entity across different datasets.\n\n## Usage\n\n```python\nimport fingerprints\n\nfp = fingerprints.generate('Mr. Sherlock Holmes')\nassert fp == 'holmes sherlock'\n\nfp = fingerprints.generate('Siemens Aktiengesellschaft')\nassert fp == 'ag siemens'\n\nfp = fingerprints.generate('New York, New York')\nassert fp == 'new york'\n```\n\n## Company type names\n\nA significant part of what `fingerprints` does it to recognize company legal form\nnames. For example, `fingerprints` will be able to simplify `\u041e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0441 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043d\u043e\u0439 \u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0441\u0442\u044c\u044e` to `\u041e\u041e\u041e`, or `Aktiengesellschaft` to `AG`. The required database\nis based on two different sources:\n\n* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.\n* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)\n\nWikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).\n\n## See also\n\n* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.\n* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library to generate entity fingerprints.",
    "version": "1.0.3",
    "split_keywords": [
        "names",
        "people",
        "companies",
        "normalisation",
        "iso20275"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "6f884098a8ac2d54a623b57db431115c",
                "sha256": "9d485aec44fbeeeda1e712f661cc6d96aa40e282d48c411e8d3175ea14742c6a"
            },
            "downloads": -1,
            "filename": "fingerprints-1.0.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6f884098a8ac2d54a623b57db431115c",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 13231,
            "upload_time": "2021-02-24T10:57:56",
            "upload_time_iso_8601": "2021-02-24T10:57:56.390786Z",
            "url": "https://files.pythonhosted.org/packages/ad/17/309d6bff8ad23902be7a75c8dc7137c608456f09bd999da7f58f2c626be7/fingerprints-1.0.3-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "0a4c56542198a851f10b831a41bd2113",
                "sha256": "cafd5f92b5b91e4ce34af2b954da9c05b448a4778947785abb19a14f363352d0"
            },
            "downloads": -1,
            "filename": "fingerprints-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "0a4c56542198a851f10b831a41bd2113",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 13232,
            "upload_time": "2021-02-24T10:57:58",
            "upload_time_iso_8601": "2021-02-24T10:57:58.099392Z",
            "url": "https://files.pythonhosted.org/packages/86/02/64e9cf0f71aca6cd133528c9315732ea8ac8a0011552d91360446d1da411/fingerprints-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-02-24 10:57:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "alephdata",
    "error": "Could not fetch GitHub repository",
    "lcname": "fingerprints"
}
        
Elapsed time: 0.20188s