# fingerprints

This library helps with the generation of fingerprints for entity data. A fingerprint
in this context is understood as a simplified entity identifier, derived from it's
name or address and used for cross-referencing of entity across different datasets.
## Usage
```python
import fingerprints
fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'
fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'
fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
```
## Company type names
A significant part of what `fingerprints` does it to recognize company legal form
names. For example, `fingerprints` will be able to simplify `Общество с ограниченной ответственностью` to `ООО`, or `Aktiengesellschaft` to `AG`. The required database
is based on two different sources:
* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.
* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)
Wikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).
## See also
* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.
* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.
Raw data
{
"_id": null,
"home_page": "http://github.com/alephdata/fingerprints",
"name": "fingerprints",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "names people companies normalisation iso20275",
"author": "Friedrich Lindenberg",
"author_email": "friedrich@pudo.org",
"download_url": "https://files.pythonhosted.org/packages/86/02/64e9cf0f71aca6cd133528c9315732ea8ac8a0011552d91360446d1da411/fingerprints-1.0.3.tar.gz",
"platform": "",
"description": "# fingerprints\n\n\n\nThis library helps with the generation of fingerprints for entity data. A fingerprint\nin this context is understood as a simplified entity identifier, derived from it's\nname or address and used for cross-referencing of entity across different datasets.\n\n## Usage\n\n```python\nimport fingerprints\n\nfp = fingerprints.generate('Mr. Sherlock Holmes')\nassert fp == 'holmes sherlock'\n\nfp = fingerprints.generate('Siemens Aktiengesellschaft')\nassert fp == 'ag siemens'\n\nfp = fingerprints.generate('New York, New York')\nassert fp == 'new york'\n```\n\n## Company type names\n\nA significant part of what `fingerprints` does it to recognize company legal form\nnames. For example, `fingerprints` will be able to simplify `\u041e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0441 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043d\u043e\u0439 \u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0441\u0442\u044c\u044e` to `\u041e\u041e\u041e`, or `Aktiengesellschaft` to `AG`. The required database\nis based on two different sources:\n\n* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.\n* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)\n\nWikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).\n\n## See also\n\n* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.\n* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.\n\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A library to generate entity fingerprints.",
"version": "1.0.3",
"split_keywords": [
"names",
"people",
"companies",
"normalisation",
"iso20275"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "6f884098a8ac2d54a623b57db431115c",
"sha256": "9d485aec44fbeeeda1e712f661cc6d96aa40e282d48c411e8d3175ea14742c6a"
},
"downloads": -1,
"filename": "fingerprints-1.0.3-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "6f884098a8ac2d54a623b57db431115c",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 13231,
"upload_time": "2021-02-24T10:57:56",
"upload_time_iso_8601": "2021-02-24T10:57:56.390786Z",
"url": "https://files.pythonhosted.org/packages/ad/17/309d6bff8ad23902be7a75c8dc7137c608456f09bd999da7f58f2c626be7/fingerprints-1.0.3-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "0a4c56542198a851f10b831a41bd2113",
"sha256": "cafd5f92b5b91e4ce34af2b954da9c05b448a4778947785abb19a14f363352d0"
},
"downloads": -1,
"filename": "fingerprints-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "0a4c56542198a851f10b831a41bd2113",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13232,
"upload_time": "2021-02-24T10:57:58",
"upload_time_iso_8601": "2021-02-24T10:57:58.099392Z",
"url": "https://files.pythonhosted.org/packages/86/02/64e9cf0f71aca6cd133528c9315732ea8ac8a0011552d91360446d1da411/fingerprints-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-02-24 10:57:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": null,
"github_project": "alephdata",
"error": "Could not fetch GitHub repository",
"lcname": "fingerprints"
}