# fingerprints
![package](https://github.com/alephdata/fingerprints/workflows/package/badge.svg)
This library helps with the generation of fingerprints for entity data. A fingerprint
in this context is understood as a simplified entity identifier, derived from it's
name or address and used for cross-referencing of entity across different datasets.
## Usage
```python
import fingerprints
fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'
fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'
fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
```
## Company type names
A significant part of what `fingerprints` does it to recognize company legal form
names. For example, `fingerprints` will be able to simplify `Общество с ограниченной ответственностью` to `ООО`, or `Aktiengesellschaft` to `AG`. The required database
is based on two different sources:
* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.
* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)
Wikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).
## See also
* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.
* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.
Raw data
{
"_id": null,
"home_page": "http://github.com/alephdata/fingerprints",
"name": "fingerprints",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "names people companies normalisation iso20275",
"author": "Friedrich Lindenberg",
"author_email": "friedrich@pudo.org",
"download_url": "https://files.pythonhosted.org/packages/cb/17/292aab0190d8c80647ad0961c3fb9830016541b3d54fa4a67b5327f4e922/fingerprints-1.2.3.tar.gz",
"platform": null,
"description": "# fingerprints\n\n![package](https://github.com/alephdata/fingerprints/workflows/package/badge.svg)\n\nThis library helps with the generation of fingerprints for entity data. A fingerprint\nin this context is understood as a simplified entity identifier, derived from it's\nname or address and used for cross-referencing of entity across different datasets.\n\n## Usage\n\n```python\nimport fingerprints\n\nfp = fingerprints.generate('Mr. Sherlock Holmes')\nassert fp == 'holmes sherlock'\n\nfp = fingerprints.generate('Siemens Aktiengesellschaft')\nassert fp == 'ag siemens'\n\nfp = fingerprints.generate('New York, New York')\nassert fp == 'new york'\n```\n\n## Company type names\n\nA significant part of what `fingerprints` does it to recognize company legal form\nnames. For example, `fingerprints` will be able to simplify `\u041e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0441 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043d\u043e\u0439 \u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0441\u0442\u044c\u044e` to `\u041e\u041e\u041e`, or `Aktiengesellschaft` to `AG`. The required database\nis based on two different sources:\n\n* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.\n* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)\n\nWikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).\n\n## See also\n\n* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.\n* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A library to generate entity fingerprints.",
"version": "1.2.3",
"project_urls": {
"Homepage": "http://github.com/alephdata/fingerprints"
},
"split_keywords": [
"names",
"people",
"companies",
"normalisation",
"iso20275"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7d2b24a2675458df250e144174b0d18d70ee031eed5c108256200a68aaf087f9",
"md5": "1cbf3b18cc050d65ad47eb17cac0a0e0",
"sha256": "b8f83ad13dcdadce94903383db3b9b062b85a3a86f54f9e26d8faa97313f20bf"
},
"downloads": -1,
"filename": "fingerprints-1.2.3-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "1cbf3b18cc050d65ad47eb17cac0a0e0",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 17125,
"upload_time": "2023-10-06T06:07:34",
"upload_time_iso_8601": "2023-10-06T06:07:34.226507Z",
"url": "https://files.pythonhosted.org/packages/7d/2b/24a2675458df250e144174b0d18d70ee031eed5c108256200a68aaf087f9/fingerprints-1.2.3-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cb17292aab0190d8c80647ad0961c3fb9830016541b3d54fa4a67b5327f4e922",
"md5": "fe48d853531d972371d9e1bb1879182d",
"sha256": "1719f808ec8dd6c7b32c79129be3cc77dc2d2258008cd0236654862a86a78b97"
},
"downloads": -1,
"filename": "fingerprints-1.2.3.tar.gz",
"has_sig": false,
"md5_digest": "fe48d853531d972371d9e1bb1879182d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16315,
"upload_time": "2023-10-06T06:07:35",
"upload_time_iso_8601": "2023-10-06T06:07:35.950687Z",
"url": "https://files.pythonhosted.org/packages/cb/17/292aab0190d8c80647ad0961c3fb9830016541b3d54fa4a67b5327f4e922/fingerprints-1.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-06 06:07:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alephdata",
"github_project": "fingerprints",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "fingerprints"
}