fingerprints


Namefingerprints JSON
Version 1.2.3 PyPI version JSON
download
home_pagehttp://github.com/alephdata/fingerprints
SummaryA library to generate entity fingerprints.
upload_time2023-10-06 06:07:35
maintainer
docs_urlNone
authorFriedrich Lindenberg
requires_python
licenseMIT
keywords names people companies normalisation iso20275
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fingerprints

![package](https://github.com/alephdata/fingerprints/workflows/package/badge.svg)

This library helps with the generation of fingerprints for entity data. A fingerprint
in this context is understood as a simplified entity identifier, derived from it's
name or address and used for cross-referencing of entity across different datasets.

## Usage

```python
import fingerprints

fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'

fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'

fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
```

## Company type names

A significant part of what `fingerprints` does it to recognize company legal form
names. For example, `fingerprints` will be able to simplify `Общество с ограниченной ответственностью` to `ООО`, or `Aktiengesellschaft` to `AG`. The required database
is based on two different sources:

* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.
* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)

Wikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).

## See also

* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.
* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.


            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/alephdata/fingerprints",
    "name": "fingerprints",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "names people companies normalisation iso20275",
    "author": "Friedrich Lindenberg",
    "author_email": "friedrich@pudo.org",
    "download_url": "https://files.pythonhosted.org/packages/cb/17/292aab0190d8c80647ad0961c3fb9830016541b3d54fa4a67b5327f4e922/fingerprints-1.2.3.tar.gz",
    "platform": null,
    "description": "# fingerprints\n\n![package](https://github.com/alephdata/fingerprints/workflows/package/badge.svg)\n\nThis library helps with the generation of fingerprints for entity data. A fingerprint\nin this context is understood as a simplified entity identifier, derived from it's\nname or address and used for cross-referencing of entity across different datasets.\n\n## Usage\n\n```python\nimport fingerprints\n\nfp = fingerprints.generate('Mr. Sherlock Holmes')\nassert fp == 'holmes sherlock'\n\nfp = fingerprints.generate('Siemens Aktiengesellschaft')\nassert fp == 'ag siemens'\n\nfp = fingerprints.generate('New York, New York')\nassert fp == 'new york'\n```\n\n## Company type names\n\nA significant part of what `fingerprints` does it to recognize company legal form\nnames. For example, `fingerprints` will be able to simplify `\u041e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0441 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043d\u043e\u0439 \u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0441\u0442\u044c\u044e` to `\u041e\u041e\u041e`, or `Aktiengesellschaft` to `AG`. The required database\nis based on two different sources:\n\n* A [Google Spreadsheet](https://docs.google.com/spreadsheets/d/1Cw2xQ3hcZOAgnnzejlY5Sv3OeMxKePTqcRhXQU8rCAw/edit?ts=5e7754cf#gid=0) created by OCCRP.\n* The ISO 20275: [Entity Legal Forms Code List](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list)\n\nWikipedia also maintains an index of [types of business entity](https://en.wikipedia.org/wiki/Types_of_business_entity).\n\n## See also\n\n* [Clustering in Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth), part of the OpenRefine documentation discussing how to create collisions in data clustering.\n* [probablepeople](https://github.com/datamade/probablepeople), parser for western names made by the brilliant folks at datamade.us.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library to generate entity fingerprints.",
    "version": "1.2.3",
    "project_urls": {
        "Homepage": "http://github.com/alephdata/fingerprints"
    },
    "split_keywords": [
        "names",
        "people",
        "companies",
        "normalisation",
        "iso20275"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7d2b24a2675458df250e144174b0d18d70ee031eed5c108256200a68aaf087f9",
                "md5": "1cbf3b18cc050d65ad47eb17cac0a0e0",
                "sha256": "b8f83ad13dcdadce94903383db3b9b062b85a3a86f54f9e26d8faa97313f20bf"
            },
            "downloads": -1,
            "filename": "fingerprints-1.2.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1cbf3b18cc050d65ad47eb17cac0a0e0",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 17125,
            "upload_time": "2023-10-06T06:07:34",
            "upload_time_iso_8601": "2023-10-06T06:07:34.226507Z",
            "url": "https://files.pythonhosted.org/packages/7d/2b/24a2675458df250e144174b0d18d70ee031eed5c108256200a68aaf087f9/fingerprints-1.2.3-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb17292aab0190d8c80647ad0961c3fb9830016541b3d54fa4a67b5327f4e922",
                "md5": "fe48d853531d972371d9e1bb1879182d",
                "sha256": "1719f808ec8dd6c7b32c79129be3cc77dc2d2258008cd0236654862a86a78b97"
            },
            "downloads": -1,
            "filename": "fingerprints-1.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "fe48d853531d972371d9e1bb1879182d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16315,
            "upload_time": "2023-10-06T06:07:35",
            "upload_time_iso_8601": "2023-10-06T06:07:35.950687Z",
            "url": "https://files.pythonhosted.org/packages/cb/17/292aab0190d8c80647ad0961c3fb9830016541b3d54fa4a67b5327f4e922/fingerprints-1.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-06 06:07:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "alephdata",
    "github_project": "fingerprints",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fingerprints"
}
        
Elapsed time: 2.46979s