parsernaam

Name	parsernaam JSON
Version	0.0.4 JSON
	download
home_page	https://github.com/appeler/parsernaam
Summary	Name parser
upload_time	2023-10-11 20:10:48
maintainer
docs_url	None
author	Rajashekar Chintalapati, Gaurav Sood
requires_python
license	MIT
keywords	parse names
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            Parsernaam: Predict First and Last Name
-----------------------------------------

.. image:: https://github.com/appeler/parsernaam/actions/workflows/python-package.yml/badge.svg
    :target: https://github.com/appeler/parsernaam/actions?query=workflow%3A%22Python+package%22
.. image:: https://img.shields.io/pypi/v/parsernaam.svg
    :target: https://pypi.python.org/pypi/parsernaam
.. image:: https://static.pepy.tech/badge/parsernaam
    :target: https://pepy.tech/project/parsernaam

Most common name parsers use crude pattern matching and the sequence of strings, e.g., the last word is the last name, to parse names. This approach is limited and fragile, especially for Indian names. We take a machine-learning approach to the problem. Using the large voter registration data in India and US, we build machine-learning-based name parsers that predict whether the string is a first or last name. 

For Indian electoral rolls, we assume the last name is the word in the name that is shared by multiple family members. (We table the expansion to include compound last names---extremely rare in India---till the next iteration.)

Gradio App.
-----------
`parsernaam on HF <https://huggingface.co/spaces/sixtyfold/parsernaam>`_

Installation
------------
.. code-block:: bash

    pip install parsernaam

General API
-----------

The general API is as follows:

::

    # Import the library
    from parsernaam.parsernaam import ParseNames

    positional arguments:
      df                 dataframe with Names to parse (with column name 'name')

    # example
    df = pd.DataFrame({'name': ['Jan', 'Nicholas Turner', 'Petersen', 'Nichols Richard', 'Piet',
                                         'John Smith', 'Janssen', 'Kim Yeon']})
    df = ParseNames.parse(df)
    print(df.to_markdown())

::

    |    | name            | parsed_name                                                                   |
    |---:|:----------------|:------------------------------------------------------------------------------|
    |  0 | Jan             | {'name': 'Jan', 'type': 'first', 'prob': 0.6769440174102783}                  |
    |  1 | Nicholas Turner | {'name': 'Nicholas Turner', 'type': 'first_last', 'prob': 0.9990382194519043} |
    |  2 | Petersen        | {'name': 'Petersen', 'type': 'last', 'prob': 0.5342262387275696}              |
    |  3 | Nichols Richard | {'name': 'Nichols Richard', 'type': 'last_first', 'prob': 0.9998832941055298} |
    |  4 | Piet            | {'name': 'Piet', 'type': 'first', 'prob': 0.5381495952606201}                 |
    |  5 | John Smith      | {'name': 'John Smith', 'type': 'first_last', 'prob': 0.9975730776786804}      |
    |  6 | Janssen         | {'name': 'Janssen', 'type': 'first', 'prob': 0.5929554104804993}              |
    |  7 | Kim Yeon        | {'name': 'Kim Yeon', 'type': 'last_first', 'prob': 0.9987115859985352}        |


Data
----

The model is trained on names from the Florida Voter Registration Data from early 2022.
The data are available on the `Harvard Dataverse <http://dx.doi.org/10.7910/DVN/UBIG3F>`__


Authors
-------

Rajashekar Chintalapati and Gaurav Sood

Contributing
------------

Contributions are welcome. Please open an issue if you find a bug or have a feature request.

License
-------

The package is released under the `MIT License <https://opensource.org/licenses/MIT>`_.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/appeler/parsernaam",
    "name": "parsernaam",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "parse names",
    "author": "Rajashekar Chintalapati, Gaurav Sood",
    "author_email": "rajshekar.ch@gmail.com, gsood07@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2c/eb/599dd959b7b709b8e3307f70dcec70f954c195ca8f951a8d14058ec96438/parsernaam-0.0.4.tar.gz",
    "platform": null,
    "description": "Parsernaam: Predict First and Last Name\n-----------------------------------------\n\n.. image:: https://github.com/appeler/parsernaam/actions/workflows/python-package.yml/badge.svg\n    :target: https://github.com/appeler/parsernaam/actions?query=workflow%3A%22Python+package%22\n.. image:: https://img.shields.io/pypi/v/parsernaam.svg\n    :target: https://pypi.python.org/pypi/parsernaam\n.. image:: https://static.pepy.tech/badge/parsernaam\n    :target: https://pepy.tech/project/parsernaam\n\nMost common name parsers use crude pattern matching and the sequence of strings, e.g., the last word is the last name, to parse names. This approach is limited and fragile, especially for Indian names. We take a machine-learning approach to the problem. Using the large voter registration data in India and US, we build machine-learning-based name parsers that predict whether the string is a first or last name. \n\nFor Indian electoral rolls, we assume the last name is the word in the name that is shared by multiple family members. (We table the expansion to include compound last names---extremely rare in India---till the next iteration.)\n\nGradio App.\n-----------\n`parsernaam on HF <https://huggingface.co/spaces/sixtyfold/parsernaam>`_\n\nInstallation\n------------\n.. code-block:: bash\n\n    pip install parsernaam\n\nGeneral API\n-----------\n\nThe general API is as follows:\n\n::\n\n    # Import the library\n    from parsernaam.parsernaam import ParseNames\n\n    positional arguments:\n      df                 dataframe with Names to parse (with column name 'name')\n\n    # example\n    df = pd.DataFrame({'name': ['Jan', 'Nicholas Turner', 'Petersen', 'Nichols Richard', 'Piet',\n                                         'John Smith', 'Janssen', 'Kim Yeon']})\n    df = ParseNames.parse(df)\n    print(df.to_markdown())\n\n::\n\n    |    | name            | parsed_name                                                                   |\n    |---:|:----------------|:------------------------------------------------------------------------------|\n    |  0 | Jan             | {'name': 'Jan', 'type': 'first', 'prob': 0.6769440174102783}                  |\n    |  1 | Nicholas Turner | {'name': 'Nicholas Turner', 'type': 'first_last', 'prob': 0.9990382194519043} |\n    |  2 | Petersen        | {'name': 'Petersen', 'type': 'last', 'prob': 0.5342262387275696}              |\n    |  3 | Nichols Richard | {'name': 'Nichols Richard', 'type': 'last_first', 'prob': 0.9998832941055298} |\n    |  4 | Piet            | {'name': 'Piet', 'type': 'first', 'prob': 0.5381495952606201}                 |\n    |  5 | John Smith      | {'name': 'John Smith', 'type': 'first_last', 'prob': 0.9975730776786804}      |\n    |  6 | Janssen         | {'name': 'Janssen', 'type': 'first', 'prob': 0.5929554104804993}              |\n    |  7 | Kim Yeon        | {'name': 'Kim Yeon', 'type': 'last_first', 'prob': 0.9987115859985352}        |\n\n\nData\n----\n\nThe model is trained on names from the Florida Voter Registration Data from early 2022.\nThe data are available on the `Harvard Dataverse <http://dx.doi.org/10.7910/DVN/UBIG3F>`__\n\n\nAuthors\n-------\n\nRajashekar Chintalapati and Gaurav Sood\n\nContributing\n------------\n\nContributions are welcome. Please open an issue if you find a bug or have a feature request.\n\nLicense\n-------\n\nThe package is released under the `MIT License <https://opensource.org/licenses/MIT>`_.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Name parser",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/appeler/parsernaam"
    },
    "split_keywords": [
        "parse",
        "names"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "69b04ef135b494af3ac44ad21acbb119a977116b1275aa4bf37c121ef52672ad",
                "md5": "594ca673166a6ba5203c91d2ffd7f472",
                "sha256": "d72c23879858bc906d09979a4a36cad4abcafa2bc9858921731d1afdb7736ccd"
            },
            "downloads": -1,
            "filename": "parsernaam-0.0.4-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "594ca673166a6ba5203c91d2ffd7f472",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 8145924,
            "upload_time": "2023-10-11T20:10:44",
            "upload_time_iso_8601": "2023-10-11T20:10:44.942827Z",
            "url": "https://files.pythonhosted.org/packages/69/b0/4ef135b494af3ac44ad21acbb119a977116b1275aa4bf37c121ef52672ad/parsernaam-0.0.4-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2ceb599dd959b7b709b8e3307f70dcec70f954c195ca8f951a8d14058ec96438",
                "md5": "cdffa8788c3ad171dc9018c4e8ba9132",
                "sha256": "6e6904fdad9efbe96a78e99825bee37d3970d9730083cbb847fcef27dad14ae2"
            },
            "downloads": -1,
            "filename": "parsernaam-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "cdffa8788c3ad171dc9018c4e8ba9132",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8142688,
            "upload_time": "2023-10-11T20:10:48",
            "upload_time_iso_8601": "2023-10-11T20:10:48.400115Z",
            "url": "https://files.pythonhosted.org/packages/2c/eb/599dd959b7b709b8e3307f70dcec70f954c195ca8f951a8d14058ec96438/parsernaam-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-11 20:10:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "appeler",
    "github_project": "parsernaam",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "parsernaam"
}

Rajashekar Chintalapati, Gaurav Sood