# Parsernaam: ML-Assisted Name Parser
[](https://github.com/appeler/parsernaam/actions?query=workflow%3Atest)
[](https://pypi.python.org/pypi/parsernaam)
[](https://pepy.tech/project/parsernaam)
Most common name parsers use crude pattern matching and the sequence of
strings, e.g., the last word is the last name, to parse names. This
approach is limited and fragile, especially for Indian names. We take a
machine-learning approach to the problem. Using the large voter
registration data in India and the US, we build machine-learning-based name
parsers that predict whether the string is a first or last name.
For Indian electoral rolls, we assume the last name is the word in the
name that is shared by multiple family members. (We table the expansion
to include compound last names\-\--extremely rare in India\-\--till the
next iteration.)
# Gradio App.
[parsernaam on HF](https://huggingface.co/spaces/sixtyfold/parsernaam)
# Installation
``` bash
pip install parsernaam
```
# Usage
## Python API
```python
import pandas as pd
from parsernaam.parse import ParseNames
# Create DataFrame with names to parse
df = pd.DataFrame({'name': ['Jan', 'Nicholas Turner', 'Petersen', 'Nichols Richard', 'Piet',
'John Smith', 'Janssen', 'Kim Yeon']})
# Parse names using ML models
results = ParseNames.parse(df)
print(results.to_markdown())
```
**Output:**
```
| | name | parsed_name |
|---:|:----------------|:------------------------------------------------------------------------------|
| 0 | Jan | {'name': 'Jan', 'type': 'first', 'prob': 0.677} |
| 1 | Nicholas Turner | {'name': 'Nicholas Turner', 'type': 'first_last', 'prob': 0.999} |
| 2 | Petersen | {'name': 'Petersen', 'type': 'last', 'prob': 0.534} |
| 3 | Nichols Richard | {'name': 'Nichols Richard', 'type': 'last_first', 'prob': 0.999} |
| 4 | Piet | {'name': 'Piet', 'type': 'first', 'prob': 0.538} |
| 5 | John Smith | {'name': 'John Smith', 'type': 'first_last', 'prob': 0.997} |
| 6 | Janssen | {'name': 'Janssen', 'type': 'first', 'prob': 0.593} |
| 7 | Kim Yeon | {'name': 'Kim Yeon', 'type': 'last_first', 'prob': 0.999} |
```
## Command Line Interface
```bash
parse_names input.csv -o output.csv -n name_column
```
## Features
- **Machine Learning Based**: Uses LSTM neural networks trained on voter registration data
- **Multi-language Support**: Handles Indian, Western, and other international name patterns
- **High Accuracy**: Confidence scores provided for each prediction
- **Performance Optimized**: Model caching and batch processing support
- **Robust Error Handling**: Handles edge cases like empty names, special characters, etc.
# Data
The model is trained on names from the Florida Voter Registration Data
from early 2022. The data are available on the [Harvard
Dataverse](http://dx.doi.org/10.7910/DVN/UBIG3F)
# Authors
Rajashekar Chintalapati and Gaurav Sood
# Contributing
Contributions are welcome. Please open an issue if you find a bug or
have a feature request.
## 🔗 Adjacent Repositories
- [appeler/naamkaran](https://github.com/appeler/naamkaran) — generative model for names
- [appeler/ethnicolr2](https://github.com/appeler/ethnicolr2) — Ethnicolr implementation with new models in pytorch
- [appeler/namesexdata](https://github.com/appeler/namesexdata) — Data on international first names and sex of people with that name
- [appeler/pranaam](https://github.com/appeler/pranaam) — pranaam: predict religion based on name
- [appeler/graphic_names](https://github.com/appeler/graphic_names) — Infer the gender of a person with a particular first name using Google image search and Clarifai
# License
The package is released under the [MIT
License](https://opensource.org/licenses/MIT).
Raw data
{
"_id": null,
"home_page": null,
"name": "parsernaam",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "name-parser, names, machine-learning, nlp, indian-names, lstm, pytorch",
"author": null,
"author_email": "Rajashekar Chintalapati <rajshekar.ch@gmail.com>, Gaurav Sood <gsood07@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/94/97/16039d20b830c053f4cd300abb3bbfb96a3680a3cb6bba2770b567cb686e/parsernaam-0.1.1.tar.gz",
"platform": null,
"description": "# Parsernaam: ML-Assisted Name Parser\n\n[](https://github.com/appeler/parsernaam/actions?query=workflow%3Atest)\n[](https://pypi.python.org/pypi/parsernaam)\n[](https://pepy.tech/project/parsernaam)\n\nMost common name parsers use crude pattern matching and the sequence of\nstrings, e.g., the last word is the last name, to parse names. This\napproach is limited and fragile, especially for Indian names. We take a\nmachine-learning approach to the problem. Using the large voter\nregistration data in India and the US, we build machine-learning-based name\nparsers that predict whether the string is a first or last name.\n\nFor Indian electoral rolls, we assume the last name is the word in the\nname that is shared by multiple family members. (We table the expansion\nto include compound last names\\-\\--extremely rare in India\\-\\--till the\nnext iteration.)\n\n# Gradio App.\n\n[parsernaam on HF](https://huggingface.co/spaces/sixtyfold/parsernaam)\n\n# Installation\n\n``` bash\npip install parsernaam\n```\n\n# Usage\n\n## Python API\n\n```python\nimport pandas as pd\nfrom parsernaam.parse import ParseNames\n\n# Create DataFrame with names to parse\ndf = pd.DataFrame({'name': ['Jan', 'Nicholas Turner', 'Petersen', 'Nichols Richard', 'Piet',\n 'John Smith', 'Janssen', 'Kim Yeon']})\n\n# Parse names using ML models\nresults = ParseNames.parse(df)\nprint(results.to_markdown())\n```\n\n**Output:**\n```\n| | name | parsed_name |\n|---:|:----------------|:------------------------------------------------------------------------------|\n| 0 | Jan | {'name': 'Jan', 'type': 'first', 'prob': 0.677} |\n| 1 | Nicholas Turner | {'name': 'Nicholas Turner', 'type': 'first_last', 'prob': 0.999} |\n| 2 | Petersen | {'name': 'Petersen', 'type': 'last', 'prob': 0.534} |\n| 3 | Nichols Richard | {'name': 'Nichols Richard', 'type': 'last_first', 'prob': 0.999} |\n| 4 | Piet | {'name': 'Piet', 'type': 'first', 'prob': 0.538} |\n| 5 | John Smith | {'name': 'John Smith', 'type': 'first_last', 'prob': 0.997} |\n| 6 | Janssen | {'name': 'Janssen', 'type': 'first', 'prob': 0.593} |\n| 7 | Kim Yeon | {'name': 'Kim Yeon', 'type': 'last_first', 'prob': 0.999} |\n```\n\n## Command Line Interface\n\n```bash\nparse_names input.csv -o output.csv -n name_column\n```\n\n## Features\n\n- **Machine Learning Based**: Uses LSTM neural networks trained on voter registration data\n- **Multi-language Support**: Handles Indian, Western, and other international name patterns \n- **High Accuracy**: Confidence scores provided for each prediction\n- **Performance Optimized**: Model caching and batch processing support\n- **Robust Error Handling**: Handles edge cases like empty names, special characters, etc.\n\n# Data\n\nThe model is trained on names from the Florida Voter Registration Data\nfrom early 2022. The data are available on the [Harvard\nDataverse](http://dx.doi.org/10.7910/DVN/UBIG3F)\n\n# Authors\n\nRajashekar Chintalapati and Gaurav Sood\n\n# Contributing\n\nContributions are welcome. Please open an issue if you find a bug or\nhave a feature request.\n\n## \ud83d\udd17 Adjacent Repositories\n\n- [appeler/naamkaran](https://github.com/appeler/naamkaran) \u2014 generative model for names\n- [appeler/ethnicolr2](https://github.com/appeler/ethnicolr2) \u2014 Ethnicolr implementation with new models in pytorch\n- [appeler/namesexdata](https://github.com/appeler/namesexdata) \u2014 Data on international first names and sex of people with that name\n- [appeler/pranaam](https://github.com/appeler/pranaam) \u2014 pranaam: predict religion based on name\n- [appeler/graphic_names](https://github.com/appeler/graphic_names) \u2014 Infer the gender of a person with a particular first name using Google image search and Clarifai\n\n# License\n\nThe package is released under the [MIT\nLicense](https://opensource.org/licenses/MIT).\n",
"bugtrack_url": null,
"license": null,
"summary": "ML-assisted name parser for Indian and international names",
"version": "0.1.1",
"project_urls": {
"Bug Tracker": "https://github.com/appeler/parsernaam/issues",
"Homepage": "https://github.com/appeler/parsernaam",
"Repository": "https://github.com/appeler/parsernaam"
},
"split_keywords": [
"name-parser",
" names",
" machine-learning",
" nlp",
" indian-names",
" lstm",
" pytorch"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d647e5ccab1bfbbf628c0d96d3b4b1b7197a174cfb0b50a81c6e9383b603aad4",
"md5": "f24e09627c761584a8f71bd0eb0845f4",
"sha256": "01d157c238e3b763cc1e39790fae01614d529f11b46ea034cf457e6e890a3eb2"
},
"downloads": -1,
"filename": "parsernaam-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f24e09627c761584a8f71bd0eb0845f4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 8150976,
"upload_time": "2025-09-03T11:56:52",
"upload_time_iso_8601": "2025-09-03T11:56:52.717376Z",
"url": "https://files.pythonhosted.org/packages/d6/47/e5ccab1bfbbf628c0d96d3b4b1b7197a174cfb0b50a81c6e9383b603aad4/parsernaam-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "949716039d20b830c053f4cd300abb3bbfb96a3680a3cb6bba2770b567cb686e",
"md5": "7b3ba74840959ff51bf2eb00fcb726e4",
"sha256": "5e0f32fff5f0f652a060a3d22a655405e3f4c16b8a6e456542ecbd94442d3054"
},
"downloads": -1,
"filename": "parsernaam-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "7b3ba74840959ff51bf2eb00fcb726e4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 8145964,
"upload_time": "2025-09-03T11:57:03",
"upload_time_iso_8601": "2025-09-03T11:57:03.918933Z",
"url": "https://files.pythonhosted.org/packages/94/97/16039d20b830c053f4cd300abb3bbfb96a3680a3cb6bba2770b567cb686e/parsernaam-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-03 11:57:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "appeler",
"github_project": "parsernaam",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "parsernaam"
}