pynysiis
=========
.. image:: https://badge.fury.io/py/pynysiis.svg
:target: https://badge.fury.io/py/pynysiis
:alt: NYSIIS Python Package Version
The `pynysiis` package provides a Python implementation of the `New York State Identification and Intelligence System`_ (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.
.. _New York State Identification and Intelligence System: https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Requirements
-------------
Python 3.8 and later.
Setup
------
You can install this package by using the pip tool and installing:
.. code-block:: bash
$ pip install pynysiis
Or:
.. code-block:: bash
$ easy_install pynysiis
Basic Usage
-----------
.. code-block:: python
from nysiis import NYSIIS
encoder = NYSIIS()
name = "Watkins"
encoded_name = encoder.encode(name)
print(encoded_name) # Output: WATCAN
Name Comparison
---------------
.. code-block:: python
from nysiis import NYSIIS
encoder = NYSIIS()
# Compare similar names
name1 = "John Smith"
name2 = "John Smyth"
encoded_name1 = encoder.encode(name1)
encoded_name2 = encoder.encode(name2)
if encoded_name1 == encoded_name2:
print("Names match phonetically")
else:
print("Names are phonetically different")
# Output: Names match phonetically
Multi-Language Support
----------------------
The NYSIIS encoder handles names from various languages:
.. code-block:: python
from nysiis import NYSIIS
encoder = NYSIIS()
# Sample names from different languages
names = [
# English names
"Watkins",
"Robert Johnson",
# Yoruba name
"Olanrewaju Akinyele",
# Igbo name
"Obinwanne Obiora",
# Hausa name
"Abdussalamu Abubakar",
# Hindi name
"Virat Kohli",
# Urdu name
"Usman Shah"
]
# Process each name
for name in names:
encoded_name = encoder.encode(name)
print(f"{name:<20} -> {encoded_name}")
# Output:
# Watkins -> WATCAN
# Robert Johnson -> RABART
# Olanrewaju Akinyele -> OLANRA
# Obinwanne Obiora -> OBAWAN
# Abdussalamu Abubakar-> ABDASA
# Virat Kohli -> VARATC
# Usman Shah -> USNANS
Common Use Cases
----------------
Database Search Optimisation
----------------------------
.. code-block:: python
def find_similar_names(search_name, database_names):
encoder = NYSIIS()
search_code = encoder.encode(search_name)
matches = [
name for name in database_names
if encoder.encode(name) == search_code
]
return matches
Name Deduplication
------------------
.. code-block:: python
def find_duplicates(names):
encoder = NYSIIS()
encoded_names = {}
for name in names:
code = encoder.encode(name)
encoded_names.setdefault(code, []).append(name)
return {
code: names
for code, names in encoded_names.items()
if len(names) > 1
}
Fuzzy Name Matching
-------------------
.. code-block:: python
def match_names(name1, name2, encoder=None):
if encoder is None:
encoder = NYSIIS()
return encoder.encode(name1) == encoder.encode(name2)
Best Practices
--------------
Reuse the Encoder Instance
--------------------------
.. code-block:: python
# Good - create once, use many times
encoder = NYSIIS()
for name in large_name_list:
encoded = encoder.encode(name)
# Less efficient - creating new instance repeatedly
for name in large_name_list:
encoded = NYSIIS().encode(name)
Handle Empty Inputs
-------------------
.. code-block:: python
def process_name(name):
if not name or not name.strip():
return None
encoder = NYSIIS()
return encoder.encode(name)
Case Sensitivity
----------------
.. code-block:: python
# The encoder handles case automatically
encoder = NYSIIS()
print(encoder.encode("smith")) # Same as "SMITH"
print(encoder.encode("SMITH")) # Same result
Reference
----------
.. code-block:: bibtex
@inproceedings{Rajkovic2007,
author = {Petar Rajkovic and Dragan Jankovic},
title = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},
booktitle = {XVII Conference on Applied Mathematics},
editors = {D. Herceg and H. Zarin},
pages = {193--204},
year = {2007},
publisher = {Department of Mathematics and Informatics, Novi Sad},
url = {https://jmp.sh/hukNujCG}
}
Additional References
----------------------
+ `Commission Implementing Regulation (EU) 2016/480`_
+ `Commission Implementing Regulation (EU) 2023/2381`_
.. _Commission Implementing Regulation (EU) 2016/480: https://www.legislation.gov.uk/eur/2016/480/contents
.. _Commission Implementing Regulation (EU) 2023/2381: https://eur-lex.europa.eu/eli/reg_impl/2023/2381/oj
License
--------
This project is licensed under the `MIT License`_.
.. _MIT License: https://gist.github.com/0xnu/d11da49c85eeb7272517a9010bbdf1ab
Copyright
----------
Copyright |copy| 2024 `Finbarrs Oketunji`_. All Rights Reserved.
.. |copy| unicode:: 0xA9 .. copyright sign
.. _Finbarrs Oketunji: https://finbarrs.eu/
Raw data
{
"_id": null,
"home_page": "https://finbarrs.eu/",
"name": "pynysiis",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "nysiis, phonetic, encoding, algorithm, name matching, fuzzy matching, sound matching",
"author": "Finbarrs Oketunji",
"author_email": "f@finbarrs.eu",
"download_url": "https://files.pythonhosted.org/packages/8a/fa/804e0d3cb7bbfb23f0b44e874a2a0d5ff681a083c9f0835e32eef09ab3c4/pynysiis-1.0.7.tar.gz",
"platform": null,
"description": "pynysiis\n=========\n\n.. image:: https://badge.fury.io/py/pynysiis.svg\n :target: https://badge.fury.io/py/pynysiis\n :alt: NYSIIS Python Package Version\n\n\nThe `pynysiis` package provides a Python implementation of the `New York State Identification and Intelligence System`_ (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.\n\n.. _New York State Identification and Intelligence System: https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System\n\n\nRequirements\n-------------\n\nPython 3.8 and later.\n\n\nSetup\n------\n\nYou can install this package by using the pip tool and installing:\n\n.. code-block:: bash\n\n\t$ pip install pynysiis\n\nOr:\n\n.. code-block:: bash\n\n\t$ easy_install pynysiis\n\nBasic Usage\n-----------\n\n.. code-block:: python\n\n from nysiis import NYSIIS\n\n encoder = NYSIIS()\n name = \"Watkins\"\n encoded_name = encoder.encode(name)\n print(encoded_name) # Output: WATCAN\n\nName Comparison\n---------------\n\n.. code-block:: python\n\n from nysiis import NYSIIS\n\n encoder = NYSIIS()\n\n # Compare similar names\n name1 = \"John Smith\"\n name2 = \"John Smyth\"\n\n encoded_name1 = encoder.encode(name1)\n encoded_name2 = encoder.encode(name2)\n\n if encoded_name1 == encoded_name2:\n print(\"Names match phonetically\")\n else:\n print(\"Names are phonetically different\")\n\n # Output: Names match phonetically\n\nMulti-Language Support\n----------------------\n\nThe NYSIIS encoder handles names from various languages:\n\n.. code-block:: python\n\n from nysiis import NYSIIS\n\n encoder = NYSIIS()\n\n # Sample names from different languages\n names = [\n # English names\n \"Watkins\",\n \"Robert Johnson\",\n \n # Yoruba name\n \"Olanrewaju Akinyele\",\n \n # Igbo name\n \"Obinwanne Obiora\",\n \n # Hausa name\n \"Abdussalamu Abubakar\",\n \n # Hindi name\n \"Virat Kohli\",\n \n # Urdu name\n \"Usman Shah\"\n ]\n\n # Process each name\n for name in names:\n encoded_name = encoder.encode(name)\n print(f\"{name:<20} -> {encoded_name}\")\n\n # Output:\n # Watkins -> WATCAN\n # Robert Johnson -> RABART\n # Olanrewaju Akinyele -> OLANRA\n # Obinwanne Obiora -> OBAWAN\n # Abdussalamu Abubakar-> ABDASA\n # Virat Kohli -> VARATC\n # Usman Shah -> USNANS\n\nCommon Use Cases\n----------------\n\nDatabase Search Optimisation\n----------------------------\n\n.. code-block:: python\n\n def find_similar_names(search_name, database_names):\n encoder = NYSIIS()\n search_code = encoder.encode(search_name)\n \n matches = [\n name for name in database_names\n if encoder.encode(name) == search_code\n ]\n return matches\n\nName Deduplication\n------------------\n\n.. code-block:: python\n\n def find_duplicates(names):\n encoder = NYSIIS()\n encoded_names = {}\n \n for name in names:\n code = encoder.encode(name)\n encoded_names.setdefault(code, []).append(name)\n \n return {\n code: names \n for code, names in encoded_names.items() \n if len(names) > 1\n }\n\nFuzzy Name Matching\n-------------------\n\n.. code-block:: python\n\n def match_names(name1, name2, encoder=None):\n if encoder is None:\n encoder = NYSIIS()\n \n return encoder.encode(name1) == encoder.encode(name2)\n\nBest Practices\n--------------\n\nReuse the Encoder Instance\n--------------------------\n\n.. code-block:: python\n\n # Good - create once, use many times\n encoder = NYSIIS()\n for name in large_name_list:\n encoded = encoder.encode(name)\n\n # Less efficient - creating new instance repeatedly\n for name in large_name_list:\n encoded = NYSIIS().encode(name)\n\nHandle Empty Inputs\n-------------------\n\n.. code-block:: python\n\n def process_name(name):\n if not name or not name.strip():\n return None\n \n encoder = NYSIIS()\n return encoder.encode(name)\n\nCase Sensitivity\n----------------\n\n.. code-block:: python\n\n # The encoder handles case automatically\n encoder = NYSIIS()\n print(encoder.encode(\"smith\")) # Same as \"SMITH\"\n print(encoder.encode(\"SMITH\")) # Same result\n\nReference\n----------\n\n.. code-block:: bibtex\n\n @inproceedings{Rajkovic2007,\n author = {Petar Rajkovic and Dragan Jankovic},\n title = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},\n booktitle = {XVII Conference on Applied Mathematics},\n editors = {D. Herceg and H. Zarin},\n pages = {193--204},\n year = {2007},\n publisher = {Department of Mathematics and Informatics, Novi Sad},\n url = {https://jmp.sh/hukNujCG}\n }\n\n\nAdditional References\n----------------------\n\n+ `Commission Implementing Regulation (EU) 2016/480`_\n+ `Commission Implementing Regulation (EU) 2023/2381`_\n\n.. _Commission Implementing Regulation (EU) 2016/480: https://www.legislation.gov.uk/eur/2016/480/contents\n.. _Commission Implementing Regulation (EU) 2023/2381: https://eur-lex.europa.eu/eli/reg_impl/2023/2381/oj\n\n\nLicense\n--------\n\nThis project is licensed under the `MIT License`_. \n\n.. _MIT License: https://gist.github.com/0xnu/d11da49c85eeb7272517a9010bbdf1ab\n\n\nCopyright\n----------\n\nCopyright |copy| 2024 `Finbarrs Oketunji`_. All Rights Reserved.\n\n.. |copy| unicode:: 0xA9 .. copyright sign\n.. _Finbarrs Oketunji: https://finbarrs.eu/\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "NYSIIS phonetic encoding algorithm.",
"version": "1.0.7",
"project_urls": {
"Bug Tracker": "https://github.com/0xnu/nysiis/issues",
"Changes": "https://github.com/0xnu/nysiis/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/0xnu/nysiis/blob/main/README.md",
"Homepage": "https://finbarrs.eu/",
"Source Code": "https://github.com/0xnu/nysiis"
},
"split_keywords": [
"nysiis",
" phonetic",
" encoding",
" algorithm",
" name matching",
" fuzzy matching",
" sound matching"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d1cb07985e562b078c7a8ebde8302f383e34fd4353e86570ea994b640bddcf61",
"md5": "509641b45f6af89d61cd811e10fef18b",
"sha256": "2d624de7a7fb051fc73ce4bb021d645b634bc29445121bb65f3ff1d6e528bffc"
},
"downloads": -1,
"filename": "pynysiis-1.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "509641b45f6af89d61cd811e10fef18b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 15494,
"upload_time": "2024-11-19T13:53:34",
"upload_time_iso_8601": "2024-11-19T13:53:34.472674Z",
"url": "https://files.pythonhosted.org/packages/d1/cb/07985e562b078c7a8ebde8302f383e34fd4353e86570ea994b640bddcf61/pynysiis-1.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8afa804e0d3cb7bbfb23f0b44e874a2a0d5ff681a083c9f0835e32eef09ab3c4",
"md5": "328a53ff9fb63c7960b7763f0037ba0d",
"sha256": "461f4f5e499a7a33298e1afdfc4848880e8fda89603a9ff7e0b467afa90e305d"
},
"downloads": -1,
"filename": "pynysiis-1.0.7.tar.gz",
"has_sig": false,
"md5_digest": "328a53ff9fb63c7960b7763f0037ba0d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17633,
"upload_time": "2024-11-19T13:53:36",
"upload_time_iso_8601": "2024-11-19T13:53:36.304546Z",
"url": "https://files.pythonhosted.org/packages/8a/fa/804e0d3cb7bbfb23f0b44e874a2a0d5ff681a083c9f0835e32eef09ab3c4/pynysiis-1.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-19 13:53:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "0xnu",
"github_project": "nysiis",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pynysiis"
}