pynysiis


Namepynysiis JSON
Version 1.0.7 PyPI version JSON
download
home_pagehttps://finbarrs.eu/
SummaryNYSIIS phonetic encoding algorithm.
upload_time2024-11-19 13:53:36
maintainerNone
docs_urlNone
authorFinbarrs Oketunji
requires_python>=3.8
licenseMIT
keywords nysiis phonetic encoding algorithm name matching fuzzy matching sound matching
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            pynysiis
=========

.. image:: https://badge.fury.io/py/pynysiis.svg
    :target: https://badge.fury.io/py/pynysiis
    :alt: NYSIIS Python Package Version


The `pynysiis` package provides a Python implementation of the `New York State Identification and Intelligence System`_ (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.

.. _New York State Identification and Intelligence System: https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System


Requirements
-------------

Python 3.8 and later.


Setup
------

You can install this package by using the pip tool and installing:

.. code-block:: bash

	$ pip install pynysiis

Or:

.. code-block:: bash

	$ easy_install pynysiis

Basic Usage
-----------

.. code-block:: python

    from nysiis import NYSIIS

    encoder = NYSIIS()
    name = "Watkins"
    encoded_name = encoder.encode(name)
    print(encoded_name)  # Output: WATCAN

Name Comparison
---------------

.. code-block:: python

    from nysiis import NYSIIS

    encoder = NYSIIS()

    # Compare similar names
    name1 = "John Smith"
    name2 = "John Smyth"

    encoded_name1 = encoder.encode(name1)
    encoded_name2 = encoder.encode(name2)

    if encoded_name1 == encoded_name2:
        print("Names match phonetically")
    else:
        print("Names are phonetically different")

    # Output: Names match phonetically

Multi-Language Support
----------------------

The NYSIIS encoder handles names from various languages:

.. code-block:: python

    from nysiis import NYSIIS

    encoder = NYSIIS()

    # Sample names from different languages
    names = [
        # English names
        "Watkins",
        "Robert Johnson",
        
        # Yoruba name
        "Olanrewaju Akinyele",
        
        # Igbo name
        "Obinwanne Obiora",
        
        # Hausa name
        "Abdussalamu Abubakar",
        
        # Hindi name
        "Virat Kohli",
        
        # Urdu name
        "Usman Shah"
    ]

    # Process each name
    for name in names:
        encoded_name = encoder.encode(name)
        print(f"{name:<20} -> {encoded_name}")

    # Output:
    # Watkins              -> WATCAN
    # Robert Johnson       -> RABART
    # Olanrewaju Akinyele -> OLANRA
    # Obinwanne Obiora    -> OBAWAN
    # Abdussalamu Abubakar-> ABDASA
    # Virat Kohli         -> VARATC
    # Usman Shah          -> USNANS

Common Use Cases
----------------

Database Search Optimisation
----------------------------

.. code-block:: python

    def find_similar_names(search_name, database_names):
        encoder = NYSIIS()
        search_code = encoder.encode(search_name)
        
        matches = [
            name for name in database_names
            if encoder.encode(name) == search_code
        ]
        return matches

Name Deduplication
------------------

.. code-block:: python

    def find_duplicates(names):
        encoder = NYSIIS()
        encoded_names = {}
        
        for name in names:
            code = encoder.encode(name)
            encoded_names.setdefault(code, []).append(name)
            
        return {
            code: names 
            for code, names in encoded_names.items() 
            if len(names) > 1
        }

Fuzzy Name Matching
-------------------

.. code-block:: python

    def match_names(name1, name2, encoder=None):
        if encoder is None:
            encoder = NYSIIS()
            
        return encoder.encode(name1) == encoder.encode(name2)

Best Practices
--------------

Reuse the Encoder Instance
--------------------------

.. code-block:: python

    # Good - create once, use many times
    encoder = NYSIIS()
    for name in large_name_list:
        encoded = encoder.encode(name)

    # Less efficient - creating new instance repeatedly
    for name in large_name_list:
        encoded = NYSIIS().encode(name)

Handle Empty Inputs
-------------------

.. code-block:: python

    def process_name(name):
        if not name or not name.strip():
            return None
        
        encoder = NYSIIS()
        return encoder.encode(name)

Case Sensitivity
----------------

.. code-block:: python

    # The encoder handles case automatically
    encoder = NYSIIS()
    print(encoder.encode("smith"))  # Same as "SMITH"
    print(encoder.encode("SMITH"))  # Same result

Reference
----------

.. code-block:: bibtex

    @inproceedings{Rajkovic2007,
      author    = {Petar Rajkovic and Dragan Jankovic},
      title     = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},
      booktitle = {XVII Conference on Applied Mathematics},
      editors   = {D. Herceg and H. Zarin},
      pages     = {193--204},
      year      = {2007},
      publisher = {Department of Mathematics and Informatics, Novi Sad},
      url       = {https://jmp.sh/hukNujCG}
    }


Additional References
----------------------

+ `Commission Implementing Regulation (EU) 2016/480`_
+ `Commission Implementing Regulation (EU) 2023/2381`_

.. _Commission Implementing Regulation (EU) 2016/480: https://www.legislation.gov.uk/eur/2016/480/contents
.. _Commission Implementing Regulation (EU) 2023/2381: https://eur-lex.europa.eu/eli/reg_impl/2023/2381/oj


License
--------

This project is licensed under the `MIT License`_.  

.. _MIT License: https://gist.github.com/0xnu/d11da49c85eeb7272517a9010bbdf1ab


Copyright
----------

Copyright |copy| 2024 `Finbarrs Oketunji`_. All Rights Reserved.

.. |copy| unicode:: 0xA9 .. copyright sign
.. _Finbarrs Oketunji: https://finbarrs.eu/

            

Raw data

            {
    "_id": null,
    "home_page": "https://finbarrs.eu/",
    "name": "pynysiis",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "nysiis, phonetic, encoding, algorithm, name matching, fuzzy matching, sound matching",
    "author": "Finbarrs Oketunji",
    "author_email": "f@finbarrs.eu",
    "download_url": "https://files.pythonhosted.org/packages/8a/fa/804e0d3cb7bbfb23f0b44e874a2a0d5ff681a083c9f0835e32eef09ab3c4/pynysiis-1.0.7.tar.gz",
    "platform": null,
    "description": "pynysiis\n=========\n\n.. image:: https://badge.fury.io/py/pynysiis.svg\n    :target: https://badge.fury.io/py/pynysiis\n    :alt: NYSIIS Python Package Version\n\n\nThe `pynysiis` package provides a Python implementation of the `New York State Identification and Intelligence System`_ (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.\n\n.. _New York State Identification and Intelligence System: https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System\n\n\nRequirements\n-------------\n\nPython 3.8 and later.\n\n\nSetup\n------\n\nYou can install this package by using the pip tool and installing:\n\n.. code-block:: bash\n\n\t$ pip install pynysiis\n\nOr:\n\n.. code-block:: bash\n\n\t$ easy_install pynysiis\n\nBasic Usage\n-----------\n\n.. code-block:: python\n\n    from nysiis import NYSIIS\n\n    encoder = NYSIIS()\n    name = \"Watkins\"\n    encoded_name = encoder.encode(name)\n    print(encoded_name)  # Output: WATCAN\n\nName Comparison\n---------------\n\n.. code-block:: python\n\n    from nysiis import NYSIIS\n\n    encoder = NYSIIS()\n\n    # Compare similar names\n    name1 = \"John Smith\"\n    name2 = \"John Smyth\"\n\n    encoded_name1 = encoder.encode(name1)\n    encoded_name2 = encoder.encode(name2)\n\n    if encoded_name1 == encoded_name2:\n        print(\"Names match phonetically\")\n    else:\n        print(\"Names are phonetically different\")\n\n    # Output: Names match phonetically\n\nMulti-Language Support\n----------------------\n\nThe NYSIIS encoder handles names from various languages:\n\n.. code-block:: python\n\n    from nysiis import NYSIIS\n\n    encoder = NYSIIS()\n\n    # Sample names from different languages\n    names = [\n        # English names\n        \"Watkins\",\n        \"Robert Johnson\",\n        \n        # Yoruba name\n        \"Olanrewaju Akinyele\",\n        \n        # Igbo name\n        \"Obinwanne Obiora\",\n        \n        # Hausa name\n        \"Abdussalamu Abubakar\",\n        \n        # Hindi name\n        \"Virat Kohli\",\n        \n        # Urdu name\n        \"Usman Shah\"\n    ]\n\n    # Process each name\n    for name in names:\n        encoded_name = encoder.encode(name)\n        print(f\"{name:<20} -> {encoded_name}\")\n\n    # Output:\n    # Watkins              -> WATCAN\n    # Robert Johnson       -> RABART\n    # Olanrewaju Akinyele -> OLANRA\n    # Obinwanne Obiora    -> OBAWAN\n    # Abdussalamu Abubakar-> ABDASA\n    # Virat Kohli         -> VARATC\n    # Usman Shah          -> USNANS\n\nCommon Use Cases\n----------------\n\nDatabase Search Optimisation\n----------------------------\n\n.. code-block:: python\n\n    def find_similar_names(search_name, database_names):\n        encoder = NYSIIS()\n        search_code = encoder.encode(search_name)\n        \n        matches = [\n            name for name in database_names\n            if encoder.encode(name) == search_code\n        ]\n        return matches\n\nName Deduplication\n------------------\n\n.. code-block:: python\n\n    def find_duplicates(names):\n        encoder = NYSIIS()\n        encoded_names = {}\n        \n        for name in names:\n            code = encoder.encode(name)\n            encoded_names.setdefault(code, []).append(name)\n            \n        return {\n            code: names \n            for code, names in encoded_names.items() \n            if len(names) > 1\n        }\n\nFuzzy Name Matching\n-------------------\n\n.. code-block:: python\n\n    def match_names(name1, name2, encoder=None):\n        if encoder is None:\n            encoder = NYSIIS()\n            \n        return encoder.encode(name1) == encoder.encode(name2)\n\nBest Practices\n--------------\n\nReuse the Encoder Instance\n--------------------------\n\n.. code-block:: python\n\n    # Good - create once, use many times\n    encoder = NYSIIS()\n    for name in large_name_list:\n        encoded = encoder.encode(name)\n\n    # Less efficient - creating new instance repeatedly\n    for name in large_name_list:\n        encoded = NYSIIS().encode(name)\n\nHandle Empty Inputs\n-------------------\n\n.. code-block:: python\n\n    def process_name(name):\n        if not name or not name.strip():\n            return None\n        \n        encoder = NYSIIS()\n        return encoder.encode(name)\n\nCase Sensitivity\n----------------\n\n.. code-block:: python\n\n    # The encoder handles case automatically\n    encoder = NYSIIS()\n    print(encoder.encode(\"smith\"))  # Same as \"SMITH\"\n    print(encoder.encode(\"SMITH\"))  # Same result\n\nReference\n----------\n\n.. code-block:: bibtex\n\n    @inproceedings{Rajkovic2007,\n      author    = {Petar Rajkovic and Dragan Jankovic},\n      title     = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},\n      booktitle = {XVII Conference on Applied Mathematics},\n      editors   = {D. Herceg and H. Zarin},\n      pages     = {193--204},\n      year      = {2007},\n      publisher = {Department of Mathematics and Informatics, Novi Sad},\n      url       = {https://jmp.sh/hukNujCG}\n    }\n\n\nAdditional References\n----------------------\n\n+ `Commission Implementing Regulation (EU) 2016/480`_\n+ `Commission Implementing Regulation (EU) 2023/2381`_\n\n.. _Commission Implementing Regulation (EU) 2016/480: https://www.legislation.gov.uk/eur/2016/480/contents\n.. _Commission Implementing Regulation (EU) 2023/2381: https://eur-lex.europa.eu/eli/reg_impl/2023/2381/oj\n\n\nLicense\n--------\n\nThis project is licensed under the `MIT License`_.  \n\n.. _MIT License: https://gist.github.com/0xnu/d11da49c85eeb7272517a9010bbdf1ab\n\n\nCopyright\n----------\n\nCopyright |copy| 2024 `Finbarrs Oketunji`_. All Rights Reserved.\n\n.. |copy| unicode:: 0xA9 .. copyright sign\n.. _Finbarrs Oketunji: https://finbarrs.eu/\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "NYSIIS phonetic encoding algorithm.",
    "version": "1.0.7",
    "project_urls": {
        "Bug Tracker": "https://github.com/0xnu/nysiis/issues",
        "Changes": "https://github.com/0xnu/nysiis/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/0xnu/nysiis/blob/main/README.md",
        "Homepage": "https://finbarrs.eu/",
        "Source Code": "https://github.com/0xnu/nysiis"
    },
    "split_keywords": [
        "nysiis",
        " phonetic",
        " encoding",
        " algorithm",
        " name matching",
        " fuzzy matching",
        " sound matching"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1cb07985e562b078c7a8ebde8302f383e34fd4353e86570ea994b640bddcf61",
                "md5": "509641b45f6af89d61cd811e10fef18b",
                "sha256": "2d624de7a7fb051fc73ce4bb021d645b634bc29445121bb65f3ff1d6e528bffc"
            },
            "downloads": -1,
            "filename": "pynysiis-1.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "509641b45f6af89d61cd811e10fef18b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15494,
            "upload_time": "2024-11-19T13:53:34",
            "upload_time_iso_8601": "2024-11-19T13:53:34.472674Z",
            "url": "https://files.pythonhosted.org/packages/d1/cb/07985e562b078c7a8ebde8302f383e34fd4353e86570ea994b640bddcf61/pynysiis-1.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8afa804e0d3cb7bbfb23f0b44e874a2a0d5ff681a083c9f0835e32eef09ab3c4",
                "md5": "328a53ff9fb63c7960b7763f0037ba0d",
                "sha256": "461f4f5e499a7a33298e1afdfc4848880e8fda89603a9ff7e0b467afa90e305d"
            },
            "downloads": -1,
            "filename": "pynysiis-1.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "328a53ff9fb63c7960b7763f0037ba0d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17633,
            "upload_time": "2024-11-19T13:53:36",
            "upload_time_iso_8601": "2024-11-19T13:53:36.304546Z",
            "url": "https://files.pythonhosted.org/packages/8a/fa/804e0d3cb7bbfb23f0b44e874a2a0d5ff681a083c9f0835e32eef09ab3c4/pynysiis-1.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-19 13:53:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "0xnu",
    "github_project": "nysiis",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pynysiis"
}
        
Elapsed time: 1.02560s