Metaphone


NameMetaphone JSON
Version 0.6 PyPI version JSON
download
home_pagehttps://github.com/oubiwann/metaphone
SummaryA Python implementation of the metaphone and double metaphone algorithms.
upload_time2016-08-24 14:37:29
maintainerNone
docs_urlNone
authorAndrew Collins
requires_pythonNone
licenseBSD
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ~~~~~~~~~
Metaphone
~~~~~~~~~

.. contents::
   :depth: 2
   :backlinks: top
   :local:

About
=====

*A Python implementation of the Metaphone and Double Metaphone algorithms*

Metaphone
---------
As described on the `Wikipedia page`_, the original Metaphone algorithm was
published in 1990 as an improvement over the `Soundex`_ algorithm. Like
Soundex, it was limited to English-only use. The Metaphone algorithm does not
produce phonetic representations of an input word or name; rather, the output
is an intentionally approximate phonetic representation. The approximate
encoding is necessary to account for the way speakers vary their pronunciations
and misspell or otherwise vary words and names they are trying to spell.

Double Metaphone
----------------
The Double Metaphone phonetic encoding algorithm is the second generation of
the Metaphone algorithm. Its implementation was described in the June 2000
issue of C/C++ Users Journal. It makes a number of fundamental design
improvements over the original Metaphone algorithm.

It is called "Double" because it can return both a primary and a secondary code
for a string; this accounts for some ambiguous cases as well as for multiple
variants of surnames with common ancestry. For example, encoding the name
"Smith" yields a primary code of SM0 and a secondary code of XMT, while the
name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both
have XMT in common.

Double Metaphone tries to account for myriad irregularities in English of
Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other
origin. Thus it uses a much more complex ruleset for coding than its
predecessor; for example, it tests for approximately 100 different contexts of
the use of the letter C alone.

History
-------
This is a copy of the Python Double Metaphone algorithm, taken from `Andrew
Collins' work`_, a Python implementation of an algorithm in C originally
created by Lawrence Philips. Since then, improvements have been made by several
contributors, viewable in the git history.

A ``resources`` directory is included with this project which contains the
following:

* the original C++ file by Lawrence Philips

* Kevin Atkinson's improvements to it

* a C implementation (for use in a Perl extension) by Maurice Aubrey

The contributors of the Python version, originally started by Andrew Collins
include:

* Andrew Collins

* Chris Leong

* Matthew Somerville

* Richard Barran

* Maximillian Dornseif

* Sebastien Metrot

* Duncan McGreggor

* Ollie Bennett

* Ian Beaver

* Alastair Houghton

Usage
=====

Running the Unit Tests
----------------------
``metaphone`` uses the ``unittest`` package from the standard library, and as
such, its tests are runnable by most test runners. If you have `nose`_ installed,
you can do the following::

  $ git clone https://github.com/oubiwann/metaphone.git
  $ cd metaphone
  $ nosetests -v .

If you have Twisted installed, you can do::

  $ trial ./metaphone

Example Code
------------

The unit tests are full of examples, so be sure to check those out. But here's
a taste::

  $ python
  >>> from metaphone import doublemetaphone
  >>> doublemetaphone("architect")
  (u"ARKTKT", u"")
  >>> doublemetaphone("bajador")
  (u"PJTR", u"PHTR")
  >>> doublemetaphone("Τι είναι το Unicode;")
  (u'NKT', u'')

In the Wild
===========

The following developers/projects make use of this library:

* `Andrew Collins`_ used his original code in various music projects and
  dealing with misspelled text from data provided by various web services. This
  was then integrated with Plone/Zope projects.

* `Matthew Somerville`_ uses it on Theatricalia to do people name matching, and
  it appears to work `quite well`_. The database stores the double metaphones
  for first and last names, and then upon searching simply computes the double
  metaphones of what has been entered and looks up anything that matches.

* `Duncan McGreggor`_ uses it on the `φarsk project`_ to provide greater full
  text search capabilities for Indo-European language word lists and
  dictionaries.

.. Links
.. _Wikipedia page: http://en.wikipedia.org/wiki/Metaphone#Double_Metaphone
.. _Soundex: http://en.wikipedia.org/wiki/Soundex
.. _Andrew Collins' work: http://www.atomodo.com/code/double-metaphone/metaphone.py/view
.. _Andrew Collins: http://www.atomodo.com/
.. _Matthew Somerville: https://github.com/dracos/
.. _Duncan McGreggor: https://github.com/oubiwann/
.. _quite well: http://theatricalia.com/search?q=chuck+iwugee
.. _φarsk project: https://github.com/oubiwann/tharsk
.. _nose: https://nose.readthedocs.org/
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/oubiwann/metaphone",
    "name": "Metaphone",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Andrew Collins",
    "author_email": "AtomBoy@SWCP.com",
    "download_url": "https://files.pythonhosted.org/packages/d4/ae/c9e4d007e32a6469be212da11d0b8e104d643f6f247d771742caf6ac6bb8/Metaphone-0.6.tar.gz",
    "platform": "UNKNOWN",
    "description": "~~~~~~~~~\nMetaphone\n~~~~~~~~~\n\n.. contents::\n   :depth: 2\n   :backlinks: top\n   :local:\n\nAbout\n=====\n\n*A Python implementation of the Metaphone and Double Metaphone algorithms*\n\nMetaphone\n---------\nAs described on the `Wikipedia page`_, the original Metaphone algorithm was\npublished in 1990 as an improvement over the `Soundex`_ algorithm. Like\nSoundex, it was limited to English-only use. The Metaphone algorithm does not\nproduce phonetic representations of an input word or name; rather, the output\nis an intentionally approximate phonetic representation. The approximate\nencoding is necessary to account for the way speakers vary their pronunciations\nand misspell or otherwise vary words and names they are trying to spell.\n\nDouble Metaphone\n----------------\nThe Double Metaphone phonetic encoding algorithm is the second generation of\nthe Metaphone algorithm. Its implementation was described in the June 2000\nissue of C/C++ Users Journal. It makes a number of fundamental design\nimprovements over the original Metaphone algorithm.\n\nIt is called \"Double\" because it can return both a primary and a secondary code\nfor a string; this accounts for some ambiguous cases as well as for multiple\nvariants of surnames with common ancestry. For example, encoding the name\n\"Smith\" yields a primary code of SM0 and a secondary code of XMT, while the\nname \"Schmidt\" yields a primary code of XMT and a secondary code of SMT--both\nhave XMT in common.\n\nDouble Metaphone tries to account for myriad irregularities in English of\nSlavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other\norigin. Thus it uses a much more complex ruleset for coding than its\npredecessor; for example, it tests for approximately 100 different contexts of\nthe use of the letter C alone.\n\nHistory\n-------\nThis is a copy of the Python Double Metaphone algorithm, taken from `Andrew\nCollins' work`_, a Python implementation of an algorithm in C originally\ncreated by Lawrence Philips. Since then, improvements have been made by several\ncontributors, viewable in the git history.\n\nA ``resources`` directory is included with this project which contains the\nfollowing:\n\n* the original C++ file by Lawrence Philips\n\n* Kevin Atkinson's improvements to it\n\n* a C implementation (for use in a Perl extension) by Maurice Aubrey\n\nThe contributors of the Python version, originally started by Andrew Collins\ninclude:\n\n* Andrew Collins\n\n* Chris Leong\n\n* Matthew Somerville\n\n* Richard Barran\n\n* Maximillian Dornseif\n\n* Sebastien Metrot\n\n* Duncan McGreggor\n\n* Ollie Bennett\n\n* Ian Beaver\n\n* Alastair Houghton\n\nUsage\n=====\n\nRunning the Unit Tests\n----------------------\n``metaphone`` uses the ``unittest`` package from the standard library, and as\nsuch, its tests are runnable by most test runners. If you have `nose`_ installed,\nyou can do the following::\n\n  $ git clone https://github.com/oubiwann/metaphone.git\n  $ cd metaphone\n  $ nosetests -v .\n\nIf you have Twisted installed, you can do::\n\n  $ trial ./metaphone\n\nExample Code\n------------\n\nThe unit tests are full of examples, so be sure to check those out. But here's\na taste::\n\n  $ python\n  >>> from metaphone import doublemetaphone\n  >>> doublemetaphone(\"architect\")\n  (u\"ARKTKT\", u\"\")\n  >>> doublemetaphone(\"bajador\")\n  (u\"PJTR\", u\"PHTR\")\n  >>> doublemetaphone(\"\u03a4\u03b9 \u03b5\u03af\u03bd\u03b1\u03b9 \u03c4\u03bf Unicode;\")\n  (u'NKT', u'')\n\nIn the Wild\n===========\n\nThe following developers/projects make use of this library:\n\n* `Andrew Collins`_ used his original code in various music projects and\n  dealing with misspelled text from data provided by various web services. This\n  was then integrated with Plone/Zope projects.\n\n* `Matthew Somerville`_ uses it on Theatricalia to do people name matching, and\n  it appears to work `quite well`_. The database stores the double metaphones\n  for first and last names, and then upon searching simply computes the double\n  metaphones of what has been entered and looks up anything that matches.\n\n* `Duncan McGreggor`_ uses it on the `\u03c6arsk project`_ to provide greater full\n  text search capabilities for Indo-European language word lists and\n  dictionaries.\n\n.. Links\n.. _Wikipedia page: http://en.wikipedia.org/wiki/Metaphone#Double_Metaphone\n.. _Soundex: http://en.wikipedia.org/wiki/Soundex\n.. _Andrew Collins' work: http://www.atomodo.com/code/double-metaphone/metaphone.py/view\n.. _Andrew Collins: http://www.atomodo.com/\n.. _Matthew Somerville: https://github.com/dracos/\n.. _Duncan McGreggor: https://github.com/oubiwann/\n.. _quite well: http://theatricalia.com/search?q=chuck+iwugee\n.. _\u03c6arsk project: https://github.com/oubiwann/tharsk\n.. _nose: https://nose.readthedocs.org/",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "A Python implementation of the metaphone and double metaphone algorithms.",
    "version": "0.6",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d4aec9e4d007e32a6469be212da11d0b8e104d643f6f247d771742caf6ac6bb8",
                "md5": "81d319c20720bd0a1d2e8529002caf06",
                "sha256": "ad0beadca66cb7ec6ede71ef72bb02da097c493ddf159930d6340bc83f53da27"
            },
            "downloads": -1,
            "filename": "Metaphone-0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "81d319c20720bd0a1d2e8529002caf06",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14075,
            "upload_time": "2016-08-24T14:37:29",
            "upload_time_iso_8601": "2016-08-24T14:37:29.687986Z",
            "url": "https://files.pythonhosted.org/packages/d4/ae/c9e4d007e32a6469be212da11d0b8e104d643f6f247d771742caf6ac6bb8/Metaphone-0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2016-08-24 14:37:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "oubiwann",
    "github_project": "metaphone",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "metaphone"
}
        
Elapsed time: 0.03483s