pybktree


Namepybktree JSON
Version 1.1 PyPI version JSON
download
home_pagehttps://github.com/Jetsetter/pybktree
SummaryBK-tree data structure to allow fast querying of "close" matches
upload_time2017-08-22 13:38:02
maintainer
docs_urlNone
authorBen Hoyt
requires_python
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            pybktree
========

pybktree is a generic, pure Python implementation of a `BK-tree`_ data
structure, which allows fast querying of "close" matches (for example, matches
with small hamming distance or Levenshtein distance). This module is based on
the algorithm by Nick Johnson in his `blog article on BK-trees`_.

The library is `on the Python Package Index (PyPI)`_ and works on both Python
3 and Python 2.7. To install it, fire up a command prompt, activate your
virtual environment if you're using one, and type:

::

    pip install pybktree

Example usage:

.. code:: python

    >>> tree = pybktree.BKTree(pybktree.hamming_distance, [0, 4, 5, 14])
    >>> tree.add(15)              # add element 15
    >>> sorted(tree)              # BKTree instances are iterable
    [0, 4, 5, 14, 15]
    >>> sorted(tree.find(13, 1))  # find elements at most 1 bit away from element 13
    [(1, 5), (1, 15)]

For large trees and fairly small N when calling ``find()``, using a BKTree is
much faster than doing a linear search. This is especially good when you're
de-duping a few hundred thousand photos -- with a linear search that would
become a very slow, O(N²) operation. With a BKTree, it's more like O(N log N).

Read the code in `pybktree.py`_ for more details – it's pretty small!

Other BK-tree modules I found on GitHub while writing this one:

* `ahupp/bktree`_: this one is pretty good, but it's not on PyPI, and it's
  recursive
* `ryanfox/bktree`_: this one is hard to customize, ``search()`` doesn't
  return distances, it's slower, and was buggy (though I think he fixed it
  recently)

pybktree was written by `Ben Hoyt`_ for `Jetsetter`_ and is licensed with a
permissive MIT license (see `LICENSE.txt`_).


.. _BK-tree: https://en.wikipedia.org/wiki/BK-tree
.. _blog article on BK-trees: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees
.. _on the Python Package Index (PyPI): https://pypi.python.org/pypi/pybktree
.. _pybktree.py: https://github.com/Jetsetter/pybktree/blob/master/pybktree.py
.. _ahupp/bktree: https://github.com/ahupp/bktree
.. _ryanfox/bktree: https://github.com/ryanfox/bktree
.. _Ben Hoyt: http://benhoyt.com/
.. _Jetsetter: http://www.jetsetter.com/
.. _LICENSE.txt: https://github.com/Jetsetter/pybktree/blob/master/LICENSE.txt

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Jetsetter/pybktree",
    "name": "pybktree",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Ben Hoyt",
    "author_email": "benhoyt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/6b/38/a5fba29cf727ca8249d3840016e1f7919896111aa28d2cb96f013eaa480d/pybktree-1.1.tar.gz",
    "platform": "",
    "description": "pybktree\n========\n\npybktree is a generic, pure Python implementation of a `BK-tree`_ data\nstructure, which allows fast querying of \"close\" matches (for example, matches\nwith small hamming distance or Levenshtein distance). This module is based on\nthe algorithm by Nick Johnson in his `blog article on BK-trees`_.\n\nThe library is `on the Python Package Index (PyPI)`_ and works on both Python\n3 and Python 2.7. To install it, fire up a command prompt, activate your\nvirtual environment if you're using one, and type:\n\n::\n\n    pip install pybktree\n\nExample usage:\n\n.. code:: python\n\n    >>> tree = pybktree.BKTree(pybktree.hamming_distance, [0, 4, 5, 14])\n    >>> tree.add(15)              # add element 15\n    >>> sorted(tree)              # BKTree instances are iterable\n    [0, 4, 5, 14, 15]\n    >>> sorted(tree.find(13, 1))  # find elements at most 1 bit away from element 13\n    [(1, 5), (1, 15)]\n\nFor large trees and fairly small N when calling ``find()``, using a BKTree is\nmuch faster than doing a linear search. This is especially good when you're\nde-duping a few hundred thousand photos -- with a linear search that would\nbecome a very slow, O(N\u00b2) operation. With a BKTree, it's more like O(N log N).\n\nRead the code in `pybktree.py`_ for more details \u2013 it's pretty small!\n\nOther BK-tree modules I found on GitHub while writing this one:\n\n* `ahupp/bktree`_: this one is pretty good, but it's not on PyPI, and it's\n  recursive\n* `ryanfox/bktree`_: this one is hard to customize, ``search()`` doesn't\n  return distances, it's slower, and was buggy (though I think he fixed it\n  recently)\n\npybktree was written by `Ben Hoyt`_ for `Jetsetter`_ and is licensed with a\npermissive MIT license (see `LICENSE.txt`_).\n\n\n.. _BK-tree: https://en.wikipedia.org/wiki/BK-tree\n.. _blog article on BK-trees: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees\n.. _on the Python Package Index (PyPI): https://pypi.python.org/pypi/pybktree\n.. _pybktree.py: https://github.com/Jetsetter/pybktree/blob/master/pybktree.py\n.. _ahupp/bktree: https://github.com/ahupp/bktree\n.. _ryanfox/bktree: https://github.com/ryanfox/bktree\n.. _Ben Hoyt: http://benhoyt.com/\n.. _Jetsetter: http://www.jetsetter.com/\n.. _LICENSE.txt: https://github.com/Jetsetter/pybktree/blob/master/LICENSE.txt\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "BK-tree data structure to allow fast querying of \"close\" matches",
    "version": "1.1",
    "project_urls": {
        "Homepage": "https://github.com/Jetsetter/pybktree"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b38a5fba29cf727ca8249d3840016e1f7919896111aa28d2cb96f013eaa480d",
                "md5": "9c809214fc979391fdd75dae1942cfb2",
                "sha256": "eec0037cdd3d7553e6d72435a4379bede64be17c6712f149e485169638154d2b"
            },
            "downloads": -1,
            "filename": "pybktree-1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9c809214fc979391fdd75dae1942cfb2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4544,
            "upload_time": "2017-08-22T13:38:02",
            "upload_time_iso_8601": "2017-08-22T13:38:02.873552Z",
            "url": "https://files.pythonhosted.org/packages/6b/38/a5fba29cf727ca8249d3840016e1f7919896111aa28d2cb96f013eaa480d/pybktree-1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2017-08-22 13:38:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Jetsetter",
    "github_project": "pybktree",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pybktree"
}
        
Elapsed time: 0.21359s