DAWG-Python


NameDAWG-Python JSON
Version 0.7.2 PyPI version JSON
download
home_pagehttps://github.com/kmike/DAWG-Python/
SummaryPure-python reader for DAWGs (DAFSAs) created by dawgdic C++ library or DAWG Python extension.
upload_time2015-04-18 16:59:55
maintainerNone
docs_urlNone
authorMikhail Korobov
requires_pythonNone
licenseUNKNOWN
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage
            DAWG-Python
===========

.. image:: https://travis-ci.org/kmike/DAWG-Python.png?branch=master
    :target: https://travis-ci.org/kmike/DAWG-Python
.. image:: https://coveralls.io/repos/kmike/DAWG-Python/badge.png?branch=master
    :target: https://coveralls.io/r/kmike/DAWG-Python


This pure-python package provides read-only access for files
created by `dawgdic`_ C++ library and `DAWG`_ python package.

.. _dawgdic: https://code.google.com/p/dawgdic/
.. _DAWG: https://github.com/kmike/DAWG

This package is not capable of creating DAWGs. It works with DAWGs built by
`dawgdic`_ C++ library or `DAWG`_ Python extension module. The main purpose
of DAWG-Python is to provide an access to DAWGs without requiring compiled
extensions. It is also quite fast under PyPy (see benchmarks).

Installation
============

pip install DAWG-Python

Usage
=====

The aim of DAWG-Python is to be API- and binary-compatible
with `DAWG`_ when it is possible.

First, you have to create a dawg using DAWG_ module::

    import dawg
    d = dawg.DAWG(data)
    d.save('words.dawg')

And then this dawg can be loaded without requiring C extensions::

    import dawg_python
    d = dawg_python.DAWG().load('words.dawg')

Please consult `DAWG`_ docs for detailed usage. Some features
(like constructor parameters or ``save`` method) are intentionally
unsupported.

Benchmarks
==========

Benchmark results (100k unicode words, integer values (lenghts of the words),
PyPy 1.9, macbook air i5 1.8 Ghz)::

    dict __getitem__ (hits):        11.090M ops/sec
    DAWG __getitem__ (hits):        not supported
    BytesDAWG __getitem__ (hits):   0.493M ops/sec
    RecordDAWG __getitem__ (hits):  0.376M ops/sec

    dict get() (hits):              10.127M ops/sec
    DAWG get() (hits):              not supported
    BytesDAWG get() (hits):         0.481M ops/sec
    RecordDAWG get() (hits):        0.402M ops/sec
    dict get() (misses):            14.885M ops/sec
    DAWG get() (misses):            not supported
    BytesDAWG get() (misses):       1.259M ops/sec
    RecordDAWG get() (misses):      1.337M ops/sec

    dict __contains__ (hits):           11.100M ops/sec
    DAWG __contains__ (hits):           1.317M ops/sec
    BytesDAWG __contains__ (hits):      1.107M ops/sec
    RecordDAWG __contains__ (hits):     1.095M ops/sec

    dict __contains__ (misses):         10.567M ops/sec
    DAWG __contains__ (misses):         1.902M ops/sec
    BytesDAWG __contains__ (misses):    1.873M ops/sec
    RecordDAWG __contains__ (misses):   1.862M ops/sec

    dict items():           44.401 ops/sec
    DAWG items():           not supported
    BytesDAWG items():      3.226 ops/sec
    RecordDAWG items():     2.987 ops/sec
    dict keys():            426.250 ops/sec
    DAWG keys():            not supported
    BytesDAWG keys():       6.050 ops/sec
    RecordDAWG keys():      6.363 ops/sec

    DAWG.prefixes (hits):    0.756M ops/sec
    DAWG.prefixes (mixed):   1.965M ops/sec
    DAWG.prefixes (misses):  1.773M ops/sec

    RecordDAWG.keys(prefix="xxx"), avg_len(res)==415:       1.429K ops/sec
    RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17:      36.994K ops/sec
    RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3:    121.897K ops/sec
    RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/sec
    RecordDAWG.keys(prefix="xxx"), NON_EXISTING:            2450.898K ops/sec

Under CPython expect it to be about 50x slower.
Memory consumption of DAWG-Python should be the same as of `DAWG`_.

.. _marisa-trie: https://github.com/kmike/marisa-trie

Current limitations
===================

* This package is not capable of creating DAWGs;
* all the limitations of `DAWG`_ apply.

Contributions are welcome!


Contributing
============

Development happens at github: https://github.com/kmike/DAWG-Python
Issue tracker: https://github.com/kmike/DAWG-Python/issues

Feel free to submit ideas, bugs or pull requests.

Running tests and benchmarks
----------------------------

Make sure `tox`_ is installed and run

::

    $ tox

from the source checkout. Tests should pass under python 2.6, 2.7, 3.2, 3.3,
3.4 and PyPy >= 1.9.

In order to run benchmarks, type

::

    $ tox -c bench.ini -e pypy

This runs benchmarks under PyPy (they are about 50x slower under CPython).

.. _tox: http://tox.testrun.org

Authors & Contributors
----------------------

* Mikhail Korobov <kmike84@gmail.com>

The algorithms are from `dawgdic`_ C++ library by Susumu Yata & contributors.

License
=======

This package is licensed under MIT License.



Changes
=======

0.7.2 (2015-04-18)
------------------

- minor speedup;
- bitbucket mirror is no longer maintained.

0.7.1 (2014-06-05)
------------------

- Switch to setuptools;
- upload wheel tp pypi;
- check Python 3.4 compatibility.

0.7 (2013-10-13)
----------------

IntDAWG and IntCompletionDAWG are implemented.

0.6 (2013-03-23)
----------------

Use less shared state internally. This should fix thread-safety bugs and
make iterkeys/iteritems reenterant.

0.5.1 (2013-03-01)
------------------

Internal tweaks: memory usage is reduced; something is a bit faster,
something is a bit slower.

0.5 (2012-10-08)
----------------

Storage scheme is updated to match DAWG==0.5. This enables
the alphabetical ordering of ``BytesDAWG`` and ``RecordDAWG`` items.

In order to read ``BytesDAWG`` or ``RecordDAWG`` created with
versions of DAWG < 0.5 use ``payload_separator`` constructor argument::

    >>> BytesDAWG(payload_separator=b'\xff').load('old.dawg')


0.3.1 (2012-10-01)
------------------

Bug with empty DAWGs is fixed.

0.3 (2012-09-26)
----------------

- ``iterkeys`` and ``iteritems`` methods.

0.2 (2012-09-24)
----------------

``prefixes`` support.

0.1 (2012-09-20)
----------------

Initial release.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kmike/DAWG-Python/",
    "name": "DAWG-Python",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Mikhail Korobov",
    "author_email": "kmike84@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b8/33/fd52c8ec329641a7730fad662ba3f29f98c45e4bea552cceee569b00c915/DAWG-Python-0.7.2.tar.gz",
    "platform": "UNKNOWN",
    "description": "DAWG-Python\n===========\n\n.. image:: https://travis-ci.org/kmike/DAWG-Python.png?branch=master\n    :target: https://travis-ci.org/kmike/DAWG-Python\n.. image:: https://coveralls.io/repos/kmike/DAWG-Python/badge.png?branch=master\n    :target: https://coveralls.io/r/kmike/DAWG-Python\n\n\nThis pure-python package provides read-only access for files\ncreated by `dawgdic`_ C++ library and `DAWG`_ python package.\n\n.. _dawgdic: https://code.google.com/p/dawgdic/\n.. _DAWG: https://github.com/kmike/DAWG\n\nThis package is not capable of creating DAWGs. It works with DAWGs built by\n`dawgdic`_ C++ library or `DAWG`_ Python extension module. The main purpose\nof DAWG-Python is to provide an access to DAWGs without requiring compiled\nextensions. It is also quite fast under PyPy (see benchmarks).\n\nInstallation\n============\n\npip install DAWG-Python\n\nUsage\n=====\n\nThe aim of DAWG-Python is to be API- and binary-compatible\nwith `DAWG`_ when it is possible.\n\nFirst, you have to create a dawg using DAWG_ module::\n\n    import dawg\n    d = dawg.DAWG(data)\n    d.save('words.dawg')\n\nAnd then this dawg can be loaded without requiring C extensions::\n\n    import dawg_python\n    d = dawg_python.DAWG().load('words.dawg')\n\nPlease consult `DAWG`_ docs for detailed usage. Some features\n(like constructor parameters or ``save`` method) are intentionally\nunsupported.\n\nBenchmarks\n==========\n\nBenchmark results (100k unicode words, integer values (lenghts of the words),\nPyPy 1.9, macbook air i5 1.8 Ghz)::\n\n    dict __getitem__ (hits):        11.090M ops/sec\n    DAWG __getitem__ (hits):        not supported\n    BytesDAWG __getitem__ (hits):   0.493M ops/sec\n    RecordDAWG __getitem__ (hits):  0.376M ops/sec\n\n    dict get() (hits):              10.127M ops/sec\n    DAWG get() (hits):              not supported\n    BytesDAWG get() (hits):         0.481M ops/sec\n    RecordDAWG get() (hits):        0.402M ops/sec\n    dict get() (misses):            14.885M ops/sec\n    DAWG get() (misses):            not supported\n    BytesDAWG get() (misses):       1.259M ops/sec\n    RecordDAWG get() (misses):      1.337M ops/sec\n\n    dict __contains__ (hits):           11.100M ops/sec\n    DAWG __contains__ (hits):           1.317M ops/sec\n    BytesDAWG __contains__ (hits):      1.107M ops/sec\n    RecordDAWG __contains__ (hits):     1.095M ops/sec\n\n    dict __contains__ (misses):         10.567M ops/sec\n    DAWG __contains__ (misses):         1.902M ops/sec\n    BytesDAWG __contains__ (misses):    1.873M ops/sec\n    RecordDAWG __contains__ (misses):   1.862M ops/sec\n\n    dict items():           44.401 ops/sec\n    DAWG items():           not supported\n    BytesDAWG items():      3.226 ops/sec\n    RecordDAWG items():     2.987 ops/sec\n    dict keys():            426.250 ops/sec\n    DAWG keys():            not supported\n    BytesDAWG keys():       6.050 ops/sec\n    RecordDAWG keys():      6.363 ops/sec\n\n    DAWG.prefixes (hits):    0.756M ops/sec\n    DAWG.prefixes (mixed):   1.965M ops/sec\n    DAWG.prefixes (misses):  1.773M ops/sec\n\n    RecordDAWG.keys(prefix=\"xxx\"), avg_len(res)==415:       1.429K ops/sec\n    RecordDAWG.keys(prefix=\"xxxxx\"), avg_len(res)==17:      36.994K ops/sec\n    RecordDAWG.keys(prefix=\"xxxxxxxx\"), avg_len(res)==3:    121.897K ops/sec\n    RecordDAWG.keys(prefix=\"xxxxx..xx\"), avg_len(res)==1.4: 265.015K ops/sec\n    RecordDAWG.keys(prefix=\"xxx\"), NON_EXISTING:            2450.898K ops/sec\n\nUnder CPython expect it to be about 50x slower.\nMemory consumption of DAWG-Python should be the same as of `DAWG`_.\n\n.. _marisa-trie: https://github.com/kmike/marisa-trie\n\nCurrent limitations\n===================\n\n* This package is not capable of creating DAWGs;\n* all the limitations of `DAWG`_ apply.\n\nContributions are welcome!\n\n\nContributing\n============\n\nDevelopment happens at github: https://github.com/kmike/DAWG-Python\nIssue tracker: https://github.com/kmike/DAWG-Python/issues\n\nFeel free to submit ideas, bugs or pull requests.\n\nRunning tests and benchmarks\n----------------------------\n\nMake sure `tox`_ is installed and run\n\n::\n\n    $ tox\n\nfrom the source checkout. Tests should pass under python 2.6, 2.7, 3.2, 3.3,\n3.4 and PyPy >= 1.9.\n\nIn order to run benchmarks, type\n\n::\n\n    $ tox -c bench.ini -e pypy\n\nThis runs benchmarks under PyPy (they are about 50x slower under CPython).\n\n.. _tox: http://tox.testrun.org\n\nAuthors & Contributors\n----------------------\n\n* Mikhail Korobov <kmike84@gmail.com>\n\nThe algorithms are from `dawgdic`_ C++ library by Susumu Yata & contributors.\n\nLicense\n=======\n\nThis package is licensed under MIT License.\n\n\n\nChanges\n=======\n\n0.7.2 (2015-04-18)\n------------------\n\n- minor speedup;\n- bitbucket mirror is no longer maintained.\n\n0.7.1 (2014-06-05)\n------------------\n\n- Switch to setuptools;\n- upload wheel tp pypi;\n- check Python 3.4 compatibility.\n\n0.7 (2013-10-13)\n----------------\n\nIntDAWG and IntCompletionDAWG are implemented.\n\n0.6 (2013-03-23)\n----------------\n\nUse less shared state internally. This should fix thread-safety bugs and\nmake iterkeys/iteritems reenterant.\n\n0.5.1 (2013-03-01)\n------------------\n\nInternal tweaks: memory usage is reduced; something is a bit faster,\nsomething is a bit slower.\n\n0.5 (2012-10-08)\n----------------\n\nStorage scheme is updated to match DAWG==0.5. This enables\nthe alphabetical ordering of ``BytesDAWG`` and ``RecordDAWG`` items.\n\nIn order to read ``BytesDAWG`` or ``RecordDAWG`` created with\nversions of DAWG < 0.5 use ``payload_separator`` constructor argument::\n\n    >>> BytesDAWG(payload_separator=b'\\xff').load('old.dawg')\n\n\n0.3.1 (2012-10-01)\n------------------\n\nBug with empty DAWGs is fixed.\n\n0.3 (2012-09-26)\n----------------\n\n- ``iterkeys`` and ``iteritems`` methods.\n\n0.2 (2012-09-24)\n----------------\n\n``prefixes`` support.\n\n0.1 (2012-09-20)\n----------------\n\nInitial release.",
    "bugtrack_url": null,
    "license": "UNKNOWN",
    "summary": "Pure-python reader for DAWGs (DAFSAs) created by dawgdic C++ library or DAWG Python extension.",
    "version": "0.7.2",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6a84ff1ce2071d4c650ec85745766c0047ccc3b5036f1d03559fd46bb38b5eeb",
                "md5": "2a9b8d02b872ac723588542e3542d1df",
                "sha256": "4941d5df081b8d6fcb4597e073a9f60d5c1ccc9d17cd733e8744d7ecfec94ef3"
            },
            "downloads": -1,
            "filename": "DAWG_Python-0.7.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2a9b8d02b872ac723588542e3542d1df",
            "packagetype": "bdist_wheel",
            "python_version": "3.4",
            "requires_python": null,
            "size": 11711,
            "upload_time": "2015-04-18T17:00:08",
            "upload_time_iso_8601": "2015-04-18T17:00:08.938162Z",
            "url": "https://files.pythonhosted.org/packages/6a/84/ff1ce2071d4c650ec85745766c0047ccc3b5036f1d03559fd46bb38b5eeb/DAWG_Python-0.7.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b833fd52c8ec329641a7730fad662ba3f29f98c45e4bea552cceee569b00c915",
                "md5": "25835be7d559de75dde5e28a254427ad",
                "sha256": "4a5e3286e6261cca02f205cfd5516a7ab10190fa30c51c28d345808f595e3421"
            },
            "downloads": -1,
            "filename": "DAWG-Python-0.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "25835be7d559de75dde5e28a254427ad",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9007,
            "upload_time": "2015-04-18T16:59:55",
            "upload_time_iso_8601": "2015-04-18T16:59:55.184543Z",
            "url": "https://files.pythonhosted.org/packages/b8/33/fd52c8ec329641a7730fad662ba3f29f98c45e4bea552cceee569b00c915/DAWG-Python-0.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2015-04-18 16:59:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "kmike",
    "github_project": "DAWG-Python",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": false,
    "tox": true,
    "lcname": "dawg-python"
}
        
Elapsed time: 0.05434s