DAWG-Python
===========
.. image:: https://travis-ci.org/kmike/DAWG-Python.png?branch=master
:target: https://travis-ci.org/kmike/DAWG-Python
.. image:: https://coveralls.io/repos/kmike/DAWG-Python/badge.png?branch=master
:target: https://coveralls.io/r/kmike/DAWG-Python
This pure-python package provides read-only access for files
created by `dawgdic`_ C++ library and `DAWG`_ python package.
.. _dawgdic: https://code.google.com/p/dawgdic/
.. _DAWG: https://github.com/kmike/DAWG
This package is not capable of creating DAWGs. It works with DAWGs built by
`dawgdic`_ C++ library or `DAWG`_ Python extension module. The main purpose
of DAWG-Python is to provide an access to DAWGs without requiring compiled
extensions. It is also quite fast under PyPy (see benchmarks).
Installation
============
pip install DAWG-Python
Usage
=====
The aim of DAWG-Python is to be API- and binary-compatible
with `DAWG`_ when it is possible.
First, you have to create a dawg using DAWG_ module::
import dawg
d = dawg.DAWG(data)
d.save('words.dawg')
And then this dawg can be loaded without requiring C extensions::
import dawg_python
d = dawg_python.DAWG().load('words.dawg')
Please consult `DAWG`_ docs for detailed usage. Some features
(like constructor parameters or ``save`` method) are intentionally
unsupported.
Benchmarks
==========
Benchmark results (100k unicode words, integer values (lenghts of the words),
PyPy 1.9, macbook air i5 1.8 Ghz)::
dict __getitem__ (hits): 11.090M ops/sec
DAWG __getitem__ (hits): not supported
BytesDAWG __getitem__ (hits): 0.493M ops/sec
RecordDAWG __getitem__ (hits): 0.376M ops/sec
dict get() (hits): 10.127M ops/sec
DAWG get() (hits): not supported
BytesDAWG get() (hits): 0.481M ops/sec
RecordDAWG get() (hits): 0.402M ops/sec
dict get() (misses): 14.885M ops/sec
DAWG get() (misses): not supported
BytesDAWG get() (misses): 1.259M ops/sec
RecordDAWG get() (misses): 1.337M ops/sec
dict __contains__ (hits): 11.100M ops/sec
DAWG __contains__ (hits): 1.317M ops/sec
BytesDAWG __contains__ (hits): 1.107M ops/sec
RecordDAWG __contains__ (hits): 1.095M ops/sec
dict __contains__ (misses): 10.567M ops/sec
DAWG __contains__ (misses): 1.902M ops/sec
BytesDAWG __contains__ (misses): 1.873M ops/sec
RecordDAWG __contains__ (misses): 1.862M ops/sec
dict items(): 44.401 ops/sec
DAWG items(): not supported
BytesDAWG items(): 3.226 ops/sec
RecordDAWG items(): 2.987 ops/sec
dict keys(): 426.250 ops/sec
DAWG keys(): not supported
BytesDAWG keys(): 6.050 ops/sec
RecordDAWG keys(): 6.363 ops/sec
DAWG.prefixes (hits): 0.756M ops/sec
DAWG.prefixes (mixed): 1.965M ops/sec
DAWG.prefixes (misses): 1.773M ops/sec
RecordDAWG.keys(prefix="xxx"), avg_len(res)==415: 1.429K ops/sec
RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17: 36.994K ops/sec
RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3: 121.897K ops/sec
RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/sec
RecordDAWG.keys(prefix="xxx"), NON_EXISTING: 2450.898K ops/sec
Under CPython expect it to be about 50x slower.
Memory consumption of DAWG-Python should be the same as of `DAWG`_.
.. _marisa-trie: https://github.com/kmike/marisa-trie
Current limitations
===================
* This package is not capable of creating DAWGs;
* all the limitations of `DAWG`_ apply.
Contributions are welcome!
Contributing
============
Development happens at github: https://github.com/kmike/DAWG-Python
Issue tracker: https://github.com/kmike/DAWG-Python/issues
Feel free to submit ideas, bugs or pull requests.
Running tests and benchmarks
----------------------------
Make sure `tox`_ is installed and run
::
$ tox
from the source checkout. Tests should pass under python 2.6, 2.7, 3.2, 3.3,
3.4 and PyPy >= 1.9.
In order to run benchmarks, type
::
$ tox -c bench.ini -e pypy
This runs benchmarks under PyPy (they are about 50x slower under CPython).
.. _tox: http://tox.testrun.org
Authors & Contributors
----------------------
* Mikhail Korobov <kmike84@gmail.com>
The algorithms are from `dawgdic`_ C++ library by Susumu Yata & contributors.
License
=======
This package is licensed under MIT License.
Changes
=======
0.7.2 (2015-04-18)
------------------
- minor speedup;
- bitbucket mirror is no longer maintained.
0.7.1 (2014-06-05)
------------------
- Switch to setuptools;
- upload wheel tp pypi;
- check Python 3.4 compatibility.
0.7 (2013-10-13)
----------------
IntDAWG and IntCompletionDAWG are implemented.
0.6 (2013-03-23)
----------------
Use less shared state internally. This should fix thread-safety bugs and
make iterkeys/iteritems reenterant.
0.5.1 (2013-03-01)
------------------
Internal tweaks: memory usage is reduced; something is a bit faster,
something is a bit slower.
0.5 (2012-10-08)
----------------
Storage scheme is updated to match DAWG==0.5. This enables
the alphabetical ordering of ``BytesDAWG`` and ``RecordDAWG`` items.
In order to read ``BytesDAWG`` or ``RecordDAWG`` created with
versions of DAWG < 0.5 use ``payload_separator`` constructor argument::
>>> BytesDAWG(payload_separator=b'\xff').load('old.dawg')
0.3.1 (2012-10-01)
------------------
Bug with empty DAWGs is fixed.
0.3 (2012-09-26)
----------------
- ``iterkeys`` and ``iteritems`` methods.
0.2 (2012-09-24)
----------------
``prefixes`` support.
0.1 (2012-09-20)
----------------
Initial release.
Raw data
{
"_id": null,
"home_page": "https://github.com/kmike/DAWG-Python/",
"name": "DAWG-Python",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Mikhail Korobov",
"author_email": "kmike84@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b8/33/fd52c8ec329641a7730fad662ba3f29f98c45e4bea552cceee569b00c915/DAWG-Python-0.7.2.tar.gz",
"platform": "UNKNOWN",
"description": "DAWG-Python\n===========\n\n.. image:: https://travis-ci.org/kmike/DAWG-Python.png?branch=master\n :target: https://travis-ci.org/kmike/DAWG-Python\n.. image:: https://coveralls.io/repos/kmike/DAWG-Python/badge.png?branch=master\n :target: https://coveralls.io/r/kmike/DAWG-Python\n\n\nThis pure-python package provides read-only access for files\ncreated by `dawgdic`_ C++ library and `DAWG`_ python package.\n\n.. _dawgdic: https://code.google.com/p/dawgdic/\n.. _DAWG: https://github.com/kmike/DAWG\n\nThis package is not capable of creating DAWGs. It works with DAWGs built by\n`dawgdic`_ C++ library or `DAWG`_ Python extension module. The main purpose\nof DAWG-Python is to provide an access to DAWGs without requiring compiled\nextensions. It is also quite fast under PyPy (see benchmarks).\n\nInstallation\n============\n\npip install DAWG-Python\n\nUsage\n=====\n\nThe aim of DAWG-Python is to be API- and binary-compatible\nwith `DAWG`_ when it is possible.\n\nFirst, you have to create a dawg using DAWG_ module::\n\n import dawg\n d = dawg.DAWG(data)\n d.save('words.dawg')\n\nAnd then this dawg can be loaded without requiring C extensions::\n\n import dawg_python\n d = dawg_python.DAWG().load('words.dawg')\n\nPlease consult `DAWG`_ docs for detailed usage. Some features\n(like constructor parameters or ``save`` method) are intentionally\nunsupported.\n\nBenchmarks\n==========\n\nBenchmark results (100k unicode words, integer values (lenghts of the words),\nPyPy 1.9, macbook air i5 1.8 Ghz)::\n\n dict __getitem__ (hits): 11.090M ops/sec\n DAWG __getitem__ (hits): not supported\n BytesDAWG __getitem__ (hits): 0.493M ops/sec\n RecordDAWG __getitem__ (hits): 0.376M ops/sec\n\n dict get() (hits): 10.127M ops/sec\n DAWG get() (hits): not supported\n BytesDAWG get() (hits): 0.481M ops/sec\n RecordDAWG get() (hits): 0.402M ops/sec\n dict get() (misses): 14.885M ops/sec\n DAWG get() (misses): not supported\n BytesDAWG get() (misses): 1.259M ops/sec\n RecordDAWG get() (misses): 1.337M ops/sec\n\n dict __contains__ (hits): 11.100M ops/sec\n DAWG __contains__ (hits): 1.317M ops/sec\n BytesDAWG __contains__ (hits): 1.107M ops/sec\n RecordDAWG __contains__ (hits): 1.095M ops/sec\n\n dict __contains__ (misses): 10.567M ops/sec\n DAWG __contains__ (misses): 1.902M ops/sec\n BytesDAWG __contains__ (misses): 1.873M ops/sec\n RecordDAWG __contains__ (misses): 1.862M ops/sec\n\n dict items(): 44.401 ops/sec\n DAWG items(): not supported\n BytesDAWG items(): 3.226 ops/sec\n RecordDAWG items(): 2.987 ops/sec\n dict keys(): 426.250 ops/sec\n DAWG keys(): not supported\n BytesDAWG keys(): 6.050 ops/sec\n RecordDAWG keys(): 6.363 ops/sec\n\n DAWG.prefixes (hits): 0.756M ops/sec\n DAWG.prefixes (mixed): 1.965M ops/sec\n DAWG.prefixes (misses): 1.773M ops/sec\n\n RecordDAWG.keys(prefix=\"xxx\"), avg_len(res)==415: 1.429K ops/sec\n RecordDAWG.keys(prefix=\"xxxxx\"), avg_len(res)==17: 36.994K ops/sec\n RecordDAWG.keys(prefix=\"xxxxxxxx\"), avg_len(res)==3: 121.897K ops/sec\n RecordDAWG.keys(prefix=\"xxxxx..xx\"), avg_len(res)==1.4: 265.015K ops/sec\n RecordDAWG.keys(prefix=\"xxx\"), NON_EXISTING: 2450.898K ops/sec\n\nUnder CPython expect it to be about 50x slower.\nMemory consumption of DAWG-Python should be the same as of `DAWG`_.\n\n.. _marisa-trie: https://github.com/kmike/marisa-trie\n\nCurrent limitations\n===================\n\n* This package is not capable of creating DAWGs;\n* all the limitations of `DAWG`_ apply.\n\nContributions are welcome!\n\n\nContributing\n============\n\nDevelopment happens at github: https://github.com/kmike/DAWG-Python\nIssue tracker: https://github.com/kmike/DAWG-Python/issues\n\nFeel free to submit ideas, bugs or pull requests.\n\nRunning tests and benchmarks\n----------------------------\n\nMake sure `tox`_ is installed and run\n\n::\n\n $ tox\n\nfrom the source checkout. Tests should pass under python 2.6, 2.7, 3.2, 3.3,\n3.4 and PyPy >= 1.9.\n\nIn order to run benchmarks, type\n\n::\n\n $ tox -c bench.ini -e pypy\n\nThis runs benchmarks under PyPy (they are about 50x slower under CPython).\n\n.. _tox: http://tox.testrun.org\n\nAuthors & Contributors\n----------------------\n\n* Mikhail Korobov <kmike84@gmail.com>\n\nThe algorithms are from `dawgdic`_ C++ library by Susumu Yata & contributors.\n\nLicense\n=======\n\nThis package is licensed under MIT License.\n\n\n\nChanges\n=======\n\n0.7.2 (2015-04-18)\n------------------\n\n- minor speedup;\n- bitbucket mirror is no longer maintained.\n\n0.7.1 (2014-06-05)\n------------------\n\n- Switch to setuptools;\n- upload wheel tp pypi;\n- check Python 3.4 compatibility.\n\n0.7 (2013-10-13)\n----------------\n\nIntDAWG and IntCompletionDAWG are implemented.\n\n0.6 (2013-03-23)\n----------------\n\nUse less shared state internally. This should fix thread-safety bugs and\nmake iterkeys/iteritems reenterant.\n\n0.5.1 (2013-03-01)\n------------------\n\nInternal tweaks: memory usage is reduced; something is a bit faster,\nsomething is a bit slower.\n\n0.5 (2012-10-08)\n----------------\n\nStorage scheme is updated to match DAWG==0.5. This enables\nthe alphabetical ordering of ``BytesDAWG`` and ``RecordDAWG`` items.\n\nIn order to read ``BytesDAWG`` or ``RecordDAWG`` created with\nversions of DAWG < 0.5 use ``payload_separator`` constructor argument::\n\n >>> BytesDAWG(payload_separator=b'\\xff').load('old.dawg')\n\n\n0.3.1 (2012-10-01)\n------------------\n\nBug with empty DAWGs is fixed.\n\n0.3 (2012-09-26)\n----------------\n\n- ``iterkeys`` and ``iteritems`` methods.\n\n0.2 (2012-09-24)\n----------------\n\n``prefixes`` support.\n\n0.1 (2012-09-20)\n----------------\n\nInitial release.",
"bugtrack_url": null,
"license": "UNKNOWN",
"summary": "Pure-python reader for DAWGs (DAFSAs) created by dawgdic C++ library or DAWG Python extension.",
"version": "0.7.2",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6a84ff1ce2071d4c650ec85745766c0047ccc3b5036f1d03559fd46bb38b5eeb",
"md5": "2a9b8d02b872ac723588542e3542d1df",
"sha256": "4941d5df081b8d6fcb4597e073a9f60d5c1ccc9d17cd733e8744d7ecfec94ef3"
},
"downloads": -1,
"filename": "DAWG_Python-0.7.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "2a9b8d02b872ac723588542e3542d1df",
"packagetype": "bdist_wheel",
"python_version": "3.4",
"requires_python": null,
"size": 11711,
"upload_time": "2015-04-18T17:00:08",
"upload_time_iso_8601": "2015-04-18T17:00:08.938162Z",
"url": "https://files.pythonhosted.org/packages/6a/84/ff1ce2071d4c650ec85745766c0047ccc3b5036f1d03559fd46bb38b5eeb/DAWG_Python-0.7.2-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b833fd52c8ec329641a7730fad662ba3f29f98c45e4bea552cceee569b00c915",
"md5": "25835be7d559de75dde5e28a254427ad",
"sha256": "4a5e3286e6261cca02f205cfd5516a7ab10190fa30c51c28d345808f595e3421"
},
"downloads": -1,
"filename": "DAWG-Python-0.7.2.tar.gz",
"has_sig": false,
"md5_digest": "25835be7d559de75dde5e28a254427ad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9007,
"upload_time": "2015-04-18T16:59:55",
"upload_time_iso_8601": "2015-04-18T16:59:55.184543Z",
"url": "https://files.pythonhosted.org/packages/b8/33/fd52c8ec329641a7730fad662ba3f29f98c45e4bea552cceee569b00c915/DAWG-Python-0.7.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2015-04-18 16:59:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "kmike",
"github_project": "DAWG-Python",
"travis_ci": true,
"coveralls": true,
"github_actions": false,
"tox": true,
"lcname": "dawg-python"
}