DAWG2-Python


NameDAWG2-Python JSON
Version 0.8.0 PyPI version JSON
download
home_pagehttps://github.com/pymorphy2-form/DAWG-Python/
SummaryPure-python reader for DAWGs (DAFSAs) created by dawgdic C++ library or DAWG Python extension.
upload_time2023-09-27 17:40:11
maintainer
docs_urlNone
authorMikhail Korobov
requires_python>=3.8,<4.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DAWG2-Python

[![Python tests](https://github.com/pymorphy2-fork/DAWG-Python/actions/workflows/python-tests.yml/badge.svg)](https://github.com/pymorphy2-fork/DAWG-Python/actions/workflows/python-tests.yml)
[![Coverage Status](https://coveralls.io/repos/github/pymorphy2-fork/DAWG-Python/badge.svg?branch=master)](https://coveralls.io/github/pymorphy2-fork/DAWG-Python?branch=master)

This pure-python package provides read-only access for files created by
[dawgdic][1] C++ library and
[DAWG][2] python package.

This package is not capable of creating DAWGs. It works with DAWGs built
by [dawgdic][1] C++ library or
[DAWG][2] Python extension module. The main
purpose of DAWG-Python is to provide access to DAWGs without
requiring compiled extensions. It is also quite fast under PyPy (see
benchmarks).

# Installation

```commandline
pip install DAWG2-Python
```
# Usage

The aim of DAWG2-Python is to be API- and binary-compatible with
[DAWG][2] when it is possible.

First, you have to create a dawg using
[DAWG][2] module:

```python
import dawg

d = dawg.DAWG(data)
d.save('words.dawg')
```
And then this dawg can be loaded without requiring C extensions:

```python
import dawg_python

d = dawg_python.DAWG().load('words.dawg')
```
Please consult [DAWG][2] docs for detailed
usage. Some features (like constructor parameters or `save` method) are
intentionally unsupported.

# Benchmarks

Benchmark results (100k unicode words, integer values (lengths of the
words), PyPy 1.9, macbook air i5 1.8 Ghz):

    dict __getitem__ (hits):        11.090M ops/sec
    DAWG __getitem__ (hits):        not supported
    BytesDAWG __getitem__ (hits):   0.493M ops/sec
    RecordDAWG __getitem__ (hits):  0.376M ops/sec

    dict get() (hits):              10.127M ops/sec
    DAWG get() (hits):              not supported
    BytesDAWG get() (hits):         0.481M ops/sec
    RecordDAWG get() (hits):        0.402M ops/sec
    dict get() (misses):            14.885M ops/sec
    DAWG get() (misses):            not supported
    BytesDAWG get() (misses):       1.259M ops/sec
    RecordDAWG get() (misses):      1.337M ops/sec

    dict __contains__ (hits):           11.100M ops/sec
    DAWG __contains__ (hits):           1.317M ops/sec
    BytesDAWG __contains__ (hits):      1.107M ops/sec
    RecordDAWG __contains__ (hits):     1.095M ops/sec

    dict __contains__ (misses):         10.567M ops/sec
    DAWG __contains__ (misses):         1.902M ops/sec
    BytesDAWG __contains__ (misses):    1.873M ops/sec
    RecordDAWG __contains__ (misses):   1.862M ops/sec

    dict items():           44.401 ops/sec
    DAWG items():           not supported
    BytesDAWG items():      3.226 ops/sec
    RecordDAWG items():     2.987 ops/sec
    dict keys():            426.250 ops/sec
    DAWG keys():            not supported
    BytesDAWG keys():       6.050 ops/sec
    RecordDAWG keys():      6.363 ops/sec

    DAWG.prefixes (hits):    0.756M ops/sec
    DAWG.prefixes (mixed):   1.965M ops/sec
    DAWG.prefixes (misses):  1.773M ops/sec

    RecordDAWG.keys(prefix="xxx"), avg_len(res)==415:       1.429K ops/sec
    RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17:      36.994K ops/sec
    RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3:    121.897K ops/sec
    RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/sec
    RecordDAWG.keys(prefix="xxx"), NON_EXISTING:            2450.898K ops/sec

Under CPython expect it to be about 50x slower. Memory consumption of
DAWG-Python should be the same as of
[DAWG][2].

# Current limitations

- This package is not capable of creating DAWGs;
- all the limitations of [DAWG][2] apply.

Contributions are welcome!

# Contributing

- Development happens at GitHub: <https://github.com/pymorphy2-fork/DAWG-Python>
- Issue tracker: <https://github.com/pymorphy2-fork/DAWG-Python/issues>

Feel free to submit ideas, bugs or pull requests.

## Running tests and benchmarks

Make sure [pytest][3] is installed and run

```commandline
$ pytest .
```
from the source checkout. Tests should pass under python 3.8, 3.9, 3.10, 3.11 and PyPy3 \>= 7.3.

In order to run benchmarks, type

```commandline
$ pypy3 -m bench.speed
```
This runs benchmarks under PyPy (they are about 50x slower under
CPython).

## Authors & Contributors

- Mikhail Korobov \<kmike84@gmail.com\>
- [@bt2901](https://github.com/bt2901)
- [@insolor](https://github.com/insolor)

The algorithms are from [dawgdic][1]
C++ library by Susumu Yata & contributors.

# License

This package is licensed under MIT License.

[1]: https://code.google.com/p/dawgdic/
[2]: https://github.com/pymorphy2-fork/DAWG
[3]: https://docs.pytest.org/en/7.4.x/getting-started.html

# Changes

## 0.8.0 (2023-09-27)

- Allow more flexible char substitutes by [@bt2901](https://github.com/bt2901)
- minimal Python version changed to 3.8 by [@insolor](https://github.com/insolor)
- setup.py building changed to poetry by [@insolor](https://github.com/insolor)

## 0.7.2 (2015-04-18)

- minor speedup;
- bitbucket mirror is no longer maintained.

## 0.7.1 (2014-06-05)

- Switch to setuptools;
- upload wheel to pypi;
- check Python 3.4 compatibility.

## 0.7 (2013-10-13)

IntDAWG and IntCompletionDAWG are implemented.

## 0.6 (2013-03-23)

Use less shared state internally. This should fix thread-safety bugs and
make iterkeys/iteritems reentrant.

## 0.5.1 (2013-03-01)

Internal tweaks: memory usage is reduced; something is a bit faster,
something is a bit slower.

## 0.5 (2012-10-08)

Storage scheme is updated to match DAWG==0.5. This enables the
alphabetical ordering of `BytesDAWG` and `RecordDAWG` items.

In order to read `BytesDAWG` or `RecordDAWG` created with versions of
DAWG \< 0.5 use `payload_separator` constructor argument:

    >>> BytesDAWG(payload_separator=b'\xff').load('old.dawg')

## 0.3.1 (2012-10-01)

Bug with empty DAWGs is fixed.

## 0.3 (2012-09-26)

- `iterkeys` and `iteritems` methods.

## 0.2 (2012-09-24)

`prefixes` support.

## 0.1 (2012-09-20)

Initial release.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pymorphy2-form/DAWG-Python/",
    "name": "DAWG2-Python",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Mikhail Korobov",
    "author_email": "kmike84@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a1/16/dc4360640ac9846f89fe0517b3816271d6a8c13e6fedb863f81dd6991ede/dawg2_python-0.8.0.tar.gz",
    "platform": null,
    "description": "# DAWG2-Python\n\n[![Python tests](https://github.com/pymorphy2-fork/DAWG-Python/actions/workflows/python-tests.yml/badge.svg)](https://github.com/pymorphy2-fork/DAWG-Python/actions/workflows/python-tests.yml)\n[![Coverage Status](https://coveralls.io/repos/github/pymorphy2-fork/DAWG-Python/badge.svg?branch=master)](https://coveralls.io/github/pymorphy2-fork/DAWG-Python?branch=master)\n\nThis pure-python package provides read-only access for files created by\n[dawgdic][1] C++ library and\n[DAWG][2] python package.\n\nThis package is not capable of creating DAWGs. It works with DAWGs built\nby [dawgdic][1] C++ library or\n[DAWG][2] Python extension module. The main\npurpose of DAWG-Python is to provide access to DAWGs without\nrequiring compiled extensions. It is also quite fast under PyPy (see\nbenchmarks).\n\n# Installation\n\n```commandline\npip install DAWG2-Python\n```\n# Usage\n\nThe aim of DAWG2-Python is to be API- and binary-compatible with\n[DAWG][2] when it is possible.\n\nFirst, you have to create a dawg using\n[DAWG][2] module:\n\n```python\nimport dawg\n\nd = dawg.DAWG(data)\nd.save('words.dawg')\n```\nAnd then this dawg can be loaded without requiring C extensions:\n\n```python\nimport dawg_python\n\nd = dawg_python.DAWG().load('words.dawg')\n```\nPlease consult [DAWG][2] docs for detailed\nusage. Some features (like constructor parameters or `save` method) are\nintentionally unsupported.\n\n# Benchmarks\n\nBenchmark results (100k unicode words, integer values (lengths of the\nwords), PyPy 1.9, macbook air i5 1.8 Ghz):\n\n    dict __getitem__ (hits):        11.090M ops/sec\n    DAWG __getitem__ (hits):        not supported\n    BytesDAWG __getitem__ (hits):   0.493M ops/sec\n    RecordDAWG __getitem__ (hits):  0.376M ops/sec\n\n    dict get() (hits):              10.127M ops/sec\n    DAWG get() (hits):              not supported\n    BytesDAWG get() (hits):         0.481M ops/sec\n    RecordDAWG get() (hits):        0.402M ops/sec\n    dict get() (misses):            14.885M ops/sec\n    DAWG get() (misses):            not supported\n    BytesDAWG get() (misses):       1.259M ops/sec\n    RecordDAWG get() (misses):      1.337M ops/sec\n\n    dict __contains__ (hits):           11.100M ops/sec\n    DAWG __contains__ (hits):           1.317M ops/sec\n    BytesDAWG __contains__ (hits):      1.107M ops/sec\n    RecordDAWG __contains__ (hits):     1.095M ops/sec\n\n    dict __contains__ (misses):         10.567M ops/sec\n    DAWG __contains__ (misses):         1.902M ops/sec\n    BytesDAWG __contains__ (misses):    1.873M ops/sec\n    RecordDAWG __contains__ (misses):   1.862M ops/sec\n\n    dict items():           44.401 ops/sec\n    DAWG items():           not supported\n    BytesDAWG items():      3.226 ops/sec\n    RecordDAWG items():     2.987 ops/sec\n    dict keys():            426.250 ops/sec\n    DAWG keys():            not supported\n    BytesDAWG keys():       6.050 ops/sec\n    RecordDAWG keys():      6.363 ops/sec\n\n    DAWG.prefixes (hits):    0.756M ops/sec\n    DAWG.prefixes (mixed):   1.965M ops/sec\n    DAWG.prefixes (misses):  1.773M ops/sec\n\n    RecordDAWG.keys(prefix=\"xxx\"), avg_len(res)==415:       1.429K ops/sec\n    RecordDAWG.keys(prefix=\"xxxxx\"), avg_len(res)==17:      36.994K ops/sec\n    RecordDAWG.keys(prefix=\"xxxxxxxx\"), avg_len(res)==3:    121.897K ops/sec\n    RecordDAWG.keys(prefix=\"xxxxx..xx\"), avg_len(res)==1.4: 265.015K ops/sec\n    RecordDAWG.keys(prefix=\"xxx\"), NON_EXISTING:            2450.898K ops/sec\n\nUnder CPython expect it to be about 50x slower. Memory consumption of\nDAWG-Python should be the same as of\n[DAWG][2].\n\n# Current limitations\n\n- This package is not capable of creating DAWGs;\n- all the limitations of [DAWG][2] apply.\n\nContributions are welcome!\n\n# Contributing\n\n- Development happens at GitHub: <https://github.com/pymorphy2-fork/DAWG-Python>\n- Issue tracker: <https://github.com/pymorphy2-fork/DAWG-Python/issues>\n\nFeel free to submit ideas, bugs or pull requests.\n\n## Running tests and benchmarks\n\nMake sure [pytest][3] is installed and run\n\n```commandline\n$ pytest .\n```\nfrom the source checkout. Tests should pass under python 3.8, 3.9, 3.10, 3.11 and PyPy3 \\>= 7.3.\n\nIn order to run benchmarks, type\n\n```commandline\n$ pypy3 -m bench.speed\n```\nThis runs benchmarks under PyPy (they are about 50x slower under\nCPython).\n\n## Authors & Contributors\n\n- Mikhail Korobov \\<kmike84@gmail.com\\>\n- [@bt2901](https://github.com/bt2901)\n- [@insolor](https://github.com/insolor)\n\nThe algorithms are from [dawgdic][1]\nC++ library by Susumu Yata & contributors.\n\n# License\n\nThis package is licensed under MIT License.\n\n[1]: https://code.google.com/p/dawgdic/\n[2]: https://github.com/pymorphy2-fork/DAWG\n[3]: https://docs.pytest.org/en/7.4.x/getting-started.html\n\n# Changes\n\n## 0.8.0 (2023-09-27)\n\n- Allow more flexible char substitutes by [@bt2901](https://github.com/bt2901)\n- minimal Python version changed to 3.8 by [@insolor](https://github.com/insolor)\n- setup.py building changed to poetry by [@insolor](https://github.com/insolor)\n\n## 0.7.2 (2015-04-18)\n\n- minor speedup;\n- bitbucket mirror is no longer maintained.\n\n## 0.7.1 (2014-06-05)\n\n- Switch to setuptools;\n- upload wheel to pypi;\n- check Python 3.4 compatibility.\n\n## 0.7 (2013-10-13)\n\nIntDAWG and IntCompletionDAWG are implemented.\n\n## 0.6 (2013-03-23)\n\nUse less shared state internally. This should fix thread-safety bugs and\nmake iterkeys/iteritems reentrant.\n\n## 0.5.1 (2013-03-01)\n\nInternal tweaks: memory usage is reduced; something is a bit faster,\nsomething is a bit slower.\n\n## 0.5 (2012-10-08)\n\nStorage scheme is updated to match DAWG==0.5. This enables the\nalphabetical ordering of `BytesDAWG` and `RecordDAWG` items.\n\nIn order to read `BytesDAWG` or `RecordDAWG` created with versions of\nDAWG \\< 0.5 use `payload_separator` constructor argument:\n\n    >>> BytesDAWG(payload_separator=b'\\xff').load('old.dawg')\n\n## 0.3.1 (2012-10-01)\n\nBug with empty DAWGs is fixed.\n\n## 0.3 (2012-09-26)\n\n- `iterkeys` and `iteritems` methods.\n\n## 0.2 (2012-09-24)\n\n`prefixes` support.\n\n## 0.1 (2012-09-20)\n\nInitial release.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pure-python reader for DAWGs (DAFSAs) created by dawgdic C++ library or DAWG Python extension.",
    "version": "0.8.0",
    "project_urls": {
        "Homepage": "https://github.com/pymorphy2-form/DAWG-Python/",
        "Repository": "https://github.com/pymorphy2-form/DAWG-Python/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3a08b12f423b5914220aac343357714d72777cfb7e0d4da65cdc3bb247688cf1",
                "md5": "7a46c40bc5ac956382d6df07eb641ad2",
                "sha256": "aea16490e51320c44a4632a8b7848950a9dc8a293b8f52a49ac013227869c060"
            },
            "downloads": -1,
            "filename": "dawg2_python-0.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7a46c40bc5ac956382d6df07eb641ad2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 8842,
            "upload_time": "2023-09-27T17:40:09",
            "upload_time_iso_8601": "2023-09-27T17:40:09.114317Z",
            "url": "https://files.pythonhosted.org/packages/3a/08/b12f423b5914220aac343357714d72777cfb7e0d4da65cdc3bb247688cf1/dawg2_python-0.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a116dc4360640ac9846f89fe0517b3816271d6a8c13e6fedb863f81dd6991ede",
                "md5": "5b87660826c2d467ce626199d16c0fcc",
                "sha256": "fe766ba9d63fa42a99bcb45b9c15694fb32fc92528f66c68b9acffaaa068ceee"
            },
            "downloads": -1,
            "filename": "dawg2_python-0.8.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5b87660826c2d467ce626199d16c0fcc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 10025,
            "upload_time": "2023-09-27T17:40:11",
            "upload_time_iso_8601": "2023-09-27T17:40:11.085900Z",
            "url": "https://files.pythonhosted.org/packages/a1/16/dc4360640ac9846f89fe0517b3816271d6a8c13e6fedb863f81dd6991ede/dawg2_python-0.8.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-27 17:40:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pymorphy2-form",
    "github_project": "DAWG-Python",
    "github_not_found": true,
    "lcname": "dawg2-python"
}
        
Elapsed time: 0.12113s