cyhunspell-py310


Namecyhunspell-py310 JSON
Version 2.0.3 PyPI version JSON
download
home_pagehttps://github.com/MSeal/cython_hunspell
SummaryA wrapper on hunspell for use in Python
upload_time2023-06-27 12:04:18
maintainer
docs_urlNone
authorMatthew Seal
requires_python
licenseMIT + MPL 1.1/GPL 2.0/LGPL 2.1
keywords hunspell spelling correction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            [![Build Status](https://travis-ci.org/MSeal/cython_hunspell.svg?branch=master)](https://travis-ci.org/MSeal/cython_hunspell)
[![PyPI version shields.io](https://img.shields.io/pypi/v/CyHunspell.svg)](https://pypi.python.org/pypi/CyHunspell/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/CyHunspell.svg)](https://pypi.python.org/pypi/CyHunspell/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

# CyHunspell
Cython wrapper on Hunspell Dictionary

## Description
This repository provides a wrapper on Hunspell to be used natively in Python. The
module uses cython to link between the C++ and Python code, with some additional
features. There's very little Python overhead as all the heavy lifting is done
on the C++ side of the module interface, which gives optimal performance.

The hunspell library will cache any corrections, you can use persistent caching by
adding the `use_disk_cache` argument to a Hunspell constructor. Otherwise it uses
in-memory caching.

## Installing

For the simplest install simply run:

    pip install cyhunspell

This will install the hunspell 1.7.0 C++ bindings on your behalf for your platform.

## Dependencies

cacheman -- for (optionally asynchronous) persistent caching

## Non-Python Dependencies

### hunspell

The library installs [hunspell](http://hunspell.github.io/) version 1.7.0. As new version of hunspell become
available this library will provide new versions to match.

## Features

Spell checking & spell suggestions
* See http://hunspell.github.io/

## How to use

Below are some simple examples for how to use the repository.

### Creating a Hunspell object

```python
from hunspell import Hunspell
h = Hunspell()
```

You now have a usable hunspell object that can make basic queries for you.

```python
h.spell('test') # True
```

### Spelling

It's a simple task to ask if a particular word is in the dictionary.

```python
h.spell('correct') # True
h.spell('incorect') # False
```

This will only ever return True or False, and won't give suggestions about why it
might be wrong. It also depends on your choice of dictionary.

### Suggestions

If you want to get a suggestion from Hunspell, it can provide a corrected label
given a basestring input.

```python
h.suggest('incorect') # ('incorrect', 'correction', corrector', 'correct', 'injector')
```

The suggestions are in sorted order, where the lower the index the closer to the
input string.

#### Suffix Match

```python
h.suffix_suggest('do') # ('doing', 'doth', 'doer', 'doings', 'doers', 'doest')
```

### Stemming

The module can also stem words, providing the stems for pluralization and other
inflections.

```python
h.stem('testers') # ('tester', 'test')
h.stem('saves') # ('save',)
```

#### Analyze

Like stemming but return morphological analysis of the input instead.

```python
h.analyze('permanently') # (' st:permanent fl:Y',)
```

#### Generate

Generate methods are *NOT* provided at this time due to the 1.7.0 build not producing
any results for any inputs, included the documented one. If this is fixed or someone
identifies the issue in the call pattern this will be added to the library in the
future.

### Bulk Requests

You can also request bulk actions against Hunspell. This will trigger a threaded
(without a gil) request to perform the action requested. Currently just 'suggest'
and 'stem' are bulk requestable.

```python
h.bulk_suggest(['correct', 'incorect'])
# {'incorect': ('incorrect', 'correction', 'corrector', 'correct', 'injector'), 'correct': ('correct',)}
h.bulk_suffix_suggest(['cat', 'do'])
# {'do': ('doing', 'doth', 'doer', 'doings', 'doers', 'doest'), 'cat': ('cater', 'cats', "cat's", 'caters')}
h.bulk_stem(['stems', 'currencies'])
# {'currencies': ('currency',), 'stems': ('stem',)}
h.bulk_analyze(['dog', 'permanently'])
# {'permanently': (' st:permanent fl:Y',), 'dog': (' st:dog',)}
```

By default it spawns number of CPUs threads to perform the operation. You can
overwrite the concurrency as well.

```python
h.set_concurrency(4) # Four threads will now be used for bulk requests
```

### Dictionaries

You can also specify the language or dictionary you wish to use.

```python
h = Hunspell('en_CA') # Canadian English
```

By default you have the following dictionaries available
* en_AU
* en_CA
* en_GB
* en_NZ
* en_US
* en_ZA

However you can download your own and point Hunspell to your custom dictionaries.

```python
h = Hunspell('en_GB-large', hunspell_data_dir='/custom/dicts/dir')
```

#### Adding Dictionaries

You can also add new dictionaries at runtime by calling the add_dic method.

```python
h.add_dic(os.path.join(PATH_TO, 'special.dic'))
```

#### Adding words

You can add individual words to a dictionary at runtime.

```python
h.add('sillly')
```

Furthermore you can attach an affix to the word when doing this by providing a
second argument

```python
h.add('silllies', "is:plural")
```

#### Removing words

Much like adding, you can remove words.

```python
h.remove(word)
```

### Asynchronous Caching

If you want to have Hunspell cache suggestions and stems you can pass it a directory
to house such caches.

```python
h = Hunspell(disk_cache_dir='/tmp/hunspell/cache/dir')
```

This will save all suggestion and stem requests periodically and in the background.
The cache will fork after a number of new requests over particular time ranges and
save the cache contents while the rest of the program continues onward. Yo'll never
have to explicitly save your caches to disk, but you can if you so choose.

```python
h.save_cache()
```

Otherwise the Hunspell object will cache such requests locally in memory and not
persist that memory.

## Language Preferences

* Google Style Guide
* Object Oriented (with a few exceptions)

## Known Workarounds

- On Windows very long file paths, or paths saved in a different encoding than the system require special handling by Hunspell to load dictionary files. To circumvent this on Windows setups, either set `system_encoding='UTF-8'` in the `Hunspell` constructor or set the environment variable `HUNSPELL_PATH_ENCODING=UTF-8`. Then you must re-encode your `hunspell_data_dir` in UTF-8 by passing that argument name to the `Hunspell` constructor or setting the `HUNSPELL_DATA` environment variable. This is a restriction of Hunspell / Windows operations.

## Author
Author(s): Tim Rodriguez and Matthew Seal

## License
MIT

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MSeal/cython_hunspell",
    "name": "cyhunspell-py310",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "hunspell,spelling,correction",
    "author": "Matthew Seal",
    "author_email": "mseal007@gmail.com",
    "download_url": "https://github.com/MSeal/cython_hunspell/tarball/v2.0.3",
    "platform": null,
    "description": "[![Build Status](https://travis-ci.org/MSeal/cython_hunspell.svg?branch=master)](https://travis-ci.org/MSeal/cython_hunspell)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/CyHunspell.svg)](https://pypi.python.org/pypi/CyHunspell/)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/CyHunspell.svg)](https://pypi.python.org/pypi/CyHunspell/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n# CyHunspell\nCython wrapper on Hunspell Dictionary\n\n## Description\nThis repository provides a wrapper on Hunspell to be used natively in Python. The\nmodule uses cython to link between the C++ and Python code, with some additional\nfeatures. There's very little Python overhead as all the heavy lifting is done\non the C++ side of the module interface, which gives optimal performance.\n\nThe hunspell library will cache any corrections, you can use persistent caching by\nadding the `use_disk_cache` argument to a Hunspell constructor. Otherwise it uses\nin-memory caching.\n\n## Installing\n\nFor the simplest install simply run:\n\n    pip install cyhunspell\n\nThis will install the hunspell 1.7.0 C++ bindings on your behalf for your platform.\n\n## Dependencies\n\ncacheman -- for (optionally asynchronous) persistent caching\n\n## Non-Python Dependencies\n\n### hunspell\n\nThe library installs [hunspell](http://hunspell.github.io/) version 1.7.0. As new version of hunspell become\navailable this library will provide new versions to match.\n\n## Features\n\nSpell checking & spell suggestions\n* See http://hunspell.github.io/\n\n## How to use\n\nBelow are some simple examples for how to use the repository.\n\n### Creating a Hunspell object\n\n```python\nfrom hunspell import Hunspell\nh = Hunspell()\n```\n\nYou now have a usable hunspell object that can make basic queries for you.\n\n```python\nh.spell('test') # True\n```\n\n### Spelling\n\nIt's a simple task to ask if a particular word is in the dictionary.\n\n```python\nh.spell('correct') # True\nh.spell('incorect') # False\n```\n\nThis will only ever return True or False, and won't give suggestions about why it\nmight be wrong. It also depends on your choice of dictionary.\n\n### Suggestions\n\nIf you want to get a suggestion from Hunspell, it can provide a corrected label\ngiven a basestring input.\n\n```python\nh.suggest('incorect') # ('incorrect', 'correction', corrector', 'correct', 'injector')\n```\n\nThe suggestions are in sorted order, where the lower the index the closer to the\ninput string.\n\n#### Suffix Match\n\n```python\nh.suffix_suggest('do') # ('doing', 'doth', 'doer', 'doings', 'doers', 'doest')\n```\n\n### Stemming\n\nThe module can also stem words, providing the stems for pluralization and other\ninflections.\n\n```python\nh.stem('testers') # ('tester', 'test')\nh.stem('saves') # ('save',)\n```\n\n#### Analyze\n\nLike stemming but return morphological analysis of the input instead.\n\n```python\nh.analyze('permanently') # (' st:permanent fl:Y',)\n```\n\n#### Generate\n\nGenerate methods are *NOT* provided at this time due to the 1.7.0 build not producing\nany results for any inputs, included the documented one. If this is fixed or someone\nidentifies the issue in the call pattern this will be added to the library in the\nfuture.\n\n### Bulk Requests\n\nYou can also request bulk actions against Hunspell. This will trigger a threaded\n(without a gil) request to perform the action requested. Currently just 'suggest'\nand 'stem' are bulk requestable.\n\n```python\nh.bulk_suggest(['correct', 'incorect'])\n# {'incorect': ('incorrect', 'correction', 'corrector', 'correct', 'injector'), 'correct': ('correct',)}\nh.bulk_suffix_suggest(['cat', 'do'])\n# {'do': ('doing', 'doth', 'doer', 'doings', 'doers', 'doest'), 'cat': ('cater', 'cats', \"cat's\", 'caters')}\nh.bulk_stem(['stems', 'currencies'])\n# {'currencies': ('currency',), 'stems': ('stem',)}\nh.bulk_analyze(['dog', 'permanently'])\n# {'permanently': (' st:permanent fl:Y',), 'dog': (' st:dog',)}\n```\n\nBy default it spawns number of CPUs threads to perform the operation. You can\noverwrite the concurrency as well.\n\n```python\nh.set_concurrency(4) # Four threads will now be used for bulk requests\n```\n\n### Dictionaries\n\nYou can also specify the language or dictionary you wish to use.\n\n```python\nh = Hunspell('en_CA') # Canadian English\n```\n\nBy default you have the following dictionaries available\n* en_AU\n* en_CA\n* en_GB\n* en_NZ\n* en_US\n* en_ZA\n\nHowever you can download your own and point Hunspell to your custom dictionaries.\n\n```python\nh = Hunspell('en_GB-large', hunspell_data_dir='/custom/dicts/dir')\n```\n\n#### Adding Dictionaries\n\nYou can also add new dictionaries at runtime by calling the add_dic method.\n\n```python\nh.add_dic(os.path.join(PATH_TO, 'special.dic'))\n```\n\n#### Adding words\n\nYou can add individual words to a dictionary at runtime.\n\n```python\nh.add('sillly')\n```\n\nFurthermore you can attach an affix to the word when doing this by providing a\nsecond argument\n\n```python\nh.add('silllies', \"is:plural\")\n```\n\n#### Removing words\n\nMuch like adding, you can remove words.\n\n```python\nh.remove(word)\n```\n\n### Asynchronous Caching\n\nIf you want to have Hunspell cache suggestions and stems you can pass it a directory\nto house such caches.\n\n```python\nh = Hunspell(disk_cache_dir='/tmp/hunspell/cache/dir')\n```\n\nThis will save all suggestion and stem requests periodically and in the background.\nThe cache will fork after a number of new requests over particular time ranges and\nsave the cache contents while the rest of the program continues onward. Yo'll never\nhave to explicitly save your caches to disk, but you can if you so choose.\n\n```python\nh.save_cache()\n```\n\nOtherwise the Hunspell object will cache such requests locally in memory and not\npersist that memory.\n\n## Language Preferences\n\n* Google Style Guide\n* Object Oriented (with a few exceptions)\n\n## Known Workarounds\n\n- On Windows very long file paths, or paths saved in a different encoding than the system require special handling by Hunspell to load dictionary files. To circumvent this on Windows setups, either set `system_encoding='UTF-8'` in the `Hunspell` constructor or set the environment variable `HUNSPELL_PATH_ENCODING=UTF-8`. Then you must re-encode your `hunspell_data_dir` in UTF-8 by passing that argument name to the `Hunspell` constructor or setting the `HUNSPELL_DATA` environment variable. This is a restriction of Hunspell / Windows operations.\n\n## Author\nAuthor(s): Tim Rodriguez and Matthew Seal\n\n## License\nMIT\n",
    "bugtrack_url": null,
    "license": "MIT + MPL 1.1/GPL 2.0/LGPL 2.1",
    "summary": "A wrapper on hunspell for use in Python",
    "version": "2.0.3",
    "project_urls": {
        "Download": "https://github.com/MSeal/cython_hunspell/tarball/v2.0.3",
        "Homepage": "https://github.com/MSeal/cython_hunspell"
    },
    "split_keywords": [
        "hunspell",
        "spelling",
        "correction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6ed01f1f77cdc125b50cb26fd76bfcc3469857c7d2edb28ed16c9f26df9876d6",
                "md5": "0ce9c26c8b5b35e57482ac1e49f9fb83",
                "sha256": "7f8f3929617caecaa4286ffb9103707566ba7c5ba4509ab4d783ae5ce8d7435f"
            },
            "downloads": -1,
            "filename": "cyhunspell_py310-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "0ce9c26c8b5b35e57482ac1e49f9fb83",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 3682791,
            "upload_time": "2023-06-27T12:04:18",
            "upload_time_iso_8601": "2023-06-27T12:04:18.360426Z",
            "url": "https://files.pythonhosted.org/packages/6e/d0/1f1f77cdc125b50cb26fd76bfcc3469857c7d2edb28ed16c9f26df9876d6/cyhunspell_py310-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-27 12:04:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MSeal",
    "github_project": "cython_hunspell",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "cyhunspell-py310"
}
        
Elapsed time: 1.45055s