spacy-wordnet

Name	spacy-wordnet JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/recognai/spacy-wordnet
Summary	Add a short description here!
upload_time	2022-09-19 10:39:16
maintainer
docs_url	None
author	Francisco Aranda
requires_python
license	mit
keywords
VCS
bugtrack_url
requirements	nltk spacy
Travis-CI	No Travis.
coveralls test coverage

            # spaCy WordNet

spaCy Wordnet is a simple custom component for using [WordNet](https://wordnet.princeton.edu/), [MultiWordnet](http://multiwordnet.fbk.eu/english/home.php) and [WordNet domains](http://wndomains.fbk.eu/) with [spaCy](http://spacy.io).

The component combines the [NLTK wordnet interface](http://www.nltk.org/howto/wordnet.html) with WordNet domains to allow users to:

* Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word ``bank``.
* Get and filter synsets by domain. For example, getting synonyms of the verb ``withdraw`` in the financial domain.

 
## Getting started
The spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:
### Prerequisites

* Python 3.X
* spaCy

You also need to install the following NLTK wordnet data:

````bash
python -m nltk.downloader wordnet
python -m nltk.downloader omw
````
### Install

````bash
pip install spacy-wordnet
````


### Supported languages
Almost all Open Multi Wordnet languages are supported.

## Usage

Once you choose the desired language (from the list of supported ones above), you will need to manually download a spaCy model for it. Check the list of available models for each language at [SpaCy 2.x](https://v2.spacy.io/models) or [SpaCy 3.x](https://spacy.io/usage/models).

### English example

Download example model:
```bash
python -m spacy download en_core_web_sm
```

Run:
````python

import spacy

from spacy_wordnet.wordnet_annotator import WordnetAnnotator 

# Load an spacy model
nlp = spacy.load('en_core_web_sm')
# Spacy 3.x
nlp.add_pipe("spacy_wordnet", after='tagger')
# Spacy 2.x
# nlp.add_pipe(WordnetAnnotator(nlp, name="spacy_wordnet"), after='tagger')
token = nlp('prices')[0]

# wordnet object link spacy token with nltk wordnet interface by giving acces to
# synsets and lemmas 
token._.wordnet.synsets()
token._.wordnet.lemmas()

# And automatically tags with wordnet domains
token._.wordnet.wordnet_domains()
````

spaCy WordNet lets you find synonyms by domain of interest for example economy
````python
economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp('I want to withdraw 5,000 euros')

# For each token in the sentence
for token in sentence:
    # We get those synsets within the desired domains
    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
    if not synsets:
        enriched_sentence.append(token.text)
    else:
        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names()]
        # If we found a synset in the economy domains
        # we get the variants and add them to the enriched sentence
        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))

# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> I (need|want|require) to (draw|withdraw|draw_off|take_out) 5,000 euros
    
````

### Portuguese example

Download example model:
```bash
python -m spacy download pt_core_news_sm
```

Run:
```python
import spacy

from spacy_wordnet.wordnet_annotator import WordnetAnnotator 

# Load an spacy model
nlp = spacy.load('pt_core_news_sm')
# Spacy 3.x
nlp.add_pipe("spacy_wordnet", after='tagger', config={'lang': nlp.lang})
# Spacy 2.x
# nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')
text = "Eu quero retirar 5.000 euros"
economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp(text)

# For each token in the sentence
for token in sentence:
    # We get those synsets within the desired domains
    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
    if not synsets:
        enriched_sentence.append(token.text)
    else:
        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names('por')]
        # If we found a synset in the economy domains
        # we get the variants and add them to the enriched sentence
        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))

# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> Eu (querer|desejar|esperar) retirar 5.000 euros
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/recognai/spacy-wordnet",
    "name": "spacy-wordnet",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Francisco Aranda",
    "author_email": "francisco@recogn.ai",
    "download_url": "https://files.pythonhosted.org/packages/09/5f/46b883073e4d9ab68a6635b33c6f48e8496de4feb0838eaadd38d0921e32/spacy-wordnet-0.1.0.tar.gz",
    "platform": "any",
    "description": "# spaCy WordNet\n\nspaCy Wordnet is a simple custom component for using [WordNet](https://wordnet.princeton.edu/), [MultiWordnet](http://multiwordnet.fbk.eu/english/home.php) and [WordNet domains](http://wndomains.fbk.eu/) with [spaCy](http://spacy.io).\n\nThe component combines the [NLTK wordnet interface](http://www.nltk.org/howto/wordnet.html) with WordNet domains to allow users to:\n\n* Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word ``bank``.\n* Get and filter synsets by domain. For example, getting synonyms of the verb ``withdraw`` in the financial domain.\n\n \n## Getting started\nThe spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:\n### Prerequisites\n\n* Python 3.X\n* spaCy\n\nYou also need to install the following NLTK wordnet data:\n\n````bash\npython -m nltk.downloader wordnet\npython -m nltk.downloader omw\n````\n### Install\n\n````bash\npip install spacy-wordnet\n````\n\n\n### Supported languages\nAlmost all Open Multi Wordnet languages are supported.\n\n## Usage\n\nOnce you choose the desired language (from the list of supported ones above), you will need to manually download a spaCy model for it. Check the list of available models for each language at [SpaCy 2.x](https://v2.spacy.io/models) or [SpaCy 3.x](https://spacy.io/usage/models).\n\n### English example\n\nDownload example model:\n```bash\npython -m spacy download en_core_web_sm\n```\n\nRun:\n````python\n\nimport spacy\n\nfrom spacy_wordnet.wordnet_annotator import WordnetAnnotator \n\n# Load an spacy model\nnlp = spacy.load('en_core_web_sm')\n# Spacy 3.x\nnlp.add_pipe(\"spacy_wordnet\", after='tagger')\n# Spacy 2.x\n# nlp.add_pipe(WordnetAnnotator(nlp, name=\"spacy_wordnet\"), after='tagger')\ntoken = nlp('prices')[0]\n\n# wordnet object link spacy token with nltk wordnet interface by giving acces to\n# synsets and lemmas \ntoken._.wordnet.synsets()\ntoken._.wordnet.lemmas()\n\n# And automatically tags with wordnet domains\ntoken._.wordnet.wordnet_domains()\n````\n\nspaCy WordNet lets you find synonyms by domain of interest for example economy\n````python\neconomy_domains = ['finance', 'banking']\nenriched_sentence = []\nsentence = nlp('I want to withdraw 5,000 euros')\n\n# For each token in the sentence\nfor token in sentence:\n    # We get those synsets within the desired domains\n    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)\n    if not synsets:\n        enriched_sentence.append(token.text)\n    else:\n        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names()]\n        # If we found a synset in the economy domains\n        # we get the variants and add them to the enriched sentence\n        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))\n\n# Let's see our enriched sentence\nprint(' '.join(enriched_sentence))\n# >> I (need|want|require) to (draw|withdraw|draw_off|take_out) 5,000 euros\n    \n````\n\n### Portuguese example\n\nDownload example model:\n```bash\npython -m spacy download pt_core_news_sm\n```\n\nRun:\n```python\nimport spacy\n\nfrom spacy_wordnet.wordnet_annotator import WordnetAnnotator \n\n# Load an spacy model\nnlp = spacy.load('pt_core_news_sm')\n# Spacy 3.x\nnlp.add_pipe(\"spacy_wordnet\", after='tagger', config={'lang': nlp.lang})\n# Spacy 2.x\n# nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')\ntext = \"Eu quero retirar 5.000 euros\"\neconomy_domains = ['finance', 'banking']\nenriched_sentence = []\nsentence = nlp(text)\n\n# For each token in the sentence\nfor token in sentence:\n    # We get those synsets within the desired domains\n    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)\n    if not synsets:\n        enriched_sentence.append(token.text)\n    else:\n        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names('por')]\n        # If we found a synset in the economy domains\n        # we get the variants and add them to the enriched sentence\n        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))\n\n# Let's see our enriched sentence\nprint(' '.join(enriched_sentence))\n# >> Eu (querer|desejar|esperar) retirar 5.000 euros\n```\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "Add a short description here!",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/recognai/spacy-wordnet"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "206e3263cc4117399c6df0460ee7b7f69e82ccfea2a924dd92f42af2279cdad7",
                "md5": "4f84a297f5f965c29d823afa88b79d9e",
                "sha256": "ff81865fafa1bf9c84b8e741c57b5489ecb61a816b5f3316d093613dd9a6c437"
            },
            "downloads": -1,
            "filename": "spacy_wordnet-0.1.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f84a297f5f965c29d823afa88b79d9e",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 652138,
            "upload_time": "2022-09-19T10:39:15",
            "upload_time_iso_8601": "2022-09-19T10:39:15.007085Z",
            "url": "https://files.pythonhosted.org/packages/20/6e/3263cc4117399c6df0460ee7b7f69e82ccfea2a924dd92f42af2279cdad7/spacy_wordnet-0.1.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "095f46b883073e4d9ab68a6635b33c6f48e8496de4feb0838eaadd38d0921e32",
                "md5": "731f9fe4cb88c76f374b8f9db6dfd8f3",
                "sha256": "2102145ca92e37e94d775e80a05c7e303d977a5a7fd4ff4d5de6e17a251117cd"
            },
            "downloads": -1,
            "filename": "spacy-wordnet-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "731f9fe4cb88c76f374b8f9db6dfd8f3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 651590,
            "upload_time": "2022-09-19T10:39:16",
            "upload_time_iso_8601": "2022-09-19T10:39:16.806137Z",
            "url": "https://files.pythonhosted.org/packages/09/5f/46b883073e4d9ab68a6635b33c6f48e8496de4feb0838eaadd38d0921e32/spacy-wordnet-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-09-19 10:39:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "recognai",
    "github_project": "spacy-wordnet",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "nltk",
            "specs": [
                [
                    ">=",
                    "3.3"
                ],
                [
                    "<",
                    "3.6"
                ]
            ]
        },
        {
            "name": "spacy",
            "specs": [
                [
                    ">=",
                    "2.0"
                ],
                [
                    "<",
                    "4.0"
                ]
            ]
        }
    ],
    "lcname": "spacy-wordnet"
}

Francisco Aranda