Name | spacy-ngram JSON |
Version |
0.0.3
JSON |
| download |
home_page | |
Summary | SpaCy pipeline component for adding document or sentence-level ngrams. |
upload_time | 2023-07-25 19:44:14 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.10 |
license | |
keywords |
nlp
ngrams
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# spacy-ngram
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]
<!-- PROJECT LOGO -->
<br />
<div>
<p>
<a href="https://github.com/kpwhri/spacy-ngram">
<!--img src="images/logo.png" alt="Logo"-->
</a>
</p>
<h3 align="center">spacy-ngram</h3>
<p>
SpaCy pipeline component for adding document or sentence-level ngrams.
</p>
</div>
## Table of Contents
* [About the Project](#about-the-project)
* [Getting Started](#getting-started)
* [Prerequisites](#prerequisites)
* [Installation](#installation)
* [Usage](#usage)
* [Roadmap](#roadmap)
* [Contributing](#contributing)
* [License](#license)
* [Contact](#contact)
* [Acknowledgements](#acknowledgements)
## About the Project
SpaCy pipeline component for adding document or sentence-level ngrams.
## Getting Started
### Prerequisites
* Python 3.10+
### Installation
1. Install from PyPI:
```sh
pip install spacy-ngram
```
2. This will install `spacy`, but `spacy` requires a model:
* E.g., download: `python -m spacy download en_core_web_sm`
* Or, manually download and install with `pip install ...`
## Usage
### Quick Start
`spacy-ngram` allows the creation of ngrams of any size. These will be added at either the document- or sentence-level.
```python
import spacy
from spacy_ngram import NgramComponent
nlp = spacy.load('en_core_web_sm') # or whatever model you downloaded
nlp.add_pipe('spacy-ngram') # default to document-level ngrams, removing stopwords
text = 'Quark soup is an interacting localized assembly of quarks and gluons.'
doc = nlp(text)
print(doc._.ngram_1)
# ['quark', 'soup', 'interact', 'localize', 'assembly', 'quark', 'gluon']
print(doc._.ngram_2)
# ['quark_soup', 'soup_interact', 'interact_localize', 'localize_assembly', 'assembly_quark', 'quark_gluon']
```
### Quick Reference
`spacy-ngram` creates new extensions under the `Doc` and/or `Span` classes, depending on the parameters (it defaults
to `Doc`). The extension begins with the prefix `ngram_` followed by the level of ngram desired (e.g., `ngram_1`).
* unigram (`1` included in `ngrams` argument): `Doc._.ngram_1`
* bigram (`2` included in `ngrams` argument): `Doc._.ngram_2`
### Pipeline Parameters
The pipeline can be parametrized depending on needs. E.g., to process at the sentence-level:
```python
nlp.add_pipe('spacy-ngram', config={
'sentence_level': True, # initialize sentence-level ngrams
'doc_level': False, # skip processing at document-level
'ngrams': (2, 3), # bi- and trigram only
})
doc = nlp(text)
sentence = list(doc.sents)
print(sentence._.ngram_1)
# raises AttributeError
sentence._.ngram_2 # returns list of bigrams
sentence._.ngram_3 # returns list of trigrams
```
| Parameter | Type | Default | Description |
|------------------|--------------|----------|------------------------------------------------|
| `ngrams` | `tuple[int]` | `(1, 2)` | 1 for unigram, 2 for bigram, etc. |
| `include_bos` | `bool` | `False` | include `BOS` tags at end of sentence/document |
| `include_eos` | `bool` | `False` | include `EOS` tags at end of sentence/document |
| `sentence_level` | `bool` | `False` | perform ngram-extraction at sentence-level |
| `doc_level` | `bool` | `True` | perform ngram-extraction at document-level |
## Versions
Uses [SEMVER](https://semver.org/).
See https://github.com/kpwhri/spacy-ngram/releases.
## Roadmap
See the [open issues](https://github.com/kpwhri/spacy-ngram/issues) for a list of proposed features (and known issues).
## Contributing
Any contributions you make are **greatly appreciated**.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## License
Distributed under the MIT License.
See `LICENSE` or https://kpwhri.mit-license.org for more information.
<!-- CONTACT -->
## Contact
Please use the [issue tracker](https://github.com/kpwhri/spacy-ngram/issues).
<!-- ACKNOWLEDGEMENTS -->
## Acknowledgements
<!-- MARKDOWN LINKS & IMAGES -->
<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->
[contributors-shield]: https://img.shields.io/github/contributors/kpwhri/spacy-ngram.svg?style=flat-square
[contributors-url]: https://github.com/kpwhri/spacy-ngram/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/kpwhri/spacy-ngram.svg?style=flat-square
[forks-url]: https://github.com/kpwhri/spacy-ngram/network/members
[stars-shield]: https://img.shields.io/github/stars/kpwhri/spacy-ngram.svg?style=flat-square
[stars-url]: https://github.com/kpwhri/spacy-ngram/stargazers
[issues-shield]: https://img.shields.io/github/issues/kpwhri/spacy-ngram.svg?style=flat-square
[issues-url]: https://github.com/kpwhri/spacy-ngram/issues
[license-shield]: https://img.shields.io/github/license/kpwhri/spacy-ngram.svg?style=flat-square
[license-url]: https://kpwhri.mit-license.org/
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555
[linkedin-url]: https://www.linkedin.com/company/kaiserpermanentewashingtonresearch
<!-- [product-screenshot]: images/screenshot.png -->
Raw data
{
"_id": null,
"home_page": "",
"name": "spacy-ngram",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "nlp,ngrams",
"author": "",
"author_email": "dcronkite <dcronkite+pypi@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/55/42/d27a7e2dea7a42e8ff911a4458926891b657a1f6e46db82ad3162a30c098/spacy_ngram-0.0.3.tar.gz",
"platform": null,
"description": "# spacy-ngram\n\n[![Contributors][contributors-shield]][contributors-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n\n\n\n<!-- PROJECT LOGO -->\n<br />\n<div>\n <p>\n <a href=\"https://github.com/kpwhri/spacy-ngram\">\n <!--img src=\"images/logo.png\" alt=\"Logo\"-->\n </a>\n </p>\n\n<h3 align=\"center\">spacy-ngram</h3>\n\n <p>\n SpaCy pipeline component for adding document or sentence-level ngrams.\n </p>\n</div>\n\n## Table of Contents\n\n* [About the Project](#about-the-project)\n* [Getting Started](#getting-started)\n * [Prerequisites](#prerequisites)\n * [Installation](#installation)\n* [Usage](#usage)\n* [Roadmap](#roadmap)\n* [Contributing](#contributing)\n* [License](#license)\n* [Contact](#contact)\n* [Acknowledgements](#acknowledgements)\n\n## About the Project\n\nSpaCy pipeline component for adding document or sentence-level ngrams.\n\n## Getting Started\n\n### Prerequisites\n\n* Python 3.10+\n\n### Installation\n\n1. Install from PyPI:\n\n```sh\npip install spacy-ngram\n```\n\n2. This will install `spacy`, but `spacy` requires a model:\n * E.g., download: `python -m spacy download en_core_web_sm`\n * Or, manually download and install with `pip install ...`\n\n## Usage\n\n### Quick Start\n\n`spacy-ngram` allows the creation of ngrams of any size. These will be added at either the document- or sentence-level.\n\n```python\nimport spacy\nfrom spacy_ngram import NgramComponent\n\nnlp = spacy.load('en_core_web_sm') # or whatever model you downloaded\nnlp.add_pipe('spacy-ngram') # default to document-level ngrams, removing stopwords\n\ntext = 'Quark soup is an interacting localized assembly of quarks and gluons.'\ndoc = nlp(text)\n\nprint(doc._.ngram_1)\n# ['quark', 'soup', 'interact', 'localize', 'assembly', 'quark', 'gluon']\n\nprint(doc._.ngram_2)\n# ['quark_soup', 'soup_interact', 'interact_localize', 'localize_assembly', 'assembly_quark', 'quark_gluon']\n```\n\n### Quick Reference\n\n`spacy-ngram` creates new extensions under the `Doc` and/or `Span` classes, depending on the parameters (it defaults\nto `Doc`). The extension begins with the prefix `ngram_` followed by the level of ngram desired (e.g., `ngram_1`).\n\n* unigram (`1` included in `ngrams` argument): `Doc._.ngram_1`\n* bigram (`2` included in `ngrams` argument): `Doc._.ngram_2`\n\n### Pipeline Parameters\n\nThe pipeline can be parametrized depending on needs. E.g., to process at the sentence-level:\n\n```python\nnlp.add_pipe('spacy-ngram', config={\n 'sentence_level': True, # initialize sentence-level ngrams\n 'doc_level': False, # skip processing at document-level\n 'ngrams': (2, 3), # bi- and trigram only\n})\ndoc = nlp(text)\nsentence = list(doc.sents)\n\nprint(sentence._.ngram_1)\n# raises AttributeError\nsentence._.ngram_2 # returns list of bigrams\nsentence._.ngram_3 # returns list of trigrams\n```\n\n| Parameter | Type | Default | Description |\n|------------------|--------------|----------|------------------------------------------------|\n| `ngrams` | `tuple[int]` | `(1, 2)` | 1 for unigram, 2 for bigram, etc. |\n| `include_bos` | `bool` | `False` | include `BOS` tags at end of sentence/document |\n| `include_eos` | `bool` | `False` | include `EOS` tags at end of sentence/document |\n| `sentence_level` | `bool` | `False` | perform ngram-extraction at sentence-level |\n| `doc_level` | `bool` | `True` | perform ngram-extraction at document-level |\n\n\n## Versions\n\nUses [SEMVER](https://semver.org/).\n\nSee https://github.com/kpwhri/spacy-ngram/releases.\n\n## Roadmap\n\nSee the [open issues](https://github.com/kpwhri/spacy-ngram/issues) for a list of proposed features (and known issues).\n\n## Contributing\n\nAny contributions you make are **greatly appreciated**.\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\n## License\n\nDistributed under the MIT License.\n\nSee `LICENSE` or https://kpwhri.mit-license.org for more information.\n\n\n\n<!-- CONTACT -->\n\n## Contact\n\nPlease use the [issue tracker](https://github.com/kpwhri/spacy-ngram/issues).\n\n\n<!-- ACKNOWLEDGEMENTS -->\n\n## Acknowledgements\n\n<!-- MARKDOWN LINKS & IMAGES -->\n<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->\n\n[contributors-shield]: https://img.shields.io/github/contributors/kpwhri/spacy-ngram.svg?style=flat-square\n\n[contributors-url]: https://github.com/kpwhri/spacy-ngram/graphs/contributors\n\n[forks-shield]: https://img.shields.io/github/forks/kpwhri/spacy-ngram.svg?style=flat-square\n\n[forks-url]: https://github.com/kpwhri/spacy-ngram/network/members\n\n[stars-shield]: https://img.shields.io/github/stars/kpwhri/spacy-ngram.svg?style=flat-square\n\n[stars-url]: https://github.com/kpwhri/spacy-ngram/stargazers\n\n[issues-shield]: https://img.shields.io/github/issues/kpwhri/spacy-ngram.svg?style=flat-square\n\n[issues-url]: https://github.com/kpwhri/spacy-ngram/issues\n\n[license-shield]: https://img.shields.io/github/license/kpwhri/spacy-ngram.svg?style=flat-square\n\n[license-url]: https://kpwhri.mit-license.org/\n\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555\n\n[linkedin-url]: https://www.linkedin.com/company/kaiserpermanentewashingtonresearch\n<!-- [product-screenshot]: images/screenshot.png -->\n",
"bugtrack_url": null,
"license": "",
"summary": "SpaCy pipeline component for adding document or sentence-level ngrams.",
"version": "0.0.3",
"project_urls": {
"Home": "https://github.com/kpwhri/spacy-ngram"
},
"split_keywords": [
"nlp",
"ngrams"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bfa5b2dfe976b7e66323c88208c6eacca45d1bc4beb2a3cabb4a9c8fed4e87d7",
"md5": "d801cb119cdf92799cda260641e8b33e",
"sha256": "5cad2ec422d5b2638cf0d46c2a9711f77b01f38e52fdaafe079a306a56a80a11"
},
"downloads": -1,
"filename": "spacy_ngram-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d801cb119cdf92799cda260641e8b33e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 5565,
"upload_time": "2023-07-25T19:44:13",
"upload_time_iso_8601": "2023-07-25T19:44:13.591313Z",
"url": "https://files.pythonhosted.org/packages/bf/a5/b2dfe976b7e66323c88208c6eacca45d1bc4beb2a3cabb4a9c8fed4e87d7/spacy_ngram-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5542d27a7e2dea7a42e8ff911a4458926891b657a1f6e46db82ad3162a30c098",
"md5": "337912e0118059e25582740aa1eeb481",
"sha256": "b84cd14221745828928afc15a888592a95eb79c2408f5160848478ed7cd783cc"
},
"downloads": -1,
"filename": "spacy_ngram-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "337912e0118059e25582740aa1eeb481",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 8544,
"upload_time": "2023-07-25T19:44:14",
"upload_time_iso_8601": "2023-07-25T19:44:14.571579Z",
"url": "https://files.pythonhosted.org/packages/55/42/d27a7e2dea7a42e8ff911a4458926891b657a1f6e46db82ad3162a30c098/spacy_ngram-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-25 19:44:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kpwhri",
"github_project": "spacy-ngram",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "spacy-ngram"
}