[![GitHub Stars](https://img.shields.io/github/stars/wjbmattingly/number-spacy?style=social)](https://github.com/wjbmattingly/number-spacy)
[![PyPi Version](https://img.shields.io/pypi/v/number-spacy)](https://pypi.org/project/number-spacy/0.0.1/)
[![PyPi Downloads](https://img.shields.io/pypi/dm/number-spacy)](https://pypi.org/project/number-spacy/0.0.1/)
# Number spaCy
![number spacy logo](https://github.com/wjbmattingly/number-spacy/blob/main/images/number-spacy-logo.png?raw=true)
Number spaCy is a custom spaCy pipeline component that enhances the identification of number entities in text and fetches the parsed numeric values using spaCy's token extensions. It uses RegEx to identify number entities written in words and then leverages the [word2number](https://github.com/akshaynagpal/w2n) library to convert those words into structured numeric data. The output numeric value is stored in a custom entity extension: `._.number`.
This lightweight component can be seamlessly added to an existing spaCy pipeline or integrated into a blank model. If using within an existing spaCy pipeline, ensure to insert it before the NER model.
## Installation
To install Number spaCy, execute:
```bash
pip install number-spacy
```
## Usage
### Integrating the Component into your spaCy Pipeline
Begin by importing the `find_numbers` component and then integrating it into your spaCy pipeline:
```python
import spacy
from number_spacy import find_numbers
# Initialize your preferred spaCy model
nlp = spacy.blank('en')
# Integrate the component into the pipeline
nlp.add_pipe('find_numbers')
```
### Text Processing with the Pipeline
Post the component addition, you can process text as you typically would:
```python
doc = nlp("I have three apples. She gave me twenty-two more, and now I have twenty-five apples in total.")
```
### Retrieving the Parsed Numbers
You can loop through the entities in the `doc` and access the specific number extension:
```python
for ent in doc.ents:
if ent.label_ == "NUMBER":
print(f"Text: {ent.text} -> Parsed Number: {ent._.number}")
```
This should output:
```
Text: three -> Parsed Number: 3
Text: twenty-two -> Parsed Number: 22
Text: twenty-five -> Parsed Number: 25
```
Raw data
{
"_id": null,
"home_page": "https://github.com/wjbmattingly/number-spacy",
"name": "number-spacy",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "WJB Mattingly",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/db/ae/a3ec63882ffb200376d6c32269885f14e4fae6470fe3aac6c22030e25e88/number_spacy-0.0.1.tar.gz",
"platform": null,
"description": "[![GitHub Stars](https://img.shields.io/github/stars/wjbmattingly/number-spacy?style=social)](https://github.com/wjbmattingly/number-spacy)\r\n[![PyPi Version](https://img.shields.io/pypi/v/number-spacy)](https://pypi.org/project/number-spacy/0.0.1/)\r\n[![PyPi Downloads](https://img.shields.io/pypi/dm/number-spacy)](https://pypi.org/project/number-spacy/0.0.1/)\r\n\r\n# Number spaCy\r\n\r\n![number spacy logo](https://github.com/wjbmattingly/number-spacy/blob/main/images/number-spacy-logo.png?raw=true)\r\n\r\nNumber spaCy is a custom spaCy pipeline component that enhances the identification of number entities in text and fetches the parsed numeric values using spaCy's token extensions. It uses RegEx to identify number entities written in words and then leverages the [word2number](https://github.com/akshaynagpal/w2n) library to convert those words into structured numeric data. The output numeric value is stored in a custom entity extension: `._.number`.\r\n\r\nThis lightweight component can be seamlessly added to an existing spaCy pipeline or integrated into a blank model. If using within an existing spaCy pipeline, ensure to insert it before the NER model.\r\n\r\n## Installation\r\n\r\nTo install Number spaCy, execute:\r\n\r\n```bash\r\npip install number-spacy\r\n```\r\n\r\n## Usage\r\n\r\n### Integrating the Component into your spaCy Pipeline\r\n\r\nBegin by importing the `find_numbers` component and then integrating it into your spaCy pipeline:\r\n\r\n```python\r\nimport spacy\r\nfrom number_spacy import find_numbers\r\n\r\n# Initialize your preferred spaCy model\r\nnlp = spacy.blank('en')\r\n\r\n# Integrate the component into the pipeline\r\nnlp.add_pipe('find_numbers')\r\n```\r\n\r\n### Text Processing with the Pipeline\r\n\r\nPost the component addition, you can process text as you typically would:\r\n\r\n```python\r\ndoc = nlp(\"I have three apples. She gave me twenty-two more, and now I have twenty-five apples in total.\")\r\n```\r\n\r\n### Retrieving the Parsed Numbers\r\n\r\nYou can loop through the entities in the `doc` and access the specific number extension:\r\n\r\n```python\r\nfor ent in doc.ents:\r\n if ent.label_ == \"NUMBER\":\r\n print(f\"Text: {ent.text} -> Parsed Number: {ent._.number}\")\r\n```\r\n\r\nThis should output:\r\n\r\n```\r\nText: three -> Parsed Number: 3\r\nText: twenty-two -> Parsed Number: 22\r\nText: twenty-five -> Parsed Number: 25\r\n```\r\n",
"bugtrack_url": null,
"license": "",
"summary": "A spaCy extension for enhanced number entity recognition and extraction as structured data.",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/wjbmattingly/number-spacy"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dbaea3ec63882ffb200376d6c32269885f14e4fae6470fe3aac6c22030e25e88",
"md5": "faa29ce93f2b3db2d3113d44ae6b2c6a",
"sha256": "c4c27e7f7ed093223bc48abf24b13a4dc3444b2639860cd3267d754cdb36f372"
},
"downloads": -1,
"filename": "number_spacy-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "faa29ce93f2b3db2d3113d44ae6b2c6a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3267,
"upload_time": "2023-08-25T11:43:46",
"upload_time_iso_8601": "2023-08-25T11:43:46.401889Z",
"url": "https://files.pythonhosted.org/packages/db/ae/a3ec63882ffb200376d6c32269885f14e4fae6470fe3aac6c22030e25e88/number_spacy-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-25 11:43:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wjbmattingly",
"github_project": "number-spacy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "number-spacy"
}