Name | date-spacy JSON |
Version |
0.0.1
JSON |
| download |
home_page | https://github.com/wjbmattingly/date-spacy |
Summary | A spaCy extension for enhanced date and number entity recognition and extraction as structured data. |
upload_time | 2023-08-24 17:00:38 |
maintainer | |
docs_url | None |
author | WJB Mattingly |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Date spaCy
![date spacy logo](https://github.com/wjbmattingly/date-spacy/blob/main/images/date-spacy-logo.png?raw=true)
Date spaCy is a collection of custom spaCy pipeline component that enables you to easily identify date entities in a text and fetch the parsed date values using spaCy's token extensions. It uses RegEx to find dates and then uses the [dateparser](https://dateparser.readthedocs.io/en/latest/) library to convert those dates into structured datetime data. One current limitation is that if no year is given, it presumes it is the current year. The `dateparser` output is stored in a custom entity extension: `._.date`.
This lightweight approach can be added to an existing spaCy pipeline or to a blank model. If using in an existing spaCy pipeline, be sure to add it before the NER model.
## Installation
To install `date_spacy`, simply run:
```bash
pip install date-spacy
```
## Usage
### Adding the Component to your spaCy Pipeline
First, you'll need to import the `find_dates` component and add it to your spaCy pipeline:
```python
import spacy
from date_spacy import find_dates
# Load your desired spaCy model
nlp = spacy.blank('en')
# Add the component to the pipeline
nlp.add_pipe('find_dates')
```
### Processing Text with the Pipeline
After adding the component, you can process text as usual:
```python
doc = nlp("""The event is scheduled for 25th August 2023.
We also have a meeting on 10 September and another one on the twelfth of October and a
final one on January fourth.""")
```
### Accessing the Parsed Dates
You can iterate over the entities in the `doc` and access the special date extension:
```python
for ent in doc.ents:
if ent.label_ == "DATE":
print(f"Text: {ent.text} -> Parsed Date: {ent._.date}")
```
This will output:
```
Text: 25th August 2023 -> Parsed Date: 2023-08-25 00:00:00
Text: 10 September -> Parsed Date: 2023-09-10 00:00:00
Text: twelfth of October -> Parsed Date: 2023-10-12 00:00:00
Text: January fourth -> Parsed Date: 2023-01-04 00:00:00
```
Raw data
{
"_id": null,
"home_page": "https://github.com/wjbmattingly/date-spacy",
"name": "date-spacy",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "WJB Mattingly",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/fc/88/4db3f2ef3ac8737c81f413a523029f48ef530e5b92111dc7862c5b6ed96a/date_spacy-0.0.1.tar.gz",
"platform": null,
"description": "# Date spaCy\r\n\r\n![date spacy logo](https://github.com/wjbmattingly/date-spacy/blob/main/images/date-spacy-logo.png?raw=true)\r\n\r\nDate spaCy is a collection of custom spaCy pipeline component that enables you to easily identify date entities in a text and fetch the parsed date values using spaCy's token extensions. It uses RegEx to find dates and then uses the [dateparser](https://dateparser.readthedocs.io/en/latest/) library to convert those dates into structured datetime data. One current limitation is that if no year is given, it presumes it is the current year. The `dateparser` output is stored in a custom entity extension: `._.date`.\r\n\r\nThis lightweight approach can be added to an existing spaCy pipeline or to a blank model. If using in an existing spaCy pipeline, be sure to add it before the NER model.\r\n\r\n## Installation\r\n\r\nTo install `date_spacy`, simply run:\r\n\r\n```bash\r\npip install date-spacy\r\n```\r\n\r\n## Usage\r\n\r\n### Adding the Component to your spaCy Pipeline\r\n\r\nFirst, you'll need to import the `find_dates` component and add it to your spaCy pipeline:\r\n\r\n```python\r\nimport spacy\r\nfrom date_spacy import find_dates\r\n\r\n# Load your desired spaCy model\r\nnlp = spacy.blank('en')\r\n\r\n# Add the component to the pipeline\r\nnlp.add_pipe('find_dates')\r\n```\r\n\r\n### Processing Text with the Pipeline\r\n\r\nAfter adding the component, you can process text as usual:\r\n\r\n```python\r\ndoc = nlp(\"\"\"The event is scheduled for 25th August 2023.\r\n We also have a meeting on 10 September and another one on the twelfth of October and a\r\n final one on January fourth.\"\"\")\r\n```\r\n\r\n### Accessing the Parsed Dates\r\n\r\nYou can iterate over the entities in the `doc` and access the special date extension:\r\n\r\n```python\r\nfor ent in doc.ents:\r\n if ent.label_ == \"DATE\":\r\n print(f\"Text: {ent.text} -> Parsed Date: {ent._.date}\")\r\n```\r\n\r\nThis will output:\r\n\r\n```\r\nText: 25th August 2023 -> Parsed Date: 2023-08-25 00:00:00\r\nText: 10 September -> Parsed Date: 2023-09-10 00:00:00\r\nText: twelfth of October -> Parsed Date: 2023-10-12 00:00:00\r\nText: January fourth -> Parsed Date: 2023-01-04 00:00:00\r\n```\r\n",
"bugtrack_url": null,
"license": "",
"summary": "A spaCy extension for enhanced date and number entity recognition and extraction as structured data.",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/wjbmattingly/date-spacy"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ab21eb10065730aa93392af1ba902aaff1ccd3a3eb460d8d0392695840c1630a",
"md5": "60da3e7d84dfaf0049ca28d9d22d812f",
"sha256": "b8c8b6bcb60419b8caa81e087168b98a2accfce3784de6c5181ae07b74dd433e"
},
"downloads": -1,
"filename": "date_spacy-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "60da3e7d84dfaf0049ca28d9d22d812f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 3926,
"upload_time": "2023-08-24T17:00:36",
"upload_time_iso_8601": "2023-08-24T17:00:36.243850Z",
"url": "https://files.pythonhosted.org/packages/ab/21/eb10065730aa93392af1ba902aaff1ccd3a3eb460d8d0392695840c1630a/date_spacy-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fc884db3f2ef3ac8737c81f413a523029f48ef530e5b92111dc7862c5b6ed96a",
"md5": "8d0f24f20b53aef7dd4995ce671fcdae",
"sha256": "e4e4c21f1030e08fc5da08f6787ce5fce6554c162ad65b63e81af84ca46c47cd"
},
"downloads": -1,
"filename": "date_spacy-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "8d0f24f20b53aef7dd4995ce671fcdae",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3671,
"upload_time": "2023-08-24T17:00:38",
"upload_time_iso_8601": "2023-08-24T17:00:38.036454Z",
"url": "https://files.pythonhosted.org/packages/fc/88/4db3f2ef3ac8737c81f413a523029f48ef530e5b92111dc7862c5b6ed96a/date_spacy-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-24 17:00:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wjbmattingly",
"github_project": "date-spacy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "date-spacy"
}