date-spacy


Namedate-spacy JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/wjbmattingly/date-spacy
SummaryA spaCy extension for enhanced date and number entity recognition and extraction as structured data.
upload_time2023-08-24 17:00:38
maintainer
docs_urlNone
authorWJB Mattingly
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Date spaCy

![date spacy logo](https://github.com/wjbmattingly/date-spacy/blob/main/images/date-spacy-logo.png?raw=true)

Date spaCy is a collection of custom spaCy pipeline component that enables you to easily identify date entities in a text and fetch the parsed date values using spaCy's token extensions. It uses RegEx to find dates and then uses the [dateparser](https://dateparser.readthedocs.io/en/latest/) library to convert those dates into structured datetime data. One current limitation is that if no year is given, it presumes it is the current year. The `dateparser` output is stored in a custom entity extension: `._.date`.

This lightweight approach can be added to an existing spaCy pipeline or to a blank model. If using in an existing spaCy pipeline, be sure to add it before the NER model.

## Installation

To install `date_spacy`, simply run:

```bash
pip install date-spacy
```

## Usage

### Adding the Component to your spaCy Pipeline

First, you'll need to import the `find_dates` component and add it to your spaCy pipeline:

```python
import spacy
from date_spacy import find_dates

# Load your desired spaCy model
nlp = spacy.blank('en')

# Add the component to the pipeline
nlp.add_pipe('find_dates')
```

### Processing Text with the Pipeline

After adding the component, you can process text as usual:

```python
doc = nlp("""The event is scheduled for 25th August 2023.
          We also have a meeting on 10 September and another one on the twelfth of October and a
          final one on January fourth.""")
```

### Accessing the Parsed Dates

You can iterate over the entities in the `doc` and access the special date extension:

```python
for ent in doc.ents:
    if ent.label_ == "DATE":
        print(f"Text: {ent.text} -> Parsed Date: {ent._.date}")
```

This will output:

```
Text: 25th August 2023 -> Parsed Date: 2023-08-25 00:00:00
Text: 10 September -> Parsed Date: 2023-09-10 00:00:00
Text: twelfth of October -> Parsed Date: 2023-10-12 00:00:00
Text: January fourth -> Parsed Date: 2023-01-04 00:00:00
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wjbmattingly/date-spacy",
    "name": "date-spacy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "WJB Mattingly",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/fc/88/4db3f2ef3ac8737c81f413a523029f48ef530e5b92111dc7862c5b6ed96a/date_spacy-0.0.1.tar.gz",
    "platform": null,
    "description": "# Date spaCy\r\n\r\n![date spacy logo](https://github.com/wjbmattingly/date-spacy/blob/main/images/date-spacy-logo.png?raw=true)\r\n\r\nDate spaCy is a collection of custom spaCy pipeline component that enables you to easily identify date entities in a text and fetch the parsed date values using spaCy's token extensions. It uses RegEx to find dates and then uses the [dateparser](https://dateparser.readthedocs.io/en/latest/) library to convert those dates into structured datetime data. One current limitation is that if no year is given, it presumes it is the current year. The `dateparser` output is stored in a custom entity extension: `._.date`.\r\n\r\nThis lightweight approach can be added to an existing spaCy pipeline or to a blank model. If using in an existing spaCy pipeline, be sure to add it before the NER model.\r\n\r\n## Installation\r\n\r\nTo install `date_spacy`, simply run:\r\n\r\n```bash\r\npip install date-spacy\r\n```\r\n\r\n## Usage\r\n\r\n### Adding the Component to your spaCy Pipeline\r\n\r\nFirst, you'll need to import the `find_dates` component and add it to your spaCy pipeline:\r\n\r\n```python\r\nimport spacy\r\nfrom date_spacy import find_dates\r\n\r\n# Load your desired spaCy model\r\nnlp = spacy.blank('en')\r\n\r\n# Add the component to the pipeline\r\nnlp.add_pipe('find_dates')\r\n```\r\n\r\n### Processing Text with the Pipeline\r\n\r\nAfter adding the component, you can process text as usual:\r\n\r\n```python\r\ndoc = nlp(\"\"\"The event is scheduled for 25th August 2023.\r\n          We also have a meeting on 10 September and another one on the twelfth of October and a\r\n          final one on January fourth.\"\"\")\r\n```\r\n\r\n### Accessing the Parsed Dates\r\n\r\nYou can iterate over the entities in the `doc` and access the special date extension:\r\n\r\n```python\r\nfor ent in doc.ents:\r\n    if ent.label_ == \"DATE\":\r\n        print(f\"Text: {ent.text} -> Parsed Date: {ent._.date}\")\r\n```\r\n\r\nThis will output:\r\n\r\n```\r\nText: 25th August 2023 -> Parsed Date: 2023-08-25 00:00:00\r\nText: 10 September -> Parsed Date: 2023-09-10 00:00:00\r\nText: twelfth of October -> Parsed Date: 2023-10-12 00:00:00\r\nText: January fourth -> Parsed Date: 2023-01-04 00:00:00\r\n```\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A spaCy extension for enhanced date and number entity recognition and extraction as structured data.",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/wjbmattingly/date-spacy"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab21eb10065730aa93392af1ba902aaff1ccd3a3eb460d8d0392695840c1630a",
                "md5": "60da3e7d84dfaf0049ca28d9d22d812f",
                "sha256": "b8c8b6bcb60419b8caa81e087168b98a2accfce3784de6c5181ae07b74dd433e"
            },
            "downloads": -1,
            "filename": "date_spacy-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "60da3e7d84dfaf0049ca28d9d22d812f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 3926,
            "upload_time": "2023-08-24T17:00:36",
            "upload_time_iso_8601": "2023-08-24T17:00:36.243850Z",
            "url": "https://files.pythonhosted.org/packages/ab/21/eb10065730aa93392af1ba902aaff1ccd3a3eb460d8d0392695840c1630a/date_spacy-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc884db3f2ef3ac8737c81f413a523029f48ef530e5b92111dc7862c5b6ed96a",
                "md5": "8d0f24f20b53aef7dd4995ce671fcdae",
                "sha256": "e4e4c21f1030e08fc5da08f6787ce5fce6554c162ad65b63e81af84ca46c47cd"
            },
            "downloads": -1,
            "filename": "date_spacy-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8d0f24f20b53aef7dd4995ce671fcdae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 3671,
            "upload_time": "2023-08-24T17:00:38",
            "upload_time_iso_8601": "2023-08-24T17:00:38.036454Z",
            "url": "https://files.pythonhosted.org/packages/fc/88/4db3f2ef3ac8737c81f413a523029f48ef530e5b92111dc7862c5b6ed96a/date_spacy-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-24 17:00:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wjbmattingly",
    "github_project": "date-spacy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "date-spacy"
}
        
Elapsed time: 0.22328s