<p align="center"><img width="40%" src="docs/icon.png" /></p>
# negspacy: negation for spaCy
[![Build Status](https://dev.azure.com/jenopizzaro/negspacy/_apis/build/status/jenojp.negspacy?branchName=master)](https://dev.azure.com/jenopizzaro/negspacy/_build/latest?definitionId=2&branchName=master) [![Built with spaCy](https://img.shields.io/badge/made%20with%20❤%20and-spaCy-09a3d5.svg)](https://spacy.io) [![pypi Version](https://img.shields.io/pypi/v/negspacy.svg?style=flat-square)](https://pypi.org/project/negspacy/) [![DOI](https://zenodo.org/badge/201071164.svg)](https://zenodo.org/badge/latestdoi/201071164) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)
spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.
***NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries
Chapman, Bridewell, Hanbury, Cooper, Buchanan***
[https://doi.org/10.1006/jbin.2001.1029](https://doi.org/10.1006/jbin.2001.1029)
## What's new
Version 1.0 is a major version update providing support for spaCy 3.0's new interface for adding pipeline components. As a result, it is not backwards compatible with previous versions of negspacy.
If your project uses spaCy 2.3.5 or earlier, you will need to use version 0.1.9. See [archived readme](https://github.com/jenojp/negspacy/blob/v0.1.9_spacy_2.3.5/README.md).
## Installation and usage
Install the library.
```bash
pip install negspacy
```
Import library and spaCy.
```python
import spacy
from negspacy.negation import Negex
```
Load spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.
```python
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})
```
View negations.
```python
doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
print(e.text, e._.negex)
```
```console
Steve Jobs True
Apple False
```
Consider pairing with [scispacy](https://allenai.github.io/scispacy/) to find UMLS concepts in text and process negations.
## NegEx Patterns
* **pseudo_negations** - phrases that are false triggers, ambiguous negations, or double negatives
* **preceding_negations** - negation phrases that precede an entity
* **following_negations** - negation phrases that follow an entity
* **termination** - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")
### Termsets
Designate termset to use, `en_clinical` is used by default.
* `en` = phrases for general english language text
* `en_clinical` **DEFAULT** = adds phrases specific to clinical domain to general english
* `en_clinical_sensitive` = adds additional phrases to help rule out historical and possibly irrelevant entities
To set:
```python
from negspacy.negation import Negex
from negspacy.termsets import termset
ts = termset("en")
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
"negex",
config={
"neg_termset":ts.get_patterns()
}
)
```
## Additional Functionality
### Change patterns or view patterns in use
Replace all patterns with your own set
```python
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
"negex",
config={
"neg_termset":{
"pseudo_negations": ["might not"],
"preceding_negations": ["not"],
"following_negations":["declined"],
"termination": ["but","however"]
}
}
)
```
Add and remove individual patterns on the fly from built-in termsets
```python
from negspacy.termsets import termset
ts = termset("en")
ts.add_patterns({
"pseudo_negations": ["my favorite pattern"],
"termination": ["these are", "great patterns", "but"],
"preceding_negations": ["wow a negation"],
"following_negations": ["extra negation"],
})
#OR
ts.remove_patterns(
{
"termination": ["these are", "great patterns"],
"pseudo_negations": ["my favorite pattern"],
"preceding_negations": ["denied", "wow a negation"],
"following_negations": ["unlikely", "extra negation"],
}
)
```
View patterns in use
```python
from negspacy.termsets import termset
ts = termset("en_clinical")
print(ts.get_patterns())
```
### Negations in noun chunks
Depending on the Named Entity Recognition model you are using, you _may_ have negations "chunked together" with nouns. For example:
```python
nlp = spacy.load("en_core_sci_sm")
doc = nlp("There is no headache.")
for e in doc.ents:
print(e.text)
# no headache
```
This would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a ```chunk_prefix```:
```python
nlp = spacy.load("en_core_sci_sm")
ts = termset("en_clinical")
nlp.add_pipe(
"negex",
config={
"chunk_prefix": ["no"],
},
last=True,
)
doc = nlp("There is no headache.")
for e in doc.ents:
print(e.text, e._.negex)
# no headache True
```
## Contributing
[contributing](https://github.com/jenojp/negspacy/blob/master/CONTRIBUTING.md)
## Authors
* Jeno Pizarro
## License
[license](https://github.com/jenojp/negspacy/blob/master/LICENSE)
## Other libraries
This library is featured in the [spaCy Universe](https://spacy.io/universe). Check it out for other useful libraries and inspiration.
If you're looking for a spaCy pipeline object to extract values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results) take a look at [extractacy](https://github.com/jenojp/extractacy).
<p align="left"><img width="40%" src="https://github.com/jenojp/extractacy/blob/master/docs/icon.png?raw=true" /></p>
Raw data
{
"_id": null,
"home_page": "https://github.com/jenojp/negspacy",
"name": "negspacy",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6.0",
"maintainer_email": "",
"keywords": "nlp spacy SpaCy negation",
"author": "Jeno Pizarro",
"author_email": "jenopizzaro@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/6b/bd/ef5354a8900d534dd149186092efcd86c083b6eb0a635112da84a7769aa2/negspacy-1.0.4.tar.gz",
"platform": null,
"description": "\n<p align=\"center\"><img width=\"40%\" src=\"docs/icon.png\" /></p>\n\n\n# negspacy: negation for spaCy\n\n[![Build Status](https://dev.azure.com/jenopizzaro/negspacy/_apis/build/status/jenojp.negspacy?branchName=master)](https://dev.azure.com/jenopizzaro/negspacy/_build/latest?definitionId=2&branchName=master) [![Built with spaCy](https://img.shields.io/badge/made%20with%20\u2764%20and-spaCy-09a3d5.svg)](https://spacy.io) [![pypi Version](https://img.shields.io/pypi/v/negspacy.svg?style=flat-square)](https://pypi.org/project/negspacy/) [![DOI](https://zenodo.org/badge/201071164.svg)](https://zenodo.org/badge/latestdoi/201071164) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)\n\nspaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.\n\n***NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries\nChapman, Bridewell, Hanbury, Cooper, Buchanan***\n[https://doi.org/10.1006/jbin.2001.1029](https://doi.org/10.1006/jbin.2001.1029)\n\n## What's new\nVersion 1.0 is a major version update providing support for spaCy 3.0's new interface for adding pipeline components. As a result, it is not backwards compatible with previous versions of negspacy.\n\nIf your project uses spaCy 2.3.5 or earlier, you will need to use version 0.1.9. See [archived readme](https://github.com/jenojp/negspacy/blob/v0.1.9_spacy_2.3.5/README.md).\n\n## Installation and usage\nInstall the library.\n```bash\npip install negspacy\n```\n\nImport library and spaCy.\n```python\nimport spacy\nfrom negspacy.negation import Negex\n```\n\nLoad spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.\n```python\nnlp = spacy.load(\"en_core_web_sm\")\nnlp.add_pipe(\"negex\", config={\"ent_types\":[\"PERSON\",\"ORG\"]})\n\n```\n\nView negations.\n```python\ndoc = nlp(\"She does not like Steve Jobs but likes Apple products.\")\n\nfor e in doc.ents:\n\tprint(e.text, e._.negex)\n```\n\n```console\nSteve Jobs True\nApple False\n```\n\nConsider pairing with [scispacy](https://allenai.github.io/scispacy/) to find UMLS concepts in text and process negations.\n\n## NegEx Patterns\n\n* **pseudo_negations** - phrases that are false triggers, ambiguous negations, or double negatives\n* **preceding_negations** - negation phrases that precede an entity\n* **following_negations** - negation phrases that follow an entity\n* **termination** - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., \"but\")\n\n### Termsets\n\nDesignate termset to use, `en_clinical` is used by default.\n\n* `en` = phrases for general english language text\n* `en_clinical` **DEFAULT** = adds phrases specific to clinical domain to general english\n* `en_clinical_sensitive` = adds additional phrases to help rule out historical and possibly irrelevant entities\n\nTo set:\n```python\nfrom negspacy.negation import Negex\nfrom negspacy.termsets import termset\n\nts = termset(\"en\")\n\nnlp = spacy.load(\"en_core_web_sm\")\nnlp.add_pipe(\n \"negex\",\n config={\n \"neg_termset\":ts.get_patterns()\n }\n)\n\n```\n\n## Additional Functionality\n\n### Change patterns or view patterns in use\n\nReplace all patterns with your own set\n```python\nnlp = spacy.load(\"en_core_web_sm\")\nnlp.add_pipe(\n \"negex\", \n config={\n \"neg_termset\":{\n \"pseudo_negations\": [\"might not\"],\n \"preceding_negations\": [\"not\"],\n \"following_negations\":[\"declined\"],\n \"termination\": [\"but\",\"however\"]\n }\n }\n )\n```\n\nAdd and remove individual patterns on the fly from built-in termsets\n```python\nfrom negspacy.termsets import termset\nts = termset(\"en\")\nts.add_patterns({\n \"pseudo_negations\": [\"my favorite pattern\"],\n \"termination\": [\"these are\", \"great patterns\", \"but\"],\n \"preceding_negations\": [\"wow a negation\"],\n \"following_negations\": [\"extra negation\"],\n })\n#OR\nts.remove_patterns(\n {\n \"termination\": [\"these are\", \"great patterns\"],\n \"pseudo_negations\": [\"my favorite pattern\"],\n \"preceding_negations\": [\"denied\", \"wow a negation\"],\n \"following_negations\": [\"unlikely\", \"extra negation\"],\n }\n )\n```\n\nView patterns in use\n```python\nfrom negspacy.termsets import termset\nts = termset(\"en_clinical\")\nprint(ts.get_patterns())\n```\n\n\n### Negations in noun chunks\n\nDepending on the Named Entity Recognition model you are using, you _may_ have negations \"chunked together\" with nouns. For example:\n```python\nnlp = spacy.load(\"en_core_sci_sm\")\ndoc = nlp(\"There is no headache.\")\nfor e in doc.ents:\n print(e.text)\n\n# no headache\n```\nThis would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a ```chunk_prefix```:\n\n```python\nnlp = spacy.load(\"en_core_sci_sm\")\nts = termset(\"en_clinical\")\nnlp.add_pipe(\n \"negex\",\n config={\n \"chunk_prefix\": [\"no\"],\n },\n last=True,\n)\ndoc = nlp(\"There is no headache.\")\nfor e in doc.ents:\n print(e.text, e._.negex)\n\n# no headache True\n```\n\n\n## Contributing\n[contributing](https://github.com/jenojp/negspacy/blob/master/CONTRIBUTING.md)\n\n## Authors\n* Jeno Pizarro\n\n## License\n[license](https://github.com/jenojp/negspacy/blob/master/LICENSE)\n\n## Other libraries\n\nThis library is featured in the [spaCy Universe](https://spacy.io/universe). Check it out for other useful libraries and inspiration.\n\nIf you're looking for a spaCy pipeline object to extract values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results) take a look at [extractacy](https://github.com/jenojp/extractacy).\n\n<p align=\"left\"><img width=\"40%\" src=\"https://github.com/jenojp/extractacy/blob/master/docs/icon.png?raw=true\" /></p>\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A spaCy pipeline object for negation.",
"version": "1.0.4",
"project_urls": {
"Homepage": "https://github.com/jenojp/negspacy"
},
"split_keywords": [
"nlp",
"spacy",
"spacy",
"negation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6bbdef5354a8900d534dd149186092efcd86c083b6eb0a635112da84a7769aa2",
"md5": "1d868eb632c357ee4486168fce8c4390",
"sha256": "c068bfbbcd28a63969733c3310a40e557a802b8fbb6b79cd724d6261860abb7c"
},
"downloads": -1,
"filename": "negspacy-1.0.4.tar.gz",
"has_sig": false,
"md5_digest": "1d868eb632c357ee4486168fce8c4390",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 13743,
"upload_time": "2023-05-20T18:31:30",
"upload_time_iso_8601": "2023-05-20T18:31:30.103623Z",
"url": "https://files.pythonhosted.org/packages/6b/bd/ef5354a8900d534dd149186092efcd86c083b6eb0a635112da84a7769aa2/negspacy-1.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-20 18:31:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jenojp",
"github_project": "negspacy",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "negspacy"
}