Name | gender-spacy JSON |
Version |
0.0.5
JSON |
| download |
home_page | None |
Summary | A spaCy component for identifying grammatical gender in English texts. |
upload_time | 2024-04-02 02:17:55 |
maintainer | None |
docs_url | None |
author | WJB Mattingly |
requires_python | None |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
![gender spacy logo](https://github.com/sidatasciencelab/gender-spacy/raw/main/images/genderspacy-logo.png)
# About
Gender spaCy is a heuristic and machine learning pipeline that allows users to identify gender in an ethical way using gender-specific context. It is designed to sit alongside a standard spaCy pipeline (only English supported currently). The majority of the pipeline is rules-based, relying on titles and pronouns to identify gender as presented in the text. **It is important to note that this pipeline does not seek to assign gender to an individual, rather contextually identify an entity's gender within the context of a text.**
There are Python libraries, such as gender-resolver that assign gender based on the statistical usage of first names in a given region. This, however, gets into problematic territory and is not as reliable as gender-based context (such as titles and pronouns). As a result, this pipeline opts out of leveraging these libraries. Instead, entities identified as PERSON by the spaCy NER model are altered to the span label of PERSON_UNKNOWN. Next, this pipeline leverages the new experimental coreference resolution model from ExplosionAI. It looks at all clusters of linked tokens. If any of them align with PERSON_UNKNOWN tags *and* gender-specific pronouns are used, the entity's label is changed to a gender-specific label, e.g. PERSON_FEMALE, PERSON_MALE, PERSON_NEUTRAL. In addition, terms that are nouns that are linked to a specific person receive the tag "REL_MALE/FEMALE_COREF".
In addition to this, all gender-neutral pronouns are also identified and labeled as spans. This includes male, female, and gender neutral pronouns. Even transformer models have difficulty correctly parsing certain gender neutral pronouns due to their toponym nature, such as "per" which can function in English as an adverb (Per our discusion yesterday, I want to go to the store.) or as a gender neutral pronoun (Per went to the store yesterday). With a few extra rules, Gender spaCy corrects the POS tags for these toponyms in addition to placing all pronouns in the spans ruler.
Users can access all gender span data under doc.spans["ruler].
# Installation
Because this pipeline leverages spaCy's new experimental coreference resolution model, it is best to install Gender spaCy in a fresh environment.
First, it is good to create a new environment.
```python
conda create --name="gender-spacy" python=3.9
```
Now, activate the environment:
```python
conda activate gender-spacy
```
Next, install GenderSpaCy
```python
pip install gender-spacy
```
Finally, for the pipeline to perform coreference resolution, you should install the latest version of the spaCy experimental coreference resolution model.
```python
pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl
```
# Usage
```python
# import the library
from gender_spacy import gender_spacy as gs
# create the GenderParser nlp class.
# This will take one argument: the spaCy model you wish to use
nlp = gs.GenderParser("en_core_web_sm")
# create a text and pass it to the the nlp via the process_doc() method.
text = """
Maya Angelou was an American memoirist, popular poet, and civil rights activist. She published seven autobiographies, three books of essays, several books of poetry, and is credited with a list of plays, movies, and television shows spanning over 50 years.
Jerome Allen Seinfeld is an American stand-up comedian, actor, writer, and producer. He is best known for playing a semi-fictionalized version of himself in the sitcom Seinfeld (1989–1998), which he created and wrote with Larry David.
"""
doc = nlp.process_doc(text)
# perform coreference resolution on the doc container
# This part of the library comes from spacy-experimental
doc = nlp.coref_resolution()
# Visualize the result:
nlp.visualize()
```
## Expected Result
![result demo](https://github.com/sidatasciencelab/gender-spacy/raw/main/images/demo.JPG)
# CITATIONS
Source for gender pronouns: https://uwm.edu/lgbtrc/support/gender-pronouns/
Source for Coreference Resolution: https://explosion.ai/blog/coref
Discussion for Coref Code: https://github.com/explosion/spaCy/discussions/11585
Raw data
{
"_id": null,
"home_page": null,
"name": "gender-spacy",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "WJB Mattingly",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/83/47/521eb35dbd3cd2e4ba8b9fc7d2bea6a40d19b1928aa983cb7f5f97b2cc38/gender-spacy-0.0.5.tar.gz",
"platform": null,
"description": "\n![gender spacy logo](https://github.com/sidatasciencelab/gender-spacy/raw/main/images/genderspacy-logo.png)\n\n# About\n\nGender spaCy is a heuristic and machine learning pipeline that allows users to identify gender in an ethical way using gender-specific context. It is designed to sit alongside a standard spaCy pipeline (only English supported currently). The majority of the pipeline is rules-based, relying on titles and pronouns to identify gender as presented in the text. **It is important to note that this pipeline does not seek to assign gender to an individual, rather contextually identify an entity's gender within the context of a text.**\n\nThere are Python libraries, such as gender-resolver that assign gender based on the statistical usage of first names in a given region. This, however, gets into problematic territory and is not as reliable as gender-based context (such as titles and pronouns). As a result, this pipeline opts out of leveraging these libraries. Instead, entities identified as PERSON by the spaCy NER model are altered to the span label of PERSON_UNKNOWN. Next, this pipeline leverages the new experimental coreference resolution model from ExplosionAI. It looks at all clusters of linked tokens. If any of them align with PERSON_UNKNOWN tags *and* gender-specific pronouns are used, the entity's label is changed to a gender-specific label, e.g. PERSON_FEMALE, PERSON_MALE, PERSON_NEUTRAL. In addition, terms that are nouns that are linked to a specific person receive the tag \"REL_MALE/FEMALE_COREF\".\n\nIn addition to this, all gender-neutral pronouns are also identified and labeled as spans. This includes male, female, and gender neutral pronouns. Even transformer models have difficulty correctly parsing certain gender neutral pronouns due to their toponym nature, such as \"per\" which can function in English as an adverb (Per our discusion yesterday, I want to go to the store.) or as a gender neutral pronoun (Per went to the store yesterday). With a few extra rules, Gender spaCy corrects the POS tags for these toponyms in addition to placing all pronouns in the spans ruler.\n\nUsers can access all gender span data under doc.spans[\"ruler].\n\n# Installation\n\nBecause this pipeline leverages spaCy's new experimental coreference resolution model, it is best to install Gender spaCy in a fresh environment.\n\nFirst, it is good to create a new environment.\n\n```python\nconda create --name=\"gender-spacy\" python=3.9\n```\n\nNow, activate the environment:\n\n```python\nconda activate gender-spacy\n```\n\nNext, install GenderSpaCy\n\n```python\npip install gender-spacy\n```\n\nFinally, for the pipeline to perform coreference resolution, you should install the latest version of the spaCy experimental coreference resolution model.\n\n```python\npip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl\n```\n\n\n\n# Usage\n\n```python\n\n# import the library\nfrom gender_spacy import gender_spacy as gs\n\n# create the GenderParser nlp class.\n# This will take one argument: the spaCy model you wish to use\nnlp = gs.GenderParser(\"en_core_web_sm\")\n\n# create a text and pass it to the the nlp via the process_doc() method.\ntext = \"\"\"\nMaya Angelou was an American memoirist, popular poet, and civil rights activist. She published seven autobiographies, three books of essays, several books of poetry, and is credited with a list of plays, movies, and television shows spanning over 50 years.\n\nJerome Allen Seinfeld is an American stand-up comedian, actor, writer, and producer. He is best known for playing a semi-fictionalized version of himself in the sitcom Seinfeld (1989\u20131998), which he created and wrote with Larry David.\n\"\"\"\ndoc = nlp.process_doc(text)\n\n# perform coreference resolution on the doc container\n# This part of the library comes from spacy-experimental\ndoc = nlp.coref_resolution()\n\n# Visualize the result:\nnlp.visualize()\n```\n\n## Expected Result\n\n![result demo](https://github.com/sidatasciencelab/gender-spacy/raw/main/images/demo.JPG)\n\n\n\n# CITATIONS\nSource for gender pronouns: https://uwm.edu/lgbtrc/support/gender-pronouns/\n\nSource for Coreference Resolution: https://explosion.ai/blog/coref\n\nDiscussion for Coref Code: https://github.com/explosion/spaCy/discussions/11585\n",
"bugtrack_url": null,
"license": null,
"summary": "A spaCy component for identifying grammatical gender in English texts.",
"version": "0.0.5",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e6ecea90caa968c22f1fd5d74aa6caa15008e73fe5a7a3ee78161bec067dd3e1",
"md5": "44c6c96c69ffc49cb2e467ac9da86058",
"sha256": "f5f060bdf4d5f4b0701090120b44044bccf03789c367627eeb1b5659be26c4b8"
},
"downloads": -1,
"filename": "gender_spacy-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "44c6c96c69ffc49cb2e467ac9da86058",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9280,
"upload_time": "2024-04-02T02:17:52",
"upload_time_iso_8601": "2024-04-02T02:17:52.608726Z",
"url": "https://files.pythonhosted.org/packages/e6/ec/ea90caa968c22f1fd5d74aa6caa15008e73fe5a7a3ee78161bec067dd3e1/gender_spacy-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8347521eb35dbd3cd2e4ba8b9fc7d2bea6a40d19b1928aa983cb7f5f97b2cc38",
"md5": "8657fefb0225f98e93a53a2361cc9219",
"sha256": "2168c7788ca62013b9f966a3ca8e6d774c069da5fe15f391dc083390c7597ff8"
},
"downloads": -1,
"filename": "gender-spacy-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "8657fefb0225f98e93a53a2361cc9219",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10011,
"upload_time": "2024-04-02T02:17:55",
"upload_time_iso_8601": "2024-04-02T02:17:55.107829Z",
"url": "https://files.pythonhosted.org/packages/83/47/521eb35dbd3cd2e4ba8b9fc7d2bea6a40d19b1928aa983cb7f5f97b2cc38/gender-spacy-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-02 02:17:55",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "gender-spacy"
}