sayswho


Namesayswho JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/afriedman412/sayswho
SummaryQuote identification, attribution and resolution.
upload_time2023-07-16 00:46:15
maintainer
docs_urlNone
authorAndy Friedman
requires_python>=3.9,<4.0
licenseMIT
keywords nlp natural-language-processing spacy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SaysWho
**SaysWho** is a Python package for identifying and attributing quotes in text. It uses a combination of logic and grammer to find quotes and their speakers, then uses a [coreferencing model](https://explosion.ai/blog/coref) to better clarify who is speaking. It's built on [Textacy](https://textacy.readthedocs.io/en/latest/) and [SpaCy](https://spacy.io/).

## Notes
- Corefencing is an experimental feature not fully integrated into SpaCy, and the current pipeline is built on SpaCy 3.4. I haven't had any problems using it with SpaCy 3.5+, but it takes some finesse to navigate the different versions.

- SaysWho grew out of a larger project for analyzing newspaper articles from Lexis between ~250 and ~2000 words, and it is optimized to navitage the syntax and common errors particular to that text.

- The output of this version is kind of open-ended, and possibly not as useful as it could be. HTML viz is coming, but I'm open to any suggestions about how this could be more useful!

## Installation
Install and update using [pip](https://pip.pypa.io/en/stable/):

```
$ pip install sayswho
```

You will probably need to do this to navigate some versioning issues (see [Notes](#notes))

```
$ pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl
$ pip install spacy -U
$ spacy download en_core_web_lg
```

## A Simple Example

##### Sample text adapted from [here](https://sports.yahoo.com/nets-jacque-vaughn-looking-forward-150705556.html):
> Nets Coach Jacque Vaughn was optimistic when discussing Ben Simmons's prospects on NBA TV.
> 
> “It’s been great, being able to check in with Ben," Vaughn said, via Nets Daily. “I look forward to coaching a healthy Ben Simmons. The team is excited to have him healthy, being part of our program and moving forward.
> 
> "He has an innate ability to impact the basketball game on both ends of the floor. So, we missed that in the Philly series and looking forward to it.”
> 
> Simmons arrived in Brooklyn during the 2021-22 season, but did not play that year after a back injury. The 26-year-old would make 42 appearances (33 starts) during a tumult-filled season for Brooklyn.
> 
> “He is on the court. No setbacks," Vaughn later told reporters about Simmons' workouts. “We’ll continue to see him improve through the offseason.”


#### Instantiate `SaysWho` and run `.attribute` on target text.

```python
from sayswho import SaysWho

sw = SaysWho(text)
```


#### See speaker, cue and content of every quote with `.quotes`.


```python
print(sw.quotes)
```

```
[DQTriple(speaker=[Vaughn], cue=[said], content=“It’s been great, being able to check in with Ben,"),
 DQTriple(speaker=[Vaughn], cue=[said], content=“I look forward to coaching a healthy Ben Simmons. The team is excited to have him healthy, being part of our program and moving forward."),
 DQTriple(speaker=[Vaughn], cue=[told], content=“He is on the court. No setbacks,"),
 DQTriple(speaker=[Vaughn], cue=[told], content=“We’ll continue to see him improve through the offseason.”)]
```



#### See resolved entity clusters with `.clusters`.


```python
print(sw.clusters)
```

```
[[Ben Simmons's,
  Ben,
  a healthy Ben Simmons,
  him,
  He,
  Simmons,
  The 26-year-old,
  He,
  Simmons'x,
  him],
 [Nets Coach Jacque Vaughn, Vaughn, I, Vaughn],
 [Nets, The team, our, we],
 [an innate ability to impact the basketball game on both ends of the floor,
  that,
  it],
 [the 2021-22 season, that year],
 [Brooklyn, Brooklyn, We]]
```



#### Use `.print_clusters()` to see unique text in each cluster, easier to read.


```python
sw.print_clusters()
```
```
0 {'Ben', 'He', 'The 26-year-old', 'a healthy Ben Simmons', "Simmons'x", "Ben Simmons's", 'Simmons', 'him'}
1 {'I', 'Nets Coach Jacque Vaughn', 'Vaughn'}
2 {'The team', 'our', 'we', 'Nets'}
3 {'it', 'an innate ability to impact the basketball game on both ends of the floor', 'that'}
4 {'that year', 'the 2021-22 season'}
5 {'Brooklyn', 'We'}
```


#### Quote/cluster matches are saved to `.quote_matches` as `namedtuples`.


```python
for qm in sw.quote_matches:
    print(qm)
```
```
QuoteClusterMatch(quote_index=0, cluster_index=1)
QuoteClusterMatch(quote_index=1, cluster_index=1)
QuoteClusterMatch(quote_index=2, cluster_index=1)
QuoteClusterMatch(quote_index=3, cluster_index=1)
```


#### Use `.expand_match()` to view and interpret quote/cluster matches.


```python
sw.expand_match()
```
```
QUOTE : 0
 DQTriple(speaker=[Vaughn], cue=[said], content=“It’s been great, being able to check in with Ben,") 

CLUSTER : 1
 ['Nets Coach Jacque Vaughn', 'Vaughn'] 

QUOTE : 1
 DQTriple(speaker=[Vaughn], cue=[said], content=“I look forward to coaching a healthy Ben Simmons. The team is excited to have him healthy, being part of our program and moving forward.") 

CLUSTER : 1
 ['Nets Coach Jacque Vaughn', 'Vaughn'] 

QUOTE : 2
 DQTriple(speaker=[Vaughn], cue=[told], content=“He is on the court. No setbacks,") 

CLUSTER : 1
 ['Nets Coach Jacque Vaughn', 'Vaughn'] 

QUOTE : 3
 DQTriple(speaker=[Vaughn], cue=[told], content=“We’ll continue to see him improve through the offseason.”) 

CLUSTER : 1
 ['Nets Coach Jacque Vaughn', 'Vaughn'] 
```

    



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/afriedman412/sayswho",
    "name": "sayswho",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "nlp,natural-language-processing,spacy",
    "author": "Andy Friedman",
    "author_email": "afriedman412@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/95/8c/65492b607846e710f60467e88623a815e6d142e89edf8ffba559ee158dce/sayswho-0.1.2.tar.gz",
    "platform": null,
    "description": "# SaysWho\n**SaysWho** is a Python package for identifying and attributing quotes in text. It uses a combination of logic and grammer to find quotes and their speakers, then uses a [coreferencing model](https://explosion.ai/blog/coref) to better clarify who is speaking. It's built on [Textacy](https://textacy.readthedocs.io/en/latest/) and [SpaCy](https://spacy.io/).\n\n## Notes\n- Corefencing is an experimental feature not fully integrated into SpaCy, and the current pipeline is built on SpaCy 3.4. I haven't had any problems using it with SpaCy 3.5+, but it takes some finesse to navigate the different versions.\n\n- SaysWho grew out of a larger project for analyzing newspaper articles from Lexis between ~250 and ~2000 words, and it is optimized to navitage the syntax and common errors particular to that text.\n\n- The output of this version is kind of open-ended, and possibly not as useful as it could be. HTML viz is coming, but I'm open to any suggestions about how this could be more useful!\n\n## Installation\nInstall and update using [pip](https://pip.pypa.io/en/stable/):\n\n```\n$ pip install sayswho\n```\n\nYou will probably need to do this to navigate some versioning issues (see [Notes](#notes))\n\n```\n$ pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl\n$ pip install spacy -U\n$ spacy download en_core_web_lg\n```\n\n## A Simple Example\n\n##### Sample text adapted from [here](https://sports.yahoo.com/nets-jacque-vaughn-looking-forward-150705556.html):\n> Nets Coach Jacque Vaughn was optimistic when discussing Ben Simmons's prospects on NBA TV.\n> \n> \u201cIt\u2019s been great, being able to check in with Ben,\" Vaughn said, via Nets Daily. \u201cI look forward to coaching a healthy Ben Simmons. The team is excited to have him healthy, being part of our program and moving forward.\n> \n> \"He has an innate ability to impact the basketball game on both ends of the floor. So, we missed that in the Philly series and looking forward to it.\u201d\n> \n> Simmons arrived in Brooklyn during the 2021-22 season, but did not play that year after a back injury. The 26-year-old would make 42 appearances (33 starts) during a tumult-filled season for Brooklyn.\n> \n> \u201cHe is on the court. No setbacks,\" Vaughn later told reporters about Simmons' workouts. \u201cWe\u2019ll continue to see him improve through the offseason.\u201d\n\n\n#### Instantiate `SaysWho` and run `.attribute` on target text.\n\n```python\nfrom sayswho import SaysWho\n\nsw = SaysWho(text)\n```\n\n\n#### See speaker, cue and content of every quote with `.quotes`.\n\n\n```python\nprint(sw.quotes)\n```\n\n```\n[DQTriple(speaker=[Vaughn], cue=[said], content=\u201cIt\u2019s been great, being able to check in with Ben,\"),\n DQTriple(speaker=[Vaughn], cue=[said], content=\u201cI look forward to coaching a healthy Ben Simmons. The team is excited to have him healthy, being part of our program and moving forward.\"),\n DQTriple(speaker=[Vaughn], cue=[told], content=\u201cHe is on the court. No setbacks,\"),\n DQTriple(speaker=[Vaughn], cue=[told], content=\u201cWe\u2019ll continue to see him improve through the offseason.\u201d)]\n```\n\n\n\n#### See resolved entity clusters with `.clusters`.\n\n\n```python\nprint(sw.clusters)\n```\n\n```\n[[Ben Simmons's,\n  Ben,\n  a healthy Ben Simmons,\n  him,\n  He,\n  Simmons,\n  The 26-year-old,\n  He,\n  Simmons'x,\n  him],\n [Nets Coach Jacque Vaughn, Vaughn, I, Vaughn],\n [Nets, The team, our, we],\n [an innate ability to impact the basketball game on both ends of the floor,\n  that,\n  it],\n [the 2021-22 season, that year],\n [Brooklyn, Brooklyn, We]]\n```\n\n\n\n#### Use `.print_clusters()` to see unique text in each cluster, easier to read.\n\n\n```python\nsw.print_clusters()\n```\n```\n0 {'Ben', 'He', 'The 26-year-old', 'a healthy Ben Simmons', \"Simmons'x\", \"Ben Simmons's\", 'Simmons', 'him'}\n1 {'I', 'Nets Coach Jacque Vaughn', 'Vaughn'}\n2 {'The team', 'our', 'we', 'Nets'}\n3 {'it', 'an innate ability to impact the basketball game on both ends of the floor', 'that'}\n4 {'that year', 'the 2021-22 season'}\n5 {'Brooklyn', 'We'}\n```\n\n\n#### Quote/cluster matches are saved to `.quote_matches` as `namedtuples`.\n\n\n```python\nfor qm in sw.quote_matches:\n    print(qm)\n```\n```\nQuoteClusterMatch(quote_index=0, cluster_index=1)\nQuoteClusterMatch(quote_index=1, cluster_index=1)\nQuoteClusterMatch(quote_index=2, cluster_index=1)\nQuoteClusterMatch(quote_index=3, cluster_index=1)\n```\n\n\n#### Use `.expand_match()` to view and interpret quote/cluster matches.\n\n\n```python\nsw.expand_match()\n```\n```\nQUOTE : 0\n DQTriple(speaker=[Vaughn], cue=[said], content=\u201cIt\u2019s been great, being able to check in with Ben,\") \n\nCLUSTER : 1\n ['Nets Coach Jacque Vaughn', 'Vaughn'] \n\nQUOTE : 1\n DQTriple(speaker=[Vaughn], cue=[said], content=\u201cI look forward to coaching a healthy Ben Simmons. The team is excited to have him healthy, being part of our program and moving forward.\") \n\nCLUSTER : 1\n ['Nets Coach Jacque Vaughn', 'Vaughn'] \n\nQUOTE : 2\n DQTriple(speaker=[Vaughn], cue=[told], content=\u201cHe is on the court. No setbacks,\") \n\nCLUSTER : 1\n ['Nets Coach Jacque Vaughn', 'Vaughn'] \n\nQUOTE : 3\n DQTriple(speaker=[Vaughn], cue=[told], content=\u201cWe\u2019ll continue to see him improve through the offseason.\u201d) \n\nCLUSTER : 1\n ['Nets Coach Jacque Vaughn', 'Vaughn'] \n```\n\n    \n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Quote identification, attribution and resolution.",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/afriedman412/sayswho",
        "Repository": "https://github.com/afriedman412/sayswho"
    },
    "split_keywords": [
        "nlp",
        "natural-language-processing",
        "spacy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c35d9fa3bf101f6efe2af74e60653d522cf70e8b5d507378971060f0e3aadf37",
                "md5": "1d77c009e957625e26a0f3182cf3cf90",
                "sha256": "ebe5b4679965167a5ade4f4a9db6d02f323082ae0334c49043198886514f41cb"
            },
            "downloads": -1,
            "filename": "sayswho-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1d77c009e957625e26a0f3182cf3cf90",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 15460,
            "upload_time": "2023-07-16T00:46:13",
            "upload_time_iso_8601": "2023-07-16T00:46:13.446814Z",
            "url": "https://files.pythonhosted.org/packages/c3/5d/9fa3bf101f6efe2af74e60653d522cf70e8b5d507378971060f0e3aadf37/sayswho-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "958c65492b607846e710f60467e88623a815e6d142e89edf8ffba559ee158dce",
                "md5": "268167fc3b1a0019e97e7dd911cd4ea3",
                "sha256": "613479b591fa217fd93bdc8872d0dc1db6d49f862429cfcf6f0239af015d8381"
            },
            "downloads": -1,
            "filename": "sayswho-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "268167fc3b1a0019e97e7dd911cd4ea3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 14881,
            "upload_time": "2023-07-16T00:46:15",
            "upload_time_iso_8601": "2023-07-16T00:46:15.015465Z",
            "url": "https://files.pythonhosted.org/packages/95/8c/65492b607846e710f60467e88623a815e6d142e89edf8ffba559ee158dce/sayswho-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-16 00:46:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "afriedman412",
    "github_project": "sayswho",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "sayswho"
}
        
Elapsed time: 0.10356s