posextract


Nameposextract JSON
Version 1.2.3 PyPI version JSON
download
home_page
SummaryGrammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.
upload_time2023-05-26 20:09:00
maintainer
docs_urlNone
author
requires_python>=3.6
licenseMIT License Copyright (c) 2021 Steph Buongiorno and Alexander Cerpa Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords triples svo action agency
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # posextract
posextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See [our article]() for more. You can also [download posextract for pypi with pip](https://pypi.org/project/posextract/). 

## Usage

- `extract_triples` to extract subject-verb-object (SVO) and subject-verb-adjective complement (SVA) triples
- `extract_adj_noun_pairs` to extract adjective-noun pairs
- `extract_subj_verb_pairs` to extract subject-verb pairs

Required Paramters: 

- `input` can be the name of a csv file or an input string
- `output` name of the output file

Optional Paramters: 
- `--data_column` specify the column to extract triples from.
- `--id_column` specify a unique ID field if csv file is given.
- `--file-delimiter` specify comma, pipe, or tab. Default is comma. 
- `--post-combine-adj` combine triples (adjective predicate with object) 
- `--add-auxiliary` extract future and past tense triples. 
- `--prep-phrase` extract the . Default set to false. 
- `--no-compound-noun` Extract just the subject or object (e.g. "Indian Government" is extracted as just "Government").
- `--lemma` specify whether to lemmatize parts-of-speech. Default is non-lemmatized. 
- `--verbose` print

### Examples

#### Interactive: 

Extract grammatical triples.

```
from posextract import grammatical_triples

triples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])

for triple in triples:
    print(triple)

# Output: Landlords exercise oppression, soldiers were ill
```

Extract grammatical triples using different options from default: 

```
from posextract.util import TripleExtractorOptions

triples = grammatical_triples.extract(sent, TripleExtractorOptions(prep_phrase = True))
```

Or extract adjectives and the nouns they modify. 

```
from posextract import adj_noun_pairs

adj_noun = adj_noun_pairs.extract()
```

Or extract subjects and their verbs. 

```
from posextract import subj_verb_pairs

subj_verb = subj_verb_pairs.extract()
```

#### Over CLI: 

posextract can extract grammatical triples from text: 

```
python -m posextract.extract_triples "Landlords may exercise oppression." output.csv

# Output: Landlords exercise oppression
```

posextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:

```
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally, soldiers were ill 
```

```
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally ill
```

If provided a .csv file: 

```
python -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv
```

## For More Information...
... see our Wiki: 
- [About Our Evaluation Data](https://github.com/stephbuon/posextract/wiki/Evaluation-Data-Sets)
- [About the Syntactic Dependency Parser](https://github.com/stephbuon/posextract/wiki/Our-Application-of-spaCy-NLP)
- [How to Use posextract on Databricks](https://github.com/stephbuon/posextract/wiki/Using-posextract-on-Databricks)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "posextract",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "triples,svo,action,agency",
    "author": "",
    "author_email": "Steph Buongiorno <steph.buon@gmail.com>, Alexander Cerpa <acerpa@smu.edu>",
    "download_url": "https://files.pythonhosted.org/packages/bb/8d/94accbd509c2dca90b15d8a103cc8f0ca9c4f0f9d6cc5f5b6991bbbbfe88/posextract-1.2.3.tar.gz",
    "platform": null,
    "description": "# posextract\nposextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See [our article]() for more. You can also [download posextract for pypi with pip](https://pypi.org/project/posextract/). \n\n## Usage\n\n- `extract_triples` to extract subject-verb-object (SVO) and subject-verb-adjective complement (SVA) triples\n- `extract_adj_noun_pairs` to extract adjective-noun pairs\n- `extract_subj_verb_pairs` to extract subject-verb pairs\n\nRequired Paramters: \n\n- `input` can be the name of a csv file or an input string\n- `output` name of the output file\n\nOptional Paramters: \n- `--data_column` specify the column to extract triples from.\n- `--id_column` specify a unique ID field if csv file is given.\n- `--file-delimiter` specify comma, pipe, or tab. Default is comma. \n- `--post-combine-adj` combine triples (adjective predicate with object) \n- `--add-auxiliary` extract future and past tense triples. \n- `--prep-phrase` extract the . Default set to false. \n- `--no-compound-noun` Extract just the subject or object (e.g. \"Indian Government\" is extracted as just \"Government\").\n- `--lemma` specify whether to lemmatize parts-of-speech. Default is non-lemmatized. \n- `--verbose` print\n\n### Examples\n\n#### Interactive: \n\nExtract grammatical triples.\n\n```\nfrom posextract import grammatical_triples\n\ntriples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])\n\nfor triple in triples:\n    print(triple)\n\n# Output: Landlords exercise oppression, soldiers were ill\n```\n\nExtract grammatical triples using different options from default: \n\n```\nfrom posextract.util import TripleExtractorOptions\n\ntriples = grammatical_triples.extract(sent, TripleExtractorOptions(prep_phrase = True))\n```\n\nOr extract adjectives and the nouns they modify. \n\n```\nfrom posextract import adj_noun_pairs\n\nadj_noun = adj_noun_pairs.extract()\n```\n\nOr extract subjects and their verbs. \n\n```\nfrom posextract import subj_verb_pairs\n\nsubj_verb = subj_verb_pairs.extract()\n```\n\n#### Over CLI: \n\nposextract can extract grammatical triples from text: \n\n```\npython -m posextract.extract_triples \"Landlords may exercise oppression.\" output.csv\n\n# Output: Landlords exercise oppression\n```\n\nposextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:\n\n```\npython -m posextract.extract_triples \"The soldiers were terminally ill.\" output.csv --post-combine-adj\n\n# Output: soldiers were terminally, soldiers were ill \n```\n\n```\npython -m posextract.extract_triples \"The soldiers were terminally ill.\" output.csv --post-combine-adj\n\n# Output: soldiers were terminally ill\n```\n\nIf provided a .csv file: \n\n```\npython -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv\n```\n\n## For More Information...\n... see our Wiki: \n- [About Our Evaluation Data](https://github.com/stephbuon/posextract/wiki/Evaluation-Data-Sets)\n- [About the Syntactic Dependency Parser](https://github.com/stephbuon/posextract/wiki/Our-Application-of-spaCy-NLP)\n- [How to Use posextract on Databricks](https://github.com/stephbuon/posextract/wiki/Using-posextract-on-Databricks)\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2021 Steph Buongiorno and Alexander Cerpa  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.",
    "version": "1.2.3",
    "project_urls": {
        "documentation": "https://github.com/stephbuon/posextract",
        "homepage": "https://github.com/stephbuon/posextract",
        "repository": "https://github.com/stephbuon/posextract"
    },
    "split_keywords": [
        "triples",
        "svo",
        "action",
        "agency"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "50dcc05708e6398efc456caf4a872df084c59eac3fc66d053d6f08aac2974df8",
                "md5": "dc461540644f51ce448afb4c76ffec9b",
                "sha256": "8cf42f61c3edfe2f711b3b59ce45baaa818dd1530574e08d893ddae080057dc2"
            },
            "downloads": -1,
            "filename": "posextract-1.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dc461540644f51ce448afb4c76ffec9b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 19121,
            "upload_time": "2023-05-26T20:08:57",
            "upload_time_iso_8601": "2023-05-26T20:08:57.829650Z",
            "url": "https://files.pythonhosted.org/packages/50/dc/c05708e6398efc456caf4a872df084c59eac3fc66d053d6f08aac2974df8/posextract-1.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bb8d94accbd509c2dca90b15d8a103cc8f0ca9c4f0f9d6cc5f5b6991bbbbfe88",
                "md5": "4c0779817c764003a18f60f7d5157885",
                "sha256": "59a2e4bdef272be81d524a24003c2b6d66a1fd561b3f2f4df0b017b0676fbaf4"
            },
            "downloads": -1,
            "filename": "posextract-1.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "4c0779817c764003a18f60f7d5157885",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 15133,
            "upload_time": "2023-05-26T20:09:00",
            "upload_time_iso_8601": "2023-05-26T20:09:00.247418Z",
            "url": "https://files.pythonhosted.org/packages/bb/8d/94accbd509c2dca90b15d8a103cc8f0ca9c4f0f9d6cc5f5b6991bbbbfe88/posextract-1.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-26 20:09:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stephbuon",
    "github_project": "posextract",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "posextract"
}
        
Elapsed time: 0.07031s