impfic-core


Nameimpfic-core JSON
Version 0.9.0 PyPI version JSON
download
home_pagehttps://github.com/impact-and-fiction/impfic-core
SummaryUtility functions for the Impact and Fiction project
upload_time2024-03-27 14:51:30
maintainerNone
docs_urlNone
authorMarijn Koolen
requires_python<=3.12,>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # impfic-core

[![GitHub Actions](https://github.com/impact-and-fiction/impfic-core/workflows/tests/badge.svg)](https://github.com/impact-and-fiction/impfic-core/actions)
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
[![PyPI](https://img.shields.io/pypi/v/impfic-core)](https://pypi.org/project/impfic-core/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/impfic-core)](https://pypi.org/project/impfic-core/)

Core code base for common functionalities

## Installing

```shell
pip install impfic-core
```


## Usage

### Dealing with output from different parsers

The Doc class of `impfic-core` offers a unified API to parsed document from different parsers (currently SpaCy and Trankit).

```python
import spacy
from trankit import Pipeline

import impfic_core.parse.doc as parse_doc

spacy_nlp = spacy.load('en_core_web_lg')

trankit_nlp = Pipeline('english')

# First paragraph of Moby Dick, taken from Project Gutenberg (https://www.gutenberg.org/cache/epub/2701/pg2701-images.html)
text = """Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people’s hats off—then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me."""
```

`Document` objects have the following properties: `text` (the whole text string) `sentences`, `tokens`, `entities` and optional `metadata` (a dictionary with whatever keys and values).

```python
# parse with both SpaCy and Trankit
spacy_doc = spacy_nlp(text)
trankit_doc = trankit_nlp(text)

# First, turn SpaCy document object to an impfic Doc
impfic_doc1 = parse_doc.spacy_json_to_doc(spacy_doc.to_json())

# Next, turn Trankit document object to an impfic Doc
impfic_doc2 = parse_doc.trankit_json_to_doc(trankit_doc)

# Show type and length of impfic_core Doc
# Doc length is number of tokens
print('impfic Doc of SpaCy parse:', type(impfic_doc1), len(impfic_doc1))

print('impfic Doc of Trankit parse:', type(impfic_doc2), len(impfic_doc2))
```

Outputs:
```python
>>> impfic Doc of SpaCy parse: <class 'impfic_core.parse.doc.Doc'> 190
>>> impfic Doc of Trankit parse: <class 'impfic_core.parse.doc.Doc'> 226
```

`Sentence` objects have the following properties:

- `id`: ID of the sentence in the document (running numbers)
- `tokens`: a list of `Token` objects
- `entitites`: a list of `Entity` objects (named entities identified by the parser)
- `text`: the sentence as text string 
- `start`: the character offset of the start of the sentence within the document
- `end`: the character offset of the end of the sentence within the document

### Extracting Clausal Units

```python
sent = doc.sentences[5]
for sent in doc.sentences:
    print(sent.text)
    clauses = pattern.get_verb_clauses(sent)
    for clause in clauses:
        print([t.text for t in clause])
```

```python
With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship.
clause: ['With', 'a', 'philosophical', 'flourish', 'Cato', 'throws', 'himself', 'upon', 'his', 'sword', ';', '.']
clause: ['I', 'quietly', 'take', 'to', 'the', 'ship']
```


### External Resources

To use utilities for external resources such as the RBN, you need to point to your copy of those resources 
in the settings (`settings.py`). Once you have done that, you can use them with:

```python
from settings import rbn_file
from impfic_core.resources.rbn import RBN

rbn = RBN(rbn_file)

rbn.has_term('aanbiddelijk') # returns True
```

## Anonymisation

For review anonymisation you need a salt hash in a file called `impfic_core/secrets.py`. The repository doesn't contain this file to ensure other cannot recreate the user ID mapping. 
An example file is available as `impfic_core/secrets_example.py`. Copy this file to `impfic_core/secrets.py` and update the salt hash to do your own user ID mapping.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/impact-and-fiction/impfic-core",
    "name": "impfic-core",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.12,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Marijn Koolen",
    "author_email": "marijn.koolen@huygens.knaw.nl",
    "download_url": "https://files.pythonhosted.org/packages/0f/50/8d6b6a306c195e446b3f97ef2b836b84ca0a9421c6e4f64e6ef9b24b892e/impfic_core-0.9.0.tar.gz",
    "platform": null,
    "description": "# impfic-core\n\n[![GitHub Actions](https://github.com/impact-and-fiction/impfic-core/workflows/tests/badge.svg)](https://github.com/impact-and-fiction/impfic-core/actions)\n[![Project Status: WIP \u2013 Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)\n[![PyPI](https://img.shields.io/pypi/v/impfic-core)](https://pypi.org/project/impfic-core/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/impfic-core)](https://pypi.org/project/impfic-core/)\n\nCore code base for common functionalities\n\n## Installing\n\n```shell\npip install impfic-core\n```\n\n\n## Usage\n\n### Dealing with output from different parsers\n\nThe Doc class of `impfic-core` offers a unified API to parsed document from different parsers (currently SpaCy and Trankit).\n\n```python\nimport spacy\nfrom trankit import Pipeline\n\nimport impfic_core.parse.doc as parse_doc\n\nspacy_nlp = spacy.load('en_core_web_lg')\n\ntrankit_nlp = Pipeline('english')\n\n# First paragraph of Moby Dick, taken from Project Gutenberg (https://www.gutenberg.org/cache/epub/2701/pg2701-images.html)\ntext = \"\"\"Call me Ishmael. Some years ago\u2014never mind how long precisely\u2014having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people\u2019s hats off\u2014then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.\"\"\"\n```\n\n`Document` objects have the following properties: `text` (the whole text string) `sentences`, `tokens`, `entities` and optional `metadata` (a dictionary with whatever keys and values).\n\n```python\n# parse with both SpaCy and Trankit\nspacy_doc = spacy_nlp(text)\ntrankit_doc = trankit_nlp(text)\n\n# First, turn SpaCy document object to an impfic Doc\nimpfic_doc1 = parse_doc.spacy_json_to_doc(spacy_doc.to_json())\n\n# Next, turn Trankit document object to an impfic Doc\nimpfic_doc2 = parse_doc.trankit_json_to_doc(trankit_doc)\n\n# Show type and length of impfic_core Doc\n# Doc length is number of tokens\nprint('impfic Doc of SpaCy parse:', type(impfic_doc1), len(impfic_doc1))\n\nprint('impfic Doc of Trankit parse:', type(impfic_doc2), len(impfic_doc2))\n```\n\nOutputs:\n```python\n>>> impfic Doc of SpaCy parse: <class 'impfic_core.parse.doc.Doc'> 190\n>>> impfic Doc of Trankit parse: <class 'impfic_core.parse.doc.Doc'> 226\n```\n\n`Sentence` objects have the following properties:\n\n- `id`: ID of the sentence in the document (running numbers)\n- `tokens`: a list of `Token` objects\n- `entitites`: a list of `Entity` objects (named entities identified by the parser)\n- `text`: the sentence as text string \n- `start`: the character offset of the start of the sentence within the document\n- `end`: the character offset of the end of the sentence within the document\n\n### Extracting Clausal Units\n\n```python\nsent = doc.sentences[5]\nfor sent in doc.sentences:\n    print(sent.text)\n    clauses = pattern.get_verb_clauses(sent)\n    for clause in clauses:\n        print([t.text for t in clause])\n```\n\n```python\nWith a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship.\nclause: ['With', 'a', 'philosophical', 'flourish', 'Cato', 'throws', 'himself', 'upon', 'his', 'sword', ';', '.']\nclause: ['I', 'quietly', 'take', 'to', 'the', 'ship']\n```\n\n\n### External Resources\n\nTo use utilities for external resources such as the RBN, you need to point to your copy of those resources \nin the settings (`settings.py`). Once you have done that, you can use them with:\n\n```python\nfrom settings import rbn_file\nfrom impfic_core.resources.rbn import RBN\n\nrbn = RBN(rbn_file)\n\nrbn.has_term('aanbiddelijk') # returns True\n```\n\n## Anonymisation\n\nFor review anonymisation you need a salt hash in a file called `impfic_core/secrets.py`. The repository doesn't contain this file to ensure other cannot recreate the user ID mapping. \nAn example file is available as `impfic_core/secrets_example.py`. Copy this file to `impfic_core/secrets.py` and update the salt hash to do your own user ID mapping.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Utility functions for the Impact and Fiction project",
    "version": "0.9.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/impact-and-fiction/impfic-core/issues",
        "Homepage": "https://github.com/impact-and-fiction/impfic-core",
        "Repository": "https://github.com/impact-and-fiction/impfic-core"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f4c1bf191119764f169834ebde5836891bcb39b3c8b680a1278ea435518df4ba",
                "md5": "5b2f60041e208574c9a84ab4b3fac6ce",
                "sha256": "f25e5ca6b75a52aaef8c81a5f4e815b6e6fe9b662ba529183901a02db7ebf0f2"
            },
            "downloads": -1,
            "filename": "impfic_core-0.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5b2f60041e208574c9a84ab4b3fac6ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.12,>=3.8",
            "size": 42378,
            "upload_time": "2024-03-27T14:51:24",
            "upload_time_iso_8601": "2024-03-27T14:51:24.623328Z",
            "url": "https://files.pythonhosted.org/packages/f4/c1/bf191119764f169834ebde5836891bcb39b3c8b680a1278ea435518df4ba/impfic_core-0.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f508d6b6a306c195e446b3f97ef2b836b84ca0a9421c6e4f64e6ef9b24b892e",
                "md5": "bfd6d32bbf55c0d772070e50cc2017ff",
                "sha256": "e3e1deaa54dd1dbdb4d98f8070b556ef113fbc971f40b271171a2d44d658986f"
            },
            "downloads": -1,
            "filename": "impfic_core-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bfd6d32bbf55c0d772070e50cc2017ff",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.12,>=3.8",
            "size": 35520,
            "upload_time": "2024-03-27T14:51:30",
            "upload_time_iso_8601": "2024-03-27T14:51:30.468690Z",
            "url": "https://files.pythonhosted.org/packages/0f/50/8d6b6a306c195e446b3f97ef2b836b84ca0a9421c6e4f64e6ef9b24b892e/impfic_core-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-27 14:51:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "impact-and-fiction",
    "github_project": "impfic-core",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "impfic-core"
}
        
Elapsed time: 0.23243s