turcy


Nameturcy JSON
Version 0.0.42 PyPI version JSON
download
home_pagehttps://github.com/ChrisChross/turCy
SummaryA package for German Open Informtion Extraction
upload_time2023-03-11 10:29:00
maintainer
docs_urlNone
authorChristian Klose
requires_python>=3.6
license
keywords openie turcy information extraction spacy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # turCy

An Open Information Extraction System  mainly designed for German.

### Installation
```python
pip install turcy
```

[comment]: <> (### Usage)

[comment]: <> (```python)

[comment]: <> (import spacy)

[comment]: <> (import turcy)

[comment]: <> (nlp = spacy.load&#40;"de_core_news_lg"&#41;)

[comment]: <> (turcy.add_to_pipe&#40;nlp&#41;)

[comment]: <> (pattern_list = "small" # or "large")

[comment]: <> (pipeline_params = {"attach_triple2sentence": {"pattern_list": pattern_list}})

[comment]: <> (doc = nlp&#40;"Nürnberg ist eine Stadt in Deutschland.", component_cfg=pipeline_params&#41;)

[comment]: <> (for sent in doc.sents:)

[comment]: <> (    for triple in sent._.triples:)

[comment]: <> (        &#40;subj, pred, obj&#41; = triple["triple"])

[comment]: <> (#Out: &#40;Nürnberg, Stadt, Deutschland&#41;)

[comment]: <> (```)

Can be applied to other languages as well, however some extrawork is necessary
as no patterns for english are shipped. Therefore, you would have to build your own patterns first.
For building patterns, a `pattern_builder module is available. 

## How it works 

![img_3.png](img_3.png)

### 1. Building a Pattern 


![img_2.png](img_2.png)

![img_1.png](img_1.png)


### 2. Extraction 

1. Load the German Language Model from spaCy.
2. Add turCy to the nlp-Pipeline.
3. Pass the document to the pipeline.
4. Iterate over the sentences in the document and access the triples in each sentence.

```python
def example():
    nlp = spacy.load("de_core_news_lg", exclude=["ner"])
    nlp.max_length = 2096700
    turcy.add_to_pipe(nlp)  # apply/use current patterns in list
    pipeline_params = {"attach_triple2sentence": {"pattern_list": "small"}}
    doc = nlp("Nürnberg ist eine Stadt in Deutschland.", component_cfg=pipeline_params)
    for sent in doc.sents:
        print(sent)
        for triple in sent._.triples:
            (subj, pred, obj) = triple["triple"]
            print(f"subject:'{subj}', predicate:'{pred}' and object: '{obj}'")
```


### 3. Results 

![img_5.png](img_5.png)

![img_6.png](img_6.png)

# References

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ChrisChross/turCy",
    "name": "turcy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "openie turcy information extraction spacy",
    "author": "Christian Klose",
    "author_email": "chris.klose@gmx.net",
    "download_url": "https://files.pythonhosted.org/packages/15/0f/b9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798/turcy-0.0.42.tar.gz",
    "platform": null,
    "description": "# turCy\n\nAn Open Information Extraction System  mainly designed for German.\n\n### Installation\n```python\npip install turcy\n```\n\n[comment]: <> (### Usage)\n\n[comment]: <> (```python)\n\n[comment]: <> (import spacy)\n\n[comment]: <> (import turcy)\n\n[comment]: <> (nlp = spacy.load&#40;\"de_core_news_lg\"&#41;)\n\n[comment]: <> (turcy.add_to_pipe&#40;nlp&#41;)\n\n[comment]: <> (pattern_list = \"small\" # or \"large\")\n\n[comment]: <> (pipeline_params = {\"attach_triple2sentence\": {\"pattern_list\": pattern_list}})\n\n[comment]: <> (doc = nlp&#40;\"N\u00fcrnberg ist eine Stadt in Deutschland.\", component_cfg=pipeline_params&#41;)\n\n[comment]: <> (for sent in doc.sents:)\n\n[comment]: <> (    for triple in sent._.triples:)\n\n[comment]: <> (        &#40;subj, pred, obj&#41; = triple[\"triple\"])\n\n[comment]: <> (#Out: &#40;N\u00fcrnberg, Stadt, Deutschland&#41;)\n\n[comment]: <> (```)\n\nCan be applied to other languages as well, however some extrawork is necessary\nas no patterns for english are shipped. Therefore, you would have to build your own patterns first.\nFor building patterns, a `pattern_builder module is available. \n\n## How it works \n\n![img_3.png](img_3.png)\n\n### 1. Building a Pattern \n\n\n![img_2.png](img_2.png)\n\n![img_1.png](img_1.png)\n\n\n### 2. Extraction \n\n1. Load the German Language Model from spaCy.\n2. Add turCy to the nlp-Pipeline.\n3. Pass the document to the pipeline.\n4. Iterate over the sentences in the document and access the triples in each sentence.\n\n```python\ndef example():\n    nlp = spacy.load(\"de_core_news_lg\", exclude=[\"ner\"])\n    nlp.max_length = 2096700\n    turcy.add_to_pipe(nlp)  # apply/use current patterns in list\n    pipeline_params = {\"attach_triple2sentence\": {\"pattern_list\": \"small\"}}\n    doc = nlp(\"N\u00fcrnberg ist eine Stadt in Deutschland.\", component_cfg=pipeline_params)\n    for sent in doc.sents:\n        print(sent)\n        for triple in sent._.triples:\n            (subj, pred, obj) = triple[\"triple\"]\n            print(f\"subject:'{subj}', predicate:'{pred}' and object: '{obj}'\")\n```\n\n\n### 3. Results \n\n![img_5.png](img_5.png)\n\n![img_6.png](img_6.png)\n\n# References\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A package for German Open Informtion Extraction",
    "version": "0.0.42",
    "split_keywords": [
        "openie",
        "turcy",
        "information",
        "extraction",
        "spacy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d710f2412a10191908466f8a7211e21448ebac1e68b2b6a3529dec3df39d0a7",
                "md5": "933760fa8e92cbb1afd5586b7fcddb58",
                "sha256": "9d2e07509732881b694ba972813f87ebe371b80051b7d9e52f2e1f2e371f9c19"
            },
            "downloads": -1,
            "filename": "turcy-0.0.42-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "933760fa8e92cbb1afd5586b7fcddb58",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 1245075,
            "upload_time": "2023-03-11T10:28:55",
            "upload_time_iso_8601": "2023-03-11T10:28:55.867525Z",
            "url": "https://files.pythonhosted.org/packages/0d/71/0f2412a10191908466f8a7211e21448ebac1e68b2b6a3529dec3df39d0a7/turcy-0.0.42-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "150fb9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798",
                "md5": "f11b6de5aea0541e5b077c0757a3a4da",
                "sha256": "d37115f2b5c0f7777f8d36f92d8f98268e082690669bc268c6e6403dbda67062"
            },
            "downloads": -1,
            "filename": "turcy-0.0.42.tar.gz",
            "has_sig": false,
            "md5_digest": "f11b6de5aea0541e5b077c0757a3a4da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 513656,
            "upload_time": "2023-03-11T10:29:00",
            "upload_time_iso_8601": "2023-03-11T10:29:00.304900Z",
            "url": "https://files.pythonhosted.org/packages/15/0f/b9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798/turcy-0.0.42.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-11 10:29:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "ChrisChross",
    "github_project": "turCy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "turcy"
}
        
Elapsed time: 0.05963s