turcy

Name	turcy JSON
Version	0.0.42 JSON
	download
home_page	https://github.com/ChrisChross/turCy
Summary	A package for German Open Informtion Extraction
upload_time	2023-03-11 10:29:00
maintainer
docs_url	None
author	Christian Klose
requires_python	>=3.6
license
keywords	openie turcy information extraction spacy
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # turCy

An Open Information Extraction System  mainly designed for German.

### Installation
```python
pip install turcy
```

[comment]: <> (### Usage)

[comment]: <> (```python)

[comment]: <> (import spacy)

[comment]: <> (import turcy)

[comment]: <> (nlp = spacy.load&#40;"de_core_news_lg"&#41;)

[comment]: <> (turcy.add_to_pipe&#40;nlp&#41;)

[comment]: <> (pattern_list = "small" # or "large")

[comment]: <> (pipeline_params = {"attach_triple2sentence": {"pattern_list": pattern_list}})

[comment]: <> (doc = nlp&#40;"Nürnberg ist eine Stadt in Deutschland.", component_cfg=pipeline_params&#41;)

[comment]: <> (for sent in doc.sents:)

[comment]: <> (    for triple in sent._.triples:)

[comment]: <> (        &#40;subj, pred, obj&#41; = triple["triple"])

[comment]: <> (#Out: &#40;Nürnberg, Stadt, Deutschland&#41;)

[comment]: <> (```)

Can be applied to other languages as well, however some extrawork is necessary
as no patterns for english are shipped. Therefore, you would have to build your own patterns first.
For building patterns, a `pattern_builder module is available. 

## How it works 

![img_3.png](img_3.png)

### 1. Building a Pattern 


![img_2.png](img_2.png)

![img_1.png](img_1.png)


### 2. Extraction 

1. Load the German Language Model from spaCy.
2. Add turCy to the nlp-Pipeline.
3. Pass the document to the pipeline.
4. Iterate over the sentences in the document and access the triples in each sentence.

```python
def example():
    nlp = spacy.load("de_core_news_lg", exclude=["ner"])
    nlp.max_length = 2096700
    turcy.add_to_pipe(nlp)  # apply/use current patterns in list
    pipeline_params = {"attach_triple2sentence": {"pattern_list": "small"}}
    doc = nlp("Nürnberg ist eine Stadt in Deutschland.", component_cfg=pipeline_params)
    for sent in doc.sents:
        print(sent)
        for triple in sent._.triples:
            (subj, pred, obj) = triple["triple"]
            print(f"subject:'{subj}', predicate:'{pred}' and object: '{obj}'")
```


### 3. Results 

![img_5.png](img_5.png)

![img_6.png](img_6.png)

# References

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ChrisChross/turCy",
    "name": "turcy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "openie turcy information extraction spacy",
    "author": "Christian Klose",
    "author_email": "chris.klose@gmx.net",
    "download_url": "https://files.pythonhosted.org/packages/15/0f/b9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798/turcy-0.0.42.tar.gz",
    "platform": null,
    "description": "# turCy\n\nAn Open Information Extraction System  mainly designed for German.\n\n### Installation\n```python\npip install turcy\n```\n\n[comment]: <> (### Usage)\n\n[comment]: <> (```python)\n\n[comment]: <> (import spacy)\n\n[comment]: <> (import turcy)\n\n[comment]: <> (nlp = spacy.load&#40;\"de_core_news_lg\"&#41;)\n\n[comment]: <> (turcy.add_to_pipe&#40;nlp&#41;)\n\n[comment]: <> (pattern_list = \"small\" # or \"large\")\n\n[comment]: <> (pipeline_params = {\"attach_triple2sentence\": {\"pattern_list\": pattern_list}})\n\n[comment]: <> (doc = nlp&#40;\"N\u00fcrnberg ist eine Stadt in Deutschland.\", component_cfg=pipeline_params&#41;)\n\n[comment]: <> (for sent in doc.sents:)\n\n[comment]: <> (    for triple in sent._.triples:)\n\n[comment]: <> (        &#40;subj, pred, obj&#41; = triple[\"triple\"])\n\n[comment]: <> (#Out: &#40;N\u00fcrnberg, Stadt, Deutschland&#41;)\n\n[comment]: <> (```)\n\nCan be applied to other languages as well, however some extrawork is necessary\nas no patterns for english are shipped. Therefore, you would have to build your own patterns first.\nFor building patterns, a `pattern_builder module is available. \n\n## How it works \n\n![img_3.png](img_3.png)\n\n### 1. Building a Pattern \n\n\n![img_2.png](img_2.png)\n\n![img_1.png](img_1.png)\n\n\n### 2. Extraction \n\n1. Load the German Language Model from spaCy.\n2. Add turCy to the nlp-Pipeline.\n3. Pass the document to the pipeline.\n4. Iterate over the sentences in the document and access the triples in each sentence.\n\n```python\ndef example():\n    nlp = spacy.load(\"de_core_news_lg\", exclude=[\"ner\"])\n    nlp.max_length = 2096700\n    turcy.add_to_pipe(nlp)  # apply/use current patterns in list\n    pipeline_params = {\"attach_triple2sentence\": {\"pattern_list\": \"small\"}}\n    doc = nlp(\"N\u00fcrnberg ist eine Stadt in Deutschland.\", component_cfg=pipeline_params)\n    for sent in doc.sents:\n        print(sent)\n        for triple in sent._.triples:\n            (subj, pred, obj) = triple[\"triple\"]\n            print(f\"subject:'{subj}', predicate:'{pred}' and object: '{obj}'\")\n```\n\n\n### 3. Results \n\n![img_5.png](img_5.png)\n\n![img_6.png](img_6.png)\n\n# References\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A package for German Open Informtion Extraction",
    "version": "0.0.42",
    "split_keywords": [
        "openie",
        "turcy",
        "information",
        "extraction",
        "spacy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d710f2412a10191908466f8a7211e21448ebac1e68b2b6a3529dec3df39d0a7",
                "md5": "933760fa8e92cbb1afd5586b7fcddb58",
                "sha256": "9d2e07509732881b694ba972813f87ebe371b80051b7d9e52f2e1f2e371f9c19"
            },
            "downloads": -1,
            "filename": "turcy-0.0.42-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "933760fa8e92cbb1afd5586b7fcddb58",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 1245075,
            "upload_time": "2023-03-11T10:28:55",
            "upload_time_iso_8601": "2023-03-11T10:28:55.867525Z",
            "url": "https://files.pythonhosted.org/packages/0d/71/0f2412a10191908466f8a7211e21448ebac1e68b2b6a3529dec3df39d0a7/turcy-0.0.42-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "150fb9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798",
                "md5": "f11b6de5aea0541e5b077c0757a3a4da",
                "sha256": "d37115f2b5c0f7777f8d36f92d8f98268e082690669bc268c6e6403dbda67062"
            },
            "downloads": -1,
            "filename": "turcy-0.0.42.tar.gz",
            "has_sig": false,
            "md5_digest": "f11b6de5aea0541e5b077c0757a3a4da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 513656,
            "upload_time": "2023-03-11T10:29:00",
            "upload_time_iso_8601": "2023-03-11T10:29:00.304900Z",
            "url": "https://files.pythonhosted.org/packages/15/0f/b9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798/turcy-0.0.42.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-11 10:29:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "ChrisChross",
    "github_project": "turCy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "turcy"
}

Christian Klose