# turCy
An Open Information Extraction System mainly designed for German.
### Installation
```python
pip install turcy
```
[comment]: <> (### Usage)
[comment]: <> (```python)
[comment]: <> (import spacy)
[comment]: <> (import turcy)
[comment]: <> (nlp = spacy.load("de_core_news_lg"))
[comment]: <> (turcy.add_to_pipe(nlp))
[comment]: <> (pattern_list = "small" # or "large")
[comment]: <> (pipeline_params = {"attach_triple2sentence": {"pattern_list": pattern_list}})
[comment]: <> (doc = nlp("Nürnberg ist eine Stadt in Deutschland.", component_cfg=pipeline_params))
[comment]: <> (for sent in doc.sents:)
[comment]: <> ( for triple in sent._.triples:)
[comment]: <> ( (subj, pred, obj) = triple["triple"])
[comment]: <> (#Out: (Nürnberg, Stadt, Deutschland))
[comment]: <> (```)
Can be applied to other languages as well, however some extrawork is necessary
as no patterns for english are shipped. Therefore, you would have to build your own patterns first.
For building patterns, a `pattern_builder module is available.
## How it works
![img_3.png](img_3.png)
### 1. Building a Pattern
![img_2.png](img_2.png)
![img_1.png](img_1.png)
### 2. Extraction
1. Load the German Language Model from spaCy.
2. Add turCy to the nlp-Pipeline.
3. Pass the document to the pipeline.
4. Iterate over the sentences in the document and access the triples in each sentence.
```python
def example():
nlp = spacy.load("de_core_news_lg", exclude=["ner"])
nlp.max_length = 2096700
turcy.add_to_pipe(nlp) # apply/use current patterns in list
pipeline_params = {"attach_triple2sentence": {"pattern_list": "small"}}
doc = nlp("Nürnberg ist eine Stadt in Deutschland.", component_cfg=pipeline_params)
for sent in doc.sents:
print(sent)
for triple in sent._.triples:
(subj, pred, obj) = triple["triple"]
print(f"subject:'{subj}', predicate:'{pred}' and object: '{obj}'")
```
### 3. Results
![img_5.png](img_5.png)
![img_6.png](img_6.png)
# References
Raw data
{
"_id": null,
"home_page": "https://github.com/ChrisChross/turCy",
"name": "turcy",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "openie turcy information extraction spacy",
"author": "Christian Klose",
"author_email": "chris.klose@gmx.net",
"download_url": "https://files.pythonhosted.org/packages/15/0f/b9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798/turcy-0.0.42.tar.gz",
"platform": null,
"description": "# turCy\n\nAn Open Information Extraction System mainly designed for German.\n\n### Installation\n```python\npip install turcy\n```\n\n[comment]: <> (### Usage)\n\n[comment]: <> (```python)\n\n[comment]: <> (import spacy)\n\n[comment]: <> (import turcy)\n\n[comment]: <> (nlp = spacy.load(\"de_core_news_lg\"))\n\n[comment]: <> (turcy.add_to_pipe(nlp))\n\n[comment]: <> (pattern_list = \"small\" # or \"large\")\n\n[comment]: <> (pipeline_params = {\"attach_triple2sentence\": {\"pattern_list\": pattern_list}})\n\n[comment]: <> (doc = nlp(\"N\u00fcrnberg ist eine Stadt in Deutschland.\", component_cfg=pipeline_params))\n\n[comment]: <> (for sent in doc.sents:)\n\n[comment]: <> ( for triple in sent._.triples:)\n\n[comment]: <> ( (subj, pred, obj) = triple[\"triple\"])\n\n[comment]: <> (#Out: (N\u00fcrnberg, Stadt, Deutschland))\n\n[comment]: <> (```)\n\nCan be applied to other languages as well, however some extrawork is necessary\nas no patterns for english are shipped. Therefore, you would have to build your own patterns first.\nFor building patterns, a `pattern_builder module is available. \n\n## How it works \n\n![img_3.png](img_3.png)\n\n### 1. Building a Pattern \n\n\n![img_2.png](img_2.png)\n\n![img_1.png](img_1.png)\n\n\n### 2. Extraction \n\n1. Load the German Language Model from spaCy.\n2. Add turCy to the nlp-Pipeline.\n3. Pass the document to the pipeline.\n4. Iterate over the sentences in the document and access the triples in each sentence.\n\n```python\ndef example():\n nlp = spacy.load(\"de_core_news_lg\", exclude=[\"ner\"])\n nlp.max_length = 2096700\n turcy.add_to_pipe(nlp) # apply/use current patterns in list\n pipeline_params = {\"attach_triple2sentence\": {\"pattern_list\": \"small\"}}\n doc = nlp(\"N\u00fcrnberg ist eine Stadt in Deutschland.\", component_cfg=pipeline_params)\n for sent in doc.sents:\n print(sent)\n for triple in sent._.triples:\n (subj, pred, obj) = triple[\"triple\"]\n print(f\"subject:'{subj}', predicate:'{pred}' and object: '{obj}'\")\n```\n\n\n### 3. Results \n\n![img_5.png](img_5.png)\n\n![img_6.png](img_6.png)\n\n# References\n",
"bugtrack_url": null,
"license": "",
"summary": "A package for German Open Informtion Extraction",
"version": "0.0.42",
"split_keywords": [
"openie",
"turcy",
"information",
"extraction",
"spacy"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0d710f2412a10191908466f8a7211e21448ebac1e68b2b6a3529dec3df39d0a7",
"md5": "933760fa8e92cbb1afd5586b7fcddb58",
"sha256": "9d2e07509732881b694ba972813f87ebe371b80051b7d9e52f2e1f2e371f9c19"
},
"downloads": -1,
"filename": "turcy-0.0.42-py3-none-any.whl",
"has_sig": false,
"md5_digest": "933760fa8e92cbb1afd5586b7fcddb58",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 1245075,
"upload_time": "2023-03-11T10:28:55",
"upload_time_iso_8601": "2023-03-11T10:28:55.867525Z",
"url": "https://files.pythonhosted.org/packages/0d/71/0f2412a10191908466f8a7211e21448ebac1e68b2b6a3529dec3df39d0a7/turcy-0.0.42-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "150fb9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798",
"md5": "f11b6de5aea0541e5b077c0757a3a4da",
"sha256": "d37115f2b5c0f7777f8d36f92d8f98268e082690669bc268c6e6403dbda67062"
},
"downloads": -1,
"filename": "turcy-0.0.42.tar.gz",
"has_sig": false,
"md5_digest": "f11b6de5aea0541e5b077c0757a3a4da",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 513656,
"upload_time": "2023-03-11T10:29:00",
"upload_time_iso_8601": "2023-03-11T10:29:00.304900Z",
"url": "https://files.pythonhosted.org/packages/15/0f/b9de25302b4c769d231092102a2bd8e16384564eda74f52a5a4ff0706798/turcy-0.0.42.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-11 10:29:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "ChrisChross",
"github_project": "turCy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "turcy"
}