# Extr
> Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions
<br />
## Install
```
pip install extr
```
## Example
```python
text = 'Ted is a Pitcher.'
```
### 1. Entity Extraction
> Find Named Entities from text.
```python
from extr import RegEx, RegExLabel
from extr.entities import EntityExtractor
entity_extractor = EntityExtractor([
RegExLabel('PERSON', [
RegEx([r'ted'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
])
entities = entity_extractor.get_entities(text)
## entities == [
## <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
## <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]
```
**<i> or add a knowledge base</i>**
```python
from extr import RegEx, RegExLabel
from extr.entities import create_entity_extractor
entity_extractor = create_entity_extractor(
[
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
],
kb={
'PERSON': ['Ted']
}
)
entities = entity_extractor.get_entities(text)
## entities == [
## <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
## <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]
```
### 2. Visualize Entities in HTML
> Annotate text to display in HTML.
```python
from extr.entities.viewers import HtmlViewer
viewer = HtmlViewer()
viewer.append(text, entities)
html = viewer.create_view(custom_styles="""
.lb-PERSON {
background-color: orange;
}
.lb-POSITION {
background-color: yellow;
}
""")
```
![](https://github.com/dpasse/extr/blob/main/docs/images/annotations.JPG)
### 3. Relation Extraction
> Annotate and Extract Relationships between Entities
```python
from extr.entities import EntityAnnotator
from extr.relations import RelationExtractor, \
RegExRelationLabelBuilder
## define relationship between PERSON and POSITION
relationship = RegExRelationLabelBuilder('is_a') \
.add_e1_to_e2(
'PERSON', ## e1
[
## define how the relationship exists in nature
r'\s+is\s+a\s+',
],
'POSITION' ## e2
) \
.build()
relations_to_extract = [relationship]
## `entities` see 'Entity Extraction' above
annotated_text = EntityAnnotator().annotate(text, entities)
relations = RelationExtractor(relations_to_extract).extract(annotated_text, entities)
## relations == [
## <Relation e1="Ted" r="is_a" e2="Pitcher">
## ]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/dpasse/extr",
"name": "extr",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Named Entity Recognition,Relation Extraction,Entity Linking,NER,RE,NLP",
"author": "",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/ae/55/5303faafa47b5ad9c3242c723690cfad3cd387cf0539d068c9e0afa55d88/extr-0.0.44.tar.gz",
"platform": null,
"description": "# Extr\r\n> Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions\r\n\r\n<br />\r\n\r\n## Install\r\n\r\n```\r\npip install extr\r\n```\r\n\r\n## Example\r\n\r\n```python\r\ntext = 'Ted is a Pitcher.'\r\n```\r\n\r\n### 1. Entity Extraction\r\n> Find Named Entities from text.\r\n\r\n```python\r\nfrom extr import RegEx, RegExLabel\r\nfrom extr.entities import EntityExtractor\r\n\r\nentity_extractor = EntityExtractor([\r\n RegExLabel('PERSON', [\r\n RegEx([r'ted'], re.IGNORECASE)\r\n ]),\r\n RegExLabel('POSITION', [\r\n RegEx([r'pitcher'], re.IGNORECASE)\r\n ]),\r\n])\r\n\r\nentities = entity_extractor.get_entities(text)\r\n\r\n## entities == [\r\n## <Entity label=\"POSITION\" text=\"Pitcher\" span=(9, 16)>,\r\n## <Entity label=\"PERSON\" text=\"Ted\" span=(0, 3)>\r\n## ]\r\n```\r\n\r\n**<i> or add a knowledge base</i>**\r\n\r\n```python\r\nfrom extr import RegEx, RegExLabel\r\nfrom extr.entities import create_entity_extractor\r\n\r\nentity_extractor = create_entity_extractor(\r\n [\r\n RegExLabel('POSITION', [\r\n RegEx([r'pitcher'], re.IGNORECASE)\r\n ]),\r\n ],\r\n kb={\r\n 'PERSON': ['Ted']\r\n }\r\n)\r\n\r\nentities = entity_extractor.get_entities(text)\r\n\r\n## entities == [\r\n## <Entity label=\"POSITION\" text=\"Pitcher\" span=(9, 16)>,\r\n## <Entity label=\"PERSON\" text=\"Ted\" span=(0, 3)>\r\n## ]\r\n```\r\n\r\n### 2. Visualize Entities in HTML\r\n> Annotate text to display in HTML.\r\n\r\n```python\r\nfrom extr.entities.viewers import HtmlViewer\r\n\r\nviewer = HtmlViewer()\r\nviewer.append(text, entities)\r\n\r\nhtml = viewer.create_view(custom_styles=\"\"\"\r\n .lb-PERSON {\r\n background-color: orange;\r\n }\r\n\r\n .lb-POSITION {\r\n background-color: yellow;\r\n }\r\n\"\"\")\r\n```\r\n\r\n![](https://github.com/dpasse/extr/blob/main/docs/images/annotations.JPG)\r\n\r\n### 3. Relation Extraction\r\n> Annotate and Extract Relationships between Entities\r\n\r\n```python\r\nfrom extr.entities import EntityAnnotator\r\nfrom extr.relations import RelationExtractor, \\\r\n RegExRelationLabelBuilder\r\n\r\n## define relationship between PERSON and POSITION\r\nrelationship = RegExRelationLabelBuilder('is_a') \\\r\n .add_e1_to_e2(\r\n 'PERSON', ## e1\r\n [\r\n ## define how the relationship exists in nature\r\n r'\\s+is\\s+a\\s+',\r\n ],\r\n 'POSITION' ## e2\r\n ) \\\r\n .build()\r\n\r\nrelations_to_extract = [relationship]\r\n\r\n## `entities` see 'Entity Extraction' above\r\nannotated_text = EntityAnnotator().annotate(text, entities)\r\nrelations = RelationExtractor(relations_to_extract).extract(annotated_text, entities)\r\n\r\n## relations == [\r\n## <Relation e1=\"Ted\" r=\"is_a\" e2=\"Pitcher\">\r\n## ]\r\n\r\n```\r\n\r\n\r\n",
"bugtrack_url": null,
"license": "",
"summary": "Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions",
"version": "0.0.44",
"project_urls": {
"Homepage": "https://github.com/dpasse/extr"
},
"split_keywords": [
"named entity recognition",
"relation extraction",
"entity linking",
"ner",
"re",
"nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a76b5c9c0281700418196fbd93ac38e033c4a347cfe37a7869c9031c32c2b535",
"md5": "9c67c27be01a04b9b237f1b025b5c3c8",
"sha256": "93f1ce73482208fc0beab2f79b8c8e997f0c2261eb7b5f903f0e3ba379a74842"
},
"downloads": -1,
"filename": "extr-0.0.44-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9c67c27be01a04b9b237f1b025b5c3c8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 16410,
"upload_time": "2023-06-02T00:36:55",
"upload_time_iso_8601": "2023-06-02T00:36:55.146448Z",
"url": "https://files.pythonhosted.org/packages/a7/6b/5c9c0281700418196fbd93ac38e033c4a347cfe37a7869c9031c32c2b535/extr-0.0.44-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ae555303faafa47b5ad9c3242c723690cfad3cd387cf0539d068c9e0afa55d88",
"md5": "34b979b226d50bc646d30e8fbbfa17bc",
"sha256": "08330bf28c496b5743c5a51c04f1b4f8de1d6b87cabdfbb794339b6c2fc07673"
},
"downloads": -1,
"filename": "extr-0.0.44.tar.gz",
"has_sig": false,
"md5_digest": "34b979b226d50bc646d30e8fbbfa17bc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11560,
"upload_time": "2023-06-02T00:36:57",
"upload_time_iso_8601": "2023-06-02T00:36:57.183007Z",
"url": "https://files.pythonhosted.org/packages/ae/55/5303faafa47b5ad9c3242c723690cfad3cd387cf0539d068c9e0afa55d88/extr-0.0.44.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-02 00:36:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dpasse",
"github_project": "extr",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "extr"
}