glirel


Nameglirel JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryGeneralist model for Relation Extraction (Extract any relation types from texts)
upload_time2024-06-07 20:34:40
maintainerJack Boylan
docs_urlNone
authorJack Boylan, Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois
requires_python>=3.8
licenseApache-2.0
keywords named-entity-recognition ner data-science natural-language-processing artificial-intelligence nlp machine-learning transformers
VCS
bugtrack_url
requirements torch transformers huggingface_hub datasets flair seqeval tqdm spacy wandb scipy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GLiREL : Generalist and Lightweight model for Zero-Shot Relation Extraction

GLiREL is a Relation Extraction model capable of classifying unseen relations given the entities within a text. This builds upon the excelent work done by Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois on the [GLiNER](https://github.com/urchade/GLiNER) library which enables efficient zero-shot Named Entity Recognition.

* GLiNER paper: [GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer](https://arxiv.org/abs/2311.08526)

* Train a Zero-shot model: <a href="https://colab.research.google.com/github/jackboyla/GLiREL/blob/main/train.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<!-- <img src="demo.jpg" alt="Demo Image" width="50%"/> -->

---
# Installation

```bash
conda create -n glirel python=3.10 -y && conda activate glirel
cd GLiREL && pip install -e . && pip install -r requirements.txt
```

## To run experiments

```bash
# few_rel
cd data
python process_few_rel.py
cd ..
# adjust config
python train.py --config config_few_rel.yaml --log_dir logs-few-rel --relation_extraction
```

```bash
# wiki_zsl
cd data
curl -L -o wiki_all.json 'https://drive.google.com/uc?export=download&id=1ELFGUIYDClmh9GrEHjFYoE_VI1t2a5nK'
python process_wiki_zsl.py
cd ..
# adjust config
python train.py --config config_wiki_zsl.yaml --log_dir logs-wiki-zsl --relation_extraction

```

## Example training data

JSONL file:
```json
{
  "ner": [
    [7, 8, "Q4914513", "Binsey"], 
    [11, 13, "Q19686", "River Thames"]
  ], 
  "relations": [
    {
      "head": {"mention": "Binsey", "position": [7, 8], "type": "Q4914513"}, 
      "tail": {"mention": "River Thames", "position": [11, 13], "type": "Q19686"}, 
      "relation_id": "P206", 
      "relation_text": "located in or next to body of water"
    }
  ], 
  "tokenized_text": ["The", "race", "took", "place", "between", "Godstow", "and", "Binsey", "along", "the", "Upper", "River", "Thames", "."]
},
{
  "ner": [
    [9, 11, "Q4386693", "Legislative Assembly"], 
    [1, 4, "Q1848835", "Parliament of Victoria"]
  ], 
  "relations": [
    {
      "head": {"mention": "Legislative Assembly", "position": [9, 11], "type": "Q4386693"}, 
      "tail": {"mention": "Parliament of Victoria", "position": [1, 4], "type": "Q1848835"}, 
      "relation_id": "P361", 
      "relation_text": "part of"
    }
  ], 
  "tokenized_text": ["The", "Parliament", "of", "Victoria", "consists", "of", "the", "lower", "house", "Legislative", "Assembly", ",", "the", "upper", "house", "Legislative", "Council", "and", "the", "Queen", "of", "Australia", "."]
}


```



## Usage
Once you've downloaded the GLiREL library, you can import the `GLiREL` class. You can then load this model using `GLiREL.from_pretrained` and predict entities with `predict_relations`.

```python
from glirel import GLiREL
import spacy

model = GLiREL.from_pretrained("jackboyla/glirel_base")

text = "Jack Dorsey's father, Tim Dorsey, is a licensed pilot. Jack met his wife Sarah Paulson in New York in 2003. They have one son, Edward."

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)

labels = ['country of origin', 'licensed to broadcast to', 'parent', 'followed by', 'located in or next to body of water', 'spouse', 'child']

tokens = [token.text for token in doc]

ner = [[ent.start, ent.end, ent.label_, ent.text] for ent in doc.ents]
print(f"Entities detected: {ner}")

relations = model.predict_relations(tokens, labels, threshold=0.01, ner=ner)

print('Number of relations:', len(relations))

sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
print("\nDescending Order by Score:")
for item in sorted_data_desc:
    print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | socre: {item['score']}")
```

### Expected Output

```
Entities detected: [[0, 2, 'PERSON', 'Jack Dorsey'], [5, 7, 'PERSON', 'Tim Dorsey'], [13, 14, 'PERSON', 'Jack'], [17, 19, 'PERSON', 'Sarah Paulson'], [20, 22, 'GPE', 'New York'], [23, 24, 'DATE', '2003'], [27, 28, 'CARDINAL', 'one'], [30, 31, 'PERSON', 'Edward']]
Number of relations: 90

Descending Order by Score:
['Sarah', 'Paulson'] --> spouse --> ['New', 'York'] | score: 0.6608812212944031
['Sarah', 'Paulson'] --> spouse --> ['Jack', 'Dorsey'] | score: 0.6601175665855408
['Edward'] --> spouse --> ['New', 'York'] | score: 0.6493653655052185
['one'] --> spouse --> ['New', 'York'] | score: 0.6480509042739868
['Edward'] --> spouse --> ['Jack', 'Dorsey'] | score: 0.6474933624267578
...
```

## Usage with spaCy (TBD)

You can also load GliREL into a regular spaCy NLP pipeline. Here's an example using a blank English pipeline, but you can use any spaCy model.

```python

```

### Expected Output

```

```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "glirel",
    "maintainer": "Jack Boylan",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "named-entity-recognition, ner, data-science, natural-language-processing, artificial-intelligence, nlp, machine-learning, transformers",
    "author": "Jack Boylan, Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/df/37/137af25873c1f8f3d9b77b4bd92cc3a952c94232ec24a2beecab10fcffbd/glirel-0.1.1.tar.gz",
    "platform": null,
    "description": "# GLiREL : Generalist and Lightweight model for Zero-Shot Relation Extraction\n\nGLiREL is a Relation Extraction model capable of classifying unseen relations given the entities within a text. This builds upon the excelent work done by Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois on the [GLiNER](https://github.com/urchade/GLiNER) library which enables efficient zero-shot Named Entity Recognition.\n\n* GLiNER paper: [GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer](https://arxiv.org/abs/2311.08526)\n\n* Train a Zero-shot model: <a href=\"https://colab.research.google.com/github/jackboyla/GLiREL/blob/main/train.ipynb\" target=\"_blank\">\n  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n</a>\n\n<!-- <img src=\"demo.jpg\" alt=\"Demo Image\" width=\"50%\"/> -->\n\n---\n# Installation\n\n```bash\nconda create -n glirel python=3.10 -y && conda activate glirel\ncd GLiREL && pip install -e . && pip install -r requirements.txt\n```\n\n## To run experiments\n\n```bash\n# few_rel\ncd data\npython process_few_rel.py\ncd ..\n# adjust config\npython train.py --config config_few_rel.yaml --log_dir logs-few-rel --relation_extraction\n```\n\n```bash\n# wiki_zsl\ncd data\ncurl -L -o wiki_all.json 'https://drive.google.com/uc?export=download&id=1ELFGUIYDClmh9GrEHjFYoE_VI1t2a5nK'\npython process_wiki_zsl.py\ncd ..\n# adjust config\npython train.py --config config_wiki_zsl.yaml --log_dir logs-wiki-zsl --relation_extraction\n\n```\n\n## Example training data\n\nJSONL file:\n```json\n{\n  \"ner\": [\n    [7, 8, \"Q4914513\", \"Binsey\"], \n    [11, 13, \"Q19686\", \"River Thames\"]\n  ], \n  \"relations\": [\n    {\n      \"head\": {\"mention\": \"Binsey\", \"position\": [7, 8], \"type\": \"Q4914513\"}, \n      \"tail\": {\"mention\": \"River Thames\", \"position\": [11, 13], \"type\": \"Q19686\"}, \n      \"relation_id\": \"P206\", \n      \"relation_text\": \"located in or next to body of water\"\n    }\n  ], \n  \"tokenized_text\": [\"The\", \"race\", \"took\", \"place\", \"between\", \"Godstow\", \"and\", \"Binsey\", \"along\", \"the\", \"Upper\", \"River\", \"Thames\", \".\"]\n},\n{\n  \"ner\": [\n    [9, 11, \"Q4386693\", \"Legislative Assembly\"], \n    [1, 4, \"Q1848835\", \"Parliament of Victoria\"]\n  ], \n  \"relations\": [\n    {\n      \"head\": {\"mention\": \"Legislative Assembly\", \"position\": [9, 11], \"type\": \"Q4386693\"}, \n      \"tail\": {\"mention\": \"Parliament of Victoria\", \"position\": [1, 4], \"type\": \"Q1848835\"}, \n      \"relation_id\": \"P361\", \n      \"relation_text\": \"part of\"\n    }\n  ], \n  \"tokenized_text\": [\"The\", \"Parliament\", \"of\", \"Victoria\", \"consists\", \"of\", \"the\", \"lower\", \"house\", \"Legislative\", \"Assembly\", \",\", \"the\", \"upper\", \"house\", \"Legislative\", \"Council\", \"and\", \"the\", \"Queen\", \"of\", \"Australia\", \".\"]\n}\n\n\n```\n\n\n\n## Usage\nOnce you've downloaded the GLiREL library, you can import the `GLiREL` class. You can then load this model using `GLiREL.from_pretrained` and predict entities with `predict_relations`.\n\n```python\nfrom glirel import GLiREL\nimport spacy\n\nmodel = GLiREL.from_pretrained(\"jackboyla/glirel_base\")\n\ntext = \"Jack Dorsey's father, Tim Dorsey, is a licensed pilot. Jack met his wife Sarah Paulson in New York in 2003. They have one son, Edward.\"\n\nnlp = spacy.load('en_core_web_sm')\ndoc = nlp(text)\n\nlabels = ['country of origin', 'licensed to broadcast to', 'parent', 'followed by', 'located in or next to body of water', 'spouse', 'child']\n\ntokens = [token.text for token in doc]\n\nner = [[ent.start, ent.end, ent.label_, ent.text] for ent in doc.ents]\nprint(f\"Entities detected: {ner}\")\n\nrelations = model.predict_relations(tokens, labels, threshold=0.01, ner=ner)\n\nprint('Number of relations:', len(relations))\n\nsorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)\nprint(\"\\nDescending Order by Score:\")\nfor item in sorted_data_desc:\n    print(f\"{item['head_text']} --> {item['label']} --> {item['tail_text']} | socre: {item['score']}\")\n```\n\n### Expected Output\n\n```\nEntities detected: [[0, 2, 'PERSON', 'Jack Dorsey'], [5, 7, 'PERSON', 'Tim Dorsey'], [13, 14, 'PERSON', 'Jack'], [17, 19, 'PERSON', 'Sarah Paulson'], [20, 22, 'GPE', 'New York'], [23, 24, 'DATE', '2003'], [27, 28, 'CARDINAL', 'one'], [30, 31, 'PERSON', 'Edward']]\nNumber of relations: 90\n\nDescending Order by Score:\n['Sarah', 'Paulson'] --> spouse --> ['New', 'York'] | score: 0.6608812212944031\n['Sarah', 'Paulson'] --> spouse --> ['Jack', 'Dorsey'] | score: 0.6601175665855408\n['Edward'] --> spouse --> ['New', 'York'] | score: 0.6493653655052185\n['one'] --> spouse --> ['New', 'York'] | score: 0.6480509042739868\n['Edward'] --> spouse --> ['Jack', 'Dorsey'] | score: 0.6474933624267578\n...\n```\n\n## Usage with spaCy (TBD)\n\nYou can also load GliREL into a regular spaCy NLP pipeline. Here's an example using a blank English pipeline, but you can use any spaCy model.\n\n```python\n\n```\n\n### Expected Output\n\n```\n\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Generalist model for Relation Extraction (Extract any relation types from texts)",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/jackboyla/GLiREL"
    },
    "split_keywords": [
        "named-entity-recognition",
        " ner",
        " data-science",
        " natural-language-processing",
        " artificial-intelligence",
        " nlp",
        " machine-learning",
        " transformers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "919b8c712e66a7491cbf9715feca0b52fe33cbf027d597018f2c9265e1dadd59",
                "md5": "a533be519d37570a6d9e40233613c20c",
                "sha256": "4af78d54cd4671960040759179b19e8d6eb36a4a291b518ed01783383b0f7c36"
            },
            "downloads": -1,
            "filename": "glirel-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a533be519d37570a6d9e40233613c20c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 33734,
            "upload_time": "2024-06-07T20:34:39",
            "upload_time_iso_8601": "2024-06-07T20:34:39.479788Z",
            "url": "https://files.pythonhosted.org/packages/91/9b/8c712e66a7491cbf9715feca0b52fe33cbf027d597018f2c9265e1dadd59/glirel-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df37137af25873c1f8f3d9b77b4bd92cc3a952c94232ec24a2beecab10fcffbd",
                "md5": "9379ac221e124ca0225df9f9233ec624",
                "sha256": "0659148308b5e797eb1c31d34f3c9f81b88dc80fee0d67bad758eba66224dfa7"
            },
            "downloads": -1,
            "filename": "glirel-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9379ac221e124ca0225df9f9233ec624",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 31924,
            "upload_time": "2024-06-07T20:34:40",
            "upload_time_iso_8601": "2024-06-07T20:34:40.513424Z",
            "url": "https://files.pythonhosted.org/packages/df/37/137af25873c1f8f3d9b77b4bd92cc3a952c94232ec24a2beecab10fcffbd/glirel-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-07 20:34:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jackboyla",
    "github_project": "GLiREL",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "huggingface_hub",
            "specs": []
        },
        {
            "name": "datasets",
            "specs": []
        },
        {
            "name": "flair",
            "specs": []
        },
        {
            "name": "seqeval",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "spacy",
            "specs": []
        },
        {
            "name": "wandb",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.10.1"
                ]
            ]
        }
    ],
    "lcname": "glirel"
}
        
Elapsed time: 0.56068s