# `Struct-IE`: Structured Information Extraction with Large Language Models
`struct-ie` is a Python library for named entity extraction using a transformer-based model.
## Installation
You can install the `struct-ie` library from PyPI:
```bash
pip install struct_ie
```
## To-Do List
- [x] Implement batch prediction
- [ ] Implement a Trainer fot Instruction Tuning
- [ ] PrefixLM for Instruction Tuning
- [ ] Add RelationExtractor
- [ ] Add GraphExtractor
- [ ] Add JsonExtractor
## Usage
You can try it on google colab: <a href="https://colab.research.google.com/drive/1RjtZ8xWg6KU4ztHiRfSSrEr1UeZr6eZ2?usp=sharing">
<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />
</a>
Here's an example of how to use the `EntityExtractor`:
### 1. Basic Usage
```python
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": None
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Extract entities from the text
entities = extractor.extract_entities(text)
print(entities)
```
### 2. Usage with a Custom Prompt
```python
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Custom prompt for entity extraction
prompt = "You are an expert on Named Entity Recognition. Extract entities from this text."
# Extract entities from the text using a custom prompt
entities = extractor.extract_entities(text, prompt=prompt)
print(entities)
```
### 3. Usage with Few-shot Examples
```python
from struct_ie import EntityExtractor
# Define the entity types with descriptions (optional)
entity_types_with_descriptions = {
"Name": "Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'",
"Award": "Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'",
"Date": None,
"Competition": "Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'",
"Team": "Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'"
}
# Initialize the EntityExtractor
extractor = EntityExtractor("Qwen/Qwen2-0.5B-Instruct", entity_types_with_descriptions, device="cpu")
# Example text for entity extraction
text = "Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018."
# Few-shot examples for improved entity extraction
demonstrations = [
{"input": "Lionel Messi won the Ballon d'Or 7 times.", "output": [("Lionel Messi", "Name"), ("Ballon d'Or", "Award")]}
]
# Extract entities from the text using few-shot examples
entities = extractor.extract_entities(text, few_shot_examples=demonstrations)
print(entities)
```
## License
This project is licensed under the Apache-2.0.
Raw data
{
"_id": null,
"home_page": null,
"name": "struct-ie",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Urchade Zaratiana <urchade.zaratiana@gmail.com>",
"keywords": "named-entity-recognition, ner, data-science, natural-language-processing, artificial-intelligence, nlp, machine-learning, transformers",
"author": null,
"author_email": "Urchade Zaratiana <urchade.zaratiana@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/42/14/f3afd3999a978fe7296dd8571d13ba18135f8e2a26d006f58aaf2944455c/struct_ie-0.0.2.tar.gz",
"platform": null,
"description": "# `Struct-IE`: Structured Information Extraction with Large Language Models\n\n`struct-ie` is a Python library for named entity extraction using a transformer-based model.\n\n## Installation\n\nYou can install the `struct-ie` library from PyPI:\n\n```bash\npip install struct_ie\n```\n\n## To-Do List\n\n- [x] Implement batch prediction\n- [ ] Implement a Trainer fot Instruction Tuning\n- [ ] PrefixLM for Instruction Tuning\n- [ ] Add RelationExtractor\n- [ ] Add GraphExtractor\n- [ ] Add JsonExtractor\n\n\n## Usage\n\nYou can try it on google colab: <a href=\"https://colab.research.google.com/drive/1RjtZ8xWg6KU4ztHiRfSSrEr1UeZr6eZ2?usp=sharing\">\n <img align=\"center\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" />\n</a>\n\nHere's an example of how to use the `EntityExtractor`:\n\n### 1. Basic Usage\n\n```python\nfrom struct_ie import EntityExtractor\n\n# Define the entity types with descriptions (optional)\nentity_types_with_descriptions = {\n \"Name\": \"Names of individuals like 'Jane Doe'\",\n \"Award\": \"Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'\",\n \"Date\": None,\n \"Competition\": \"Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'\",\n \"Team\": None\n}\n\n# Initialize the EntityExtractor\nextractor = EntityExtractor(\"Qwen/Qwen2-0.5B-Instruct\", entity_types_with_descriptions, device=\"cpu\")\n\n# Example text for entity extraction\ntext = \"Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018.\"\n\n# Extract entities from the text\nentities = extractor.extract_entities(text)\nprint(entities)\n```\n\n### 2. Usage with a Custom Prompt\n\n```python\nfrom struct_ie import EntityExtractor\n\n# Define the entity types with descriptions (optional)\nentity_types_with_descriptions = {\n \"Name\": \"Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'\",\n \"Award\": \"Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'\",\n \"Date\": None,\n \"Competition\": \"Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'\",\n \"Team\": \"Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'\"\n}\n\n# Initialize the EntityExtractor\nextractor = EntityExtractor(\"Qwen/Qwen2-0.5B-Instruct\", entity_types_with_descriptions, device=\"cpu\")\n\n# Example text for entity extraction\ntext = \"Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018.\"\n\n# Custom prompt for entity extraction\nprompt = \"You are an expert on Named Entity Recognition. Extract entities from this text.\"\n\n# Extract entities from the text using a custom prompt\nentities = extractor.extract_entities(text, prompt=prompt)\nprint(entities)\n```\n\n### 3. Usage with Few-shot Examples\n\n```python\nfrom struct_ie import EntityExtractor\n\n# Define the entity types with descriptions (optional)\nentity_types_with_descriptions = {\n \"Name\": \"Names of individuals like 'Jean-Luc Picard' or 'Jane Doe'\",\n \"Award\": \"Names of awards or honors such as the 'Nobel Prize' or the 'Pulitzer Prize'\",\n \"Date\": None,\n \"Competition\": \"Names of competitions or tournaments like the 'World Cup' or the 'Olympic Games'\",\n \"Team\": \"Names of sports teams or organizations like 'Manchester United' or 'FC Barcelona'\"\n}\n\n# Initialize the EntityExtractor\nextractor = EntityExtractor(\"Qwen/Qwen2-0.5B-Instruct\", entity_types_with_descriptions, device=\"cpu\")\n\n# Example text for entity extraction\ntext = \"Cristiano Ronaldo won the Ballon d'Or. He was the top scorer in the UEFA Champions League in 2018.\"\n\n# Few-shot examples for improved entity extraction\ndemonstrations = [\n {\"input\": \"Lionel Messi won the Ballon d'Or 7 times.\", \"output\": [(\"Lionel Messi\", \"Name\"), (\"Ballon d'Or\", \"Award\")]}\n]\n\n# Extract entities from the text using few-shot examples\nentities = extractor.extract_entities(text, few_shot_examples=demonstrations)\nprint(entities)\n```\n\n## License\n\nThis project is licensed under the Apache-2.0.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for structured information extraction with LLMs.",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/urchade/struct_ie",
"Repository": "https://github.com/urchade/struct_ie"
},
"split_keywords": [
"named-entity-recognition",
" ner",
" data-science",
" natural-language-processing",
" artificial-intelligence",
" nlp",
" machine-learning",
" transformers"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d29df99456e35982224b6bb06939ab52c34ba20177913836deadd9154e10292a",
"md5": "eacc83f843e4fea648e6a4a5cd2c5b62",
"sha256": "80a7a43c37f19871fa22478f27fe73ba5f6b1e41abb6150e2ada66b73da4fab6"
},
"downloads": -1,
"filename": "struct_ie-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eacc83f843e4fea648e6a4a5cd2c5b62",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 8380,
"upload_time": "2024-08-13T16:46:19",
"upload_time_iso_8601": "2024-08-13T16:46:19.820354Z",
"url": "https://files.pythonhosted.org/packages/d2/9d/f99456e35982224b6bb06939ab52c34ba20177913836deadd9154e10292a/struct_ie-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4214f3afd3999a978fe7296dd8571d13ba18135f8e2a26d006f58aaf2944455c",
"md5": "7a3414ace856a44651ccb4cc476fdf25",
"sha256": "156c24c128b88b4c7c047bbfb668465ec3cd3ee25b9d78be9b29d66741e24633"
},
"downloads": -1,
"filename": "struct_ie-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "7a3414ace856a44651ccb4cc476fdf25",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 8058,
"upload_time": "2024-08-13T16:46:22",
"upload_time_iso_8601": "2024-08-13T16:46:22.607111Z",
"url": "https://files.pythonhosted.org/packages/42/14/f3afd3999a978fe7296dd8571d13ba18135f8e2a26d006f58aaf2944455c/struct_ie-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-13 16:46:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "urchade",
"github_project": "struct_ie",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "struct-ie"
}