# GLiNER-Finetune
`gliner-finetune` is a Python library designed to generate synthetic data using OpenAI's GPT models, process this data, and then use it to train a GLiNER model. GLiNER is a framework for learning and inference in Named Entity Recognition (NER) tasks.
## Features
- **Data Generation**: Leverage OpenAI's powerful language models to create synthetic training data.
- **Data Processing**: Convert raw synthetic data into a format suitable for NER training.
- **Model Training**: Fine-tune the GLiNER model on the processed synthetic data for improved NER performance.
## Installation
To install the `gliner-finetune` library, use pip:
```bash
pip install gliner-finetune
```
## Quick Start
The following example demonstrates how to generate synthetic data, process it, and train a GLiNER model using the `gliner-finetune` library.
Make sure you have a .env file with your OPENAI_API_KEY set as a variable.
### Step 1: Generate Synthetic Data
```python
from gliner_finetune.synthetic import generate_data, create_prompt
import json
# Define your example data
example_data = {
"text": "The Alpine Swift primarily consumes flying insects such as wasps, bees, and flies. It captures its prey mid-air while swiftly flying through the alpine skies. It nests in high, rocky mountain crevices where it uses feathers and small sticks to construct a simple yet secure nesting environment.",
"generic_plant_food": [],
"generic_animal_food": ["flying insects"],
"plant_food": [],
"specific_animal_food": ["wasps", "bees", "flies"],
"location_nest": ["rocky mountain crevices"],
"item_nest": ["feathers", "small sticks"]
}
# Convert example data to JSON string
json_data = json.dumps(example_data)
# Generate prompt and synthetic data
prompt = create_prompt(json_data)
print(prompt)
# Generate synthetic data with specified number of API calls
num_calls = 3
results = generate_data(json_data, num_calls)
print(results)
```
### Step 2: Process and Split Data
```python
from gliner_finetune.convert import convert
# Assuming the data has been read from 'parsed_responses.json'
with open('synthetic_data/parsed_responses.json', 'r') as file:
data = json.load(file)
# Flatten the data list for processing
final_data = [sample for item in data for sample in item]
# Convert and split the data into training, validation, and testing datasets
training_data = convert(final_data, project_path='', train_split=0.8, eval_split=0.2, test_split=0.0,
train_file='train.json', eval_file='eval.json', test_file='test.json', overwrite=True)
```
### Step 3: Train the GLiNER Model
```python
from gliner_finetune.train import train_model
# Train the model
train_model(model="urchade/gliner_small-v2.1", train_data="assets/train.json",
eval_data="assets/eval.json", project="")
```
## Documentation
For more details about the GLiNER model and its capabilities, visit the official repository:
- [GLiNER GitHub Repository](https://github.com/urchade/GLiNER)
Raw data
{
"_id": null,
"home_page": "https://github.com/wjbmattingly/gliner-finetune",
"name": "gliner-finetune",
"maintainer": null,
"docs_url": null,
"requires_python": "<=3.11,>=3.7",
"maintainer_email": null,
"keywords": null,
"author": "William J.B. Mattingly",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/c7/cc/462e250237deeb562a23db455d14409744316114434f18e61ffe6010bcb8/gliner-finetune-0.0.4.tar.gz",
"platform": null,
"description": "# GLiNER-Finetune\n\n`gliner-finetune` is a Python library designed to generate synthetic data using OpenAI's GPT models, process this data, and then use it to train a GLiNER model. GLiNER is a framework for learning and inference in Named Entity Recognition (NER) tasks.\n\n## Features\n\n- **Data Generation**: Leverage OpenAI's powerful language models to create synthetic training data.\n- **Data Processing**: Convert raw synthetic data into a format suitable for NER training.\n- **Model Training**: Fine-tune the GLiNER model on the processed synthetic data for improved NER performance.\n\n## Installation\n\nTo install the `gliner-finetune` library, use pip:\n\n```bash\npip install gliner-finetune\n```\n\n## Quick Start\n\nThe following example demonstrates how to generate synthetic data, process it, and train a GLiNER model using the `gliner-finetune` library.\n\nMake sure you have a .env file with your OPENAI_API_KEY set as a variable.\n\n### Step 1: Generate Synthetic Data\n\n```python\nfrom gliner_finetune.synthetic import generate_data, create_prompt\nimport json\n\n# Define your example data\nexample_data = {\n \"text\": \"The Alpine Swift primarily consumes flying insects such as wasps, bees, and flies. It captures its prey mid-air while swiftly flying through the alpine skies. It nests in high, rocky mountain crevices where it uses feathers and small sticks to construct a simple yet secure nesting environment.\",\n \"generic_plant_food\": [],\n \"generic_animal_food\": [\"flying insects\"],\n \"plant_food\": [],\n \"specific_animal_food\": [\"wasps\", \"bees\", \"flies\"],\n \"location_nest\": [\"rocky mountain crevices\"],\n \"item_nest\": [\"feathers\", \"small sticks\"]\n}\n\n# Convert example data to JSON string\njson_data = json.dumps(example_data)\n\n# Generate prompt and synthetic data\nprompt = create_prompt(json_data)\nprint(prompt)\n\n# Generate synthetic data with specified number of API calls\nnum_calls = 3\nresults = generate_data(json_data, num_calls)\nprint(results)\n```\n\n### Step 2: Process and Split Data\n\n```python\nfrom gliner_finetune.convert import convert\n\n# Assuming the data has been read from 'parsed_responses.json'\nwith open('synthetic_data/parsed_responses.json', 'r') as file:\n data = json.load(file)\n\n# Flatten the data list for processing\nfinal_data = [sample for item in data for sample in item]\n\n# Convert and split the data into training, validation, and testing datasets\ntraining_data = convert(final_data, project_path='', train_split=0.8, eval_split=0.2, test_split=0.0,\n train_file='train.json', eval_file='eval.json', test_file='test.json', overwrite=True)\n```\n\n### Step 3: Train the GLiNER Model\n\n```python\nfrom gliner_finetune.train import train_model\n\n# Train the model\ntrain_model(model=\"urchade/gliner_small-v2.1\", train_data=\"assets/train.json\", \n eval_data=\"assets/eval.json\", project=\"\")\n```\n\n## Documentation\n\nFor more details about the GLiNER model and its capabilities, visit the official repository:\n\n- [GLiNER GitHub Repository](https://github.com/urchade/GLiNER)\n",
"bugtrack_url": null,
"license": null,
"summary": "A library to create synthetic data with OpenAI and train a GLiNER model on that data.",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/wjbmattingly/gliner-finetune"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "03c2b6ab5dc4a8a3812e81f7f2b15bebf18e5c1bb2958c0a039d0ccc8306c39e",
"md5": "9dfdeccf6dc806773406d9360e8fddf4",
"sha256": "8bf8d67286efa030da09706eab6feeb378a55a6f100d19302ac50b50cbf16acd"
},
"downloads": -1,
"filename": "gliner_finetune-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9dfdeccf6dc806773406d9360e8fddf4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<=3.11,>=3.7",
"size": 7797,
"upload_time": "2024-04-15T09:36:46",
"upload_time_iso_8601": "2024-04-15T09:36:46.199684Z",
"url": "https://files.pythonhosted.org/packages/03/c2/b6ab5dc4a8a3812e81f7f2b15bebf18e5c1bb2958c0a039d0ccc8306c39e/gliner_finetune-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c7cc462e250237deeb562a23db455d14409744316114434f18e61ffe6010bcb8",
"md5": "042b398c313218d233f7ff6e4e11b4d5",
"sha256": "84e3f092bcd2db8a0d8f8d612d88d7b8d12907b50132c1bdd65b9c382a98a18c"
},
"downloads": -1,
"filename": "gliner-finetune-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "042b398c313218d233f7ff6e4e11b4d5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<=3.11,>=3.7",
"size": 6771,
"upload_time": "2024-04-15T09:36:49",
"upload_time_iso_8601": "2024-04-15T09:36:49.199623Z",
"url": "https://files.pythonhosted.org/packages/c7/cc/462e250237deeb562a23db455d14409744316114434f18e61ffe6010bcb8/gliner-finetune-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-15 09:36:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wjbmattingly",
"github_project": "gliner-finetune",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "gliner-finetune"
}