gliner-finetune

Name	gliner-finetune JSON
Version	0.0.4 JSON
	download
home_page	https://github.com/wjbmattingly/gliner-finetune
Summary	A library to create synthetic data with OpenAI and train a GLiNER model on that data.
upload_time	2024-04-15 09:36:49
maintainer	None
docs_url	None
author	William J.B. Mattingly
requires_python	<=3.11,>=3.7
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # GLiNER-Finetune

`gliner-finetune` is a Python library designed to generate synthetic data using OpenAI's GPT models, process this data, and then use it to train a GLiNER model. GLiNER is a framework for learning and inference in Named Entity Recognition (NER) tasks.

## Features

- **Data Generation**: Leverage OpenAI's powerful language models to create synthetic training data.
- **Data Processing**: Convert raw synthetic data into a format suitable for NER training.
- **Model Training**: Fine-tune the GLiNER model on the processed synthetic data for improved NER performance.

## Installation

To install the `gliner-finetune` library, use pip:

```bash
pip install gliner-finetune
```

## Quick Start

The following example demonstrates how to generate synthetic data, process it, and train a GLiNER model using the `gliner-finetune` library.

Make sure you have a .env file with your OPENAI_API_KEY set as a variable.

### Step 1: Generate Synthetic Data

```python
from gliner_finetune.synthetic import generate_data, create_prompt
import json

# Define your example data
example_data = {
    "text": "The Alpine Swift primarily consumes flying insects such as wasps, bees, and flies. It captures its prey mid-air while swiftly flying through the alpine skies. It nests in high, rocky mountain crevices where it uses feathers and small sticks to construct a simple yet secure nesting environment.",
    "generic_plant_food": [],
    "generic_animal_food": ["flying insects"],
    "plant_food": [],
    "specific_animal_food": ["wasps", "bees", "flies"],
    "location_nest": ["rocky mountain crevices"],
    "item_nest": ["feathers", "small sticks"]
}

# Convert example data to JSON string
json_data = json.dumps(example_data)

# Generate prompt and synthetic data
prompt = create_prompt(json_data)
print(prompt)

# Generate synthetic data with specified number of API calls
num_calls = 3
results = generate_data(json_data, num_calls)
print(results)
```

### Step 2: Process and Split Data

```python
from gliner_finetune.convert import convert

# Assuming the data has been read from 'parsed_responses.json'
with open('synthetic_data/parsed_responses.json', 'r') as file:
    data = json.load(file)

# Flatten the data list for processing
final_data = [sample for item in data for sample in item]

# Convert and split the data into training, validation, and testing datasets
training_data = convert(final_data, project_path='', train_split=0.8, eval_split=0.2, test_split=0.0,
                        train_file='train.json', eval_file='eval.json', test_file='test.json', overwrite=True)
```

### Step 3: Train the GLiNER Model

```python
from gliner_finetune.train import train_model

# Train the model
train_model(model="urchade/gliner_small-v2.1", train_data="assets/train.json", 
            eval_data="assets/eval.json", project="")
```

## Documentation

For more details about the GLiNER model and its capabilities, visit the official repository:

- [GLiNER GitHub Repository](https://github.com/urchade/GLiNER)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wjbmattingly/gliner-finetune",
    "name": "gliner-finetune",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.11,>=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "William J.B. Mattingly",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/c7/cc/462e250237deeb562a23db455d14409744316114434f18e61ffe6010bcb8/gliner-finetune-0.0.4.tar.gz",
    "platform": null,
    "description": "# GLiNER-Finetune\n\n`gliner-finetune` is a Python library designed to generate synthetic data using OpenAI's GPT models, process this data, and then use it to train a GLiNER model. GLiNER is a framework for learning and inference in Named Entity Recognition (NER) tasks.\n\n## Features\n\n- **Data Generation**: Leverage OpenAI's powerful language models to create synthetic training data.\n- **Data Processing**: Convert raw synthetic data into a format suitable for NER training.\n- **Model Training**: Fine-tune the GLiNER model on the processed synthetic data for improved NER performance.\n\n## Installation\n\nTo install the `gliner-finetune` library, use pip:\n\n```bash\npip install gliner-finetune\n```\n\n## Quick Start\n\nThe following example demonstrates how to generate synthetic data, process it, and train a GLiNER model using the `gliner-finetune` library.\n\nMake sure you have a .env file with your OPENAI_API_KEY set as a variable.\n\n### Step 1: Generate Synthetic Data\n\n```python\nfrom gliner_finetune.synthetic import generate_data, create_prompt\nimport json\n\n# Define your example data\nexample_data = {\n    \"text\": \"The Alpine Swift primarily consumes flying insects such as wasps, bees, and flies. It captures its prey mid-air while swiftly flying through the alpine skies. It nests in high, rocky mountain crevices where it uses feathers and small sticks to construct a simple yet secure nesting environment.\",\n    \"generic_plant_food\": [],\n    \"generic_animal_food\": [\"flying insects\"],\n    \"plant_food\": [],\n    \"specific_animal_food\": [\"wasps\", \"bees\", \"flies\"],\n    \"location_nest\": [\"rocky mountain crevices\"],\n    \"item_nest\": [\"feathers\", \"small sticks\"]\n}\n\n# Convert example data to JSON string\njson_data = json.dumps(example_data)\n\n# Generate prompt and synthetic data\nprompt = create_prompt(json_data)\nprint(prompt)\n\n# Generate synthetic data with specified number of API calls\nnum_calls = 3\nresults = generate_data(json_data, num_calls)\nprint(results)\n```\n\n### Step 2: Process and Split Data\n\n```python\nfrom gliner_finetune.convert import convert\n\n# Assuming the data has been read from 'parsed_responses.json'\nwith open('synthetic_data/parsed_responses.json', 'r') as file:\n    data = json.load(file)\n\n# Flatten the data list for processing\nfinal_data = [sample for item in data for sample in item]\n\n# Convert and split the data into training, validation, and testing datasets\ntraining_data = convert(final_data, project_path='', train_split=0.8, eval_split=0.2, test_split=0.0,\n                        train_file='train.json', eval_file='eval.json', test_file='test.json', overwrite=True)\n```\n\n### Step 3: Train the GLiNER Model\n\n```python\nfrom gliner_finetune.train import train_model\n\n# Train the model\ntrain_model(model=\"urchade/gliner_small-v2.1\", train_data=\"assets/train.json\", \n            eval_data=\"assets/eval.json\", project=\"\")\n```\n\n## Documentation\n\nFor more details about the GLiNER model and its capabilities, visit the official repository:\n\n- [GLiNER GitHub Repository](https://github.com/urchade/GLiNER)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library to create synthetic data with OpenAI and train a GLiNER model on that data.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/wjbmattingly/gliner-finetune"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03c2b6ab5dc4a8a3812e81f7f2b15bebf18e5c1bb2958c0a039d0ccc8306c39e",
                "md5": "9dfdeccf6dc806773406d9360e8fddf4",
                "sha256": "8bf8d67286efa030da09706eab6feeb378a55a6f100d19302ac50b50cbf16acd"
            },
            "downloads": -1,
            "filename": "gliner_finetune-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9dfdeccf6dc806773406d9360e8fddf4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.11,>=3.7",
            "size": 7797,
            "upload_time": "2024-04-15T09:36:46",
            "upload_time_iso_8601": "2024-04-15T09:36:46.199684Z",
            "url": "https://files.pythonhosted.org/packages/03/c2/b6ab5dc4a8a3812e81f7f2b15bebf18e5c1bb2958c0a039d0ccc8306c39e/gliner_finetune-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c7cc462e250237deeb562a23db455d14409744316114434f18e61ffe6010bcb8",
                "md5": "042b398c313218d233f7ff6e4e11b4d5",
                "sha256": "84e3f092bcd2db8a0d8f8d612d88d7b8d12907b50132c1bdd65b9c382a98a18c"
            },
            "downloads": -1,
            "filename": "gliner-finetune-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "042b398c313218d233f7ff6e4e11b4d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.11,>=3.7",
            "size": 6771,
            "upload_time": "2024-04-15T09:36:49",
            "upload_time_iso_8601": "2024-04-15T09:36:49.199623Z",
            "url": "https://files.pythonhosted.org/packages/c7/cc/462e250237deeb562a23db455d14409744316114434f18e61ffe6010bcb8/gliner-finetune-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-15 09:36:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wjbmattingly",
    "github_project": "gliner-finetune",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "gliner-finetune"
}

William J.B. Mattingly