autodistill-setfit


Nameautodistill-setfit JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/roboflow/autodistill-setfit
SummaryTrain SetFit models with Autodistill
upload_time2024-06-11 16:25:23
maintainerNone
docs_urlNone
authorRoboflow
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <p>
    <a align="center" href="" target="_blank">
      <img
        width="850"
        src="https://media.roboflow.com/open-source/autodistill/autodistill-banner.png"
      >
    </a>
  </p>
</div>

# Autodistill SetFit Module

This repository contains the code supporting the SetFit target model trainer for use with [Autodistill](https://github.com/autodistill/autodistill).

SetFit is a framework for fine-tuning Sentence Transformer models with a few examples of each class on which you want to train. SetFit is developed by [Hugging Face](https://github.com/huggingface/setfit).

## Installation

To use the SetFit target model, you will need to install the following dependency:

```bash
pip3 install autodistill-setfit
```

## Quickstart

The SetFit module takes in `.jsonl` files and trains a text classification model.

Each record in the JSONL file should have an entry called `text` that contains the text to be classified. The `label` entry should contain the ground truth label for the text. This format is returned by Autodistill base text classification models like the GPTClassifier.

Here is an example entry of a record used to train a research paper subject classifier:

```json
{"title": "CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl", "content": "arXiv:2405.11039v1 Announce Type: new \nAbstract: The Common Crawl (CC) corpus....", "classification": "natural language processing"}
```

```python
from autodistill_setfit import SetFitModel

target_model = SetFitModel()

# train a model
target_model.train("./data.jsonl", output="model", epochs=5)

target_model = SetFitModel("model")

# run inference on the new model
pred = target_model.predict("Geospatial data.")

print(pred)
# geospatial
```

## License

This project is licensed under an [MIT license](LICENSE).

## 🏆 Contributing

We love your input! Please see the core Autodistill [contributing guide](https://github.com/autodistill/autodistill/blob/main/CONTRIBUTING.md) to get started. Thank you 🙏 to all our contributors!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/roboflow/autodistill-setfit",
    "name": "autodistill-setfit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Roboflow",
    "author_email": "support@roboflow.com",
    "download_url": "https://files.pythonhosted.org/packages/1b/07/f21335d52cc7741bda813513d98fd8669cb0a1f835f7ef48d221e411e7ed/autodistill_setfit-0.1.1.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <p>\n    <a align=\"center\" href=\"\" target=\"_blank\">\n      <img\n        width=\"850\"\n        src=\"https://media.roboflow.com/open-source/autodistill/autodistill-banner.png\"\n      >\n    </a>\n  </p>\n</div>\n\n# Autodistill SetFit Module\n\nThis repository contains the code supporting the SetFit target model trainer for use with [Autodistill](https://github.com/autodistill/autodistill).\n\nSetFit is a framework for fine-tuning Sentence Transformer models with a few examples of each class on which you want to train. SetFit is developed by [Hugging Face](https://github.com/huggingface/setfit).\n\n## Installation\n\nTo use the SetFit target model, you will need to install the following dependency:\n\n```bash\npip3 install autodistill-setfit\n```\n\n## Quickstart\n\nThe SetFit module takes in `.jsonl` files and trains a text classification model.\n\nEach record in the JSONL file should have an entry called `text` that contains the text to be classified. The `label` entry should contain the ground truth label for the text. This format is returned by Autodistill base text classification models like the GPTClassifier.\n\nHere is an example entry of a record used to train a research paper subject classifier:\n\n```json\n{\"title\": \"CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl\", \"content\": \"arXiv:2405.11039v1 Announce Type: new \\nAbstract: The Common Crawl (CC) corpus....\", \"classification\": \"natural language processing\"}\n```\n\n```python\nfrom autodistill_setfit import SetFitModel\n\ntarget_model = SetFitModel()\n\n# train a model\ntarget_model.train(\"./data.jsonl\", output=\"model\", epochs=5)\n\ntarget_model = SetFitModel(\"model\")\n\n# run inference on the new model\npred = target_model.predict(\"Geospatial data.\")\n\nprint(pred)\n# geospatial\n```\n\n## License\n\nThis project is licensed under an [MIT license](LICENSE).\n\n## \ud83c\udfc6 Contributing\n\nWe love your input! Please see the core Autodistill [contributing guide](https://github.com/autodistill/autodistill/blob/main/CONTRIBUTING.md) to get started. Thank you \ud83d\ude4f to all our contributors!\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Train SetFit models with Autodistill",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/roboflow/autodistill-setfit"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "87a997ca687159ff67d621ace85004e4211eb44f9e5119e270264a690f21a149",
                "md5": "d7835e85186b64595ec16c4ce33c6c5e",
                "sha256": "a8d9ade91788ee5392ccd01c14744a59ca766146a8b0878863e0b540eace27c2"
            },
            "downloads": -1,
            "filename": "autodistill_setfit-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d7835e85186b64595ec16c4ce33c6c5e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 3505,
            "upload_time": "2024-06-11T16:25:21",
            "upload_time_iso_8601": "2024-06-11T16:25:21.997634Z",
            "url": "https://files.pythonhosted.org/packages/87/a9/97ca687159ff67d621ace85004e4211eb44f9e5119e270264a690f21a149/autodistill_setfit-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1b07f21335d52cc7741bda813513d98fd8669cb0a1f835f7ef48d221e411e7ed",
                "md5": "c3fbb74dbefcd038e1621189aac13ee0",
                "sha256": "091d0725eae422492dec18a56040a0029618c744abea139166cd58996bda4a42"
            },
            "downloads": -1,
            "filename": "autodistill_setfit-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c3fbb74dbefcd038e1621189aac13ee0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 3359,
            "upload_time": "2024-06-11T16:25:23",
            "upload_time_iso_8601": "2024-06-11T16:25:23.599117Z",
            "url": "https://files.pythonhosted.org/packages/1b/07/f21335d52cc7741bda813513d98fd8669cb0a1f835f7ef48d221e411e7ed/autodistill_setfit-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-11 16:25:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "roboflow",
    "github_project": "autodistill-setfit",
    "github_not_found": true,
    "lcname": "autodistill-setfit"
}
        
Elapsed time: 0.34279s