<div align="center">
<p>
<a align="center" href="" target="_blank">
<img
width="850"
src="https://media.roboflow.com/open-source/autodistill/autodistill-banner.png"
>
</a>
</p>
</div>
# Autodistill SetFit Module
This repository contains the code supporting the SetFit target model trainer for use with [Autodistill](https://github.com/autodistill/autodistill).
SetFit is a framework for fine-tuning Sentence Transformer models with a few examples of each class on which you want to train. SetFit is developed by [Hugging Face](https://github.com/huggingface/setfit).
## Installation
To use the SetFit target model, you will need to install the following dependency:
```bash
pip3 install autodistill-setfit
```
## Quickstart
The SetFit module takes in `.jsonl` files and trains a text classification model.
Each record in the JSONL file should have an entry called `text` that contains the text to be classified. The `label` entry should contain the ground truth label for the text. This format is returned by Autodistill base text classification models like the GPTClassifier.
Here is an example entry of a record used to train a research paper subject classifier:
```json
{"title": "CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl", "content": "arXiv:2405.11039v1 Announce Type: new \nAbstract: The Common Crawl (CC) corpus....", "classification": "natural language processing"}
```
```python
from autodistill_setfit import SetFitModel
target_model = SetFitModel()
# train a model
target_model.train("./data.jsonl", output="model", epochs=5)
target_model = SetFitModel("model")
# run inference on the new model
pred = target_model.predict("Geospatial data.")
print(pred)
# geospatial
```
## License
This project is licensed under an [MIT license](LICENSE).
## 🏆 Contributing
We love your input! Please see the core Autodistill [contributing guide](https://github.com/autodistill/autodistill/blob/main/CONTRIBUTING.md) to get started. Thank you 🙏 to all our contributors!
Raw data
{
"_id": null,
"home_page": "https://github.com/roboflow/autodistill-setfit",
"name": "autodistill-setfit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": "Roboflow",
"author_email": "support@roboflow.com",
"download_url": "https://files.pythonhosted.org/packages/1b/07/f21335d52cc7741bda813513d98fd8669cb0a1f835f7ef48d221e411e7ed/autodistill_setfit-0.1.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <p>\n <a align=\"center\" href=\"\" target=\"_blank\">\n <img\n width=\"850\"\n src=\"https://media.roboflow.com/open-source/autodistill/autodistill-banner.png\"\n >\n </a>\n </p>\n</div>\n\n# Autodistill SetFit Module\n\nThis repository contains the code supporting the SetFit target model trainer for use with [Autodistill](https://github.com/autodistill/autodistill).\n\nSetFit is a framework for fine-tuning Sentence Transformer models with a few examples of each class on which you want to train. SetFit is developed by [Hugging Face](https://github.com/huggingface/setfit).\n\n## Installation\n\nTo use the SetFit target model, you will need to install the following dependency:\n\n```bash\npip3 install autodistill-setfit\n```\n\n## Quickstart\n\nThe SetFit module takes in `.jsonl` files and trains a text classification model.\n\nEach record in the JSONL file should have an entry called `text` that contains the text to be classified. The `label` entry should contain the ground truth label for the text. This format is returned by Autodistill base text classification models like the GPTClassifier.\n\nHere is an example entry of a record used to train a research paper subject classifier:\n\n```json\n{\"title\": \"CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl\", \"content\": \"arXiv:2405.11039v1 Announce Type: new \\nAbstract: The Common Crawl (CC) corpus....\", \"classification\": \"natural language processing\"}\n```\n\n```python\nfrom autodistill_setfit import SetFitModel\n\ntarget_model = SetFitModel()\n\n# train a model\ntarget_model.train(\"./data.jsonl\", output=\"model\", epochs=5)\n\ntarget_model = SetFitModel(\"model\")\n\n# run inference on the new model\npred = target_model.predict(\"Geospatial data.\")\n\nprint(pred)\n# geospatial\n```\n\n## License\n\nThis project is licensed under an [MIT license](LICENSE).\n\n## \ud83c\udfc6 Contributing\n\nWe love your input! Please see the core Autodistill [contributing guide](https://github.com/autodistill/autodistill/blob/main/CONTRIBUTING.md) to get started. Thank you \ud83d\ude4f to all our contributors!\n",
"bugtrack_url": null,
"license": null,
"summary": "Train SetFit models with Autodistill",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/roboflow/autodistill-setfit"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "87a997ca687159ff67d621ace85004e4211eb44f9e5119e270264a690f21a149",
"md5": "d7835e85186b64595ec16c4ce33c6c5e",
"sha256": "a8d9ade91788ee5392ccd01c14744a59ca766146a8b0878863e0b540eace27c2"
},
"downloads": -1,
"filename": "autodistill_setfit-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d7835e85186b64595ec16c4ce33c6c5e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 3505,
"upload_time": "2024-06-11T16:25:21",
"upload_time_iso_8601": "2024-06-11T16:25:21.997634Z",
"url": "https://files.pythonhosted.org/packages/87/a9/97ca687159ff67d621ace85004e4211eb44f9e5119e270264a690f21a149/autodistill_setfit-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1b07f21335d52cc7741bda813513d98fd8669cb0a1f835f7ef48d221e411e7ed",
"md5": "c3fbb74dbefcd038e1621189aac13ee0",
"sha256": "091d0725eae422492dec18a56040a0029618c744abea139166cd58996bda4a42"
},
"downloads": -1,
"filename": "autodistill_setfit-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "c3fbb74dbefcd038e1621189aac13ee0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 3359,
"upload_time": "2024-06-11T16:25:23",
"upload_time_iso_8601": "2024-06-11T16:25:23.599117Z",
"url": "https://files.pythonhosted.org/packages/1b/07/f21335d52cc7741bda813513d98fd8669cb0a1f835f7ef48d221e411e7ed/autodistill_setfit-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-11 16:25:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "roboflow",
"github_project": "autodistill-setfit",
"github_not_found": true,
"lcname": "autodistill-setfit"
}