[](https://pypi.org/project/small-text/)
[](https://anaconda.org/conda-forge/small-text)
[](https://codecov.io/gh/webis-de/small-text)
[](https://small-text.readthedocs.io/en/v1.4.1/)

[](CONTRIBUTING.md)
[](LICENSE)
[](https://zenodo.org/records/13338289)
[](https://twitter.com/intent/tweet?text=https%3A%2F%2Fgithub.com%2Fwebis-de%2Fsmall-text)
<p align="center">
<img width="372" height="80" src="https://raw.githubusercontent.com/webis-de/small-text/master/docs/_static/small-text-logo.png" alt="small-text logo" />
</p>
> Active Learning for Text Classification in Python.
<hr>
[Installation](#installation) | [Quick Start](#quick-start) | [Contribution](CONTRIBUTING.md) | [Changelog][changelog] | [**Docs**][documentation_main]
Small-Text provides state-of-the-art **Active Learning** for Text Classification.
Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided,
which can be easily mixed and matched to build active learning experiments or applications.
## Features
- Provides unified interfaces for Active Learning so that you can
easily mix and match query strategies with classifiers provided by [sklearn](https://scikit-learn.org/), [Pytorch](https://pytorch.org/), or [transformers](https://github.com/huggingface/transformers).
- Supports GPU-based [Pytorch](https://pytorch.org/) models and integrates [transformers](https://github.com/huggingface/transformers)
so that you can use state-of-the-art Text Classification models for Active Learning.
- GPU is supported but not required. In case of a CPU-only use case,
a lightweight installation only requires a minimal set of dependencies.
- Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).
## What is Active Learning?
[Active Learning](https://small-text.readthedocs.io/en/latest/active_learning.html) allows you to efficiently label training data for supervised learning in a scenario where you have little to no labeled data.
<p align="center">
<img src="https://raw.githubusercontent.com/webis-de/small-text/dev/docs/_static/learning-curve-example.gif?raw=true" alt="Learning curve example for the TREC-6 dataset." width="60%">
</p>
---
## News
- **Version 1.4.1** ([v1.4.1][changelog_1.4.1]) - August 18th, 2024
- Bugfix release.
- **Version 1.4.0** ([v1.4.0][changelog_1.4.0]) - June 9th, 2024
- New query strategy: [AnchorSubsampling](https://small-text.readthedocs.io/en/v1.3.3/components/query_strategies.html#small_text.query_strategies.subsampling.AnchorSubsampling) (aka [AnchorAL](https://arxiv.org/abs/2404.05623)).
Special thanks to [Pietro Lesci](https://github.com/pietrolesci) for the correspondence and code review.
- **Paper published at EACL 2023 🎉**
- The [paper][paper_arxiv] introducing small-text has been accepted at [EACL 2023](https://2023.eacl.org/). Meet us at the conference in May!
- Update: the paper was awarded [EACL Best System Demonstration](https://aclanthology.org/2023.eacl-demo.11/). Thank you, for your support!
[For a complete list of changes, see the change log.][changelog]
---
## Installation
Small-Text can be easily installed via pip (or conda):
```bash
pip install small-text
```
The command results in a [slim installation][documentation_install] with only the necessary dependencies.
For a full installation via pip, you just need to include the `transformers` extra requirement:
```bash
pip install small-text[transformers]
```
For conda, which lacks the extra requirements feature, a full installation can be achieved as follows:
```bash
conda install -c conda-forge "torch>=1.6.0" "torchtext>=0.7.0" transformers small-text
```
The library requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required.
More information regarding the installation can be found in the
[documentation][documentation_install].
## Quick Start
For a quick start, see the provided examples for [binary classification](examples/examplecode/binary_classification.py),
[pytorch multi-class classification](examples/examplecode/pytorch_multiclass_classification.py), and
[transformer-based multi-class classification](examples/examplecode/transformers_multiclass_classification.py),
or check out the notebooks.
### Notebooks
| # | Notebook | |
| --- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | [Intro: Active Learning for Text Classification with Small-Text](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) |
| 2 | [Using Stopping Criteria for Active Learning](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb) |
| 3 | [Active Learning using SetFit](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/03-active-learning-with-setfit.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/03-active-learning-with-setfit.ipynb) |
| 4 | [Using SetFit's Zero Shot Capabilities for Cold Start Initialization](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/04-zero-shot-cold-start.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/04-zero-shot-cold-start.ipynb) |
### Showcase
- [Tutorial: 👂 Active learning for text classification with small-text][argilla_al_tutorial] (Use small-text conveniently from the [argilla][argilla] UI.)
A full list of showcases can be found [in the docs][documentation_showcase].
🎀 **Would you like to share your use case?** Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the [showcase section][documentation_showcase] or even here.
## Documentation
Read the latest documentation [here][documentation_main]. Noteworthy pages include:
- [Overview of Query Strategies][documentation_query_strategies]
- [Reproducibility Notes][documentation_reproducibility_notes]
---
## Alternatives
[modAL](https://github.com/modAL-python/modAL), [ALiPy](https://github.com/NUAA-AL/ALiPy), [libact](https://github.com/ntucllab/libact), [ALToolbox](https://github.com/AIRI-Institute/al_toolbox)
## Contribution
Contributions are welcome. Details can be found in [CONTRIBUTING.md](CONTRIBUTING.md).
## Acknowledgments
This software was created by Christopher Schröder ([@chschroeder](https://github.com/chschroeder)) at Leipzig University's [NLP group](http://asv.informatik.uni-leipzig.de/)
which is a part of the [Webis](https://webis.de/) research network.
The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.
## Citation
Small-Text has been introduced in detail in the EACL23 System Demonstration Paper ["Small-Text: Active Learning for Text Classification in Python"](https://aclanthology.org/2023.eacl-demo.11/) which can be cited as follows:
```
@inproceedings{schroeder2023small-text,
title = "Small-Text: Active Learning for Text Classification in Python",
author = {Schr{\"o}der, Christopher and M{\"u}ller, Lydia and Niekler, Andreas and Potthast, Martin},
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.eacl-demo.11",
pages = "84--95"
}
```
## License
[MIT License](LICENSE)
[documentation_main]: https://small-text.readthedocs.io/en/v1.4.1/
[documentation_install]: https://small-text.readthedocs.io/en/v1.4.1/install.html
[documentation_query_strategies]: https://small-text.readthedocs.io/en/v1.4.1/components/query_strategies.html
[documentation_showcase]: https://small-text.readthedocs.io/en/v1.4.1/showcase.html
[documentation_reproducibility_notes]: https://small-text.readthedocs.io/en/v1.4.1/reproducibility_notes.html
[changelog]: https://small-text.readthedocs.io/en/latest/changelog.html
[changelog_1.3.2]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-3-2-2023-08-19
[changelog_1.3.3]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-3-3-2023-12-29
[changelog_1.4.0]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-0-2024-06-09
[changelog_1.4.1]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-1-2024-08-18
[argilla]: https://github.com/argilla-io/argilla
[argilla_al_tutorial]: https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-smalltext-activelearning.html
[paper_arxiv]: https://arxiv.org/abs/2107.10314
Raw data
{
"_id": null,
"home_page": "https://github.com/webis-de/small-text",
"name": "small-text",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "active learning, text classification",
"author": "Christopher Schr\u00f6der",
"author_email": "small-text@protonmail.com",
"download_url": "https://files.pythonhosted.org/packages/46/d7/451047555fda846caa42acd248e0691f3b9b35ca0f8a57e6200764ad2b45/small_text-1.4.1.tar.gz",
"platform": null,
"description": "[](https://pypi.org/project/small-text/)\n[](https://anaconda.org/conda-forge/small-text)\n[](https://codecov.io/gh/webis-de/small-text)\n[](https://small-text.readthedocs.io/en/v1.4.1/) \n\n[](CONTRIBUTING.md)\n[](LICENSE)\n[](https://zenodo.org/records/13338289)\n[](https://twitter.com/intent/tweet?text=https%3A%2F%2Fgithub.com%2Fwebis-de%2Fsmall-text)\n\n<p align=\"center\">\n<img width=\"372\" height=\"80\" src=\"https://raw.githubusercontent.com/webis-de/small-text/master/docs/_static/small-text-logo.png\" alt=\"small-text logo\" />\n</p>\n\n> Active Learning for Text Classification in Python.\n<hr>\n\n[Installation](#installation) | [Quick Start](#quick-start) | [Contribution](CONTRIBUTING.md) | [Changelog][changelog] | [**Docs**][documentation_main]\n\nSmall-Text provides state-of-the-art **Active Learning** for Text Classification. \nSeveral pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, \nwhich can be easily mixed and matched to build active learning experiments or applications.\n\n## Features\n\n- Provides unified interfaces for Active Learning so that you can \n easily mix and match query strategies with classifiers provided by [sklearn](https://scikit-learn.org/), [Pytorch](https://pytorch.org/), or [transformers](https://github.com/huggingface/transformers).\n- Supports GPU-based [Pytorch](https://pytorch.org/) models and integrates [transformers](https://github.com/huggingface/transformers) \n so that you can use state-of-the-art Text Classification models for Active Learning.\n- GPU is supported but not required. In case of a CPU-only use case, \n a lightweight installation only requires a minimal set of dependencies.\n- Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).\n\n## What is Active Learning?\n[Active Learning](https://small-text.readthedocs.io/en/latest/active_learning.html) allows you to efficiently label training data for supervised learning in a scenario where you have little to no labeled data.\n\n<p align=\"center\">\n\n<img src=\"https://raw.githubusercontent.com/webis-de/small-text/dev/docs/_static/learning-curve-example.gif?raw=true\" alt=\"Learning curve example for the TREC-6 dataset.\" width=\"60%\">\n\n</p>\n\n---\n\n## News\n\n- **Version 1.4.1** ([v1.4.1][changelog_1.4.1]) - August 18th, 2024\n - Bugfix release.\n\n- **Version 1.4.0** ([v1.4.0][changelog_1.4.0]) - June 9th, 2024\n - New query strategy: [AnchorSubsampling](https://small-text.readthedocs.io/en/v1.3.3/components/query_strategies.html#small_text.query_strategies.subsampling.AnchorSubsampling) (aka [AnchorAL](https://arxiv.org/abs/2404.05623)). \n Special thanks to [Pietro Lesci](https://github.com/pietrolesci) for the correspondence and code review. \n\n- **Paper published at EACL 2023 \ud83c\udf89**\n - The [paper][paper_arxiv] introducing small-text has been accepted at [EACL 2023](https://2023.eacl.org/). Meet us at the conference in May!\n - Update: the paper was awarded [EACL Best System Demonstration](https://aclanthology.org/2023.eacl-demo.11/). Thank you, for your support!\n\n[For a complete list of changes, see the change log.][changelog]\n\n---\n\n## Installation\n\nSmall-Text can be easily installed via pip (or conda):\n\n```bash\npip install small-text\n```\n\nThe command results in a [slim installation][documentation_install] with only the necessary dependencies. \nFor a full installation via pip, you just need to include the `transformers` extra requirement:\n\n```bash\npip install small-text[transformers]\n```\n\nFor conda, which lacks the extra requirements feature, a full installation can be achieved as follows:\n\n```bash\nconda install -c conda-forge \"torch>=1.6.0\" \"torchtext>=0.7.0\" transformers small-text\n```\n\nThe library requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. \nMore information regarding the installation can be found in the \n[documentation][documentation_install].\n\n\n## Quick Start\n\nFor a quick start, see the provided examples for [binary classification](examples/examplecode/binary_classification.py),\n[pytorch multi-class classification](examples/examplecode/pytorch_multiclass_classification.py), and \n[transformer-based multi-class classification](examples/examplecode/transformers_multiclass_classification.py),\nor check out the notebooks.\n\n### Notebooks\n\n| # | Notebook | |\n| --- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| 1 | [Intro: Active Learning for Text Classification with Small-Text](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) |\n| 2 | [Using Stopping Criteria for Active Learning](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb) |\n| 3 | [Active Learning using SetFit](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/03-active-learning-with-setfit.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/03-active-learning-with-setfit.ipynb) |\n| 4 | [Using SetFit's Zero Shot Capabilities for Cold Start Initialization](https://github.com/webis-de/small-text/blob/v1.4.1/examples/notebooks/04-zero-shot-cold-start.ipynb) | [](https://colab.research.google.com/github/webis-de/small-text/blob/v1.4.1/examples/notebooks/04-zero-shot-cold-start.ipynb) |\n\n### Showcase\n\n- [Tutorial: \ud83d\udc42 Active learning for text classification with small-text][argilla_al_tutorial] (Use small-text conveniently from the [argilla][argilla] UI.)\n\nA full list of showcases can be found [in the docs][documentation_showcase].\n\n\ud83c\udf80 **Would you like to share your use case?** Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the [showcase section][documentation_showcase] or even here.\n\n## Documentation\n\nRead the latest documentation [here][documentation_main]. Noteworthy pages include:\n\n- [Overview of Query Strategies][documentation_query_strategies]\n- [Reproducibility Notes][documentation_reproducibility_notes]\n\n---\n\n## Alternatives\n\n[modAL](https://github.com/modAL-python/modAL), [ALiPy](https://github.com/NUAA-AL/ALiPy), [libact](https://github.com/ntucllab/libact), [ALToolbox](https://github.com/AIRI-Institute/al_toolbox)\n\n## Contribution\n\nContributions are welcome. Details can be found in [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## Acknowledgments\n\nThis software was created by Christopher Schr\u00f6der ([@chschroeder](https://github.com/chschroeder)) at Leipzig University's [NLP group](http://asv.informatik.uni-leipzig.de/) \nwhich is a part of the [Webis](https://webis.de/) research network. \nThe encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.\n\n## Citation\n\nSmall-Text has been introduced in detail in the EACL23 System Demonstration Paper [\"Small-Text: Active Learning for Text Classification in Python\"](https://aclanthology.org/2023.eacl-demo.11/) which can be cited as follows:\n```\n@inproceedings{schroeder2023small-text,\n title = \"Small-Text: Active Learning for Text Classification in Python\",\n author = {Schr{\\\"o}der, Christopher and M{\\\"u}ller, Lydia and Niekler, Andreas and Potthast, Martin},\n booktitle = \"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\n month = may,\n year = \"2023\",\n address = \"Dubrovnik, Croatia\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2023.eacl-demo.11\",\n pages = \"84--95\"\n}\n```\n\n## License\n\n[MIT License](LICENSE)\n\n\n[documentation_main]: https://small-text.readthedocs.io/en/v1.4.1/\n[documentation_install]: https://small-text.readthedocs.io/en/v1.4.1/install.html\n[documentation_query_strategies]: https://small-text.readthedocs.io/en/v1.4.1/components/query_strategies.html\n[documentation_showcase]: https://small-text.readthedocs.io/en/v1.4.1/showcase.html\n[documentation_reproducibility_notes]: https://small-text.readthedocs.io/en/v1.4.1/reproducibility_notes.html\n[changelog]: https://small-text.readthedocs.io/en/latest/changelog.html\n[changelog_1.3.2]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-3-2-2023-08-19\n[changelog_1.3.3]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-3-3-2023-12-29\n[changelog_1.4.0]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-0-2024-06-09\n[changelog_1.4.1]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-1-2024-08-18\n[argilla]: https://github.com/argilla-io/argilla\n[argilla_al_tutorial]: https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-smalltext-activelearning.html\n[paper_arxiv]: https://arxiv.org/abs/2107.10314\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Active Learning for Text Classification in Python.",
"version": "1.4.1",
"project_urls": {
"Homepage": "https://github.com/webis-de/small-text"
},
"split_keywords": [
"active learning",
" text classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8e5e8048a14c991619f929b58f9731d5ef42858a69fe92304995ce112b869482",
"md5": "71b9ee1908236e1f26d3c85a9ce20c5e",
"sha256": "c08c41379e4ed7c009113e4c0ac40713d7a334e1dbc699b1dc32f6f27a7bc00d"
},
"downloads": -1,
"filename": "small_text-1.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "71b9ee1908236e1f26d3c85a9ce20c5e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 211664,
"upload_time": "2024-08-18T16:00:34",
"upload_time_iso_8601": "2024-08-18T16:00:34.294847Z",
"url": "https://files.pythonhosted.org/packages/8e/5e/8048a14c991619f929b58f9731d5ef42858a69fe92304995ce112b869482/small_text-1.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "46d7451047555fda846caa42acd248e0691f3b9b35ca0f8a57e6200764ad2b45",
"md5": "1d7adfc8535a7625af15492d9fbda9fe",
"sha256": "13e3bcdf5d0b405f9b3aed15ce99b317fa408f5f17b7278d5d1fd4d0c5837857"
},
"downloads": -1,
"filename": "small_text-1.4.1.tar.gz",
"has_sig": false,
"md5_digest": "1d7adfc8535a7625af15492d9fbda9fe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 268558,
"upload_time": "2024-08-18T16:00:36",
"upload_time_iso_8601": "2024-08-18T16:00:36.596043Z",
"url": "https://files.pythonhosted.org/packages/46/d7/451047555fda846caa42acd248e0691f3b9b35ca0f8a57e6200764ad2b45/small_text-1.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-18 16:00:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "webis-de",
"github_project": "small-text",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "small-text"
}