argilla-plugins


Nameargilla-plugins JSON
Version 0.1.1 PyPI version JSON
download
home_page
Summary🔌 Open-source plugins for with practical features for Argilla using listeners.
upload_time2023-01-31 15:58:07
maintainer
docs_urlNone
authordavid
requires_python>=3.8,<3.11.0
licenseApache 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Argilla Plugins

> 🔌 Open-source plugins for extra features and workflows

**Why?**
The design of Argilla is intentionally programmable (i.e., developers can build complex workflows for reading and updating datasets). However, there are certain workflows and features which are shared across different use cases and could be simplified from a developer experience perspective. In order to facilitate the reuse of key workflows and empower the community, Argilla Plugins provides a collection of extensions to super power your Argilla use cases.
Some of this pluggable method could be eventually integrated into the [core of Argilla](https://github.com/argilla-io/argilla).

## Quickstart

```bash
pip install argilla-plugins
```

```python
from argilla_plugins.datasets import end_of_life

plugin = end_of_life(
    name="plugin-test",
    end_of_life_in_seconds=100,
    execution_interval_in_seconds=5,
    discard_only=False
)
plugin.start()
```

## How to develop a plugin

1. Pick a cool plugin from the list of topics or our issue overview.
2. Think about an abstraction for the plugin as shown below.
3. Refer to the solution in the issue.
   1. fork the repo.
   2. commit your code
   3. open a PR.
4. Keep it simple.
5. Have fun.


### Development requirements

#### Function
We want to to keep the plugins as abstract as possible, hence they have to be able to be used within 3 lines of code.
```python
from argilla_plugins.topic import plugin
plugin(name="dataset_name", ws="workspace" query="query", interval=1.0)
plugin.start()
```

#### Variables
variables `name`, `ws`, and `query` are supposed to be re-used as much as possible throughout all plugins. Similarly, some functions might contain adaptations like `name_from` or `query_from`. Whenever possible re-use variables as much as possible.

Ohh, and don`t forget to have fun! 🤓

## Topics
### Reporting

**What is it?**
Create interactive reports about dataset activity, dataset features, annotation tasks, model predictions, and more.

Plugins:
- [ ] automated reporting pluging using `datapane`. [issue](https://github.com/argilla-io/argilla-plugins/issues/1)
- [ ] automated reporting pluging for `great-expectations`. [issue](https://github.com/argilla-io/argilla-plugins/issues/2)

### Datasets

**What is it?**
Everything that involves operations on a `dataset level`, like dividing work, syncing datasets, and deduplicating records.

Plugins:
- [ ] sync data between datasets.
  - [ ] directional A->B. [issue](https://github.com/argilla-io/argilla-plugins/issues/3)
  - [ ] bi-directional A <-> B. [issue](https://github.com/argilla-io/argilla-plugins/issues/4)
- [ ] remove duplicate records. [issue](https://github.com/argilla-io/argilla-plugins/issues/5)
- [ ] create train test splits. [issue](https://github.com/argilla-io/argilla-plugins/issues/6)
- [ ] set limits to records in datasets
  - [X] end of life time. [issue](https://github.com/argilla-io/argilla-plugins/issues/7)
  - [ ] max # of records. [issue](https://github.com/argilla-io/argilla-plugins/issues/8)

#### End of Life
Automatically delete or discard records after `x` seconds.

```python
from argilla_plugins.datasets import end_of_life

plugin = end_of_life(
    name="plugin-test",
    end_of_life_in_seconds=100,
    execution_interval_in_seconds=5,
    discard_only=False
)
plugin.start()
```

### Programmatic Labelling

**What is it?**
Automatically update `annotations` and `predictions` labels and predictions of `records` based on heuristics.

Plugins:
- [X] annotated spans as gazzetteer for labelling. [issue](https://github.com/argilla-io/argilla-plugins/issues/12)
- [ ] vector search queries and similarity threshold. [issue](https://github.com/argilla-io/argilla-plugins/issues/11)
- [ ] use gazzetteer for labelling. [issue](https://github.com/argilla-io/argilla-plugins/issues/9)
- [ ] materialize annotations/predictions from rules using Snorkel or a MajorityVoter [issue](https://github.com/argilla-io/argilla-plugins/issues/10)

#### Token Copycat

If we annotate spans for texts like NER, we are relatively certain that these spans should be annotated the same throughout the entire dataset. We could use this assumption to already start annotating or predicting previously unseen data.

```python
from argilla_plugins import token_copycat

plugin = token_copycat(
    name="plugin-test",
    query=None,
    copy_predictions=True,
    word_dict_kb_predictions={"key": {"label": "label", "score": 0}},
    copy_annotations=True,
    word_dict_kb_annotations={"key": {"label": "label", "score": 0}},
    included_labels=["label"],
    case_sensitive=True,
    execution_interval_in_seconds=1,
)
plugin.start()
```

### Active learning

**What is it?**
A process during which a learning algorithm can interactively query a user (or some other information source) to label new data points.

Plugins:
- [ ] active learning for `TextClassification`.
  - [X] `classy-classification`. [issue](https://github.com/argilla-io/argilla-plugins/issues/13)
  - [ ] `small-text`. [issue](https://github.com/argilla-io/argilla-plugins/issues/15)
- [ ] active learning for `TokenClassification`. [issue](https://github.com/argilla-io/argilla-plugins/issues/17)

```python
from argilla_plugins import classy_learner

plugin = classy_learner(
    name="plugin-test",
    query=None,
    model="all-MiniLM-L6-v2",
    classy_config=None,
    certainty_threshold=0,
    overwrite_predictions=True,
    sample_strategy="fifo",
    min_n_samples=6,
    max_n_samples=20,
    batch_size=1000,
    execution_interval_in_seconds=5,
)
plugin.start()
```

### Inference endpoints
**What is it?**
Automatically add predictions to records as they are logged into Argilla. This can be used for making it really easy to pre-annotated a dataset with an existing model or service.

- [ ] inference with un-authenticated endpoint. [issue](https://github.com/argilla-io/argilla-plugins/issues/16)
- [ ] embed incoming records in the background. [issue](https://github.com/argilla-io/argilla-plugins/issues/18)


### Training endpoints
**What is it?**
Automatically train a model based on dataset annotations.

- [ ] TBD

### Suggestions
Do you have any suggestions? Please [open an issue](https://github.com/argilla-io/argilla-plugins/issues/new/choose) 🤓

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "argilla-plugins",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.11.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "david",
    "author_email": "david.m.berenstein@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/aa/54/43a1cad42f01f653eb80e500f2c400f9b337d6dd061ee0d95a9efa63cc21/argilla-plugins-0.1.1.tar.gz",
    "platform": null,
    "description": "# Argilla Plugins\n\n> \ud83d\udd0c Open-source plugins for extra features and workflows\n\n**Why?**\nThe design of Argilla is intentionally programmable (i.e., developers can build complex workflows for reading and updating datasets). However, there are certain workflows and features which are shared across different use cases and could be simplified from a developer experience perspective. In order to facilitate the reuse of key workflows and empower the community, Argilla Plugins provides a collection of extensions to super power your Argilla use cases.\nSome of this pluggable method could be eventually integrated into the [core of Argilla](https://github.com/argilla-io/argilla).\n\n## Quickstart\n\n```bash\npip install argilla-plugins\n```\n\n```python\nfrom argilla_plugins.datasets import end_of_life\n\nplugin = end_of_life(\n    name=\"plugin-test\",\n    end_of_life_in_seconds=100,\n    execution_interval_in_seconds=5,\n    discard_only=False\n)\nplugin.start()\n```\n\n## How to develop a plugin\n\n1. Pick a cool plugin from the list of topics or our issue overview.\n2. Think about an abstraction for the plugin as shown below.\n3. Refer to the solution in the issue.\n   1. fork the repo.\n   2. commit your code\n   3. open a PR.\n4. Keep it simple.\n5. Have fun.\n\n\n### Development requirements\n\n#### Function\nWe want to to keep the plugins as abstract as possible, hence they have to be able to be used within 3 lines of code.\n```python\nfrom argilla_plugins.topic import plugin\nplugin(name=\"dataset_name\", ws=\"workspace\" query=\"query\", interval=1.0)\nplugin.start()\n```\n\n#### Variables\nvariables `name`, `ws`, and `query` are supposed to be re-used as much as possible throughout all plugins. Similarly, some functions might contain adaptations like `name_from` or `query_from`. Whenever possible re-use variables as much as possible.\n\nOhh, and don`t forget to have fun! \ud83e\udd13\n\n## Topics\n### Reporting\n\n**What is it?**\nCreate interactive reports about dataset activity, dataset features, annotation tasks, model predictions, and more.\n\nPlugins:\n- [ ] automated reporting pluging using `datapane`. [issue](https://github.com/argilla-io/argilla-plugins/issues/1)\n- [ ] automated reporting pluging for `great-expectations`. [issue](https://github.com/argilla-io/argilla-plugins/issues/2)\n\n### Datasets\n\n**What is it?**\nEverything that involves operations on a `dataset level`, like dividing work, syncing datasets, and deduplicating records.\n\nPlugins:\n- [ ] sync data between datasets.\n  - [ ] directional A->B. [issue](https://github.com/argilla-io/argilla-plugins/issues/3)\n  - [ ] bi-directional A <-> B. [issue](https://github.com/argilla-io/argilla-plugins/issues/4)\n- [ ] remove duplicate records. [issue](https://github.com/argilla-io/argilla-plugins/issues/5)\n- [ ] create train test splits. [issue](https://github.com/argilla-io/argilla-plugins/issues/6)\n- [ ] set limits to records in datasets\n  - [X] end of life time. [issue](https://github.com/argilla-io/argilla-plugins/issues/7)\n  - [ ] max # of records. [issue](https://github.com/argilla-io/argilla-plugins/issues/8)\n\n#### End of Life\nAutomatically delete or discard records after `x` seconds.\n\n```python\nfrom argilla_plugins.datasets import end_of_life\n\nplugin = end_of_life(\n    name=\"plugin-test\",\n    end_of_life_in_seconds=100,\n    execution_interval_in_seconds=5,\n    discard_only=False\n)\nplugin.start()\n```\n\n### Programmatic Labelling\n\n**What is it?**\nAutomatically update `annotations` and `predictions` labels and predictions of `records` based on heuristics.\n\nPlugins:\n- [X] annotated spans as gazzetteer for labelling. [issue](https://github.com/argilla-io/argilla-plugins/issues/12)\n- [ ] vector search queries and similarity threshold. [issue](https://github.com/argilla-io/argilla-plugins/issues/11)\n- [ ] use gazzetteer for labelling. [issue](https://github.com/argilla-io/argilla-plugins/issues/9)\n- [ ] materialize annotations/predictions from rules using Snorkel or a MajorityVoter [issue](https://github.com/argilla-io/argilla-plugins/issues/10)\n\n#### Token Copycat\n\nIf we annotate spans for texts like NER, we are relatively certain that these spans should be annotated the same throughout the entire dataset. We could use this assumption to already start annotating or predicting previously unseen data.\n\n```python\nfrom argilla_plugins import token_copycat\n\nplugin = token_copycat(\n    name=\"plugin-test\",\n    query=None,\n    copy_predictions=True,\n    word_dict_kb_predictions={\"key\": {\"label\": \"label\", \"score\": 0}},\n    copy_annotations=True,\n    word_dict_kb_annotations={\"key\": {\"label\": \"label\", \"score\": 0}},\n    included_labels=[\"label\"],\n    case_sensitive=True,\n    execution_interval_in_seconds=1,\n)\nplugin.start()\n```\n\n### Active learning\n\n**What is it?**\nA process during which a learning algorithm can interactively query a user (or some other information source) to label new data points.\n\nPlugins:\n- [ ] active learning for `TextClassification`.\n  - [X] `classy-classification`. [issue](https://github.com/argilla-io/argilla-plugins/issues/13)\n  - [ ] `small-text`. [issue](https://github.com/argilla-io/argilla-plugins/issues/15)\n- [ ] active learning for `TokenClassification`. [issue](https://github.com/argilla-io/argilla-plugins/issues/17)\n\n```python\nfrom argilla_plugins import classy_learner\n\nplugin = classy_learner(\n    name=\"plugin-test\",\n    query=None,\n    model=\"all-MiniLM-L6-v2\",\n    classy_config=None,\n    certainty_threshold=0,\n    overwrite_predictions=True,\n    sample_strategy=\"fifo\",\n    min_n_samples=6,\n    max_n_samples=20,\n    batch_size=1000,\n    execution_interval_in_seconds=5,\n)\nplugin.start()\n```\n\n### Inference endpoints\n**What is it?**\nAutomatically add predictions to records as they are logged into Argilla. This can be used for making it really easy to pre-annotated a dataset with an existing model or service.\n\n- [ ] inference with un-authenticated endpoint. [issue](https://github.com/argilla-io/argilla-plugins/issues/16)\n- [ ] embed incoming records in the background. [issue](https://github.com/argilla-io/argilla-plugins/issues/18)\n\n\n### Training endpoints\n**What is it?**\nAutomatically train a model based on dataset annotations.\n\n- [ ] TBD\n\n### Suggestions\nDo you have any suggestions? Please [open an issue](https://github.com/argilla-io/argilla-plugins/issues/new/choose) \ud83e\udd13\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "\ud83d\udd0c Open-source plugins for with practical features for Argilla using listeners.",
    "version": "0.1.1",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "804d29991257e58e0c0a4879a26ae1d90e083e8a91d5fd340fb02a38540c7bb1",
                "md5": "c5b0b4bbfc1df174f8930ad6339bdbe1",
                "sha256": "fe8e37b8d9c91170598162bfe4065923b6d7c1898cb9ef632babdba1f349688e"
            },
            "downloads": -1,
            "filename": "argilla_plugins-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5b0b4bbfc1df174f8930ad6339bdbe1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.11.0",
            "size": 16465,
            "upload_time": "2023-01-31T15:58:09",
            "upload_time_iso_8601": "2023-01-31T15:58:09.955827Z",
            "url": "https://files.pythonhosted.org/packages/80/4d/29991257e58e0c0a4879a26ae1d90e083e8a91d5fd340fb02a38540c7bb1/argilla_plugins-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa5443a1cad42f01f653eb80e500f2c400f9b337d6dd061ee0d95a9efa63cc21",
                "md5": "784c71f59710b781d6b5adeaed3671c8",
                "sha256": "1a0fcf58cf69320712fc3c4d6f537ae10f506472f83b9fbaa73b5fe0e12c2e4f"
            },
            "downloads": -1,
            "filename": "argilla-plugins-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "784c71f59710b781d6b5adeaed3671c8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.11.0",
            "size": 15807,
            "upload_time": "2023-01-31T15:58:07",
            "upload_time_iso_8601": "2023-01-31T15:58:07.731970Z",
            "url": "https://files.pythonhosted.org/packages/aa/54/43a1cad42f01f653eb80e500f2c400f9b337d6dd061ee0d95a9efa63cc21/argilla-plugins-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-31 15:58:07",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "argilla-plugins"
}
        
Elapsed time: 0.04750s