crowd-kit


Namecrowd-kit JSON
Version 1.3.0.post0 PyPI version JSON
download
home_pageNone
SummaryComputational Quality Control for Crowdsourcing
upload_time2024-04-06 09:36:50
maintainerNone
docs_urlNone
authorToloka
requires_python>=3.8
licenseApache 2.0
keywords crowdsourcing data labeling answer aggregation truth inference learning from crowds machine learning quality control data quality
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Crowd-Kit: Computational Quality Control for Crowdsourcing

[![Crowd-Kit](https://tlk.s3.yandex.net/crowd-kit/Crowd-Kit-GitHub.png)](https://github.com/Toloka/crowd-kit)

[![PyPI Version][pypi_badge]][pypi_link]
[![GitHub Tests][github_tests_badge]][github_tests_link]
[![Codecov][codecov_badge]][codecov_link]
[![Documentation][docs_badge]][docs_link]
[![Paper][paper_badge]][paper_link]

[pypi_badge]: https://badge.fury.io/py/crowd-kit.svg
[pypi_link]: https://pypi.python.org/pypi/crowd-kit
[github_tests_badge]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml/badge.svg?branch=main
[github_tests_link]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml
[codecov_badge]: https://codecov.io/gh/Toloka/crowd-kit/branch/main/graph/badge.svg
[codecov_link]: https://codecov.io/gh/Toloka/crowd-kit
[docs_badge]: https://readthedocs.org/projects/crowd-kit/badge/
[docs_link]: https://crowd-kit.readthedocs.io/
[paper_badge]: https://joss.theoj.org/papers/10.21105/joss.06227/status.svg
[paper_link]: https://doi.org/10.21105/joss.06227

**Crowd-Kit** is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

* implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
* metrics of uncertainty, consistency, and agreement with aggregate;
* loaders for popular crowdsourced datasets.

Also, the `learning` subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

## Installing

To install Crowd-Kit, run the following command: `pip install crowd-kit`. If you also want to use the `learning` subpackage, type `pip install crowd-kit[learning]`.

If you are interested in contributing to Crowd-Kit, use [Pipenv](https://pipenv.pypa.io/en/latest/) to install the library with its dependencies: `pipenv install --dev`. We use [pytest](https://pytest.org/) for testing.

## Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

````python
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd
````

Then, you need to read your annotations into Pandas DataFrame with columns `task`, `worker`, `label`. Alternatively, you can download an example dataset:

````python
df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset
````

Then, you can aggregate the workers' responses using the `fit_predict` method from the **scikit-learn** library:

````python
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
````

[More usage examples](https://github.com/Toloka/crowd-kit/tree/main/examples)

## Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

### Categorical Responses

| Method | Status |
| ------------- | :-------------: |
| [Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.majority_vote.MajorityVote) | ✅ |
| [One-coin Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.OneCoinDawidSkene) | ✅ |
| [Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.DawidSkene) | ✅ |
| [Gold Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.gold_majority_vote.GoldMajorityVote) | ✅ |
| [M-MSR](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.m_msr.MMSR) | ✅ |
| [Wawa](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.wawa.Wawa) | ✅ |
| [Zero-Based Skill](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.zero_based_skill.ZeroBasedSkill) | ✅ |
| [GLAD](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.glad.GLAD) | ✅ |
| [KOS](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.kos.KOS) | ✅ |
| [MACE](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.mace.MACE) | ✅ |

### Multi-Label Responses

|Method|Status|
|-|:-:|
|[Binary Relevance](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.multilabel.binary_relevance.BinaryRelevance)|✅|

### Textual Responses

| Method | Status |
| ------------- | :-------------: |
| [RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.rasa.RASA) | ✅ |
| [HRRASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.hrrasa.HRRASA) | ✅ |
| [ROVER](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.texts.rover.ROVER) | ✅ |

### Image Segmentation

| Method | Status |
| ------------------ | :------------------: |
| [Segmentation MV](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_majority_vote.SegmentationMajorityVote) | ✅ |
| [Segmentation RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_rasa.SegmentationRASA) | ✅ |
| [Segmentation EM](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_em.SegmentationEM) | ✅ |

### Pairwise Comparisons

| Method | Status |
| -------------- | :---------------------: |
| [Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.bradley_terry.BradleyTerry) | ✅ |
| [Noisy Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.noisy_bt.NoisyBradleyTerry) | ✅ |

### Learning from Crowds

|Method|Status|
|-|:-:|
|[CrowdLayer](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.crowd_layer.CrowdLayer)|✅|
|[CoNAL](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.conal.CoNAL)|✅|

## Citation

* Ustalov D., Pavlichenko N., Tseitlin B. (2024). [Learning from Crowds with Crowd-Kit](https://doi.org/10.21105/joss.06227). Journal of Open Source Software, 9(96), 6227

```bibtex
@article{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2024},
  journal   = {Journal of Open Source Software},
  volume    = {9},
  number    = {96},
  pages     = {6227},
  publisher = {The Open Journal},
  doi       = {10.21105/joss.06227},
  issn      = {2475-9066},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  language  = {english},
}
```

## Support and Contributions

Please use [GitHub Issues](https://github.com/Toloka/crowd-kit/issues) to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in [CONTRIBUTING.md](CONTRIBUTING.md).

## License

© Crowd-Kit team authors, 2020–2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "crowd-kit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "crowdsourcing, data labeling, answer aggregation, truth inference, learning from crowds, machine learning, quality control, data quality",
    "author": "Toloka",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/93/bf/e36fef1556e7612f32d14bbf6ef8bcab10420555f9f149b45e2007b07566/crowd-kit-1.3.0.post0.tar.gz",
    "platform": null,
    "description": "# Crowd-Kit: Computational Quality Control for Crowdsourcing\n\n[![Crowd-Kit](https://tlk.s3.yandex.net/crowd-kit/Crowd-Kit-GitHub.png)](https://github.com/Toloka/crowd-kit)\n\n[![PyPI Version][pypi_badge]][pypi_link]\n[![GitHub Tests][github_tests_badge]][github_tests_link]\n[![Codecov][codecov_badge]][codecov_link]\n[![Documentation][docs_badge]][docs_link]\n[![Paper][paper_badge]][paper_link]\n\n[pypi_badge]: https://badge.fury.io/py/crowd-kit.svg\n[pypi_link]: https://pypi.python.org/pypi/crowd-kit\n[github_tests_badge]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml/badge.svg?branch=main\n[github_tests_link]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml\n[codecov_badge]: https://codecov.io/gh/Toloka/crowd-kit/branch/main/graph/badge.svg\n[codecov_link]: https://codecov.io/gh/Toloka/crowd-kit\n[docs_badge]: https://readthedocs.org/projects/crowd-kit/badge/\n[docs_link]: https://crowd-kit.readthedocs.io/\n[paper_badge]: https://joss.theoj.org/papers/10.21105/joss.06227/status.svg\n[paper_link]: https://doi.org/10.21105/joss.06227\n\n**Crowd-Kit** is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.\n\nCurrently, Crowd-Kit contains:\n\n* implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;\n* metrics of uncertainty, consistency, and agreement with aggregate;\n* loaders for popular crowdsourced datasets.\n\nAlso, the `learning` subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.\n\n## Installing\n\nTo install Crowd-Kit, run the following command: `pip install crowd-kit`. If you also want to use the `learning` subpackage, type `pip install crowd-kit[learning]`.\n\nIf you are interested in contributing to Crowd-Kit, use [Pipenv](https://pipenv.pypa.io/en/latest/) to install the library with its dependencies: `pipenv install --dev`. We use [pytest](https://pytest.org/) for testing.\n\n## Getting Started\n\nThis example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.\n\nFirst, let us do all the necessary imports.\n\n````python\nfrom crowdkit.aggregation import DawidSkene\nfrom crowdkit.datasets import load_dataset\n\nimport pandas as pd\n````\n\nThen, you need to read your annotations into Pandas DataFrame with columns `task`, `worker`, `label`. Alternatively, you can download an example dataset:\n\n````python\ndf = pd.read_csv('results.csv')  # should contain columns: task, worker, label\n# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset\n````\n\nThen, you can aggregate the workers' responses using the `fit_predict` method from the **scikit-learn** library:\n\n````python\naggregated_labels = DawidSkene(n_iter=100).fit_predict(df)\n````\n\n[More usage examples](https://github.com/Toloka/crowd-kit/tree/main/examples)\n\n## Implemented Aggregation Methods\n\nBelow is the list of currently implemented methods, including the already available (\u2705) and in progress (\ud83d\udfe1).\n\n### Categorical Responses\n\n| Method | Status |\n| ------------- | :-------------: |\n| [Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.majority_vote.MajorityVote) | \u2705 |\n| [One-coin Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.OneCoinDawidSkene) | \u2705 |\n| [Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.DawidSkene) | \u2705 |\n| [Gold Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.gold_majority_vote.GoldMajorityVote) | \u2705 |\n| [M-MSR](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.m_msr.MMSR) | \u2705 |\n| [Wawa](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.wawa.Wawa) | \u2705 |\n| [Zero-Based Skill](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.zero_based_skill.ZeroBasedSkill) | \u2705 |\n| [GLAD](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.glad.GLAD) | \u2705 |\n| [KOS](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.kos.KOS) | \u2705 |\n| [MACE](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.mace.MACE) | \u2705 |\n\n### Multi-Label Responses\n\n|Method|Status|\n|-|:-:|\n|[Binary Relevance](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.multilabel.binary_relevance.BinaryRelevance)|\u2705|\n\n### Textual Responses\n\n| Method | Status |\n| ------------- | :-------------: |\n| [RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.rasa.RASA) | \u2705 |\n| [HRRASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.hrrasa.HRRASA) | \u2705 |\n| [ROVER](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.texts.rover.ROVER) | \u2705 |\n\n### Image Segmentation\n\n| Method | Status |\n| ------------------ | :------------------: |\n| [Segmentation MV](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_majority_vote.SegmentationMajorityVote) | \u2705 |\n| [Segmentation RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_rasa.SegmentationRASA) | \u2705 |\n| [Segmentation EM](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_em.SegmentationEM) | \u2705 |\n\n### Pairwise Comparisons\n\n| Method | Status |\n| -------------- | :---------------------: |\n| [Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.bradley_terry.BradleyTerry) | \u2705 |\n| [Noisy Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.noisy_bt.NoisyBradleyTerry) | \u2705 |\n\n### Learning from Crowds\n\n|Method|Status|\n|-|:-:|\n|[CrowdLayer](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.crowd_layer.CrowdLayer)|\u2705|\n|[CoNAL](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.conal.CoNAL)|\u2705|\n\n## Citation\n\n* Ustalov D., Pavlichenko N., Tseitlin B. (2024). [Learning from Crowds with Crowd-Kit](https://doi.org/10.21105/joss.06227). Journal of Open Source Software, 9(96), 6227\n\n```bibtex\n@article{CrowdKit,\n  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},\n  title     = {{Learning from Crowds with Crowd-Kit}},\n  year      = {2024},\n  journal   = {Journal of Open Source Software},\n  volume    = {9},\n  number    = {96},\n  pages     = {6227},\n  publisher = {The Open Journal},\n  doi       = {10.21105/joss.06227},\n  issn      = {2475-9066},\n  eprint    = {2109.08584},\n  eprinttype = {arxiv},\n  eprintclass = {cs.HC},\n  language  = {english},\n}\n```\n\n## Support and Contributions\n\nPlease use [GitHub Issues](https://github.com/Toloka/crowd-kit/issues) to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\n© Crowd-Kit team authors, 2020–2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Computational Quality Control for Crowdsourcing",
    "version": "1.3.0.post0",
    "project_urls": {
        "API Reference": "https://crowd-kit.readthedocs.io/",
        "Bug Tracker": "https://github.com/Toloka/crowd-kit/issues",
        "Documentation": "https://crowd-kit.readthedocs.io/",
        "Homepage": "https://github.com/Toloka/crowd-kit",
        "Release Notes": "https://github.com/Toloka/crowd-kit/blob/main/CHANGELOG.md",
        "Source Code": "https://github.com/Toloka/crowd-kit"
    },
    "split_keywords": [
        "crowdsourcing",
        " data labeling",
        " answer aggregation",
        " truth inference",
        " learning from crowds",
        " machine learning",
        " quality control",
        " data quality"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "86de64e7fbf4cc8f7188b0fc275ed2a9bea7fb92e54ec31b83b009173836999d",
                "md5": "6d74719407cdbd5b9580b303466dd6f5",
                "sha256": "e74e90d926503c7396492c24b34ed47b3451bc3e393eefac87867069fd6ef9d7"
            },
            "downloads": -1,
            "filename": "crowd_kit-1.3.0.post0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d74719407cdbd5b9580b303466dd6f5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 87282,
            "upload_time": "2024-04-06T09:36:48",
            "upload_time_iso_8601": "2024-04-06T09:36:48.761564Z",
            "url": "https://files.pythonhosted.org/packages/86/de/64e7fbf4cc8f7188b0fc275ed2a9bea7fb92e54ec31b83b009173836999d/crowd_kit-1.3.0.post0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "93bfe36fef1556e7612f32d14bbf6ef8bcab10420555f9f149b45e2007b07566",
                "md5": "530c80544ad901c78e306bb987d25632",
                "sha256": "ba4932935e29d7739ca469e1c9e69bb36cd6edaba33620412e9ed4ba46b3eff2"
            },
            "downloads": -1,
            "filename": "crowd-kit-1.3.0.post0.tar.gz",
            "has_sig": false,
            "md5_digest": "530c80544ad901c78e306bb987d25632",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 60139,
            "upload_time": "2024-04-06T09:36:50",
            "upload_time_iso_8601": "2024-04-06T09:36:50.281316Z",
            "url": "https://files.pythonhosted.org/packages/93/bf/e36fef1556e7612f32d14bbf6ef8bcab10420555f9f149b45e2007b07566/crowd-kit-1.3.0.post0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-06 09:36:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Toloka",
    "github_project": "crowd-kit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "crowd-kit"
}
        
Elapsed time: 0.33337s