bcubed-metrics


Namebcubed-metrics JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/nezumiCodes/bcubed-metrics
SummaryA package to calculate BCUBED precision, recall, and F1-score for clustering evaluation.
upload_time2024-09-05 16:06:24
maintainerNone
docs_urlNone
authorVasiliki Nikolaidi
requires_python>=3.6
licenseNone
keywords bcubed
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # B-Cubed Metrics

<div style="text-align: justify">
A simple Python package to calculate B-Cubed precision, recall, and F1-score for clustering evaluation.
</div>

## What are B-Cubed Metrics
<div style="text-align: justify">
The B-Cubed algorithm was first introduced by Bagga, A. and Baldwin B. (1998) in their paper on Entity-Based Cross-Document Coreferencing Using the Vector Space Model. The algorithm compares a predicted clustering with a ground truth (or gold standard) clustering through element-wise precision and recall scores. For each element, the predicted and ground truth clusters containing the element are compared, and then the mean over all elements is taken. The B-Cubed algorithm can be useful in unsupervised techniques where the cluster labels are not available, because unlike macro-averaged metrics, it focuses on element-wise operations.
</div>

<div style="text-align: justify">
From the paper, two simple equations were devised calculating precision and recall scores for the predicted clustering:
</div>

$$
Precision = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count \; of \; element)^2}{count \; of \; all \; elements \; in \; cluster}}
$$

$$
Recall = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count \; of \; element)^2}{count \; of \; total \; elements \; from \; this \; category}}
$$

$$
F-score = \frac{1}{k}\sum_{i=1}^n {\frac{2\times Precision(C)_k \times Recall(C)_k}{Precision(C)_k + Recall(C)_k}}
$$

<div style="text-align: justify">

where $n$ above denotes the number of categories in the cluster and $k$ is the number of predicted clusters. $Precision(C)_k$ and $Recall(C)_k$ are the 'partial' precision and recalls for each cluster. 
</div>


## Installation and Use
<div style="text-align: justify">
Download the package from any terminal using:
</div>

```bash
pip install bcubed-metrics
```
<div style="text-align: justify">
To use the B-Cubed class you need to import it and provide 2 dictionaries - one for the predicted clustering, and one for the ground truth clustering (actual labels):
</div>

```python
from bcubed_metrics.bcubed import Bcubed

predicted_clustering = [
            {'blue': 4, 'red': 2, 'green': 1},
            {'blue': 2, 'red': 2, 'green': 3},
            {'blue': 1, 'red': 5},
            {'blue': 1, 'red': 2, 'green': 3}
        ]

ground_truth_clustering = {'blue': 8, 'red': 11, 'green': 7}

bcubed = Bcubed(predicted_clustering=predicted_clustering, ground_truth_clustering=ground_truth_clustering)

metrics = bcubed.get_metrics() # returns all metrics as dictionary

bcubed.print_metrics() # prints all metrics
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nezumiCodes/bcubed-metrics",
    "name": "bcubed-metrics",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "bcubed",
    "author": "Vasiliki Nikolaidi",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/03/2e/b4a9579578ad73eb2350eb18fe422ea4f046975f499b7fc07749003a67cd/bcubed_metrics-1.0.1.tar.gz",
    "platform": null,
    "description": "# B-Cubed Metrics\r\n\r\n<div style=\"text-align: justify\">\r\nA simple Python package to calculate B-Cubed precision, recall, and F1-score for clustering evaluation.\r\n</div>\r\n\r\n## What are B-Cubed Metrics\r\n<div style=\"text-align: justify\">\r\nThe B-Cubed algorithm was first introduced by Bagga, A. and Baldwin B. (1998) in their paper on Entity-Based Cross-Document Coreferencing Using the Vector Space Model. The algorithm compares a predicted clustering with a ground truth (or gold standard) clustering through element-wise precision and recall scores. For each element, the predicted and ground truth clusters containing the element are compared, and then the mean over all elements is taken. The B-Cubed algorithm can be useful in unsupervised techniques where the cluster labels are not available, because unlike macro-averaged metrics, it focuses on element-wise operations.\r\n</div>\r\n\r\n<div style=\"text-align: justify\">\r\nFrom the paper, two simple equations were devised calculating precision and recall scores for the predicted clustering:\r\n</div>\r\n\r\n$$\r\nPrecision = \\frac{1}{\\sum {elements}}\\sum_{i=1}^n {\\frac{(count \\; of \\; element)^2}{count \\; of \\; all \\; elements \\; in \\; cluster}}\r\n$$\r\n\r\n$$\r\nRecall = \\frac{1}{\\sum {elements}}\\sum_{i=1}^n {\\frac{(count \\; of \\; element)^2}{count \\; of \\; total \\; elements \\; from \\; this \\; category}}\r\n$$\r\n\r\n$$\r\nF-score = \\frac{1}{k}\\sum_{i=1}^n {\\frac{2\\times Precision(C)_k \\times Recall(C)_k}{Precision(C)_k + Recall(C)_k}}\r\n$$\r\n\r\n<div style=\"text-align: justify\">\r\n\r\nwhere $n$ above denotes the number of categories in the cluster and $k$ is the number of predicted clusters. $Precision(C)_k$ and $Recall(C)_k$ are the 'partial' precision and recalls for each cluster. \r\n</div>\r\n\r\n\r\n## Installation and Use\r\n<div style=\"text-align: justify\">\r\nDownload the package from any terminal using:\r\n</div>\r\n\r\n```bash\r\npip install bcubed-metrics\r\n```\r\n<div style=\"text-align: justify\">\r\nTo use the B-Cubed class you need to import it and provide 2 dictionaries - one for the predicted clustering, and one for the ground truth clustering (actual labels):\r\n</div>\r\n\r\n```python\r\nfrom bcubed_metrics.bcubed import Bcubed\r\n\r\npredicted_clustering = [\r\n            {'blue': 4, 'red': 2, 'green': 1},\r\n            {'blue': 2, 'red': 2, 'green': 3},\r\n            {'blue': 1, 'red': 5},\r\n            {'blue': 1, 'red': 2, 'green': 3}\r\n        ]\r\n\r\nground_truth_clustering = {'blue': 8, 'red': 11, 'green': 7}\r\n\r\nbcubed = Bcubed(predicted_clustering=predicted_clustering, ground_truth_clustering=ground_truth_clustering)\r\n\r\nmetrics = bcubed.get_metrics() # returns all metrics as dictionary\r\n\r\nbcubed.print_metrics() # prints all metrics\r\n```\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A package to calculate BCUBED precision, recall, and F1-score for clustering evaluation.",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/nezumiCodes/bcubed-metrics"
    },
    "split_keywords": [
        "bcubed"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b43b75c9908abcb520eb52c159268dfe628d57974c82d14f0f4914e86049683",
                "md5": "af3abe05e656307bfdcae54d8fc8bacd",
                "sha256": "c783d4f4c82b4550f6e2ec57d686495ce323df3275bb96e385cfcc8aa24e13dc"
            },
            "downloads": -1,
            "filename": "bcubed_metrics-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "af3abe05e656307bfdcae54d8fc8bacd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5209,
            "upload_time": "2024-09-05T16:06:22",
            "upload_time_iso_8601": "2024-09-05T16:06:22.742422Z",
            "url": "https://files.pythonhosted.org/packages/0b/43/b75c9908abcb520eb52c159268dfe628d57974c82d14f0f4914e86049683/bcubed_metrics-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "032eb4a9579578ad73eb2350eb18fe422ea4f046975f499b7fc07749003a67cd",
                "md5": "be3e68871998337b54cb1178ad33801f",
                "sha256": "b844bbedf124789dcfa5bd47b6f4cd035a683ed807c8731bc6352105e5329319"
            },
            "downloads": -1,
            "filename": "bcubed_metrics-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "be3e68871998337b54cb1178ad33801f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 4409,
            "upload_time": "2024-09-05T16:06:24",
            "upload_time_iso_8601": "2024-09-05T16:06:24.619622Z",
            "url": "https://files.pythonhosted.org/packages/03/2e/b4a9579578ad73eb2350eb18fe422ea4f046975f499b7fc07749003a67cd/bcubed_metrics-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-05 16:06:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nezumiCodes",
    "github_project": "bcubed-metrics",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bcubed-metrics"
}
        
Elapsed time: 4.55687s