# B-Cubed Metrics
<div style="text-align: justify">
A simple Python package to calculate B-Cubed precision, recall, and F1-score for clustering evaluation.
</div>
## What are B-Cubed Metrics
<div style="text-align: justify">
The B-Cubed algorithm was first introduced by Bagga, A. and Baldwin B. (1998) in their paper on Entity-Based Cross-Document Coreferencing Using the Vector Space Model. The algorithm compares a predicted clustering with a ground truth (or gold standard) clustering through element-wise precision and recall scores. For each element, the predicted and ground truth clusters containing the element are compared, and then the mean over all elements is taken. The B-Cubed algorithm can be useful in unsupervised techniques where the cluster labels are not available, because unlike macro-averaged metrics, it focuses on element-wise operations.
</div>
<div style="text-align: justify">
From the paper, two simple equations were devised calculating precision and recall scores for the predicted clustering:
</div>
$$
Precision = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count \; of \; element)^2}{count \; of \; all \; elements \; in \; cluster}}
$$
$$
Recall = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count \; of \; element)^2}{count \; of \; total \; elements \; from \; this \; category}}
$$
$$
F-score = \frac{1}{k}\sum_{i=1}^n {\frac{2\times Precision(C)_k \times Recall(C)_k}{Precision(C)_k + Recall(C)_k}}
$$
<div style="text-align: justify">
where $n$ above denotes the number of categories in the cluster and $k$ is the number of predicted clusters. $Precision(C)_k$ and $Recall(C)_k$ are the 'partial' precision and recalls for each cluster.
</div>
## Installation and Use
<div style="text-align: justify">
Download the package from any terminal using:
</div>
```bash
pip install bcubed-metrics
```
<div style="text-align: justify">
To use the B-Cubed class you need to import it and provide 2 dictionaries - one for the predicted clustering, and one for the ground truth clustering (actual labels):
</div>
```python
from bcubed_metrics.bcubed import Bcubed
predicted_clustering = [
{'blue': 4, 'red': 2, 'green': 1},
{'blue': 2, 'red': 2, 'green': 3},
{'blue': 1, 'red': 5},
{'blue': 1, 'red': 2, 'green': 3}
]
ground_truth_clustering = {'blue': 8, 'red': 11, 'green': 7}
bcubed = Bcubed(predicted_clustering=predicted_clustering, ground_truth_clustering=ground_truth_clustering)
metrics = bcubed.get_metrics() # returns all metrics as dictionary
bcubed.print_metrics() # prints all metrics
```
Raw data
{
"_id": null,
"home_page": "https://github.com/nezumiCodes/bcubed-metrics",
"name": "bcubed-metrics",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "bcubed",
"author": "Vasiliki Nikolaidi",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/03/2e/b4a9579578ad73eb2350eb18fe422ea4f046975f499b7fc07749003a67cd/bcubed_metrics-1.0.1.tar.gz",
"platform": null,
"description": "# B-Cubed Metrics\r\n\r\n<div style=\"text-align: justify\">\r\nA simple Python package to calculate B-Cubed precision, recall, and F1-score for clustering evaluation.\r\n</div>\r\n\r\n## What are B-Cubed Metrics\r\n<div style=\"text-align: justify\">\r\nThe B-Cubed algorithm was first introduced by Bagga, A. and Baldwin B. (1998) in their paper on Entity-Based Cross-Document Coreferencing Using the Vector Space Model. The algorithm compares a predicted clustering with a ground truth (or gold standard) clustering through element-wise precision and recall scores. For each element, the predicted and ground truth clusters containing the element are compared, and then the mean over all elements is taken. The B-Cubed algorithm can be useful in unsupervised techniques where the cluster labels are not available, because unlike macro-averaged metrics, it focuses on element-wise operations.\r\n</div>\r\n\r\n<div style=\"text-align: justify\">\r\nFrom the paper, two simple equations were devised calculating precision and recall scores for the predicted clustering:\r\n</div>\r\n\r\n$$\r\nPrecision = \\frac{1}{\\sum {elements}}\\sum_{i=1}^n {\\frac{(count \\; of \\; element)^2}{count \\; of \\; all \\; elements \\; in \\; cluster}}\r\n$$\r\n\r\n$$\r\nRecall = \\frac{1}{\\sum {elements}}\\sum_{i=1}^n {\\frac{(count \\; of \\; element)^2}{count \\; of \\; total \\; elements \\; from \\; this \\; category}}\r\n$$\r\n\r\n$$\r\nF-score = \\frac{1}{k}\\sum_{i=1}^n {\\frac{2\\times Precision(C)_k \\times Recall(C)_k}{Precision(C)_k + Recall(C)_k}}\r\n$$\r\n\r\n<div style=\"text-align: justify\">\r\n\r\nwhere $n$ above denotes the number of categories in the cluster and $k$ is the number of predicted clusters. $Precision(C)_k$ and $Recall(C)_k$ are the 'partial' precision and recalls for each cluster. \r\n</div>\r\n\r\n\r\n## Installation and Use\r\n<div style=\"text-align: justify\">\r\nDownload the package from any terminal using:\r\n</div>\r\n\r\n```bash\r\npip install bcubed-metrics\r\n```\r\n<div style=\"text-align: justify\">\r\nTo use the B-Cubed class you need to import it and provide 2 dictionaries - one for the predicted clustering, and one for the ground truth clustering (actual labels):\r\n</div>\r\n\r\n```python\r\nfrom bcubed_metrics.bcubed import Bcubed\r\n\r\npredicted_clustering = [\r\n {'blue': 4, 'red': 2, 'green': 1},\r\n {'blue': 2, 'red': 2, 'green': 3},\r\n {'blue': 1, 'red': 5},\r\n {'blue': 1, 'red': 2, 'green': 3}\r\n ]\r\n\r\nground_truth_clustering = {'blue': 8, 'red': 11, 'green': 7}\r\n\r\nbcubed = Bcubed(predicted_clustering=predicted_clustering, ground_truth_clustering=ground_truth_clustering)\r\n\r\nmetrics = bcubed.get_metrics() # returns all metrics as dictionary\r\n\r\nbcubed.print_metrics() # prints all metrics\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A package to calculate BCUBED precision, recall, and F1-score for clustering evaluation.",
"version": "1.0.1",
"project_urls": {
"Homepage": "https://github.com/nezumiCodes/bcubed-metrics"
},
"split_keywords": [
"bcubed"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0b43b75c9908abcb520eb52c159268dfe628d57974c82d14f0f4914e86049683",
"md5": "af3abe05e656307bfdcae54d8fc8bacd",
"sha256": "c783d4f4c82b4550f6e2ec57d686495ce323df3275bb96e385cfcc8aa24e13dc"
},
"downloads": -1,
"filename": "bcubed_metrics-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "af3abe05e656307bfdcae54d8fc8bacd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 5209,
"upload_time": "2024-09-05T16:06:22",
"upload_time_iso_8601": "2024-09-05T16:06:22.742422Z",
"url": "https://files.pythonhosted.org/packages/0b/43/b75c9908abcb520eb52c159268dfe628d57974c82d14f0f4914e86049683/bcubed_metrics-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "032eb4a9579578ad73eb2350eb18fe422ea4f046975f499b7fc07749003a67cd",
"md5": "be3e68871998337b54cb1178ad33801f",
"sha256": "b844bbedf124789dcfa5bd47b6f4cd035a683ed807c8731bc6352105e5329319"
},
"downloads": -1,
"filename": "bcubed_metrics-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "be3e68871998337b54cb1178ad33801f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 4409,
"upload_time": "2024-09-05T16:06:24",
"upload_time_iso_8601": "2024-09-05T16:06:24.619622Z",
"url": "https://files.pythonhosted.org/packages/03/2e/b4a9579578ad73eb2350eb18fe422ea4f046975f499b7fc07749003a67cd/bcubed_metrics-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-05 16:06:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nezumiCodes",
"github_project": "bcubed-metrics",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bcubed-metrics"
}