# SSEM (Semantic Similarity Based Evaluation Metrics)
![Generic badge](https://img.shields.io/badge/HuggingFace-NLP-yellow.svg) ![Generic badge](https://img.shields.io/badge/Python-V3.10-blue.svg) ![Generic badge](https://img.shields.io/badge/pip-V3-red.svg) ![Generic badge](https://img.shields.io/badge/Transformers-V4-orange.svg) ![Generic badge](https://img.shields.io/badge/Gensim-V4-blueviolet.svg) [![Downloads](https://static.pepy.tech/personalized-badge/ssem?period=total&units=none&left_color=grey&right_color=green&left_text=Downloads)](https://pepy.tech/project/ssem)
SSEM is a python library that provides evaluation metrics for natural language processing (NLP) text generation tasks with support of multiple languages. The library focuses on measuring the semantic similarity between generated text and reference text. It supports various distance metrics, such as cosine similarity, euclidean distances, and pearson correlation.
The library is built on top of the popular Hugging Face Transformers library and is compatible with any pre-trained transformer model. Additionally, it supports parallel processing for faster computation and offers multiple evaluation levels, such as sentence-level, token-level, and Latent Semantic Indexing (LSI) based similarity.
## Developed By
### [Nilesh Verma](https://nileshverma.com "Nilesh Verma")
## Features
- Compatible with any Hugging Face pre-trained transformer models.
- Multiple language support.
- Supports multiple distance metrics: cosine, euclidean, and Pearson correlation.
- Supports different levels of evaluation: sentence-level, token-level, and LSI (Latent Semantic Indexing).
- Supports parallel processing for faster computation.
- Customizable model embeddings.
## Installation
You can install the SSEM library using pip:
```
pip install ssem
```
## How to use SSEM
To use SSEM, you first need to import the library and create an instance of the `SemanticSimilarity` class. You can specify the pre-trained model you want to use, the distance metric, and any custom embeddings.
```python
from ssem import SemanticSimilarity
ssem = SemanticSimilarity(model_name='bert-base-multilingual-cased', metric='cosine',custom_embeddings=None)
```
Once you have created an instance, you can use the `evaluate()` method to calculate the similarity between the list of generated text and list of reference text. You can specify various options such as the number of parallel jobs, the evaluation level, and the output format.
```python
output_sentences = ['This is a generated sentence 1.','This is a generated sentence 2.']
reference_sentences = ['This is the reference sentence 1.','This is the reference sentence 2.']
similarity_score = ssem.evaluate(output_sentences, reference_sentences, n_jobs=1, level='sentence', output_format='mean')
```
The `evaluate()` method returns a similarity score, which can be a single float value (mean), a standard deviation value (std), or both (mean_std).
```python
print("Similarity score: ", similarity_score)
```
You can use this score to assess the quality of the generated text compared to the reference text.
### Parameters
- `model_name`: The name of the pre-trained transformer model to use. Default is `'bert-base-multilingual-cased'`.
- `metric`: The similarity metric to use. Options are `'cosine'`, `'euclidean'`, and `'pearson'`. Default is `'cosine'`.
- `custom_embeddings`: An optional numpy array containing custom embeddings. Default is `None`.
- `n_jobs`: The number of parallel jobs to use for processing. Default is `1`.
- `level`: The level of evaluation to perform. Options are `'sentence'`, `'token'`, and `'lsi'`. Default is `'sentence'`.
- `output_format`: The format of the output. Options are `'mean'`, `'std'`, and `'mean_std'`. Default is `'mean'`.
## License
SSEM is released under the MIT License.
## References
1. [Evaluation Measures for Text Summarization](https://www.researchgate.net/publication/220106310_Evaluation_Measures_for_Text_Summarization)
2. [BERTScore: Evaluating Text Generation with BERT](https://arxiv.org/abs/1904.09675)
3. [Semantic Similarity Based Evaluation for Abstractive News Summarization](https://aclanthology.org/2021.gem-1.3/)
4. [Evaluation of Semantic Answer Similarity Metrics](https://arxiv.org/abs/2206.12664)
### Please do STAR the repository, if it helped you in anyway.
More cool features will be added in future. Feel free to give suggestions, report bugs and contribute.
Raw data
{
"_id": null,
"home_page": "https://github.com/TechyNilesh/SSEM",
"name": "SSEM",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Semantic similarity,SSEM,Evaluation metrics,NLP",
"author": "Nilesh Verma",
"author_email": "me@nileshverma.com",
"download_url": "https://files.pythonhosted.org/packages/f8/0f/2354eff549437604de97f4f7155b9d03e345bcdb3023ad3dc85ed7ebce8c/SSEM-1.0.tar.gz",
"platform": null,
"description": "# SSEM (Semantic Similarity Based Evaluation Metrics)\r\n\r\n![Generic badge](https://img.shields.io/badge/HuggingFace-NLP-yellow.svg) ![Generic badge](https://img.shields.io/badge/Python-V3.10-blue.svg) ![Generic badge](https://img.shields.io/badge/pip-V3-red.svg) ![Generic badge](https://img.shields.io/badge/Transformers-V4-orange.svg) ![Generic badge](https://img.shields.io/badge/Gensim-V4-blueviolet.svg) [![Downloads](https://static.pepy.tech/personalized-badge/ssem?period=total&units=none&left_color=grey&right_color=green&left_text=Downloads)](https://pepy.tech/project/ssem)\r\n\r\nSSEM is a python library that provides evaluation metrics for natural language processing (NLP) text generation tasks with support of multiple languages. The library focuses on measuring the semantic similarity between generated text and reference text. It supports various distance metrics, such as cosine similarity, euclidean distances, and pearson correlation.\r\n\r\nThe library is built on top of the popular Hugging Face Transformers library and is compatible with any pre-trained transformer model. Additionally, it supports parallel processing for faster computation and offers multiple evaluation levels, such as sentence-level, token-level, and Latent Semantic Indexing (LSI) based similarity.\r\n\r\n## Developed By\r\n\r\n### [Nilesh Verma](https://nileshverma.com \"Nilesh Verma\")\r\n\r\n## Features\r\n\r\n- Compatible with any Hugging Face pre-trained transformer models.\r\n- Multiple language support.\r\n- Supports multiple distance metrics: cosine, euclidean, and Pearson correlation.\r\n- Supports different levels of evaluation: sentence-level, token-level, and LSI (Latent Semantic Indexing).\r\n- Supports parallel processing for faster computation.\r\n- Customizable model embeddings.\r\n\r\n## Installation\r\n\r\nYou can install the SSEM library using pip:\r\n\r\n```\r\npip install ssem\r\n```\r\n\r\n## How to use SSEM\r\n\r\nTo use SSEM, you first need to import the library and create an instance of the `SemanticSimilarity` class. You can specify the pre-trained model you want to use, the distance metric, and any custom embeddings.\r\n\r\n```python\r\nfrom ssem import SemanticSimilarity\r\n\r\nssem = SemanticSimilarity(model_name='bert-base-multilingual-cased', metric='cosine',custom_embeddings=None)\r\n```\r\n\r\nOnce you have created an instance, you can use the `evaluate()` method to calculate the similarity between the list of generated text and list of reference text. You can specify various options such as the number of parallel jobs, the evaluation level, and the output format.\r\n\r\n```python\r\noutput_sentences = ['This is a generated sentence 1.','This is a generated sentence 2.']\r\nreference_sentences = ['This is the reference sentence 1.','This is the reference sentence 2.']\r\n\r\nsimilarity_score = ssem.evaluate(output_sentences, reference_sentences, n_jobs=1, level='sentence', output_format='mean')\r\n```\r\n\r\nThe `evaluate()` method returns a similarity score, which can be a single float value (mean), a standard deviation value (std), or both (mean_std). \r\n\r\n```python\r\nprint(\"Similarity score: \", similarity_score)\r\n```\r\n\r\nYou can use this score to assess the quality of the generated text compared to the reference text.\r\n\r\n### Parameters\r\n\r\n- `model_name`: The name of the pre-trained transformer model to use. Default is `'bert-base-multilingual-cased'`.\r\n- `metric`: The similarity metric to use. Options are `'cosine'`, `'euclidean'`, and `'pearson'`. Default is `'cosine'`.\r\n- `custom_embeddings`: An optional numpy array containing custom embeddings. Default is `None`.\r\n- `n_jobs`: The number of parallel jobs to use for processing. Default is `1`.\r\n- `level`: The level of evaluation to perform. Options are `'sentence'`, `'token'`, and `'lsi'`. Default is `'sentence'`.\r\n- `output_format`: The format of the output. Options are `'mean'`, `'std'`, and `'mean_std'`. Default is `'mean'`.\r\n\r\n## License\r\n\r\nSSEM is released under the MIT License.\r\n\r\n\r\n## References\r\n\r\n1. [Evaluation Measures for Text Summarization](https://www.researchgate.net/publication/220106310_Evaluation_Measures_for_Text_Summarization)\r\n2. [BERTScore: Evaluating Text Generation with BERT](https://arxiv.org/abs/1904.09675)\r\n3. [Semantic Similarity Based Evaluation for Abstractive News Summarization](https://aclanthology.org/2021.gem-1.3/)\r\n4. [Evaluation of Semantic Answer Similarity Metrics](https://arxiv.org/abs/2206.12664)\r\n\r\n### Please do STAR the repository, if it helped you in anyway.\r\n\r\nMore cool features will be added in future. Feel free to give suggestions, report bugs and contribute.\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "SSEM is a semantic similarity-based evaluation library for natural language processing (NLP) text generation tasks. It supports various similarity metrics and evaluation levels, and is compatible with any Hugging Face pre-trained transformer model.",
"version": "1.0",
"split_keywords": [
"semantic similarity",
"ssem",
"evaluation metrics",
"nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f80f2354eff549437604de97f4f7155b9d03e345bcdb3023ad3dc85ed7ebce8c",
"md5": "2406f289db09d41f4d1dbfad84aefd47",
"sha256": "e3ed12b11ab19392bc579534bdd969f3276db4961425e99336e7559fc6d0ab45"
},
"downloads": -1,
"filename": "SSEM-1.0.tar.gz",
"has_sig": false,
"md5_digest": "2406f289db09d41f4d1dbfad84aefd47",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5380,
"upload_time": "2023-04-07T09:12:56",
"upload_time_iso_8601": "2023-04-07T09:12:56.818468Z",
"url": "https://files.pythonhosted.org/packages/f8/0f/2354eff549437604de97f4f7155b9d03e345bcdb3023ad3dc85ed7ebce8c/SSEM-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-07 09:12:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "TechyNilesh",
"github_project": "SSEM",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "gensim",
"specs": [
[
"==",
"4.3.1"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.23.0"
]
]
},
{
"name": "scikit_learn",
"specs": [
[
"==",
"1.2.2"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.10.1"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.27.4"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.65.0"
]
]
}
],
"lcname": "ssem"
}