# Embeddings Evaluator
A Python package for analyzing and comparing embedding models through pairwise cosine similarity distributions.
## Features
- Pairwise cosine similarity distribution analysis
- Statistical measures:
* Mean (μ)
* Standard deviation (σ)
* Median (m)
* Peak location and amplitude
- Multi-model comparison visualization
## Installation
```bash
pip install -r requirements.txt
```
## Usage
```python
import numpy as np
from embeddings_evaluator import plot_model_comparison
from embeddings_evaluator.comparison import save_comparison_plot
# Load your embeddings into a dictionary
embeddings_dict = {
"Model A": embeddings_a, # numpy array of shape (n_docs, embedding_dim)
"Model B": embeddings_b
}
# Generate comparison plot
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'comparison.png')
```
## Example with Faiss Indices
```python
import faiss
import numpy as np
from embeddings_evaluator import plot_model_comparison
# Load embeddings from faiss indices
def load_faiss_embeddings(index_path):
index = faiss.read_index(index_path)
if isinstance(index, faiss.IndexFlatL2):
num_vectors = index.ntotal
dimension = index.d
embeddings = np.zeros((num_vectors, dimension), dtype=np.float32)
for i in range(num_vectors):
embeddings[i] = index.reconstruct(i)
return embeddings
raise ValueError("Unsupported index type")
# Load multiple models
embeddings_dict = {}
for size in [250, 500, 1000, 2000, 4000]:
embeddings = load_faiss_embeddings(f"faiss_embeddings/{size}/index.faiss")
# Normalize for cosine similarity
embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, np.newaxis]
embeddings_dict[f"Model {size}"] = embeddings
# Generate visualization
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'model_comparison.png')
```
## Output
The tool provides:
1. Statistical Measures for each model:
- Mean cosine similarity (μ)
- Standard deviation (σ)
- Median (m)
- Peak location and amplitude
2. Visualization:
- Overlaid probability density histograms
- Statistical annotations
- Peak coordinates
- Vertical lines at mean values
- [0,1] bounded cosine similarity range
## Requirements
- numpy
- pandas
- plotly
- scipy
- faiss-cpu (for faiss index support)
Raw data
{
"_id": null,
"home_page": "https://github.com/vinerya/embeddings_evaluator",
"name": "embeddings-evaluator",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "embeddings, similarity, evaluation, faiss, visualization",
"author": "Moudather Chelbi",
"author_email": "moudather.chelbi@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b6/a3/784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3/embeddings_evaluator-1.0.0.tar.gz",
"platform": null,
"description": "# Embeddings Evaluator\n\nA Python package for analyzing and comparing embedding models through pairwise cosine similarity distributions.\n\n## Features\n\n- Pairwise cosine similarity distribution analysis\n- Statistical measures:\n * Mean (\u03bc)\n * Standard deviation (\u03c3)\n * Median (m)\n * Peak location and amplitude\n- Multi-model comparison visualization\n\n## Installation\n\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n\n```python\nimport numpy as np\nfrom embeddings_evaluator import plot_model_comparison\nfrom embeddings_evaluator.comparison import save_comparison_plot\n\n# Load your embeddings into a dictionary\nembeddings_dict = {\n \"Model A\": embeddings_a, # numpy array of shape (n_docs, embedding_dim)\n \"Model B\": embeddings_b\n}\n\n# Generate comparison plot\nfig = plot_model_comparison(embeddings_dict)\nsave_comparison_plot(fig, 'comparison.png')\n```\n\n## Example with Faiss Indices\n\n```python\nimport faiss\nimport numpy as np\nfrom embeddings_evaluator import plot_model_comparison\n\n# Load embeddings from faiss indices\ndef load_faiss_embeddings(index_path):\n index = faiss.read_index(index_path)\n if isinstance(index, faiss.IndexFlatL2):\n num_vectors = index.ntotal\n dimension = index.d\n embeddings = np.zeros((num_vectors, dimension), dtype=np.float32)\n for i in range(num_vectors):\n embeddings[i] = index.reconstruct(i)\n return embeddings\n raise ValueError(\"Unsupported index type\")\n\n# Load multiple models\nembeddings_dict = {}\nfor size in [250, 500, 1000, 2000, 4000]:\n embeddings = load_faiss_embeddings(f\"faiss_embeddings/{size}/index.faiss\")\n # Normalize for cosine similarity\n embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, np.newaxis]\n embeddings_dict[f\"Model {size}\"] = embeddings\n\n# Generate visualization\nfig = plot_model_comparison(embeddings_dict)\nsave_comparison_plot(fig, 'model_comparison.png')\n```\n\n## Output\n\nThe tool provides:\n\n1. Statistical Measures for each model:\n- Mean cosine similarity (\u03bc)\n- Standard deviation (\u03c3)\n- Median (m)\n- Peak location and amplitude\n\n2. Visualization:\n- Overlaid probability density histograms\n- Statistical annotations\n- Peak coordinates\n- Vertical lines at mean values\n- [0,1] bounded cosine similarity range\n\n## Requirements\n\n- numpy\n- pandas\n- plotly\n- scipy\n- faiss-cpu (for faiss index support)\n",
"bugtrack_url": null,
"license": null,
"summary": "Tool for analyzing and comparing embedding models through pairwise cosine similarity distributions",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/vinerya/embeddings_evaluator"
},
"split_keywords": [
"embeddings",
" similarity",
" evaluation",
" faiss",
" visualization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0806b8948bc7f7fe7e6505a2e6e3b39d37fd3eaa9c1e26900659ca7308a15c67",
"md5": "79d34843742eec0b6fd89204369a99c3",
"sha256": "f0ab83a1de09b8c7720eddc503dcc2820bbe031cd40776d97a9ce7f13c861838"
},
"downloads": -1,
"filename": "embeddings_evaluator-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "79d34843742eec0b6fd89204369a99c3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 6635,
"upload_time": "2024-10-24T21:47:26",
"upload_time_iso_8601": "2024-10-24T21:47:26.210535Z",
"url": "https://files.pythonhosted.org/packages/08/06/b8948bc7f7fe7e6505a2e6e3b39d37fd3eaa9c1e26900659ca7308a15c67/embeddings_evaluator-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b6a3784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3",
"md5": "6ce77820ac8e1afb449a01a3e717a076",
"sha256": "3222163f40b06b8c13284a48d6fb6a0e1a374c2992a3a948da391aac6ca1c01e"
},
"downloads": -1,
"filename": "embeddings_evaluator-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "6ce77820ac8e1afb449a01a3e717a076",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 6182,
"upload_time": "2024-10-24T21:47:28",
"upload_time_iso_8601": "2024-10-24T21:47:28.177957Z",
"url": "https://files.pythonhosted.org/packages/b6/a3/784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3/embeddings_evaluator-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-24 21:47:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "vinerya",
"github_project": "embeddings_evaluator",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
"==",
"1.23.5"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"1.5.3"
]
]
},
{
"name": "plotly",
"specs": [
[
"==",
"5.14.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.10.1"
]
]
},
{
"name": "faiss-cpu",
"specs": []
},
{
"name": "pickle5",
"specs": []
}
],
"lcname": "embeddings-evaluator"
}