embeddings-evaluator


Nameembeddings-evaluator JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/vinerya/embeddings_evaluator
SummaryTool for analyzing and comparing embedding models through pairwise cosine similarity distributions
upload_time2024-10-24 21:47:28
maintainerNone
docs_urlNone
authorMoudather Chelbi
requires_python>=3.8
licenseNone
keywords embeddings similarity evaluation faiss visualization
VCS
bugtrack_url
requirements numpy pandas plotly scipy faiss-cpu pickle5
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Embeddings Evaluator

A Python package for analyzing and comparing embedding models through pairwise cosine similarity distributions.

## Features

- Pairwise cosine similarity distribution analysis
- Statistical measures:
  * Mean (μ)
  * Standard deviation (σ)
  * Median (m)
  * Peak location and amplitude
- Multi-model comparison visualization

## Installation

```bash
pip install -r requirements.txt
```

## Usage

```python
import numpy as np
from embeddings_evaluator import plot_model_comparison
from embeddings_evaluator.comparison import save_comparison_plot

# Load your embeddings into a dictionary
embeddings_dict = {
    "Model A": embeddings_a,  # numpy array of shape (n_docs, embedding_dim)
    "Model B": embeddings_b
}

# Generate comparison plot
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'comparison.png')
```

## Example with Faiss Indices

```python
import faiss
import numpy as np
from embeddings_evaluator import plot_model_comparison

# Load embeddings from faiss indices
def load_faiss_embeddings(index_path):
    index = faiss.read_index(index_path)
    if isinstance(index, faiss.IndexFlatL2):
        num_vectors = index.ntotal
        dimension = index.d
        embeddings = np.zeros((num_vectors, dimension), dtype=np.float32)
        for i in range(num_vectors):
            embeddings[i] = index.reconstruct(i)
        return embeddings
    raise ValueError("Unsupported index type")

# Load multiple models
embeddings_dict = {}
for size in [250, 500, 1000, 2000, 4000]:
    embeddings = load_faiss_embeddings(f"faiss_embeddings/{size}/index.faiss")
    # Normalize for cosine similarity
    embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, np.newaxis]
    embeddings_dict[f"Model {size}"] = embeddings

# Generate visualization
fig = plot_model_comparison(embeddings_dict)
save_comparison_plot(fig, 'model_comparison.png')
```

## Output

The tool provides:

1. Statistical Measures for each model:
- Mean cosine similarity (μ)
- Standard deviation (σ)
- Median (m)
- Peak location and amplitude

2. Visualization:
- Overlaid probability density histograms
- Statistical annotations
- Peak coordinates
- Vertical lines at mean values
- [0,1] bounded cosine similarity range

## Requirements

- numpy
- pandas
- plotly
- scipy
- faiss-cpu (for faiss index support)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vinerya/embeddings_evaluator",
    "name": "embeddings-evaluator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "embeddings, similarity, evaluation, faiss, visualization",
    "author": "Moudather Chelbi",
    "author_email": "moudather.chelbi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b6/a3/784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3/embeddings_evaluator-1.0.0.tar.gz",
    "platform": null,
    "description": "# Embeddings Evaluator\n\nA Python package for analyzing and comparing embedding models through pairwise cosine similarity distributions.\n\n## Features\n\n- Pairwise cosine similarity distribution analysis\n- Statistical measures:\n  * Mean (\u03bc)\n  * Standard deviation (\u03c3)\n  * Median (m)\n  * Peak location and amplitude\n- Multi-model comparison visualization\n\n## Installation\n\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n\n```python\nimport numpy as np\nfrom embeddings_evaluator import plot_model_comparison\nfrom embeddings_evaluator.comparison import save_comparison_plot\n\n# Load your embeddings into a dictionary\nembeddings_dict = {\n    \"Model A\": embeddings_a,  # numpy array of shape (n_docs, embedding_dim)\n    \"Model B\": embeddings_b\n}\n\n# Generate comparison plot\nfig = plot_model_comparison(embeddings_dict)\nsave_comparison_plot(fig, 'comparison.png')\n```\n\n## Example with Faiss Indices\n\n```python\nimport faiss\nimport numpy as np\nfrom embeddings_evaluator import plot_model_comparison\n\n# Load embeddings from faiss indices\ndef load_faiss_embeddings(index_path):\n    index = faiss.read_index(index_path)\n    if isinstance(index, faiss.IndexFlatL2):\n        num_vectors = index.ntotal\n        dimension = index.d\n        embeddings = np.zeros((num_vectors, dimension), dtype=np.float32)\n        for i in range(num_vectors):\n            embeddings[i] = index.reconstruct(i)\n        return embeddings\n    raise ValueError(\"Unsupported index type\")\n\n# Load multiple models\nembeddings_dict = {}\nfor size in [250, 500, 1000, 2000, 4000]:\n    embeddings = load_faiss_embeddings(f\"faiss_embeddings/{size}/index.faiss\")\n    # Normalize for cosine similarity\n    embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, np.newaxis]\n    embeddings_dict[f\"Model {size}\"] = embeddings\n\n# Generate visualization\nfig = plot_model_comparison(embeddings_dict)\nsave_comparison_plot(fig, 'model_comparison.png')\n```\n\n## Output\n\nThe tool provides:\n\n1. Statistical Measures for each model:\n- Mean cosine similarity (\u03bc)\n- Standard deviation (\u03c3)\n- Median (m)\n- Peak location and amplitude\n\n2. Visualization:\n- Overlaid probability density histograms\n- Statistical annotations\n- Peak coordinates\n- Vertical lines at mean values\n- [0,1] bounded cosine similarity range\n\n## Requirements\n\n- numpy\n- pandas\n- plotly\n- scipy\n- faiss-cpu (for faiss index support)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Tool for analyzing and comparing embedding models through pairwise cosine similarity distributions",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/vinerya/embeddings_evaluator"
    },
    "split_keywords": [
        "embeddings",
        " similarity",
        " evaluation",
        " faiss",
        " visualization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0806b8948bc7f7fe7e6505a2e6e3b39d37fd3eaa9c1e26900659ca7308a15c67",
                "md5": "79d34843742eec0b6fd89204369a99c3",
                "sha256": "f0ab83a1de09b8c7720eddc503dcc2820bbe031cd40776d97a9ce7f13c861838"
            },
            "downloads": -1,
            "filename": "embeddings_evaluator-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "79d34843742eec0b6fd89204369a99c3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 6635,
            "upload_time": "2024-10-24T21:47:26",
            "upload_time_iso_8601": "2024-10-24T21:47:26.210535Z",
            "url": "https://files.pythonhosted.org/packages/08/06/b8948bc7f7fe7e6505a2e6e3b39d37fd3eaa9c1e26900659ca7308a15c67/embeddings_evaluator-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b6a3784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3",
                "md5": "6ce77820ac8e1afb449a01a3e717a076",
                "sha256": "3222163f40b06b8c13284a48d6fb6a0e1a374c2992a3a948da391aac6ca1c01e"
            },
            "downloads": -1,
            "filename": "embeddings_evaluator-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6ce77820ac8e1afb449a01a3e717a076",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6182,
            "upload_time": "2024-10-24T21:47:28",
            "upload_time_iso_8601": "2024-10-24T21:47:28.177957Z",
            "url": "https://files.pythonhosted.org/packages/b6/a3/784a3accb865d0288ca9b4618dc9f630cb6212f448e76092bc037d076df3/embeddings_evaluator-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-24 21:47:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vinerya",
    "github_project": "embeddings_evaluator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.23.5"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "1.5.3"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    "==",
                    "5.14.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.10.1"
                ]
            ]
        },
        {
            "name": "faiss-cpu",
            "specs": []
        },
        {
            "name": "pickle5",
            "specs": []
        }
    ],
    "lcname": "embeddings-evaluator"
}
        
Elapsed time: 1.30753s