chroma-vector-aggregator


Namechroma-vector-aggregator JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/vinerya/chroma_vector_aggregator
SummaryA package to aggregate embeddings in a Chroma vector store based on metadata columns.
upload_time2024-09-21 20:13:11
maintainerNone
docs_urlNone
authorMoudather Chelbi
requires_python>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Chroma Embeddings Aggregation Library

This Python library provides a suite of advanced methods for aggregating multiple embeddings associated with a single document or entity into a single representative embedding. It is designed to work with Chroma vector stores and is compatible with LangChain's Chroma integration.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
  - [Example: Simple Average Aggregation](#example-simple-average-aggregation)
- [Aggregation Methods](#aggregation-methods)
- [Parameters](#parameters)
- [Dependencies](#dependencies)
- [Contributing](#contributing)
- [License](#license)

## Features

- Multiple aggregation methods (average, weighted average, geometric mean, harmonic mean, centroid, PCA, etc.)
- Compatible with Chroma vector stores and LangChain
- Easy-to-use API for aggregating embeddings

## Installation

To install the package, you can use pip:

```bash
pip install chroma_vector_aggregator
```

## Usage

Here's an example demonstrating how to use the library to aggregate embeddings using simple averaging:

### Example: Simple Average Aggregation

```python
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import FakeEmbeddings
from langchain.schema import Document
from chroma_vector_aggregator import aggregate_embeddings

# Create a sample Chroma collection
embeddings = FakeEmbeddings(size=10)
documents = [
    Document(page_content="Test document 1", metadata={"id": "group1"}),
    Document(page_content="Test document 2", metadata={"id": "group1"}),
    Document(page_content="Test document 3", metadata={"id": "group2"}),
    Document(page_content="Test document 4", metadata={"id": "group2"}),
]
chroma_collection = Chroma.from_documents(documents, embeddings)

# Aggregate embeddings using simple averaging
aggregated_collection = aggregate_embeddings(
    chroma_collection=chroma_collection,
    column_name="id",
    method="average"
)

# Use the aggregated collection for similarity search
results = aggregated_collection.similarity_search("Test query", k=2)
```

## Aggregation Methods

- `average`: Compute the arithmetic mean of embeddings.
- `weighted_average`: Compute a weighted average of embeddings.
- `geometric_mean`: Compute the geometric mean across embeddings.
- `harmonic_mean`: Compute the harmonic mean across embeddings.
- `median`: Compute the element-wise median of embeddings.
- `trimmed_mean`: Compute the mean after trimming outliers.
- `centroid`: Use K-Means clustering to find the centroid of the embeddings.
- `pca`: Use Principal Component Analysis to reduce embeddings.
- `exemplar`: Select the embedding that best represents the group.
- `max_pooling`: Take the maximum value for each dimension across embeddings.
- `min_pooling`: Take the minimum value for each dimension across embeddings.
- `entropy_weighted_average`: Weight embeddings by their entropy.
- `attentive_pooling`: Use an attention mechanism to aggregate embeddings.
- `tukeys_biweight`: A robust method to down-weight outliers.

## Parameters

- `chroma_collection`: The Chroma collection to aggregate embeddings from.
- `column_name`: The metadata field by which to aggregate embeddings (e.g., 'id').
- `method`: The aggregation method to use.
- `weights` (optional): Weights for the `weighted_average` method.
- `trim_percentage` (optional): Fraction to trim from each end for `trimmed_mean`.

## Dependencies

- chromadb
- numpy
- scipy
- scikit-learn
- langchain

## Contributing

Contributions are welcome! Please feel free to submit a pull request.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vinerya/chroma_vector_aggregator",
    "name": "chroma-vector-aggregator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Moudather Chelbi",
    "author_email": "moudather.chelbi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/6f/bc/8b0a01a52d177cda98583bfa649504a4d7cf1dc1bce3a75bd53ea09d658f/chroma_vector_aggregator-0.1.0.tar.gz",
    "platform": null,
    "description": "# Chroma Embeddings Aggregation Library\n\nThis Python library provides a suite of advanced methods for aggregating multiple embeddings associated with a single document or entity into a single representative embedding. It is designed to work with Chroma vector stores and is compatible with LangChain's Chroma integration.\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Example: Simple Average Aggregation](#example-simple-average-aggregation)\n- [Aggregation Methods](#aggregation-methods)\n- [Parameters](#parameters)\n- [Dependencies](#dependencies)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Features\n\n- Multiple aggregation methods (average, weighted average, geometric mean, harmonic mean, centroid, PCA, etc.)\n- Compatible with Chroma vector stores and LangChain\n- Easy-to-use API for aggregating embeddings\n\n## Installation\n\nTo install the package, you can use pip:\n\n```bash\npip install chroma_vector_aggregator\n```\n\n## Usage\n\nHere's an example demonstrating how to use the library to aggregate embeddings using simple averaging:\n\n### Example: Simple Average Aggregation\n\n```python\nfrom langchain_community.vectorstores import Chroma\nfrom langchain_community.embeddings import FakeEmbeddings\nfrom langchain.schema import Document\nfrom chroma_vector_aggregator import aggregate_embeddings\n\n# Create a sample Chroma collection\nembeddings = FakeEmbeddings(size=10)\ndocuments = [\n    Document(page_content=\"Test document 1\", metadata={\"id\": \"group1\"}),\n    Document(page_content=\"Test document 2\", metadata={\"id\": \"group1\"}),\n    Document(page_content=\"Test document 3\", metadata={\"id\": \"group2\"}),\n    Document(page_content=\"Test document 4\", metadata={\"id\": \"group2\"}),\n]\nchroma_collection = Chroma.from_documents(documents, embeddings)\n\n# Aggregate embeddings using simple averaging\naggregated_collection = aggregate_embeddings(\n    chroma_collection=chroma_collection,\n    column_name=\"id\",\n    method=\"average\"\n)\n\n# Use the aggregated collection for similarity search\nresults = aggregated_collection.similarity_search(\"Test query\", k=2)\n```\n\n## Aggregation Methods\n\n- `average`: Compute the arithmetic mean of embeddings.\n- `weighted_average`: Compute a weighted average of embeddings.\n- `geometric_mean`: Compute the geometric mean across embeddings.\n- `harmonic_mean`: Compute the harmonic mean across embeddings.\n- `median`: Compute the element-wise median of embeddings.\n- `trimmed_mean`: Compute the mean after trimming outliers.\n- `centroid`: Use K-Means clustering to find the centroid of the embeddings.\n- `pca`: Use Principal Component Analysis to reduce embeddings.\n- `exemplar`: Select the embedding that best represents the group.\n- `max_pooling`: Take the maximum value for each dimension across embeddings.\n- `min_pooling`: Take the minimum value for each dimension across embeddings.\n- `entropy_weighted_average`: Weight embeddings by their entropy.\n- `attentive_pooling`: Use an attention mechanism to aggregate embeddings.\n- `tukeys_biweight`: A robust method to down-weight outliers.\n\n## Parameters\n\n- `chroma_collection`: The Chroma collection to aggregate embeddings from.\n- `column_name`: The metadata field by which to aggregate embeddings (e.g., 'id').\n- `method`: The aggregation method to use.\n- `weights` (optional): Weights for the `weighted_average` method.\n- `trim_percentage` (optional): Fraction to trim from each end for `trimmed_mean`.\n\n## Dependencies\n\n- chromadb\n- numpy\n- scipy\n- scikit-learn\n- langchain\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A package to aggregate embeddings in a Chroma vector store based on metadata columns.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/vinerya/chroma_vector_aggregator"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2a4783a2495ba4acc2eff18be1d1d0bffa464939886d0282f2cf2ae5c0d4971b",
                "md5": "ec107ebf220b44fae52f8f96ac397bc8",
                "sha256": "eb5b7a3ce0a2c291e1a895abdcaf259254d013485fee567e907e697901e15df0"
            },
            "downloads": -1,
            "filename": "chroma_vector_aggregator-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ec107ebf220b44fae52f8f96ac397bc8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7584,
            "upload_time": "2024-09-21T20:13:09",
            "upload_time_iso_8601": "2024-09-21T20:13:09.944275Z",
            "url": "https://files.pythonhosted.org/packages/2a/47/83a2495ba4acc2eff18be1d1d0bffa464939886d0282f2cf2ae5c0d4971b/chroma_vector_aggregator-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6fbc8b0a01a52d177cda98583bfa649504a4d7cf1dc1bce3a75bd53ea09d658f",
                "md5": "29452503829a23d987e5316b8a12b7db",
                "sha256": "888568a2fa98864d515e7393c6f09f24aca7b290087e75e3f4a45eac93601a87"
            },
            "downloads": -1,
            "filename": "chroma_vector_aggregator-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "29452503829a23d987e5316b8a12b7db",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6058,
            "upload_time": "2024-09-21T20:13:11",
            "upload_time_iso_8601": "2024-09-21T20:13:11.832827Z",
            "url": "https://files.pythonhosted.org/packages/6f/bc/8b0a01a52d177cda98583bfa649504a4d7cf1dc1bce3a75bd53ea09d658f/chroma_vector_aggregator-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-21 20:13:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vinerya",
    "github_project": "chroma_vector_aggregator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "chroma-vector-aggregator"
}
        
Elapsed time: 0.75919s