voyage-embedders-haystack


Namevoyage-embedders-haystack JSON
Version 1.7.0 PyPI version JSON
download
home_pageNone
SummaryHaystack component to embed strings and Documents using VoyageAI Embedding models.
upload_time2025-10-22 06:17:44
maintainerAshwin Mathur
docs_urlNone
authorAshwin Mathur
requires_python>=3.9
licenseNone
keywords haystack voyageai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI](https://img.shields.io/pypi/v/voyage-embedders-haystack)](https://pypi.org/project/voyage-embedders-haystack/)
![PyPI - Downloads](https://img.shields.io/pypi/dm/voyage-embedders-haystack?color=blue&logo=pypi&logoColor=gold)
[![GitHub](https://img.shields.io/github/license/awinml/voyage-embedders-haystack?color=green)](LICENSE)
[![Actions status](https://github.com/awinml/voyage-embedders-haystack/workflows/Test/badge.svg)](https://github.com/awinml/voyage-embedders-haystack/actions)
[![Coverage Status](https://coveralls.io/repos/github/awinml/voyage-embedders-haystack/badge.svg?branch=main)](https://coveralls.io/github/awinml/voyage-embedders-haystack?branch=main)  
[![Types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Code Style - Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

<h1 align="center"> <a href="https://github.com/awinml/voyage-embedders-haystack"> Voyage Embedders and Rankers - Haystack </a> </h1>

Custom components for [Haystack](https://github.com/deepset-ai/haystack) for creating embeddings and reranking documents using the [Voyage Models](https://voyageai.com/).

Voyage’s embedding models are state-of-the-art in retrieval accuracy. These models outperform top performing embedding models like `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb).

#### What's New

- **[v1.5.0 - 22/01/25]:**

  - The new `VoyageRanker` component can be used to rerank documents using the `Voyage Reranker` models.
  - Matryoshka Embeddings and Quantized Embeddings can now be created using the `output_dimension` and `output_dtype` parameters.

- **[v1.4.0 - 24/07/24]:**

  - The maximum timeout and number of retries made by the Client can now be set for the embedders using the `timeout` and `max_retries` parameters.

- **[v1.3.0 - 18/03/24]:**

  - **Breaking Change:** The import path for the embedders has been changed to `haystack_integrations.components.embedders.voyage_embedders`.
    Please replace all instances of `from voyage_embedders.voyage_document_embedder import VoyageDocumentEmbedder` and `from voyage_embedders.voyage_text_embedder import VoyageTextEmbedder` with  
    `from haystack_integrations.components.embedders.voyage_embedders import VoyageDocumentEmbedder, VoyageTextEmbedder`.
  - The embedders now use the Haystack `Secret` API for authentication. For more information please see the [Secret Management Documentation](https://docs.haystack.deepset.ai/docs/secret-management).

- **[v1.2.0 - 02/02/24]:**

  - **Breaking Change:** `VoyageDocumentEmbedder` and `VoyageTextEmbedder` now accept the `model` parameter instead of `model_name`.
  - The embedders have been use the new `voyageai.Client.embed()` method instead of the deprecated `get_embedding` and `get_embeddings` methods of the global namespace.
  - Support for the new `truncate` parameter has been added.
  - Default embedding model has been changed to "voyage-2" from the deprecated "voyage-01".
  - The embedders now return the total number of tokens used as part of the `"total_tokens"` in the metadata.

- **[v1.1.0 - 13/12/23]:** Added support for `input_type` parameter in `VoyageTextEmbedder` and `VoyageDocument Embedder`.

- **[v1.0.0 - 21/11/23]:** Added `VoyageTextEmbedder` and `VoyageDocument Embedder` to embed strings and documents.

## Installation

```bash
pip install voyage-embedders-haystack
```

## Usage

You can use Voyage Embedding models with two components: [VoyageTextEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/voyage_embedders/voyage_text_embedder.py) and [VoyageDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/voyage_embedders/voyage_document_embedder.py).

To create semantic embeddings for documents, use `VoyageDocumentEmbedder` in your indexing pipeline. For generating embeddings for queries, use `VoyageTextEmbedder`.

The Voyage Reranker models can be used with the [VoyageRanker](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/rankers/voyage/ranker.py) component.

Once you've selected the suitable component for your specific use case, initialize the component with the model name and VoyageAI API key. You can also
set the environment variable `VOYAGE_API_KEY` instead of passing the API key as an argument.
To get an API key, please see the [Voyage AI website.](https://www.voyageai.com/)

Information about the supported models, can be found on the [Voyage AI Documentation.](https://docs.voyageai.com/)

## Example

You can find all the examples in the [`examples`](https://github.com/awinml/voyage-embedders-haystack/tree/main/examples) folder.

Below is the example Semantic Search pipeline that uses the [Simple Wikipedia](https://huggingface.co/datasets/pszemraj/simple_wikipedia) Dataset from HuggingFace.

Load the dataset:

```python
# Install HuggingFace Datasets using "pip install datasets"
from datasets import load_dataset
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.dataclasses import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

# Import Voyage Embedders
from haystack_integrations.components.embedders.voyage_embedders import VoyageDocumentEmbedder, VoyageTextEmbedder

# Load first 100 rows of the Simple Wikipedia Dataset from HuggingFace
dataset = load_dataset("pszemraj/simple_wikipedia", split="validation[:100]")

docs = [
    Document(
        content=doc["text"],
        meta={
            "title": doc["title"],
            "url": doc["url"],
        },
    )
    for doc in dataset
]
```

Index the documents to the `InMemoryDocumentStore` using the `VoyageDocumentEmbedder` and `DocumentWriter`:

```python
doc_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
retriever = InMemoryEmbeddingRetriever(document_store=doc_store)
doc_writer = DocumentWriter(document_store=doc_store)

doc_embedder = VoyageDocumentEmbedder(
    model="voyage-2",
    input_type="document",
)
text_embedder = VoyageTextEmbedder(model="voyage-2", input_type="query")

# Indexing Pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=doc_embedder, name="DocEmbedder")
indexing_pipeline.add_component(instance=doc_writer, name="DocWriter")
indexing_pipeline.connect("DocEmbedder", "DocWriter")

indexing_pipeline.run({"DocEmbedder": {"documents": docs}})

print(f"Number of documents in Document Store: {len(doc_store.filter_documents())}")
print(f"First Document: {doc_store.filter_documents()[0]}")
print(f"Embedding of first Document: {doc_store.filter_documents()[0].embedding}")
```

Query the Semantic Search Pipeline using the `InMemoryEmbeddingRetriever` and `VoyageTextEmbedder`:

```python
text_embedder = VoyageTextEmbedder(model="voyage-2", input_type="query")

# Query Pipeline
query_pipeline = Pipeline()
query_pipeline.add_component(instance=text_embedder, name="TextEmbedder")
query_pipeline.add_component(instance=retriever, name="Retriever")
query_pipeline.connect("TextEmbedder.embedding", "Retriever.query_embedding")

# Search
results = query_pipeline.run({"TextEmbedder": {"text": "Which year did the Joker movie release?"}})

# Print text from top result
top_result = results["Retriever"]["documents"][0].content
print("The top search result is:")
print(top_result)
```

## Contributing

We welcome contributions from the community! Please take a look at our [contributing guide](CONTRIBUTING.md) for more details on how to get started.

Pull requests are welcome. For major changes, please open an issue first to discuss the proposed changes.

## License

`voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE).

Maintained by [Ashwin Mathur](https://github.com/awinml).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "voyage-embedders-haystack",
    "maintainer": "Ashwin Mathur",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "Haystack, VoyageAI",
    "author": "Ashwin Mathur",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/1a/48/6c1f7b1845fdd851e1d2139a4efc5e57fe6fa596533fccc399476e813ce1/voyage_embedders_haystack-1.7.0.tar.gz",
    "platform": null,
    "description": "[![PyPI](https://img.shields.io/pypi/v/voyage-embedders-haystack)](https://pypi.org/project/voyage-embedders-haystack/)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/voyage-embedders-haystack?color=blue&logo=pypi&logoColor=gold)\n[![GitHub](https://img.shields.io/github/license/awinml/voyage-embedders-haystack?color=green)](LICENSE)\n[![Actions status](https://github.com/awinml/voyage-embedders-haystack/workflows/Test/badge.svg)](https://github.com/awinml/voyage-embedders-haystack/actions)\n[![Coverage Status](https://coveralls.io/repos/github/awinml/voyage-embedders-haystack/badge.svg?branch=main)](https://coveralls.io/github/awinml/voyage-embedders-haystack?branch=main)  \n[![Types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Code Style - Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n<h1 align=\"center\"> <a href=\"https://github.com/awinml/voyage-embedders-haystack\"> Voyage Embedders and Rankers - Haystack </a> </h1>\n\nCustom components for [Haystack](https://github.com/deepset-ai/haystack) for creating embeddings and reranking documents using the [Voyage Models](https://voyageai.com/).\n\nVoyage\u2019s embedding models are state-of-the-art in retrieval accuracy. These models outperform top performing embedding models like `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb).\n\n#### What's New\n\n- **[v1.5.0 - 22/01/25]:**\n\n  - The new `VoyageRanker` component can be used to rerank documents using the `Voyage Reranker` models.\n  - Matryoshka Embeddings and Quantized Embeddings can now be created using the `output_dimension` and `output_dtype` parameters.\n\n- **[v1.4.0 - 24/07/24]:**\n\n  - The maximum timeout and number of retries made by the Client can now be set for the embedders using the `timeout` and `max_retries` parameters.\n\n- **[v1.3.0 - 18/03/24]:**\n\n  - **Breaking Change:** The import path for the embedders has been changed to `haystack_integrations.components.embedders.voyage_embedders`.\n    Please replace all instances of `from voyage_embedders.voyage_document_embedder import VoyageDocumentEmbedder` and `from voyage_embedders.voyage_text_embedder import VoyageTextEmbedder` with  \n    `from haystack_integrations.components.embedders.voyage_embedders import VoyageDocumentEmbedder, VoyageTextEmbedder`.\n  - The embedders now use the Haystack `Secret` API for authentication. For more information please see the [Secret Management Documentation](https://docs.haystack.deepset.ai/docs/secret-management).\n\n- **[v1.2.0 - 02/02/24]:**\n\n  - **Breaking Change:** `VoyageDocumentEmbedder` and `VoyageTextEmbedder` now accept the `model` parameter instead of `model_name`.\n  - The embedders have been use the new `voyageai.Client.embed()` method instead of the deprecated `get_embedding` and `get_embeddings` methods of the global namespace.\n  - Support for the new `truncate` parameter has been added.\n  - Default embedding model has been changed to \"voyage-2\" from the deprecated \"voyage-01\".\n  - The embedders now return the total number of tokens used as part of the `\"total_tokens\"` in the metadata.\n\n- **[v1.1.0 - 13/12/23]:** Added support for `input_type` parameter in `VoyageTextEmbedder` and `VoyageDocument Embedder`.\n\n- **[v1.0.0 - 21/11/23]:** Added `VoyageTextEmbedder` and `VoyageDocument Embedder` to embed strings and documents.\n\n## Installation\n\n```bash\npip install voyage-embedders-haystack\n```\n\n## Usage\n\nYou can use Voyage Embedding models with two components: [VoyageTextEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/voyage_embedders/voyage_text_embedder.py) and [VoyageDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/voyage_embedders/voyage_document_embedder.py).\n\nTo create semantic embeddings for documents, use `VoyageDocumentEmbedder` in your indexing pipeline. For generating embeddings for queries, use `VoyageTextEmbedder`.\n\nThe Voyage Reranker models can be used with the [VoyageRanker](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/rankers/voyage/ranker.py) component.\n\nOnce you've selected the suitable component for your specific use case, initialize the component with the model name and VoyageAI API key. You can also\nset the environment variable `VOYAGE_API_KEY` instead of passing the API key as an argument.\nTo get an API key, please see the [Voyage AI website.](https://www.voyageai.com/)\n\nInformation about the supported models, can be found on the [Voyage AI Documentation.](https://docs.voyageai.com/)\n\n## Example\n\nYou can find all the examples in the [`examples`](https://github.com/awinml/voyage-embedders-haystack/tree/main/examples) folder.\n\nBelow is the example Semantic Search pipeline that uses the [Simple Wikipedia](https://huggingface.co/datasets/pszemraj/simple_wikipedia) Dataset from HuggingFace.\n\nLoad the dataset:\n\n```python\n# Install HuggingFace Datasets using \"pip install datasets\"\nfrom datasets import load_dataset\nfrom haystack import Pipeline\nfrom haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\nfrom haystack.components.writers import DocumentWriter\nfrom haystack.dataclasses import Document\nfrom haystack.document_stores.in_memory import InMemoryDocumentStore\n\n# Import Voyage Embedders\nfrom haystack_integrations.components.embedders.voyage_embedders import VoyageDocumentEmbedder, VoyageTextEmbedder\n\n# Load first 100 rows of the Simple Wikipedia Dataset from HuggingFace\ndataset = load_dataset(\"pszemraj/simple_wikipedia\", split=\"validation[:100]\")\n\ndocs = [\n    Document(\n        content=doc[\"text\"],\n        meta={\n            \"title\": doc[\"title\"],\n            \"url\": doc[\"url\"],\n        },\n    )\n    for doc in dataset\n]\n```\n\nIndex the documents to the `InMemoryDocumentStore` using the `VoyageDocumentEmbedder` and `DocumentWriter`:\n\n```python\ndoc_store = InMemoryDocumentStore(embedding_similarity_function=\"cosine\")\nretriever = InMemoryEmbeddingRetriever(document_store=doc_store)\ndoc_writer = DocumentWriter(document_store=doc_store)\n\ndoc_embedder = VoyageDocumentEmbedder(\n    model=\"voyage-2\",\n    input_type=\"document\",\n)\ntext_embedder = VoyageTextEmbedder(model=\"voyage-2\", input_type=\"query\")\n\n# Indexing Pipeline\nindexing_pipeline = Pipeline()\nindexing_pipeline.add_component(instance=doc_embedder, name=\"DocEmbedder\")\nindexing_pipeline.add_component(instance=doc_writer, name=\"DocWriter\")\nindexing_pipeline.connect(\"DocEmbedder\", \"DocWriter\")\n\nindexing_pipeline.run({\"DocEmbedder\": {\"documents\": docs}})\n\nprint(f\"Number of documents in Document Store: {len(doc_store.filter_documents())}\")\nprint(f\"First Document: {doc_store.filter_documents()[0]}\")\nprint(f\"Embedding of first Document: {doc_store.filter_documents()[0].embedding}\")\n```\n\nQuery the Semantic Search Pipeline using the `InMemoryEmbeddingRetriever` and `VoyageTextEmbedder`:\n\n```python\ntext_embedder = VoyageTextEmbedder(model=\"voyage-2\", input_type=\"query\")\n\n# Query Pipeline\nquery_pipeline = Pipeline()\nquery_pipeline.add_component(instance=text_embedder, name=\"TextEmbedder\")\nquery_pipeline.add_component(instance=retriever, name=\"Retriever\")\nquery_pipeline.connect(\"TextEmbedder.embedding\", \"Retriever.query_embedding\")\n\n# Search\nresults = query_pipeline.run({\"TextEmbedder\": {\"text\": \"Which year did the Joker movie release?\"}})\n\n# Print text from top result\ntop_result = results[\"Retriever\"][\"documents\"][0].content\nprint(\"The top search result is:\")\nprint(top_result)\n```\n\n## Contributing\n\nWe welcome contributions from the community! Please take a look at our [contributing guide](CONTRIBUTING.md) for more details on how to get started.\n\nPull requests are welcome. For major changes, please open an issue first to discuss the proposed changes.\n\n## License\n\n`voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE).\n\nMaintained by [Ashwin Mathur](https://github.com/awinml).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Haystack component to embed strings and Documents using VoyageAI Embedding models.",
    "version": "1.7.0",
    "project_urls": {
        "Documentation": "https://github.com/awinml/voyage-embedders-haystack#readme",
        "Issues": "https://github.com/awinml/voyage-embedders-haystack/issues",
        "Source": "https://github.com/awinml/voyage-embedders-haystack"
    },
    "split_keywords": [
        "haystack",
        " voyageai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b55c558ad7e4deafce944fe39425506ce7720e49f1d33914a13cf67e158e0415",
                "md5": "83aa5cfdc543800c9faccf61900f8956",
                "sha256": "c0864fa67b9963f3e942073ff8646b17689d1a594a8107a691886ccaa5132610"
            },
            "downloads": -1,
            "filename": "voyage_embedders_haystack-1.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "83aa5cfdc543800c9faccf61900f8956",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 17744,
            "upload_time": "2025-10-22T06:17:43",
            "upload_time_iso_8601": "2025-10-22T06:17:43.253432Z",
            "url": "https://files.pythonhosted.org/packages/b5/5c/558ad7e4deafce944fe39425506ce7720e49f1d33914a13cf67e158e0415/voyage_embedders_haystack-1.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1a486c1f7b1845fdd851e1d2139a4efc5e57fe6fa596533fccc399476e813ce1",
                "md5": "b9a4daa1a7263f8b5a5594d5671cc227",
                "sha256": "88a16bce207da3a4438d9629215be18ef68a351c480d5549cd857f09ab223da2"
            },
            "downloads": -1,
            "filename": "voyage_embedders_haystack-1.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b9a4daa1a7263f8b5a5594d5671cc227",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 24483,
            "upload_time": "2025-10-22T06:17:44",
            "upload_time_iso_8601": "2025-10-22T06:17:44.468735Z",
            "url": "https://files.pythonhosted.org/packages/1a/48/6c1f7b1845fdd851e1d2139a4efc5e57fe6fa596533fccc399476e813ce1/voyage_embedders_haystack-1.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-22 06:17:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "awinml",
    "github_project": "voyage-embedders-haystack#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "voyage-embedders-haystack"
}
        
Elapsed time: 9.76194s