Name | neo4j-haystack JSON |
Version |
2.0.1
JSON |
| download |
home_page | |
Summary | Integration of Neo4j graph database with Haystack |
upload_time | 2024-01-15 20:37:23 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | |
keywords |
documentstore
haystack
neo4j
semantic-search
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<h1 align="center">neo4j-haystack</h1>
<p align="center">A <a href="https://docs.haystack.deepset.ai/docs/document_store"><i>Haystack</i></a> Document Store for <a href="https://neo4j.com/"><i>Neo4j</i></a>.</p>
<p align="center">
<a href="https://github.com/prosto/neo4j-haystack/actions?query=workflow%3Aci">
<img alt="ci" src="https://github.com/prosto/neo4j-haystack/workflows/ci/badge.svg" />
</a>
<a href="https://prosto.github.io/neo4j-haystack/">
<img alt="documentation" src="https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat" />
</a>
<a href="https://pypi.org/project/neo4j-haystack/">
<img alt="pypi version" src="https://img.shields.io/pypi/v/neo4j-haystack.svg" />
</a>
<a href="https://img.shields.io/pypi/pyversions/neo4j-haystack.svg">
<img alt="python version" src="https://img.shields.io/pypi/pyversions/neo4j-haystack.svg" />
</a>
</p>
----
**Table of Contents**
- [Overview](#overview)
- [Usage](#usage)
- [Installation](#installation)
- [License](#license)
## Overview
An integration of [Neo4j](https://neo4j.com/) graph database with [Haystack v2.0](https://docs.haystack.deepset.ai/v2.0/docs/intro)
by [deepset](https://www.deepset.ai). In Neo4j [Vector search index](https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/)
is being used for storing document embeddings and dense retrievals.
The library allows using Neo4j as a [DocumentStore](https://docs.haystack.deepset.ai/v2.0/docs/document-store), and implements the required [Protocol](https://docs.haystack.deepset.ai/v2.0/docs/document-store#documentstore-protocol) methods. You can start working with the implementation by importing it from `neo4_haystack` package:
```python
from neo4_haystack import Neo4jDocumentStore
```
In addition to the `Neo4jDocumentStore` the library includes the following haystack components which can be used in a pipeline:
- [Neo4jEmbeddingRetriever](https://prosto.github.io/neo4j-haystack/reference/neo4j_retriever/#neo4j_haystack.components.neo4j_retriever.Neo4jEmbeddingRetriever) - is a typical [retriever component](https://docs.haystack.deepset.ai/v2.0/docs/retrievers) which can be used to query vector store index and find related Documents. The component uses `Neo4jDocumentStore` to query embeddings.
- [Neo4jDynamicDocumentRetriever](https://prosto.github.io/neo4j-haystack/reference/neo4j_retriever/#neo4j_haystack.components.neo4j_retriever.Neo4jDynamicDocumentRetriever) is also a retriever component in a sense that it can be used to query Documents in Neo4j. However it is decoupled from `Neo4jDocumentStore` and allows to run arbitrary [Cypher query](https://neo4j.com/docs/cypher-manual/current/queries/) to extract documents. Practically it is possible to query Neo4j same way `Neo4jDocumentStore` does, including vector search.
The `neo4j-haystack` library uses [Python Driver](https://neo4j.com/docs/api/python-driver/current/api.html#api-documentation) and
[Cypher Queries](https://neo4j.com/docs/cypher-manual/current/introduction/) to interact with Neo4j database and hide all complexities under the hood.
`Neo4jDocumentStore` will store Documents as Graph nodes in Neo4j. Embeddings are stored as part of the node, but indexing and querying of vector embeddings using ANN is managed by a dedicated [Vector Index](https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/).
```text
+-----------------------------+
| Neo4j Database |
+-----------------------------+
| |
| +----------------+ |
| | Document | |
write_documents | +----------------+ |
+------------------------+----->| properties | |
| | | | |
+---------+----------+ | | embedding | |
| | | +--------+-------+ |
| Neo4jDocumentStore | | | |
| | | |index/query |
+---------+----------+ | | |
| | +--------+--------+ |
| | | Vector Index | |
+----------------------->| | | |
query_embeddings | | (for embedding) | |
| +-----------------+ |
| |
+-----------------------------+
```
In the above diagram:
- `Document` is a Neo4j node (with "Document" label)
- `properties` are Document [attributes](https://docs.haystack.deepset.ai/docs/documents_answers_labels#attributes) stored as part of the node.
- `embedding` is also a property of the Document node (just shown separately in the diagram for clarity) which is a vector of type `LIST[FLOAT]`.
- `Vector Index` is where embeddings are getting indexed by Neo4j as soon as those are updated in Document nodes.
## Installation
`neo4j-haystack` can be installed as any other Python library, using pip:
```bash
pip install --upgrade pip # optional
pip install neo4j-haystack
```
## Usage
Once installed, you can start using `Neo4jDocumentStore` as any other document stores that support embeddings.
```python
from neo4j_haystack import Neo4jDocumentStore
document_store = Neo4jDocumentStore(
url="bolt://localhost:7687",
username="neo4j",
password="passw0rd",
database="neo4j",
embedding_dim=384,
embedding_field="embedding",
index="document-embeddings", # The name of the Vector Index in Neo4j
node_label="Document", # Providing a label to Neo4j nodes which store Documents
)
```
Assuming there is a list of documents available you can write/index those in Neo4j, e.g.:
```python
documents: List[Document] = ...
document_store.write_documents(documents)
```
The full list of parameters accepted by `Neo4jDocumentStore` can be found in
[API documentation](https://prosto.github.io/neo4j-haystack/reference/neo4j_store/#neo4j_haystack.document_stores.neo4j_store.Neo4jDocumentStore.__init__).
Please notice you will need to have a running instance of Neo4j database (in-memory version of Neo4j is not supported). There are several options available:
- [Docker](https://neo4j.com/docs/operations-manual/5/docker/), other options available in the same Operations Manual
- [AuraDB](https://neo4j.com/cloud/platform/aura-graph-database/) - a fully managed Cloud Instance of Neo4j
- [Neo4j Desktop](https://neo4j.com/docs/desktop-manual/current/) client application
The simplest way to start database locally will be with Docker container:
```bash
docker run \
--restart always \
--publish=7474:7474 --publish=7687:7687 \
--env NEO4J_AUTH=neo4j/passw0rd \
neo4j:5.15.0
```
### Retrieving documents
`Neo4jEmbeddingRetriever` component can be used to retrieve documents from Neo4j by querying vector index using an embedded query. Below is a pipeline which finds documents using query embedding s well as [metadata filtering](https://docs.haystack.deepset.ai/v2.0/docs/metadata-filtering):
```python
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from neo4j_haystack import Neo4jEmbeddingRetriever, Neo4jDocumentStore
model_name = "sentence-transformers/all-MiniLM-L6-v2"
document_store = Neo4jDocumentStore(
url="bolt://localhost:7687",
username="neo4j",
password="passw0rd",
database="neo4j",
embedding_dim=384,
index="document-embeddings",
)
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model_name_or_path=model_name))
pipeline.add_component("retriever", Neo4jEmbeddingRetriever(document_store=document_store))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
result = pipeline.run(
data={
"text_embedder": {"text": "Query to be embedded"},
"retriever": {
"top_k": 5,
"filters": {"field": "release_date", "operator": "==", "value": "2018-12-09"},
},
}
)
documents: List[Document] = result["retriever"]["documents"]
```
### Retrieving documents using Cypher
`Neo4jDynamicDocumentRetriever` is a flexible retriever component which can run a Cypher query to obtain documents. The above example of `Neo4jEmbeddingRetriever` could be rewritten without usage of `Neo4jDocumentStore`:
```python
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from neo4j_haystack import Neo4jClientConfig, Neo4jDynamicDocumentRetriever
client_config = Neo4jClientConfig(
url="bolt://localhost:7687",
username="neo4j",
password="passw0rd",
database="neo4j",
)
cypher_query = """
CALL db.index.vector.queryNodes($index, $top_k, $query_embedding)
YIELD node as doc, score
MATCH (doc) WHERE doc.release_date = $release_date
RETURN doc{.*, score}, score
ORDER BY score DESC LIMIT $top_k
"""
embedder = SentenceTransformersTextEmbedder(model_name_or_path="sentence-transformers/all-MiniLM-L6-v2")
retriever = Neo4jDynamicDocumentRetriever(
client_config=client_config, runtime_parameters=["query_embedding"], doc_node_name="doc"
)
pipeline = Pipeline()
pipeline.add_component("text_embedder", embedder)
pipeline.add_component("retriever", retriever)
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
result = pipeline.run(
data={
"text_embedder": {"text": "Query to be embedded"},
"retriever": {
"query": cypher_query,
"parameters": {"index": "document-embeddings", "top_k": 5, "release_date": "2018-12-09"},
},
}
)
documents: List[Document] = result["retriever"]["documents"]
```
Please notice how query parameters are being used in the `cypher_query`:
- `runtime_parameters` is a list of parameter names which are going to be input slots when connecting components
in a pipeline. In our case `query_embedding` input is connected to the `text_embedder.embedding` output.
- `pipeline.run` specifies additional parameters to the `retriever` component which can be referenced in the
`cypher_query`, e.g. `top_k`.
## License
`neo4j-haystack` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
Raw data
{
"_id": null,
"home_page": "",
"name": "neo4j-haystack",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "DocumentStore,Haystack,neo4j,semantic-search",
"author": "",
"author_email": "Sergey Bondarenco <sergey.bondarenco@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/a3/32/0beab14a9ca7050a42a5eb648934663a726cde6fe68f54d2b46f7151a3bd/neo4j_haystack-2.0.1.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">neo4j-haystack</h1>\n\n<p align=\"center\">A <a href=\"https://docs.haystack.deepset.ai/docs/document_store\"><i>Haystack</i></a> Document Store for <a href=\"https://neo4j.com/\"><i>Neo4j</i></a>.</p>\n\n<p align=\"center\">\n <a href=\"https://github.com/prosto/neo4j-haystack/actions?query=workflow%3Aci\">\n <img alt=\"ci\" src=\"https://github.com/prosto/neo4j-haystack/workflows/ci/badge.svg\" />\n </a>\n <a href=\"https://prosto.github.io/neo4j-haystack/\">\n <img alt=\"documentation\" src=\"https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat\" />\n </a>\n <a href=\"https://pypi.org/project/neo4j-haystack/\">\n <img alt=\"pypi version\" src=\"https://img.shields.io/pypi/v/neo4j-haystack.svg\" />\n </a>\n <a href=\"https://img.shields.io/pypi/pyversions/neo4j-haystack.svg\">\n <img alt=\"python version\" src=\"https://img.shields.io/pypi/pyversions/neo4j-haystack.svg\" />\n </a>\n</p>\n\n----\n\n**Table of Contents**\n\n- [Overview](#overview)\n- [Usage](#usage)\n- [Installation](#installation)\n- [License](#license)\n\n## Overview\n\nAn integration of [Neo4j](https://neo4j.com/) graph database with [Haystack v2.0](https://docs.haystack.deepset.ai/v2.0/docs/intro)\nby [deepset](https://www.deepset.ai). In Neo4j [Vector search index](https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/)\nis being used for storing document embeddings and dense retrievals.\n\nThe library allows using Neo4j as a [DocumentStore](https://docs.haystack.deepset.ai/v2.0/docs/document-store), and implements the required [Protocol](https://docs.haystack.deepset.ai/v2.0/docs/document-store#documentstore-protocol) methods. You can start working with the implementation by importing it from `neo4_haystack` package:\n\n```python\nfrom neo4_haystack import Neo4jDocumentStore\n```\n\nIn addition to the `Neo4jDocumentStore` the library includes the following haystack components which can be used in a pipeline:\n\n- [Neo4jEmbeddingRetriever](https://prosto.github.io/neo4j-haystack/reference/neo4j_retriever/#neo4j_haystack.components.neo4j_retriever.Neo4jEmbeddingRetriever) - is a typical [retriever component](https://docs.haystack.deepset.ai/v2.0/docs/retrievers) which can be used to query vector store index and find related Documents. The component uses `Neo4jDocumentStore` to query embeddings.\n- [Neo4jDynamicDocumentRetriever](https://prosto.github.io/neo4j-haystack/reference/neo4j_retriever/#neo4j_haystack.components.neo4j_retriever.Neo4jDynamicDocumentRetriever) is also a retriever component in a sense that it can be used to query Documents in Neo4j. However it is decoupled from `Neo4jDocumentStore` and allows to run arbitrary [Cypher query](https://neo4j.com/docs/cypher-manual/current/queries/) to extract documents. Practically it is possible to query Neo4j same way `Neo4jDocumentStore` does, including vector search.\n\nThe `neo4j-haystack` library uses [Python Driver](https://neo4j.com/docs/api/python-driver/current/api.html#api-documentation) and\n[Cypher Queries](https://neo4j.com/docs/cypher-manual/current/introduction/) to interact with Neo4j database and hide all complexities under the hood.\n\n`Neo4jDocumentStore` will store Documents as Graph nodes in Neo4j. Embeddings are stored as part of the node, but indexing and querying of vector embeddings using ANN is managed by a dedicated [Vector Index](https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/).\n\n```text\n +-----------------------------+\n | Neo4j Database |\n +-----------------------------+\n | |\n | +----------------+ |\n | | Document | |\n write_documents | +----------------+ |\n +------------------------+----->| properties | |\n | | | | |\n+---------+----------+ | | embedding | |\n| | | +--------+-------+ |\n| Neo4jDocumentStore | | | |\n| | | |index/query |\n+---------+----------+ | | |\n | | +--------+--------+ |\n | | | Vector Index | |\n +----------------------->| | | |\n query_embeddings | | (for embedding) | |\n | +-----------------+ |\n | |\n +-----------------------------+\n```\n\nIn the above diagram:\n\n- `Document` is a Neo4j node (with \"Document\" label)\n- `properties` are Document [attributes](https://docs.haystack.deepset.ai/docs/documents_answers_labels#attributes) stored as part of the node.\n- `embedding` is also a property of the Document node (just shown separately in the diagram for clarity) which is a vector of type `LIST[FLOAT]`.\n- `Vector Index` is where embeddings are getting indexed by Neo4j as soon as those are updated in Document nodes.\n\n## Installation\n\n`neo4j-haystack` can be installed as any other Python library, using pip:\n\n```bash\npip install --upgrade pip # optional\npip install neo4j-haystack\n```\n\n## Usage\n\nOnce installed, you can start using `Neo4jDocumentStore` as any other document stores that support embeddings.\n\n```python\nfrom neo4j_haystack import Neo4jDocumentStore\n\ndocument_store = Neo4jDocumentStore(\n url=\"bolt://localhost:7687\",\n username=\"neo4j\",\n password=\"passw0rd\",\n database=\"neo4j\",\n embedding_dim=384,\n embedding_field=\"embedding\",\n index=\"document-embeddings\", # The name of the Vector Index in Neo4j\n node_label=\"Document\", # Providing a label to Neo4j nodes which store Documents\n)\n```\n\nAssuming there is a list of documents available you can write/index those in Neo4j, e.g.:\n\n```python\ndocuments: List[Document] = ...\ndocument_store.write_documents(documents)\n```\n\nThe full list of parameters accepted by `Neo4jDocumentStore` can be found in\n[API documentation](https://prosto.github.io/neo4j-haystack/reference/neo4j_store/#neo4j_haystack.document_stores.neo4j_store.Neo4jDocumentStore.__init__).\n\nPlease notice you will need to have a running instance of Neo4j database (in-memory version of Neo4j is not supported). There are several options available:\n\n- [Docker](https://neo4j.com/docs/operations-manual/5/docker/), other options available in the same Operations Manual\n- [AuraDB](https://neo4j.com/cloud/platform/aura-graph-database/) - a fully managed Cloud Instance of Neo4j\n- [Neo4j Desktop](https://neo4j.com/docs/desktop-manual/current/) client application\n\nThe simplest way to start database locally will be with Docker container:\n\n```bash\ndocker run \\\n --restart always \\\n --publish=7474:7474 --publish=7687:7687 \\\n --env NEO4J_AUTH=neo4j/passw0rd \\\n neo4j:5.15.0\n```\n\n### Retrieving documents\n\n`Neo4jEmbeddingRetriever` component can be used to retrieve documents from Neo4j by querying vector index using an embedded query. Below is a pipeline which finds documents using query embedding s well as [metadata filtering](https://docs.haystack.deepset.ai/v2.0/docs/metadata-filtering):\n\n```python\nfrom haystack import Document, Pipeline\nfrom haystack.components.embedders import SentenceTransformersTextEmbedder\nfrom neo4j_haystack import Neo4jEmbeddingRetriever, Neo4jDocumentStore\n\nmodel_name = \"sentence-transformers/all-MiniLM-L6-v2\"\n\ndocument_store = Neo4jDocumentStore(\n url=\"bolt://localhost:7687\",\n username=\"neo4j\",\n password=\"passw0rd\",\n database=\"neo4j\",\n embedding_dim=384,\n index=\"document-embeddings\",\n)\n\npipeline = Pipeline()\npipeline.add_component(\"text_embedder\", SentenceTransformersTextEmbedder(model_name_or_path=model_name))\npipeline.add_component(\"retriever\", Neo4jEmbeddingRetriever(document_store=document_store))\npipeline.connect(\"text_embedder.embedding\", \"retriever.query_embedding\")\n\nresult = pipeline.run(\n data={\n \"text_embedder\": {\"text\": \"Query to be embedded\"},\n \"retriever\": {\n \"top_k\": 5,\n \"filters\": {\"field\": \"release_date\", \"operator\": \"==\", \"value\": \"2018-12-09\"},\n },\n }\n)\n\ndocuments: List[Document] = result[\"retriever\"][\"documents\"]\n```\n\n### Retrieving documents using Cypher\n\n`Neo4jDynamicDocumentRetriever` is a flexible retriever component which can run a Cypher query to obtain documents. The above example of `Neo4jEmbeddingRetriever` could be rewritten without usage of `Neo4jDocumentStore`:\n\n```python\nfrom haystack import Document, Pipeline\nfrom haystack.components.embedders import SentenceTransformersTextEmbedder\n\nfrom neo4j_haystack import Neo4jClientConfig, Neo4jDynamicDocumentRetriever\n\nclient_config = Neo4jClientConfig(\n url=\"bolt://localhost:7687\",\n username=\"neo4j\",\n password=\"passw0rd\",\n database=\"neo4j\",\n)\n\ncypher_query = \"\"\"\n CALL db.index.vector.queryNodes($index, $top_k, $query_embedding)\n YIELD node as doc, score\n MATCH (doc) WHERE doc.release_date = $release_date\n RETURN doc{.*, score}, score\n ORDER BY score DESC LIMIT $top_k\n \"\"\"\n\nembedder = SentenceTransformersTextEmbedder(model_name_or_path=\"sentence-transformers/all-MiniLM-L6-v2\")\nretriever = Neo4jDynamicDocumentRetriever(\n client_config=client_config, runtime_parameters=[\"query_embedding\"], doc_node_name=\"doc\"\n)\n\npipeline = Pipeline()\npipeline.add_component(\"text_embedder\", embedder)\npipeline.add_component(\"retriever\", retriever)\npipeline.connect(\"text_embedder.embedding\", \"retriever.query_embedding\")\n\nresult = pipeline.run(\n data={\n \"text_embedder\": {\"text\": \"Query to be embedded\"},\n \"retriever\": {\n \"query\": cypher_query,\n \"parameters\": {\"index\": \"document-embeddings\", \"top_k\": 5, \"release_date\": \"2018-12-09\"},\n },\n }\n)\n\ndocuments: List[Document] = result[\"retriever\"][\"documents\"]\n```\n\nPlease notice how query parameters are being used in the `cypher_query`:\n\n- `runtime_parameters` is a list of parameter names which are going to be input slots when connecting components\n in a pipeline. In our case `query_embedding` input is connected to the `text_embedder.embedding` output.\n- `pipeline.run` specifies additional parameters to the `retriever` component which can be referenced in the\n `cypher_query`, e.g. `top_k`.\n\n## License\n\n`neo4j-haystack` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.\n",
"bugtrack_url": null,
"license": "",
"summary": "Integration of Neo4j graph database with Haystack",
"version": "2.0.1",
"project_urls": {
"Documentation": "https://prosto.github.io/neo4j-haystack",
"Issues": "https://github.com/prosto/neo4j-haystack/issues",
"Source": "https://github.com/prosto/neo4j-haystack"
},
"split_keywords": [
"documentstore",
"haystack",
"neo4j",
"semantic-search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "761a2e2b9da2cb51c1e18c99ab22fb97ca63cffa556b40b798f731efbc374ffb",
"md5": "405e9c726c966e00f69a89131945d7dd",
"sha256": "b03f21102ec739b7a52580cc3e952ae82da7e2105f2b72e802b73648fb38c7cf"
},
"downloads": -1,
"filename": "neo4j_haystack-2.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "405e9c726c966e00f69a89131945d7dd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 38560,
"upload_time": "2024-01-15T20:37:21",
"upload_time_iso_8601": "2024-01-15T20:37:21.532574Z",
"url": "https://files.pythonhosted.org/packages/76/1a/2e2b9da2cb51c1e18c99ab22fb97ca63cffa556b40b798f731efbc374ffb/neo4j_haystack-2.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a3320beab14a9ca7050a42a5eb648934663a726cde6fe68f54d2b46f7151a3bd",
"md5": "59f700ec3137c47dd552673770a1dd62",
"sha256": "ff2a04cedd8d3d64067aff3deb84f8923772996fe02971f02e58396c49d1b7f1"
},
"downloads": -1,
"filename": "neo4j_haystack-2.0.1.tar.gz",
"has_sig": false,
"md5_digest": "59f700ec3137c47dd552673770a1dd62",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 578784,
"upload_time": "2024-01-15T20:37:23",
"upload_time_iso_8601": "2024-01-15T20:37:23.071467Z",
"url": "https://files.pythonhosted.org/packages/a3/32/0beab14a9ca7050a42a5eb648934663a726cde6fe68f54d2b46f7151a3bd/neo4j_haystack-2.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-15 20:37:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "prosto",
"github_project": "neo4j-haystack",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "neo4j-haystack"
}