# RAGStack Graph Store
Hybrid Graph Store combining vector similarity and edges between chunks.
## Usage
1. Pre-process your documents to populate `metadata` information.
1. Create a Hybrid `GraphStore` and add your LangChain `Document`s.
1. Retrieve documents from the `GraphStore`.
### Populate Metadata
The Graph Store makes use of the following metadata fields on each `Document`:
- `content_id`: If assigned, this specifies the unique ID of the `Document`.
If not assigned, one will be generated.
This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.
- `links`: A set of `Link`s indicating how this node should be linked to other nodes.
#### Hyperlinks
To connect nodes based on hyperlinks, you can use the `HtmlLinkExtractor` as shown below:
```python
from ragstack_knowledge_store.langchain.extractors import HtmlLinkExtractor
html_link_extractor = HtmlLinkExtractor()
for doc in documents:
doc.metadata["content_id"] = doc.metadata["source"]
# Add link tags from the page_content to the metadata.
# Should be passed the HTML content as a string or BeautifulSoup.
add_links(doc,
html_link_extractor.extract_one(HtmlInput(doc.page_content, doc.metadata["source_url"])))
```
### Store
```python
import cassio
from langchain_openai import OpenAIEmbeddings
from ragstack_knowledge_store import GraphStore
cassio.init(auto=True)
graph_store = GraphStore(embeddings=OpenAIEmbeddings())
# Store the documents
graph_store.add_documents(documents)
```
### Retrieve
```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
# Retrieve and generate using the relevant snippets of the blog.
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
# Depth 0 - don't traverse edges. equivalent to vector-only.
# Depth 1 - vector search plus 1 level of edges
retriever = graph_store.as_retriever(k=4, depth=1)
template = """You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
def format_docs(docs):
formatted = "\n\n".join(f"From {doc.metadata['content_id']}: {doc.page_content}" for doc in docs)
return formatted
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
```
## Development
```shell
poetry install --with=dev
# Run Tests
poetry run pytest
```
Raw data
{
"_id": null,
"home_page": "https://github.com/datastax/ragstack-ai",
"name": "ragstack-ai-knowledge-store",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "DataStax",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/3e/e1/da324f5b590aef5ae07b8e7e38ee249101b2bfae6882ba1152c06f8228b2/ragstack_ai_knowledge_store-0.2.1.tar.gz",
"platform": null,
"description": "# RAGStack Graph Store\n\nHybrid Graph Store combining vector similarity and edges between chunks.\n\n## Usage\n\n1. Pre-process your documents to populate `metadata` information.\n1. Create a Hybrid `GraphStore` and add your LangChain `Document`s.\n1. Retrieve documents from the `GraphStore`.\n\n### Populate Metadata\n\nThe Graph Store makes use of the following metadata fields on each `Document`:\n\n- `content_id`: If assigned, this specifies the unique ID of the `Document`.\n If not assigned, one will be generated.\n This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.\n- `links`: A set of `Link`s indicating how this node should be linked to other nodes.\n\n#### Hyperlinks\n\nTo connect nodes based on hyperlinks, you can use the `HtmlLinkExtractor` as shown below:\n\n```python\nfrom ragstack_knowledge_store.langchain.extractors import HtmlLinkExtractor\n\nhtml_link_extractor = HtmlLinkExtractor()\n\nfor doc in documents:\n doc.metadata[\"content_id\"] = doc.metadata[\"source\"]\n\n # Add link tags from the page_content to the metadata.\n # Should be passed the HTML content as a string or BeautifulSoup.\n add_links(doc,\n html_link_extractor.extract_one(HtmlInput(doc.page_content, doc.metadata[\"source_url\"])))\n```\n\n### Store\n\n```python\nimport cassio\nfrom langchain_openai import OpenAIEmbeddings\nfrom ragstack_knowledge_store import GraphStore\n\ncassio.init(auto=True)\n\ngraph_store = GraphStore(embeddings=OpenAIEmbeddings())\n\n# Store the documents\ngraph_store.add_documents(documents)\n```\n\n### Retrieve\n\n```python\nfrom langchain_openai import ChatOpenAI\n\nllm = ChatOpenAI(model=\"gpt-4o\")\n\n# Retrieve and generate using the relevant snippets of the blog.\nfrom langchain_core.runnables import RunnablePassthrough\nfrom langchain_core.output_parsers import StrOutputParser\nfrom langchain_core.prompts import ChatPromptTemplate\n\n# Depth 0 - don't traverse edges. equivalent to vector-only.\n# Depth 1 - vector search plus 1 level of edges\nretriever = graph_store.as_retriever(k=4, depth=1)\n\ntemplate = \"\"\"You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n\"\"\"\nprompt = ChatPromptTemplate.from_template(template)\n\ndef format_docs(docs):\n formatted = \"\\n\\n\".join(f\"From {doc.metadata['content_id']}: {doc.page_content}\" for doc in docs)\n return formatted\n\n\nrag_chain = (\n {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n | prompt\n | llm\n | StrOutputParser()\n)\n```\n\n## Development\n\n```shell\npoetry install --with=dev\n\n# Run Tests\npoetry run pytest\n```\n",
"bugtrack_url": null,
"license": "BUSL-1.1",
"summary": "DataStax RAGStack Graph Store",
"version": "0.2.1",
"project_urls": {
"Documentation": "https://docs.datastax.com/en/ragstack",
"Homepage": "https://github.com/datastax/ragstack-ai",
"Repository": "https://github.com/datastax/ragstack-ai"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3bab0afba28f5e9ff962130c8c08bea1ea609be98ec0be3175748911732bd2ca",
"md5": "c579bb879e684811cfc85f55d0fa7988",
"sha256": "2749c6ab43768e14892dbc1190f2348e9006ad5eb77188e875d704ff811ccc6b"
},
"downloads": -1,
"filename": "ragstack_ai_knowledge_store-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c579bb879e684811cfc85f55d0fa7988",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 18726,
"upload_time": "2024-07-30T10:42:10",
"upload_time_iso_8601": "2024-07-30T10:42:10.604003Z",
"url": "https://files.pythonhosted.org/packages/3b/ab/0afba28f5e9ff962130c8c08bea1ea609be98ec0be3175748911732bd2ca/ragstack_ai_knowledge_store-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3ee1da324f5b590aef5ae07b8e7e38ee249101b2bfae6882ba1152c06f8228b2",
"md5": "236f5cb093a83ac926e8317f243f3f02",
"sha256": "20424d42978be228feb887a71d76228638765446d1a37a1f6faef109e448fe06"
},
"downloads": -1,
"filename": "ragstack_ai_knowledge_store-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "236f5cb093a83ac926e8317f243f3f02",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 16964,
"upload_time": "2024-07-30T10:42:12",
"upload_time_iso_8601": "2024-07-30T10:42:12.030766Z",
"url": "https://files.pythonhosted.org/packages/3e/e1/da324f5b590aef5ae07b8e7e38ee249101b2bfae6882ba1152c06f8228b2/ragstack_ai_knowledge_store-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-30 10:42:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "datastax",
"github_project": "ragstack-ai",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "ragstack-ai-knowledge-store"
}