ragstack-ai-knowledge-store


Nameragstack-ai-knowledge-store JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/datastax/ragstack-ai
SummaryDataStax RAGStack Graph Store
upload_time2024-07-30 10:42:12
maintainerNone
docs_urlNone
authorDataStax
requires_python<3.13,>=3.9
licenseBUSL-1.1
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RAGStack Graph Store

Hybrid Graph Store combining vector similarity and edges between chunks.

## Usage

1. Pre-process your documents to populate `metadata` information.
1. Create a Hybrid `GraphStore` and add your LangChain `Document`s.
1. Retrieve documents from the `GraphStore`.

### Populate Metadata

The Graph Store makes use of the following metadata fields on each `Document`:

- `content_id`: If assigned, this specifies the unique ID of the `Document`.
  If not assigned, one will be generated.
  This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.
- `links`: A set of `Link`s indicating how this node should be linked to other nodes.

#### Hyperlinks

To connect nodes based on hyperlinks, you can use the `HtmlLinkExtractor` as shown below:

```python
from ragstack_knowledge_store.langchain.extractors import HtmlLinkExtractor

html_link_extractor = HtmlLinkExtractor()

for doc in documents:
    doc.metadata["content_id"] = doc.metadata["source"]

    # Add link tags from the page_content to the metadata.
    # Should be passed the HTML content as a string or BeautifulSoup.
    add_links(doc,
        html_link_extractor.extract_one(HtmlInput(doc.page_content, doc.metadata["source_url"])))
```

### Store

```python
import cassio
from langchain_openai import OpenAIEmbeddings
from ragstack_knowledge_store import GraphStore

cassio.init(auto=True)

graph_store = GraphStore(embeddings=OpenAIEmbeddings())

# Store the documents
graph_store.add_documents(documents)
```

### Retrieve

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

# Retrieve and generate using the relevant snippets of the blog.
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Depth 0 - don't traverse edges. equivalent to vector-only.
# Depth 1 - vector search plus 1 level of edges
retriever = graph_store.as_retriever(k=4, depth=1)

template = """You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    formatted = "\n\n".join(f"From {doc.metadata['content_id']}: {doc.page_content}" for doc in docs)
    return formatted


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
```

## Development

```shell
poetry install --with=dev

# Run Tests
poetry run pytest
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/datastax/ragstack-ai",
    "name": "ragstack-ai-knowledge-store",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "DataStax",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/3e/e1/da324f5b590aef5ae07b8e7e38ee249101b2bfae6882ba1152c06f8228b2/ragstack_ai_knowledge_store-0.2.1.tar.gz",
    "platform": null,
    "description": "# RAGStack Graph Store\n\nHybrid Graph Store combining vector similarity and edges between chunks.\n\n## Usage\n\n1. Pre-process your documents to populate `metadata` information.\n1. Create a Hybrid `GraphStore` and add your LangChain `Document`s.\n1. Retrieve documents from the `GraphStore`.\n\n### Populate Metadata\n\nThe Graph Store makes use of the following metadata fields on each `Document`:\n\n- `content_id`: If assigned, this specifies the unique ID of the `Document`.\n  If not assigned, one will be generated.\n  This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.\n- `links`: A set of `Link`s indicating how this node should be linked to other nodes.\n\n#### Hyperlinks\n\nTo connect nodes based on hyperlinks, you can use the `HtmlLinkExtractor` as shown below:\n\n```python\nfrom ragstack_knowledge_store.langchain.extractors import HtmlLinkExtractor\n\nhtml_link_extractor = HtmlLinkExtractor()\n\nfor doc in documents:\n    doc.metadata[\"content_id\"] = doc.metadata[\"source\"]\n\n    # Add link tags from the page_content to the metadata.\n    # Should be passed the HTML content as a string or BeautifulSoup.\n    add_links(doc,\n        html_link_extractor.extract_one(HtmlInput(doc.page_content, doc.metadata[\"source_url\"])))\n```\n\n### Store\n\n```python\nimport cassio\nfrom langchain_openai import OpenAIEmbeddings\nfrom ragstack_knowledge_store import GraphStore\n\ncassio.init(auto=True)\n\ngraph_store = GraphStore(embeddings=OpenAIEmbeddings())\n\n# Store the documents\ngraph_store.add_documents(documents)\n```\n\n### Retrieve\n\n```python\nfrom langchain_openai import ChatOpenAI\n\nllm = ChatOpenAI(model=\"gpt-4o\")\n\n# Retrieve and generate using the relevant snippets of the blog.\nfrom langchain_core.runnables import RunnablePassthrough\nfrom langchain_core.output_parsers import StrOutputParser\nfrom langchain_core.prompts import ChatPromptTemplate\n\n# Depth 0 - don't traverse edges. equivalent to vector-only.\n# Depth 1 - vector search plus 1 level of edges\nretriever = graph_store.as_retriever(k=4, depth=1)\n\ntemplate = \"\"\"You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n\"\"\"\nprompt = ChatPromptTemplate.from_template(template)\n\ndef format_docs(docs):\n    formatted = \"\\n\\n\".join(f\"From {doc.metadata['content_id']}: {doc.page_content}\" for doc in docs)\n    return formatted\n\n\nrag_chain = (\n    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n    | prompt\n    | llm\n    | StrOutputParser()\n)\n```\n\n## Development\n\n```shell\npoetry install --with=dev\n\n# Run Tests\npoetry run pytest\n```\n",
    "bugtrack_url": null,
    "license": "BUSL-1.1",
    "summary": "DataStax RAGStack Graph Store",
    "version": "0.2.1",
    "project_urls": {
        "Documentation": "https://docs.datastax.com/en/ragstack",
        "Homepage": "https://github.com/datastax/ragstack-ai",
        "Repository": "https://github.com/datastax/ragstack-ai"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3bab0afba28f5e9ff962130c8c08bea1ea609be98ec0be3175748911732bd2ca",
                "md5": "c579bb879e684811cfc85f55d0fa7988",
                "sha256": "2749c6ab43768e14892dbc1190f2348e9006ad5eb77188e875d704ff811ccc6b"
            },
            "downloads": -1,
            "filename": "ragstack_ai_knowledge_store-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c579bb879e684811cfc85f55d0fa7988",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 18726,
            "upload_time": "2024-07-30T10:42:10",
            "upload_time_iso_8601": "2024-07-30T10:42:10.604003Z",
            "url": "https://files.pythonhosted.org/packages/3b/ab/0afba28f5e9ff962130c8c08bea1ea609be98ec0be3175748911732bd2ca/ragstack_ai_knowledge_store-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3ee1da324f5b590aef5ae07b8e7e38ee249101b2bfae6882ba1152c06f8228b2",
                "md5": "236f5cb093a83ac926e8317f243f3f02",
                "sha256": "20424d42978be228feb887a71d76228638765446d1a37a1f6faef109e448fe06"
            },
            "downloads": -1,
            "filename": "ragstack_ai_knowledge_store-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "236f5cb093a83ac926e8317f243f3f02",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 16964,
            "upload_time": "2024-07-30T10:42:12",
            "upload_time_iso_8601": "2024-07-30T10:42:12.030766Z",
            "url": "https://files.pythonhosted.org/packages/3e/e1/da324f5b590aef5ae07b8e7e38ee249101b2bfae6882ba1152c06f8228b2/ragstack_ai_knowledge_store-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-30 10:42:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datastax",
    "github_project": "ragstack-ai",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "ragstack-ai-knowledge-store"
}
        
Elapsed time: 2.22460s