ragstack-ai-knowledge-store


Nameragstack-ai-knowledge-store JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/datastax/ragstack-ai
SummaryDataStax RAGStack Graph Store
upload_time2024-06-17 11:53:20
maintainerNone
docs_urlNone
authorDataStax
requires_python<4.0,>=3.8.1
licenseBUSL-1.1
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RAGStack Graph Store

Hybrid Graph Store combining vector similarity and edges between chunks.

## Usage

1. Pre-process your documents to populate `metadata` information.
1. Create a Hybrid `GraphStore` and add your LangChain `Document`s.
1. Retrieve documents from the `GraphStore`.

### Populate Metadata

The Graph Store makes use of the following metadata fields on each `Document`:

- `content_id`: If assigned, this specifies the unique ID of the `Document`.
  If not assigned, one will be generated.
  This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.
- `link_tags`: A set of `LinkTag`s indicating how this node should be linked to other nodes.

#### Hyperlinks

To connect nodes based on hyperlinks, you can use the `HtmlLinkEdgeExtractor` as shown below:

```python
from ragstack_knowledge_store.langchain.extractors import HtmlLinkEdgeExtractor

html_link_extractor = HtmlLinkEdgeExtractor()

for doc in documents:
    doc.metadata["content_id"] = doc.metadata["source"]

    # Add link tags from the page_content to the metadata.
    # Should be passed the HTML content as a string or BeautifulSoup.
    html_link_extractor.extract_one(doc, doc.page_content)
```

### Store

```python
import cassio
from langchain_openai import OpenAIEmbeddings
from ragstack_knowledge_store import GraphStore

cassio.init(auto=True)

graph_store = GraphStore(embeddings=OpenAIEmbeddings())

# Store the documents
graph_store.add_documents(documents)
```

### Retrieve

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

# Retrieve and generate using the relevant snippets of the blog.
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Depth 0 - don't traverse edges. equivalent to vector-only.
# Depth 1 - vector search plus 1 level of edges
retriever = graph_store.as_retriever(k=4, depth=1)

template = """You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    formatted = "\n\n".join(f"From {doc.metadata['content_id']}: {doc.page_content}" for doc in docs)
    return formatted


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
```

## Development

```shell
poetry install --with=dev

# Run Tests
poetry run pytest
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/datastax/ragstack-ai",
    "name": "ragstack-ai-knowledge-store",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": null,
    "author": "DataStax",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/27/19/afab02250be2ec2262df954a5f99dc24ef0a95666e76bffea5dc3f1860e4/ragstack_ai_knowledge_store-0.0.5.tar.gz",
    "platform": null,
    "description": "# RAGStack Graph Store\n\nHybrid Graph Store combining vector similarity and edges between chunks.\n\n## Usage\n\n1. Pre-process your documents to populate `metadata` information.\n1. Create a Hybrid `GraphStore` and add your LangChain `Document`s.\n1. Retrieve documents from the `GraphStore`.\n\n### Populate Metadata\n\nThe Graph Store makes use of the following metadata fields on each `Document`:\n\n- `content_id`: If assigned, this specifies the unique ID of the `Document`.\n  If not assigned, one will be generated.\n  This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.\n- `link_tags`: A set of `LinkTag`s indicating how this node should be linked to other nodes.\n\n#### Hyperlinks\n\nTo connect nodes based on hyperlinks, you can use the `HtmlLinkEdgeExtractor` as shown below:\n\n```python\nfrom ragstack_knowledge_store.langchain.extractors import HtmlLinkEdgeExtractor\n\nhtml_link_extractor = HtmlLinkEdgeExtractor()\n\nfor doc in documents:\n    doc.metadata[\"content_id\"] = doc.metadata[\"source\"]\n\n    # Add link tags from the page_content to the metadata.\n    # Should be passed the HTML content as a string or BeautifulSoup.\n    html_link_extractor.extract_one(doc, doc.page_content)\n```\n\n### Store\n\n```python\nimport cassio\nfrom langchain_openai import OpenAIEmbeddings\nfrom ragstack_knowledge_store import GraphStore\n\ncassio.init(auto=True)\n\ngraph_store = GraphStore(embeddings=OpenAIEmbeddings())\n\n# Store the documents\ngraph_store.add_documents(documents)\n```\n\n### Retrieve\n\n```python\nfrom langchain_openai import ChatOpenAI\n\nllm = ChatOpenAI(model=\"gpt-4o\")\n\n# Retrieve and generate using the relevant snippets of the blog.\nfrom langchain_core.runnables import RunnablePassthrough\nfrom langchain_core.output_parsers import StrOutputParser\nfrom langchain_core.prompts import ChatPromptTemplate\n\n# Depth 0 - don't traverse edges. equivalent to vector-only.\n# Depth 1 - vector search plus 1 level of edges\nretriever = graph_store.as_retriever(k=4, depth=1)\n\ntemplate = \"\"\"You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n\"\"\"\nprompt = ChatPromptTemplate.from_template(template)\n\ndef format_docs(docs):\n    formatted = \"\\n\\n\".join(f\"From {doc.metadata['content_id']}: {doc.page_content}\" for doc in docs)\n    return formatted\n\n\nrag_chain = (\n    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n    | prompt\n    | llm\n    | StrOutputParser()\n)\n```\n\n## Development\n\n```shell\npoetry install --with=dev\n\n# Run Tests\npoetry run pytest\n```\n",
    "bugtrack_url": null,
    "license": "BUSL-1.1",
    "summary": "DataStax RAGStack Graph Store",
    "version": "0.0.5",
    "project_urls": {
        "Documentation": "https://docs.datastax.com/en/ragstack",
        "Homepage": "https://github.com/datastax/ragstack-ai",
        "Repository": "https://github.com/datastax/ragstack-ai"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed25a1e91b1d87f832760b30bf22976260ac66ef27c130c95269e764c7a92737",
                "md5": "c1ff82cf65bd116b1c6446a4c8cf0f8d",
                "sha256": "c8912fbaaa904341f252e868ec75ee6db8b12d5e24c9ad2b383fd9d7bbf84fe5"
            },
            "downloads": -1,
            "filename": "ragstack_ai_knowledge_store-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c1ff82cf65bd116b1c6446a4c8cf0f8d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 13321,
            "upload_time": "2024-06-17T11:53:19",
            "upload_time_iso_8601": "2024-06-17T11:53:19.485882Z",
            "url": "https://files.pythonhosted.org/packages/ed/25/a1e91b1d87f832760b30bf22976260ac66ef27c130c95269e764c7a92737/ragstack_ai_knowledge_store-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2719afab02250be2ec2262df954a5f99dc24ef0a95666e76bffea5dc3f1860e4",
                "md5": "0c3d69d13128888a344d5664ef493e00",
                "sha256": "2b96196c57ac5c3e7d19a88b3db051b07a68a21bbd3bd63aa31263b5c56b6f40"
            },
            "downloads": -1,
            "filename": "ragstack_ai_knowledge_store-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "0c3d69d13128888a344d5664ef493e00",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 12319,
            "upload_time": "2024-06-17T11:53:20",
            "upload_time_iso_8601": "2024-06-17T11:53:20.934624Z",
            "url": "https://files.pythonhosted.org/packages/27/19/afab02250be2ec2262df954a5f99dc24ef0a95666e76bffea5dc3f1860e4/ragstack_ai_knowledge_store-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-17 11:53:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datastax",
    "github_project": "ragstack-ai",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "ragstack-ai-knowledge-store"
}
        
Elapsed time: 0.81557s