ragstack-ai-knowledge-store


Nameragstack-ai-knowledge-store JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/datastax/ragstack-ai
SummaryDataStax RAGStack Knowledge Store
upload_time2024-06-13 14:54:20
maintainerNone
docs_urlNone
authorDataStax
requires_python<4.0,>=3.8.1
licenseBUSL-1.1
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RAGStack Knowledge Store

Hybrid Knowledge Store combining vector similarity and edges between chunks.

## Usage

1. Pre-process your documents to populate `metadata` information.
1. Create a Hybrid `KnowledgeStore` and add your LangChain `Document`s.
1. Retrieve documents from the `KnowledgeStore`.

### Populate Metadata

The Knowledge Store makes use of the following metadata fields on each `Document`:

- `content_id`: If assigned, this specifies the unique ID of the `Document`.
  If not assigned, one will be generated.
  This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.
- `link_tags`: A set of `LinkTag`s indicating how this node should be linked to other nodes.

#### Hyperlinks

To connect nodes based on hyperlinks, you can use the `HtmlLinkEdgeExtractor` as shown below:

```python
from ragstack_knowledge_store.langchain.extractors import HtmlLinkEdgeExtractor

html_link_extractor = HtmlLinkEdgeExtractor()

for doc in documents:
    doc.metadata["content_id"] = doc.metadata["source"]

    # Add link tags from the page_content to the metadata.
    # Should be passed the HTML content as a string or BeautifulSoup.
    html_link_extractor.extract_one(doc, doc.page_content)
```

### Store

```python
import cassio
from langchain_openai import OpenAIEmbeddings
from ragstack_knowledge_store import KnowledgeStore

cassio.init(auto=True)

knowledge_store = KnowledgeStore(embeddings=OpenAIEmbeddings())

# Store the documents
knowledge_store.add_documents(documents)
```

### Retrieve

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

# Retrieve and generate using the relevant snippets of the blog.
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Depth 0 - don't traverse edges. equivalent to vector-only.
# Depth 1 - vector search plus 1 level of edges
retriever = knowledge_store.as_retriever(k=4, depth=1)

template = """You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    formatted = "\n\n".join(f"From {doc.metadata['content_id']}: {doc.page_content}" for doc in docs)
    return formatted


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
```

## Development

```shell
poetry install --with=dev

# Run Tests
poetry run pytest
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/datastax/ragstack-ai",
    "name": "ragstack-ai-knowledge-store",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": null,
    "author": "DataStax",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/95/d8/a65a9db859b3bffbf1c9b75004ad3e99623228748ef37d806b9166d13264/ragstack_ai_knowledge_store-0.0.4.tar.gz",
    "platform": null,
    "description": "# RAGStack Knowledge Store\n\nHybrid Knowledge Store combining vector similarity and edges between chunks.\n\n## Usage\n\n1. Pre-process your documents to populate `metadata` information.\n1. Create a Hybrid `KnowledgeStore` and add your LangChain `Document`s.\n1. Retrieve documents from the `KnowledgeStore`.\n\n### Populate Metadata\n\nThe Knowledge Store makes use of the following metadata fields on each `Document`:\n\n- `content_id`: If assigned, this specifies the unique ID of the `Document`.\n  If not assigned, one will be generated.\n  This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.\n- `link_tags`: A set of `LinkTag`s indicating how this node should be linked to other nodes.\n\n#### Hyperlinks\n\nTo connect nodes based on hyperlinks, you can use the `HtmlLinkEdgeExtractor` as shown below:\n\n```python\nfrom ragstack_knowledge_store.langchain.extractors import HtmlLinkEdgeExtractor\n\nhtml_link_extractor = HtmlLinkEdgeExtractor()\n\nfor doc in documents:\n    doc.metadata[\"content_id\"] = doc.metadata[\"source\"]\n\n    # Add link tags from the page_content to the metadata.\n    # Should be passed the HTML content as a string or BeautifulSoup.\n    html_link_extractor.extract_one(doc, doc.page_content)\n```\n\n### Store\n\n```python\nimport cassio\nfrom langchain_openai import OpenAIEmbeddings\nfrom ragstack_knowledge_store import KnowledgeStore\n\ncassio.init(auto=True)\n\nknowledge_store = KnowledgeStore(embeddings=OpenAIEmbeddings())\n\n# Store the documents\nknowledge_store.add_documents(documents)\n```\n\n### Retrieve\n\n```python\nfrom langchain_openai import ChatOpenAI\n\nllm = ChatOpenAI(model=\"gpt-4o\")\n\n# Retrieve and generate using the relevant snippets of the blog.\nfrom langchain_core.runnables import RunnablePassthrough\nfrom langchain_core.output_parsers import StrOutputParser\nfrom langchain_core.prompts import ChatPromptTemplate\n\n# Depth 0 - don't traverse edges. equivalent to vector-only.\n# Depth 1 - vector search plus 1 level of edges\nretriever = knowledge_store.as_retriever(k=4, depth=1)\n\ntemplate = \"\"\"You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n\"\"\"\nprompt = ChatPromptTemplate.from_template(template)\n\ndef format_docs(docs):\n    formatted = \"\\n\\n\".join(f\"From {doc.metadata['content_id']}: {doc.page_content}\" for doc in docs)\n    return formatted\n\n\nrag_chain = (\n    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n    | prompt\n    | llm\n    | StrOutputParser()\n)\n```\n\n## Development\n\n```shell\npoetry install --with=dev\n\n# Run Tests\npoetry run pytest\n```\n",
    "bugtrack_url": null,
    "license": "BUSL-1.1",
    "summary": "DataStax RAGStack Knowledge Store",
    "version": "0.0.4",
    "project_urls": {
        "Documentation": "https://docs.datastax.com/en/ragstack",
        "Homepage": "https://github.com/datastax/ragstack-ai",
        "Repository": "https://github.com/datastax/ragstack-ai"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "13961105631a0f506e8ecc0ecd7433f8055601087777b361e38920b4cd08a182",
                "md5": "31ae0a4e7e7fb3ca8560db9760b4ae0b",
                "sha256": "88b8f196d6a58c66d4b49c93a523fd252102023d26a063d0f8cfb5911663e0da"
            },
            "downloads": -1,
            "filename": "ragstack_ai_knowledge_store-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "31ae0a4e7e7fb3ca8560db9760b4ae0b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 21004,
            "upload_time": "2024-06-13T14:54:19",
            "upload_time_iso_8601": "2024-06-13T14:54:19.921317Z",
            "url": "https://files.pythonhosted.org/packages/13/96/1105631a0f506e8ecc0ecd7433f8055601087777b361e38920b4cd08a182/ragstack_ai_knowledge_store-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "95d8a65a9db859b3bffbf1c9b75004ad3e99623228748ef37d806b9166d13264",
                "md5": "7cd6ad32f0da01b2ca49a7ea706242f8",
                "sha256": "8be1221ce6304c84ac984f95bb1902802766ec449ebc04417303014f3f883bd3"
            },
            "downloads": -1,
            "filename": "ragstack_ai_knowledge_store-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "7cd6ad32f0da01b2ca49a7ea706242f8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 17031,
            "upload_time": "2024-06-13T14:54:20",
            "upload_time_iso_8601": "2024-06-13T14:54:20.918577Z",
            "url": "https://files.pythonhosted.org/packages/95/d8/a65a9db859b3bffbf1c9b75004ad3e99623228748ef37d806b9166d13264/ragstack_ai_knowledge_store-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-13 14:54:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datastax",
    "github_project": "ragstack-ai",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "ragstack-ai-knowledge-store"
}
        
Elapsed time: 0.29821s