# Multi-Document AutoRetrieval (with Weaviate) Pack
This LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack
```
You can then inspect the files at `./multidoc_autoretrieval_pack` and use them as a template for your own project!
## Code Usage
You can download the pack to a the `./multidoc_autoretrieval_pack` directory:
```python
from llama_index.core.llama_pack import download_llama_pack
# download and install dependencies
MultiDocAutoRetrieverPack = download_llama_pack(
"MultiDocAutoRetrieverPack", "./multidoc_autoretrieval_pack"
)
```
From here, you can use the pack. To initialize it, you need to define a few arguments, see below.
Then, you can set up the pack like so:
```python
# setup pack arguments
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
import weaviate
# cloud
auth_config = weaviate.AuthApiKey(api_key="<api_key>")
client = weaviate.Client(
"https://<cluster>.weaviate.network",
auth_client_secret=auth_config,
)
vector_store_info = VectorStoreInfo(
content_info="Github Issues",
metadata_info=[
MetadataInfo(
name="state",
description="Whether the issue is `open` or `closed`",
type="string",
),
...,
],
)
# metadata_nodes is set of nodes with metadata representing each document
# docs is the source docs
# metadata_nodes and docs must be the same length
metadata_nodes = [TextNode(..., metadata={...}), ...]
docs = [Document(...), ...]
pack = MultiDocAutoRetrieverPack(
client,
"<metadata_index_name>",
"<doc_chunks_index_name>",
metadata_nodes,
docs,
vector_store_info,
auto_retriever_kwargs={
# any kwargs for the auto-retriever
...
},
)
```
The `run()` function is a light wrapper around `query_engine.query()`.
```python
response = pack.run("Tell me a bout a Music celebritiy.")
```
You can also use modules individually.
```python
# use the retriever
retriever = pack.retriever
nodes = retriever.retrieve("query_str")
# use the query engine
query_engine = pack.query_engine
response = query_engine.query("query_str")
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-packs-multidoc-autoretrieval",
"maintainer": "jerryjliu",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "autoretrieval, document, multi, multidoc, retrieval",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/3c/55/e713bea9edc5494ed844874890d89906a8f5596ed131880e11c6715510bd/llama_index_packs_multidoc_autoretrieval-0.3.0.tar.gz",
"platform": null,
"description": "# Multi-Document AutoRetrieval (with Weaviate) Pack\n\nThis LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack\n```\n\nYou can then inspect the files at `./multidoc_autoretrieval_pack` and use them as a template for your own project!\n\n## Code Usage\n\nYou can download the pack to a the `./multidoc_autoretrieval_pack` directory:\n\n```python\nfrom llama_index.core.llama_pack import download_llama_pack\n\n# download and install dependencies\nMultiDocAutoRetrieverPack = download_llama_pack(\n \"MultiDocAutoRetrieverPack\", \"./multidoc_autoretrieval_pack\"\n)\n```\n\nFrom here, you can use the pack. To initialize it, you need to define a few arguments, see below.\n\nThen, you can set up the pack like so:\n\n```python\n# setup pack arguments\nfrom llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo\n\nimport weaviate\n\n# cloud\nauth_config = weaviate.AuthApiKey(api_key=\"<api_key>\")\nclient = weaviate.Client(\n \"https://<cluster>.weaviate.network\",\n auth_client_secret=auth_config,\n)\n\nvector_store_info = VectorStoreInfo(\n content_info=\"Github Issues\",\n metadata_info=[\n MetadataInfo(\n name=\"state\",\n description=\"Whether the issue is `open` or `closed`\",\n type=\"string\",\n ),\n ...,\n ],\n)\n\n# metadata_nodes is set of nodes with metadata representing each document\n# docs is the source docs\n# metadata_nodes and docs must be the same length\nmetadata_nodes = [TextNode(..., metadata={...}), ...]\ndocs = [Document(...), ...]\n\npack = MultiDocAutoRetrieverPack(\n client,\n \"<metadata_index_name>\",\n \"<doc_chunks_index_name>\",\n metadata_nodes,\n docs,\n vector_store_info,\n auto_retriever_kwargs={\n # any kwargs for the auto-retriever\n ...\n },\n)\n```\n\nThe `run()` function is a light wrapper around `query_engine.query()`.\n\n```python\nresponse = pack.run(\"Tell me a bout a Music celebritiy.\")\n```\n\nYou can also use modules individually.\n\n```python\n# use the retriever\nretriever = pack.retriever\nnodes = retriever.retrieve(\"query_str\")\n\n# use the query engine\nquery_engine = pack.query_engine\nresponse = query_engine.query(\"query_str\")\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs multidoc_autoretrieval integration",
"version": "0.3.0",
"project_urls": null,
"split_keywords": [
"autoretrieval",
" document",
" multi",
" multidoc",
" retrieval"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "701c300120c063220d3c1396c39f98f4764fad4e9bfde1776ed5677e78d8fd2a",
"md5": "da320ed9c3dc48b338bb8daa6124d1e2",
"sha256": "422be6cb4e8deade23f459de6fc1f01cb3c453ba72d4cfae9509620d0bd31b1b"
},
"downloads": -1,
"filename": "llama_index_packs_multidoc_autoretrieval-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "da320ed9c3dc48b338bb8daa6124d1e2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 4611,
"upload_time": "2024-11-18T01:30:33",
"upload_time_iso_8601": "2024-11-18T01:30:33.643551Z",
"url": "https://files.pythonhosted.org/packages/70/1c/300120c063220d3c1396c39f98f4764fad4e9bfde1776ed5677e78d8fd2a/llama_index_packs_multidoc_autoretrieval-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3c55e713bea9edc5494ed844874890d89906a8f5596ed131880e11c6715510bd",
"md5": "f15e8a1a3ac5400aa5369b0dc6763139",
"sha256": "d96f05aad1fc9c9981ebc45ca2480ddd2f8651421e4ab2b54f7aaaac3ce2d95c"
},
"downloads": -1,
"filename": "llama_index_packs_multidoc_autoretrieval-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "f15e8a1a3ac5400aa5369b0dc6763139",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 4216,
"upload_time": "2024-11-18T01:30:35",
"upload_time_iso_8601": "2024-11-18T01:30:35.511634Z",
"url": "https://files.pythonhosted.org/packages/3c/55/e713bea9edc5494ed844874890d89906a8f5596ed131880e11c6715510bd/llama_index_packs_multidoc_autoretrieval-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-18 01:30:35",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-multidoc-autoretrieval"
}