# Multi-Document AutoRetrieval (with Weaviate) Pack
This LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack
```
You can then inspect the files at `./multidoc_autoretrieval_pack` and use them as a template for your own project!
## Code Usage
You can download the pack to a the `./multidoc_autoretrieval_pack` directory:
```python
from llama_index.core.llama_pack import download_llama_pack
# download and install dependencies
MultiDocAutoRetrieverPack = download_llama_pack(
"MultiDocAutoRetrieverPack", "./multidoc_autoretrieval_pack"
)
```
From here, you can use the pack. To initialize it, you need to define a few arguments, see below.
Then, you can set up the pack like so:
```python
# setup pack arguments
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
import weaviate
# cloud
auth_config = weaviate.AuthApiKey(api_key="<api_key>")
client = weaviate.Client(
"https://<cluster>.weaviate.network",
auth_client_secret=auth_config,
)
vector_store_info = VectorStoreInfo(
content_info="Github Issues",
metadata_info=[
MetadataInfo(
name="state",
description="Whether the issue is `open` or `closed`",
type="string",
),
...,
],
)
# metadata_nodes is set of nodes with metadata representing each document
# docs is the source docs
# metadata_nodes and docs must be the same length
metadata_nodes = [TextNode(..., metadata={...}), ...]
docs = [Document(...), ...]
pack = MultiDocAutoRetrieverPack(
client,
"<metadata_index_name>",
"<doc_chunks_index_name>",
metadata_nodes,
docs,
vector_store_info,
auto_retriever_kwargs={
# any kwargs for the auto-retriever
...
},
)
```
The `run()` function is a light wrapper around `query_engine.query()`.
```python
response = pack.run("Tell me a bout a Music celebritiy.")
```
You can also use modules individually.
```python
# use the retriever
retriever = pack.retriever
nodes = retriever.retrieve("query_str")
# use the query engine
query_engine = pack.query_engine
response = query_engine.query("query_str")
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-packs-multidoc-autoretrieval",
"maintainer": "jerryjliu",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "autoretrieval, document, multi, multidoc, retrieval",
"author": null,
"author_email": "Your Name <you@example.com>",
"download_url": "https://files.pythonhosted.org/packages/18/64/5deb0370fe72aaa861397dd06fdc7f058b83e98879685fdc4c095128842c/llama_index_packs_multidoc_autoretrieval-0.4.0.tar.gz",
"platform": null,
"description": "# Multi-Document AutoRetrieval (with Weaviate) Pack\n\nThis LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack\n```\n\nYou can then inspect the files at `./multidoc_autoretrieval_pack` and use them as a template for your own project!\n\n## Code Usage\n\nYou can download the pack to a the `./multidoc_autoretrieval_pack` directory:\n\n```python\nfrom llama_index.core.llama_pack import download_llama_pack\n\n# download and install dependencies\nMultiDocAutoRetrieverPack = download_llama_pack(\n \"MultiDocAutoRetrieverPack\", \"./multidoc_autoretrieval_pack\"\n)\n```\n\nFrom here, you can use the pack. To initialize it, you need to define a few arguments, see below.\n\nThen, you can set up the pack like so:\n\n```python\n# setup pack arguments\nfrom llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo\n\nimport weaviate\n\n# cloud\nauth_config = weaviate.AuthApiKey(api_key=\"<api_key>\")\nclient = weaviate.Client(\n \"https://<cluster>.weaviate.network\",\n auth_client_secret=auth_config,\n)\n\nvector_store_info = VectorStoreInfo(\n content_info=\"Github Issues\",\n metadata_info=[\n MetadataInfo(\n name=\"state\",\n description=\"Whether the issue is `open` or `closed`\",\n type=\"string\",\n ),\n ...,\n ],\n)\n\n# metadata_nodes is set of nodes with metadata representing each document\n# docs is the source docs\n# metadata_nodes and docs must be the same length\nmetadata_nodes = [TextNode(..., metadata={...}), ...]\ndocs = [Document(...), ...]\n\npack = MultiDocAutoRetrieverPack(\n client,\n \"<metadata_index_name>\",\n \"<doc_chunks_index_name>\",\n metadata_nodes,\n docs,\n vector_store_info,\n auto_retriever_kwargs={\n # any kwargs for the auto-retriever\n ...\n },\n)\n```\n\nThe `run()` function is a light wrapper around `query_engine.query()`.\n\n```python\nresponse = pack.run(\"Tell me a bout a Music celebritiy.\")\n```\n\nYou can also use modules individually.\n\n```python\n# use the retriever\nretriever = pack.retriever\nnodes = retriever.retrieve(\"query_str\")\n\n# use the query engine\nquery_engine = pack.query_engine\nresponse = query_engine.query(\"query_str\")\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "llama-index packs multidoc_autoretrieval integration",
"version": "0.4.0",
"project_urls": null,
"split_keywords": [
"autoretrieval",
" document",
" multi",
" multidoc",
" retrieval"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4d6558142bbeca00ad4aada906ab13929c5927cfd7ce343a32fe3232b7201de4",
"md5": "359c99ad403c89d5dc11257413e59913",
"sha256": "dd8e0f164864172b3c169471b2d49ad6661c443a1ff01ef741e270c05ed1bc8b"
},
"downloads": -1,
"filename": "llama_index_packs_multidoc_autoretrieval-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "359c99ad403c89d5dc11257413e59913",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 5466,
"upload_time": "2025-07-30T21:39:59",
"upload_time_iso_8601": "2025-07-30T21:39:59.414943Z",
"url": "https://files.pythonhosted.org/packages/4d/65/58142bbeca00ad4aada906ab13929c5927cfd7ce343a32fe3232b7201de4/llama_index_packs_multidoc_autoretrieval-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "18645deb0370fe72aaa861397dd06fdc7f058b83e98879685fdc4c095128842c",
"md5": "b8410812637816eaa846d4b44365c631",
"sha256": "fa7f110b3f26b066cb82b1241ecb3edcaa933e1a44c20f1b69473573d0a588bd"
},
"downloads": -1,
"filename": "llama_index_packs_multidoc_autoretrieval-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "b8410812637816eaa846d4b44365c631",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 5634,
"upload_time": "2025-07-30T21:40:01",
"upload_time_iso_8601": "2025-07-30T21:40:01.785685Z",
"url": "https://files.pythonhosted.org/packages/18/64/5deb0370fe72aaa861397dd06fdc7f058b83e98879685fdc4c095128842c/llama_index_packs_multidoc_autoretrieval-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-30 21:40:01",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-multidoc-autoretrieval"
}