# Multi-Document AutoRetrieval (with Weaviate) Pack
This LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack
```
You can then inspect the files at `./multidoc_autoretrieval_pack` and use them as a template for your own project!
## Code Usage
You can download the pack to a the `./multidoc_autoretrieval_pack` directory:
```python
from llama_index.core.llama_pack import download_llama_pack
# download and install dependencies
MultiDocAutoRetrieverPack = download_llama_pack(
"MultiDocAutoRetrieverPack", "./multidoc_autoretrieval_pack"
)
```
From here, you can use the pack. To initialize it, you need to define a few arguments, see below.
Then, you can set up the pack like so:
```python
# setup pack arguments
from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo
import weaviate
# cloud
auth_config = weaviate.AuthApiKey(api_key="<api_key>")
client = weaviate.Client(
"https://<cluster>.weaviate.network",
auth_client_secret=auth_config,
)
vector_store_info = VectorStoreInfo(
content_info="Github Issues",
metadata_info=[
MetadataInfo(
name="state",
description="Whether the issue is `open` or `closed`",
type="string",
),
...,
],
)
# metadata_nodes is set of nodes with metadata representing each document
# docs is the source docs
# metadata_nodes and docs must be the same length
metadata_nodes = [TextNode(..., metadata={...}), ...]
docs = [Document(...), ...]
pack = MultiDocAutoRetrieverPack(
client,
"<metadata_index_name>",
"<doc_chunks_index_name>",
metadata_nodes,
docs,
vector_store_info,
auto_retriever_kwargs={
# any kwargs for the auto-retriever
...
},
)
```
The `run()` function is a light wrapper around `query_engine.query()`.
```python
response = pack.run("Tell me a bout a Music celebritiy.")
```
You can also use modules individually.
```python
# use the retriever
retriever = pack.retriever
nodes = retriever.retrieve("query_str")
# use the query engine
query_engine = pack.query_engine
response = query_engine.query("query_str")
```
Raw data
{
"_id": null,
"home_page": "",
"name": "llama-index-packs-multidoc-autoretrieval",
"maintainer": "jerryjliu",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0",
"maintainer_email": "",
"keywords": "autoretrieval,document,multi,multidoc,retrieval",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/be/3f/18388859f55d27284c7da7d362ba88b607f6ecef7245f21cc982ec08fdab/llama_index_packs_multidoc_autoretrieval-0.1.3.tar.gz",
"platform": null,
"description": "# Multi-Document AutoRetrieval (with Weaviate) Pack\n\nThis LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack\n```\n\nYou can then inspect the files at `./multidoc_autoretrieval_pack` and use them as a template for your own project!\n\n## Code Usage\n\nYou can download the pack to a the `./multidoc_autoretrieval_pack` directory:\n\n```python\nfrom llama_index.core.llama_pack import download_llama_pack\n\n# download and install dependencies\nMultiDocAutoRetrieverPack = download_llama_pack(\n \"MultiDocAutoRetrieverPack\", \"./multidoc_autoretrieval_pack\"\n)\n```\n\nFrom here, you can use the pack. To initialize it, you need to define a few arguments, see below.\n\nThen, you can set up the pack like so:\n\n```python\n# setup pack arguments\nfrom llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo\n\nimport weaviate\n\n# cloud\nauth_config = weaviate.AuthApiKey(api_key=\"<api_key>\")\nclient = weaviate.Client(\n \"https://<cluster>.weaviate.network\",\n auth_client_secret=auth_config,\n)\n\nvector_store_info = VectorStoreInfo(\n content_info=\"Github Issues\",\n metadata_info=[\n MetadataInfo(\n name=\"state\",\n description=\"Whether the issue is `open` or `closed`\",\n type=\"string\",\n ),\n ...,\n ],\n)\n\n# metadata_nodes is set of nodes with metadata representing each document\n# docs is the source docs\n# metadata_nodes and docs must be the same length\nmetadata_nodes = [TextNode(..., metadata={...}), ...]\ndocs = [Document(...), ...]\n\npack = MultiDocAutoRetrieverPack(\n client,\n \"<metadata_index_name>\",\n \"<doc_chunks_index_name>\",\n metadata_nodes,\n docs,\n vector_store_info,\n auto_retriever_kwargs={\n # any kwargs for the auto-retriever\n ...\n },\n)\n```\n\nThe `run()` function is a light wrapper around `query_engine.query()`.\n\n```python\nresponse = pack.run(\"Tell me a bout a Music celebritiy.\")\n```\n\nYou can also use modules individually.\n\n```python\n# use the retriever\nretriever = pack.retriever\nnodes = retriever.retrieve(\"query_str\")\n\n# use the query engine\nquery_engine = pack.query_engine\nresponse = query_engine.query(\"query_str\")\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs multidoc_autoretrieval integration",
"version": "0.1.3",
"project_urls": null,
"split_keywords": [
"autoretrieval",
"document",
"multi",
"multidoc",
"retrieval"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ef7bd98cb99c14687702da91080c7b1fa2280fc1e357017a3c8dc7e0bbaf0680",
"md5": "1e3c5a95ad099a8ec2cfc70b7fcddb29",
"sha256": "c9e6707beacfd48bb5cfd1b3c438da0747d180562dbe57af247c166545e15bd5"
},
"downloads": -1,
"filename": "llama_index_packs_multidoc_autoretrieval-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1e3c5a95ad099a8ec2cfc70b7fcddb29",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0",
"size": 4601,
"upload_time": "2024-02-22T01:23:28",
"upload_time_iso_8601": "2024-02-22T01:23:28.318129Z",
"url": "https://files.pythonhosted.org/packages/ef/7b/d98cb99c14687702da91080c7b1fa2280fc1e357017a3c8dc7e0bbaf0680/llama_index_packs_multidoc_autoretrieval-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "be3f18388859f55d27284c7da7d362ba88b607f6ecef7245f21cc982ec08fdab",
"md5": "25f13df09d0cebb7911140c5d56c21fa",
"sha256": "e715972dc806160a5c6980007c6af55f6ed5d5674d1c2d37d268672b2c5ac2e1"
},
"downloads": -1,
"filename": "llama_index_packs_multidoc_autoretrieval-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "25f13df09d0cebb7911140c5d56c21fa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0",
"size": 4197,
"upload_time": "2024-02-22T01:23:35",
"upload_time_iso_8601": "2024-02-22T01:23:35.793645Z",
"url": "https://files.pythonhosted.org/packages/be/3f/18388859f55d27284c7da7d362ba88b607f6ecef7245f21cc982ec08fdab/llama_index_packs_multidoc_autoretrieval-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-22 01:23:35",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-multidoc-autoretrieval"
}