llama-index-packs-timescale-vector-autoretrieval


Namellama-index-packs-timescale-vector-autoretrieval JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
Summaryllama-index packs timescale_vector_autoretrieval integration
upload_time2024-11-18 00:53:48
maintainercevian
docs_urlNone
authorYour Name
requires_python<4.0,>=3.9
licenseMIT
keywords autoretrieval index timescale vector
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Timescale Vector AutoRetrieval Pack With Hybrid Search on Time

This pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.
This sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:

- News articles (covering various timely topics like politics or business)
- Blog posts, documentation, or other published materials (both public and private)
- Social media posts
- Changelogs
- Messages
- Any timestamped data

Hybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.
For instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.

[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.

The auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.

## What is Timescale Vector?

**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**

Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.

- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.
- Enables fast time-based vector search via automatic time-based partitioning and indexing.

Timescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:

- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.
- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.
- Enables a worry-free experience with enterprise-grade security and compliance.

### How to access Timescale Vector

Llama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.

## CLI Usage

You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:

```bash
llamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack
```

You can then inspect the files at `./tsv_pack` and use them as a template for your own project.

## Code Usage

You can download the pack to a the `./tsv_pack` directory:

```python
from llama_index.core.llama_pack import download_llama_pack

# download and install dependencies
TimescaleVectorAutoretrievalPack = download_llama_pack(
    "TimescaleVectorAutoretrievalPack", "./tsv_pack"
)
```

From here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.

Then, you can set up the pack like so:

```python
# setup pack arguments
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
from timescale_vector import client
from dotenv import load_dotenv, find_dotenv
import os
from datetime import timedelta

# this is an example of the metadata describing the nodes. The example is for git commit history.
vector_store_info = VectorStoreInfo(
    content_info="Description of the commits to PostgreSQL. Describes changes made to Postgres",
    metadata_info=[
        MetadataInfo(
            name="commit_hash",
            type="str",
            description="Commit Hash",
        ),
        MetadataInfo(
            name="author",
            type="str",
            description="Author of the commit",
        ),
        # "__start_date" is a special reserved name referring to the starting point for the time of the uuid field.
        MetadataInfo(
            name="__start_date",
            type="datetime in iso format",
            description="All results will be after this datetime",
        ),
        # "__end_date" is a special reserved name referring to the last point for the time of the uuid field.
        MetadataInfo(
            name="__end_date",
            type="datetime in iso format",
            description="All results will be before this datetime",
        ),
    ],
)

# nodes have to have their `id_` field set using a V1 UUID with the right time component
# this can be achieved by using `client.uuid_from_time(datetime_obj)`
nodes = [...]
# an example:
# nodes = [
#    TextNode(
#        id_=str(client.uuid_from_time(datetime(2021, 1, 1))),
#        text="My very interesting commit message",
#        metadata={
#            "author": "Matvey Arye",
#        },
#    )
# ]

_ = load_dotenv(find_dotenv(), override=True)
service_url = os.environ["TIMESCALE_SERVICE_URL"]

# choose a time_partition_interval for your data
# the interval should be chosen so that most queries
# will touch 1-2 partitions while all your data should
# fit in less than 1000 partitions.
time_partition_interval = timedelta(days=30)

# create the pack
tsv_pack = TimescaleVectorAutoretrievalPack(
    service_url=service_url,
    table_name="test",
    time_partition_interval=time_partition_interval,
    vector_store_info=vector_store_info,
    nodes=nodes,
)
```

The `run()` function is a light wrapper around `query_engine.query()`.

```python
response = tsv_pack.run(
    "What new features were added in the past three months?"
)
```

You can also use modules individually.

```python
# use the retriever
retriever = tsv_pack.retriever
nodes = retriever.retrieve("query_str")

# use the query engine
query_engine = tsv_pack.query_engine
response = query_engine.query("query_str")
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-packs-timescale-vector-autoretrieval",
    "maintainer": "cevian",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "autoretrieval, index, timescale, vector",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/94/2d/5541596dd37061326bc0a2b84d08c9ca3c4555c67fe32b3360366dbea81c/llama_index_packs_timescale_vector_autoretrieval-0.3.0.tar.gz",
    "platform": null,
    "description": "# Timescale Vector AutoRetrieval Pack With Hybrid Search on Time\n\nThis pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.\nThis sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:\n\n- News articles (covering various timely topics like politics or business)\n- Blog posts, documentation, or other published materials (both public and private)\n- Social media posts\n- Changelogs\n- Messages\n- Any timestamped data\n\nHybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.\nFor instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.\n\n[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.\n\nThe auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.\n\n## What is Timescale Vector?\n\n**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**\n\nTimescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n\n- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.\n- Enables fast time-based vector search via automatic time-based partitioning and indexing.\n\nTimescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:\n\n- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.\n- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.\n- Enables a worry-free experience with enterprise-grade security and compliance.\n\n### How to access Timescale Vector\n\nLlama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack\n```\n\nYou can then inspect the files at `./tsv_pack` and use them as a template for your own project.\n\n## Code Usage\n\nYou can download the pack to a the `./tsv_pack` directory:\n\n```python\nfrom llama_index.core.llama_pack import download_llama_pack\n\n# download and install dependencies\nTimescaleVectorAutoretrievalPack = download_llama_pack(\n    \"TimescaleVectorAutoretrievalPack\", \"./tsv_pack\"\n)\n```\n\nFrom here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.\n\nThen, you can set up the pack like so:\n\n```python\n# setup pack arguments\nfrom llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo\nfrom timescale_vector import client\nfrom dotenv import load_dotenv, find_dotenv\nimport os\nfrom datetime import timedelta\n\n# this is an example of the metadata describing the nodes. The example is for git commit history.\nvector_store_info = VectorStoreInfo(\n    content_info=\"Description of the commits to PostgreSQL. Describes changes made to Postgres\",\n    metadata_info=[\n        MetadataInfo(\n            name=\"commit_hash\",\n            type=\"str\",\n            description=\"Commit Hash\",\n        ),\n        MetadataInfo(\n            name=\"author\",\n            type=\"str\",\n            description=\"Author of the commit\",\n        ),\n        # \"__start_date\" is a special reserved name referring to the starting point for the time of the uuid field.\n        MetadataInfo(\n            name=\"__start_date\",\n            type=\"datetime in iso format\",\n            description=\"All results will be after this datetime\",\n        ),\n        # \"__end_date\" is a special reserved name referring to the last point for the time of the uuid field.\n        MetadataInfo(\n            name=\"__end_date\",\n            type=\"datetime in iso format\",\n            description=\"All results will be before this datetime\",\n        ),\n    ],\n)\n\n# nodes have to have their `id_` field set using a V1 UUID with the right time component\n# this can be achieved by using `client.uuid_from_time(datetime_obj)`\nnodes = [...]\n# an example:\n# nodes = [\n#    TextNode(\n#        id_=str(client.uuid_from_time(datetime(2021, 1, 1))),\n#        text=\"My very interesting commit message\",\n#        metadata={\n#            \"author\": \"Matvey Arye\",\n#        },\n#    )\n# ]\n\n_ = load_dotenv(find_dotenv(), override=True)\nservice_url = os.environ[\"TIMESCALE_SERVICE_URL\"]\n\n# choose a time_partition_interval for your data\n# the interval should be chosen so that most queries\n# will touch 1-2 partitions while all your data should\n# fit in less than 1000 partitions.\ntime_partition_interval = timedelta(days=30)\n\n# create the pack\ntsv_pack = TimescaleVectorAutoretrievalPack(\n    service_url=service_url,\n    table_name=\"test\",\n    time_partition_interval=time_partition_interval,\n    vector_store_info=vector_store_info,\n    nodes=nodes,\n)\n```\n\nThe `run()` function is a light wrapper around `query_engine.query()`.\n\n```python\nresponse = tsv_pack.run(\n    \"What new features were added in the past three months?\"\n)\n```\n\nYou can also use modules individually.\n\n```python\n# use the retriever\nretriever = tsv_pack.retriever\nnodes = retriever.retrieve(\"query_str\")\n\n# use the query engine\nquery_engine = tsv_pack.query_engine\nresponse = query_engine.query(\"query_str\")\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index packs timescale_vector_autoretrieval integration",
    "version": "0.3.0",
    "project_urls": null,
    "split_keywords": [
        "autoretrieval",
        " index",
        " timescale",
        " vector"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cbc6a870e80ac6f66594d21213ebdbce0d80b9d38dd391eb72b1814f4aa1c0d6",
                "md5": "02a4a3e06890fe0368d0764892fb131d",
                "sha256": "3a56c8ab0ad9188478b9212846b8a974dc47ffdae7b000989feb6908e71da492"
            },
            "downloads": -1,
            "filename": "llama_index_packs_timescale_vector_autoretrieval-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "02a4a3e06890fe0368d0764892fb131d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 5033,
            "upload_time": "2024-11-18T00:53:48",
            "upload_time_iso_8601": "2024-11-18T00:53:48.100528Z",
            "url": "https://files.pythonhosted.org/packages/cb/c6/a870e80ac6f66594d21213ebdbce0d80b9d38dd391eb72b1814f4aa1c0d6/llama_index_packs_timescale_vector_autoretrieval-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "942d5541596dd37061326bc0a2b84d08c9ca3c4555c67fe32b3360366dbea81c",
                "md5": "ffaa344eabd5bf3f4f1ad63e3c556ec9",
                "sha256": "9e3a92afdf5a2a9f37d53d2542c5dec59b25977f6e1856314cff87b2e1c8a9ec"
            },
            "downloads": -1,
            "filename": "llama_index_packs_timescale_vector_autoretrieval-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ffaa344eabd5bf3f4f1ad63e3c556ec9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 4663,
            "upload_time": "2024-11-18T00:53:48",
            "upload_time_iso_8601": "2024-11-18T00:53:48.936192Z",
            "url": "https://files.pythonhosted.org/packages/94/2d/5541596dd37061326bc0a2b84d08c9ca3c4555c67fe32b3360366dbea81c/llama_index_packs_timescale_vector_autoretrieval-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-18 00:53:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-packs-timescale-vector-autoretrieval"
}
        
Elapsed time: 0.40850s