# Timescale Vector AutoRetrieval Pack With Hybrid Search on Time
This pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.
This sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:
- News articles (covering various timely topics like politics or business)
- Blog posts, documentation, or other published materials (both public and private)
- Social media posts
- Changelogs
- Messages
- Any timestamped data
Hybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.
For instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.
[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.
The auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.
## What is Timescale Vector?
**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**
Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.
- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.
- Enables fast time-based vector search via automatic time-based partitioning and indexing.
Timescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:
- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.
- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.
- Enables a worry-free experience with enterprise-grade security and compliance.
### How to access Timescale Vector
Llama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack
```
You can then inspect the files at `./tsv_pack` and use them as a template for your own project.
## Code Usage
You can download the pack to a the `./tsv_pack` directory:
```python
from llama_index.core.llama_pack import download_llama_pack
# download and install dependencies
TimescaleVectorAutoretrievalPack = download_llama_pack(
"TimescaleVectorAutoretrievalPack", "./tsv_pack"
)
```
From here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.
Then, you can set up the pack like so:
```python
# setup pack arguments
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
from timescale_vector import client
from dotenv import load_dotenv, find_dotenv
import os
from datetime import timedelta
# this is an example of the metadata describing the nodes. The example is for git commit history.
vector_store_info = VectorStoreInfo(
content_info="Description of the commits to PostgreSQL. Describes changes made to Postgres",
metadata_info=[
MetadataInfo(
name="commit_hash",
type="str",
description="Commit Hash",
),
MetadataInfo(
name="author",
type="str",
description="Author of the commit",
),
# "__start_date" is a special reserved name referring to the starting point for the time of the uuid field.
MetadataInfo(
name="__start_date",
type="datetime in iso format",
description="All results will be after this datetime",
),
# "__end_date" is a special reserved name referring to the last point for the time of the uuid field.
MetadataInfo(
name="__end_date",
type="datetime in iso format",
description="All results will be before this datetime",
),
],
)
# nodes have to have their `id_` field set using a V1 UUID with the right time component
# this can be achieved by using `client.uuid_from_time(datetime_obj)`
nodes = [...]
# an example:
# nodes = [
# TextNode(
# id_=str(client.uuid_from_time(datetime(2021, 1, 1))),
# text="My very interesting commit message",
# metadata={
# "author": "Matvey Arye",
# },
# )
# ]
_ = load_dotenv(find_dotenv(), override=True)
service_url = os.environ["TIMESCALE_SERVICE_URL"]
# choose a time_partition_interval for your data
# the interval should be chosen so that most queries
# will touch 1-2 partitions while all your data should
# fit in less than 1000 partitions.
time_partition_interval = timedelta(days=30)
# create the pack
tsv_pack = TimescaleVectorAutoretrievalPack(
service_url=service_url,
table_name="test",
time_partition_interval=time_partition_interval,
vector_store_info=vector_store_info,
nodes=nodes,
)
```
The `run()` function is a light wrapper around `query_engine.query()`.
```python
response = tsv_pack.run(
"What new features were added in the past three months?"
)
```
You can also use modules individually.
```python
# use the retriever
retriever = tsv_pack.retriever
nodes = retriever.retrieve("query_str")
# use the query engine
query_engine = tsv_pack.query_engine
response = query_engine.query("query_str")
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-packs-timescale-vector-autoretrieval",
"maintainer": "cevian",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "autoretrieval, index, timescale, vector",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/94/2d/5541596dd37061326bc0a2b84d08c9ca3c4555c67fe32b3360366dbea81c/llama_index_packs_timescale_vector_autoretrieval-0.3.0.tar.gz",
"platform": null,
"description": "# Timescale Vector AutoRetrieval Pack With Hybrid Search on Time\n\nThis pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.\nThis sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:\n\n- News articles (covering various timely topics like politics or business)\n- Blog posts, documentation, or other published materials (both public and private)\n- Social media posts\n- Changelogs\n- Messages\n- Any timestamped data\n\nHybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.\nFor instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.\n\n[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.\n\nThe auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.\n\n## What is Timescale Vector?\n\n**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**\n\nTimescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n\n- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.\n- Enables fast time-based vector search via automatic time-based partitioning and indexing.\n\nTimescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:\n\n- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.\n- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.\n- Enables a worry-free experience with enterprise-grade security and compliance.\n\n### How to access Timescale Vector\n\nLlama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack\n```\n\nYou can then inspect the files at `./tsv_pack` and use them as a template for your own project.\n\n## Code Usage\n\nYou can download the pack to a the `./tsv_pack` directory:\n\n```python\nfrom llama_index.core.llama_pack import download_llama_pack\n\n# download and install dependencies\nTimescaleVectorAutoretrievalPack = download_llama_pack(\n \"TimescaleVectorAutoretrievalPack\", \"./tsv_pack\"\n)\n```\n\nFrom here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.\n\nThen, you can set up the pack like so:\n\n```python\n# setup pack arguments\nfrom llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo\nfrom timescale_vector import client\nfrom dotenv import load_dotenv, find_dotenv\nimport os\nfrom datetime import timedelta\n\n# this is an example of the metadata describing the nodes. The example is for git commit history.\nvector_store_info = VectorStoreInfo(\n content_info=\"Description of the commits to PostgreSQL. Describes changes made to Postgres\",\n metadata_info=[\n MetadataInfo(\n name=\"commit_hash\",\n type=\"str\",\n description=\"Commit Hash\",\n ),\n MetadataInfo(\n name=\"author\",\n type=\"str\",\n description=\"Author of the commit\",\n ),\n # \"__start_date\" is a special reserved name referring to the starting point for the time of the uuid field.\n MetadataInfo(\n name=\"__start_date\",\n type=\"datetime in iso format\",\n description=\"All results will be after this datetime\",\n ),\n # \"__end_date\" is a special reserved name referring to the last point for the time of the uuid field.\n MetadataInfo(\n name=\"__end_date\",\n type=\"datetime in iso format\",\n description=\"All results will be before this datetime\",\n ),\n ],\n)\n\n# nodes have to have their `id_` field set using a V1 UUID with the right time component\n# this can be achieved by using `client.uuid_from_time(datetime_obj)`\nnodes = [...]\n# an example:\n# nodes = [\n# TextNode(\n# id_=str(client.uuid_from_time(datetime(2021, 1, 1))),\n# text=\"My very interesting commit message\",\n# metadata={\n# \"author\": \"Matvey Arye\",\n# },\n# )\n# ]\n\n_ = load_dotenv(find_dotenv(), override=True)\nservice_url = os.environ[\"TIMESCALE_SERVICE_URL\"]\n\n# choose a time_partition_interval for your data\n# the interval should be chosen so that most queries\n# will touch 1-2 partitions while all your data should\n# fit in less than 1000 partitions.\ntime_partition_interval = timedelta(days=30)\n\n# create the pack\ntsv_pack = TimescaleVectorAutoretrievalPack(\n service_url=service_url,\n table_name=\"test\",\n time_partition_interval=time_partition_interval,\n vector_store_info=vector_store_info,\n nodes=nodes,\n)\n```\n\nThe `run()` function is a light wrapper around `query_engine.query()`.\n\n```python\nresponse = tsv_pack.run(\n \"What new features were added in the past three months?\"\n)\n```\n\nYou can also use modules individually.\n\n```python\n# use the retriever\nretriever = tsv_pack.retriever\nnodes = retriever.retrieve(\"query_str\")\n\n# use the query engine\nquery_engine = tsv_pack.query_engine\nresponse = query_engine.query(\"query_str\")\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs timescale_vector_autoretrieval integration",
"version": "0.3.0",
"project_urls": null,
"split_keywords": [
"autoretrieval",
" index",
" timescale",
" vector"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cbc6a870e80ac6f66594d21213ebdbce0d80b9d38dd391eb72b1814f4aa1c0d6",
"md5": "02a4a3e06890fe0368d0764892fb131d",
"sha256": "3a56c8ab0ad9188478b9212846b8a974dc47ffdae7b000989feb6908e71da492"
},
"downloads": -1,
"filename": "llama_index_packs_timescale_vector_autoretrieval-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "02a4a3e06890fe0368d0764892fb131d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 5033,
"upload_time": "2024-11-18T00:53:48",
"upload_time_iso_8601": "2024-11-18T00:53:48.100528Z",
"url": "https://files.pythonhosted.org/packages/cb/c6/a870e80ac6f66594d21213ebdbce0d80b9d38dd391eb72b1814f4aa1c0d6/llama_index_packs_timescale_vector_autoretrieval-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "942d5541596dd37061326bc0a2b84d08c9ca3c4555c67fe32b3360366dbea81c",
"md5": "ffaa344eabd5bf3f4f1ad63e3c556ec9",
"sha256": "9e3a92afdf5a2a9f37d53d2542c5dec59b25977f6e1856314cff87b2e1c8a9ec"
},
"downloads": -1,
"filename": "llama_index_packs_timescale_vector_autoretrieval-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "ffaa344eabd5bf3f4f1ad63e3c556ec9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 4663,
"upload_time": "2024-11-18T00:53:48",
"upload_time_iso_8601": "2024-11-18T00:53:48.936192Z",
"url": "https://files.pythonhosted.org/packages/94/2d/5541596dd37061326bc0a2b84d08c9ca3c4555c67fe32b3360366dbea81c/llama_index_packs_timescale_vector_autoretrieval-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-18 00:53:48",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-timescale-vector-autoretrieval"
}