# Timescale Vector AutoRetrieval Pack With Hybrid Search on Time
This pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.
This sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:
- News articles (covering various timely topics like politics or business)
- Blog posts, documentation, or other published materials (both public and private)
- Social media posts
- Changelogs
- Messages
- Any timestamped data
Hybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.
For instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.
[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.
The auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.
## What is Timescale Vector?
**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**
Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.
- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.
- Enables fast time-based vector search via automatic time-based partitioning and indexing.
Timescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:
- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.
- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.
- Enables a worry-free experience with enterprise-grade security and compliance.
### How to access Timescale Vector
Llama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack
```
You can then inspect the files at `./tsv_pack` and use them as a template for your own project.
## Code Usage
You can download the pack to a the `./tsv_pack` directory:
```python
from llama_hub.llama_pack import download_llama_pack
# download and install dependencies
TimescaleVectorAutoretrievalPack = download_llama_pack(
"TimescaleVectorAutoretrievalPack", "./tsv_pack"
)
```
From here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.
Then, you can set up the pack like so:
```python
# setup pack arguments
from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo
from timescale_vector import client
from dotenv import load_dotenv, find_dotenv
import os
from datetime import timedelta
# this is an example of the metadata describing the nodes. The example is for git commit history.
vector_store_info = VectorStoreInfo(
content_info="Description of the commits to PostgreSQL. Describes changes made to Postgres",
metadata_info=[
MetadataInfo(
name="commit_hash",
type="str",
description="Commit Hash",
),
MetadataInfo(
name="author",
type="str",
description="Author of the commit",
),
# "__start_date" is a special reserved name referring to the starting point for the time of the uuid field.
MetadataInfo(
name="__start_date",
type="datetime in iso format",
description="All results will be after this datetime",
),
# "__end_date" is a special reserved name referring to the last point for the time of the uuid field.
MetadataInfo(
name="__end_date",
type="datetime in iso format",
description="All results will be before this datetime",
),
],
)
# nodes have to have their `id_` field set using a V1 UUID with the right time component
# this can be achieved by using `client.uuid_from_time(datetime_obj)`
nodes = [...]
# an example:
# nodes = [
# TextNode(
# id_=str(client.uuid_from_time(datetime(2021, 1, 1))),
# text="My very interesting commit message",
# metadata={
# "author": "Matvey Arye",
# },
# )
# ]
_ = load_dotenv(find_dotenv(), override=True)
service_url = os.environ["TIMESCALE_SERVICE_URL"]
# choose a time_partition_interval for your data
# the interval should be chosen so that most queries
# will touch 1-2 partitions while all your data should
# fit in less than 1000 partitions.
time_partition_interval = timedelta(days=30)
# create the pack
tsv_pack = TimescaleVectorAutoretrievalPack(
service_url=service_url,
table_name="test",
time_partition_interval=time_partition_interval,
vector_store_info=vector_store_info,
nodes=nodes,
)
```
The `run()` function is a light wrapper around `query_engine.query()`.
```python
response = tsv_pack.run(
"What new features were added in the past three months?"
)
```
You can also use modules individually.
```python
# use the retriever
retriever = tsv_pack.retriever
nodes = retriever.retrieve("query_str")
# use the query engine
query_engine = tsv_pack.query_engine
response = query_engine.query("query_str")
```
Raw data
{
"_id": null,
"home_page": "",
"name": "llama-index-packs-timescale-vector-autoretrieval",
"maintainer": "cevian",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0",
"maintainer_email": "",
"keywords": "autoretrieval,index,timescale,vector",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/c2/e2/3a6801cce2b619ed8d846f008c3c44e9d14fc86020275c5c2f623cbc03cb/llama_index_packs_timescale_vector_autoretrieval-0.1.3.tar.gz",
"platform": null,
"description": "# Timescale Vector AutoRetrieval Pack With Hybrid Search on Time\n\nThis pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.\nThis sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:\n\n- News articles (covering various timely topics like politics or business)\n- Blog posts, documentation, or other published materials (both public and private)\n- Social media posts\n- Changelogs\n- Messages\n- Any timestamped data\n\nHybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.\nFor instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.\n\n[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.\n\nThe auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.\n\n## What is Timescale Vector?\n\n**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**\n\nTimescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n\n- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.\n- Enables fast time-based vector search via automatic time-based partitioning and indexing.\n\nTimescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:\n\n- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.\n- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.\n- Enables a worry-free experience with enterprise-grade security and compliance.\n\n### How to access Timescale Vector\n\nLlama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack\n```\n\nYou can then inspect the files at `./tsv_pack` and use them as a template for your own project.\n\n## Code Usage\n\nYou can download the pack to a the `./tsv_pack` directory:\n\n```python\nfrom llama_hub.llama_pack import download_llama_pack\n\n# download and install dependencies\nTimescaleVectorAutoretrievalPack = download_llama_pack(\n \"TimescaleVectorAutoretrievalPack\", \"./tsv_pack\"\n)\n```\n\nFrom here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.\n\nThen, you can set up the pack like so:\n\n```python\n# setup pack arguments\nfrom llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo\nfrom timescale_vector import client\nfrom dotenv import load_dotenv, find_dotenv\nimport os\nfrom datetime import timedelta\n\n# this is an example of the metadata describing the nodes. The example is for git commit history.\nvector_store_info = VectorStoreInfo(\n content_info=\"Description of the commits to PostgreSQL. Describes changes made to Postgres\",\n metadata_info=[\n MetadataInfo(\n name=\"commit_hash\",\n type=\"str\",\n description=\"Commit Hash\",\n ),\n MetadataInfo(\n name=\"author\",\n type=\"str\",\n description=\"Author of the commit\",\n ),\n # \"__start_date\" is a special reserved name referring to the starting point for the time of the uuid field.\n MetadataInfo(\n name=\"__start_date\",\n type=\"datetime in iso format\",\n description=\"All results will be after this datetime\",\n ),\n # \"__end_date\" is a special reserved name referring to the last point for the time of the uuid field.\n MetadataInfo(\n name=\"__end_date\",\n type=\"datetime in iso format\",\n description=\"All results will be before this datetime\",\n ),\n ],\n)\n\n# nodes have to have their `id_` field set using a V1 UUID with the right time component\n# this can be achieved by using `client.uuid_from_time(datetime_obj)`\nnodes = [...]\n# an example:\n# nodes = [\n# TextNode(\n# id_=str(client.uuid_from_time(datetime(2021, 1, 1))),\n# text=\"My very interesting commit message\",\n# metadata={\n# \"author\": \"Matvey Arye\",\n# },\n# )\n# ]\n\n_ = load_dotenv(find_dotenv(), override=True)\nservice_url = os.environ[\"TIMESCALE_SERVICE_URL\"]\n\n# choose a time_partition_interval for your data\n# the interval should be chosen so that most queries\n# will touch 1-2 partitions while all your data should\n# fit in less than 1000 partitions.\ntime_partition_interval = timedelta(days=30)\n\n# create the pack\ntsv_pack = TimescaleVectorAutoretrievalPack(\n service_url=service_url,\n table_name=\"test\",\n time_partition_interval=time_partition_interval,\n vector_store_info=vector_store_info,\n nodes=nodes,\n)\n```\n\nThe `run()` function is a light wrapper around `query_engine.query()`.\n\n```python\nresponse = tsv_pack.run(\n \"What new features were added in the past three months?\"\n)\n```\n\nYou can also use modules individually.\n\n```python\n# use the retriever\nretriever = tsv_pack.retriever\nnodes = retriever.retrieve(\"query_str\")\n\n# use the query engine\nquery_engine = tsv_pack.query_engine\nresponse = query_engine.query(\"query_str\")\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs timescale_vector_autoretrieval integration",
"version": "0.1.3",
"project_urls": null,
"split_keywords": [
"autoretrieval",
"index",
"timescale",
"vector"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b5230b7143e8d4a8ebdaf42f0f685f26b9992d43694cdb4999f325a381e75adb",
"md5": "593020b8dc40032c21dd449c924aa500",
"sha256": "86f9565b7ecc911e68fcbf5df5a55afd0b55b761cc13cb585f0fc63476aff19e"
},
"downloads": -1,
"filename": "llama_index_packs_timescale_vector_autoretrieval-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "593020b8dc40032c21dd449c924aa500",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0",
"size": 5044,
"upload_time": "2024-02-22T01:43:36",
"upload_time_iso_8601": "2024-02-22T01:43:36.041305Z",
"url": "https://files.pythonhosted.org/packages/b5/23/0b7143e8d4a8ebdaf42f0f685f26b9992d43694cdb4999f325a381e75adb/llama_index_packs_timescale_vector_autoretrieval-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c2e23a6801cce2b619ed8d846f008c3c44e9d14fc86020275c5c2f623cbc03cb",
"md5": "2e9a63e69220fa316f2c920fa677fc88",
"sha256": "979258459a6f475b150a57e20a7f3851f293b5d5c72a847dbe44f40ea1fcf340"
},
"downloads": -1,
"filename": "llama_index_packs_timescale_vector_autoretrieval-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "2e9a63e69220fa316f2c920fa677fc88",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0",
"size": 4684,
"upload_time": "2024-02-22T01:43:37",
"upload_time_iso_8601": "2024-02-22T01:43:37.649575Z",
"url": "https://files.pythonhosted.org/packages/c2/e2/3a6801cce2b619ed8d846f008c3c44e9d14fc86020275c5c2f623cbc03cb/llama_index_packs_timescale_vector_autoretrieval-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-22 01:43:37",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-timescale-vector-autoretrieval"
}