llama-index-packs-timescale-vector-autoretrieval


Namellama-index-packs-timescale-vector-autoretrieval JSON
Version 0.1.3 PyPI version JSON
download
home_page
Summaryllama-index packs timescale_vector_autoretrieval integration
upload_time2024-02-22 01:43:37
maintainercevian
docs_urlNone
authorYour Name
requires_python>=3.8.1,<4.0
licenseMIT
keywords autoretrieval index timescale vector
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Timescale Vector AutoRetrieval Pack With Hybrid Search on Time

This pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.
This sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:

- News articles (covering various timely topics like politics or business)
- Blog posts, documentation, or other published materials (both public and private)
- Social media posts
- Changelogs
- Messages
- Any timestamped data

Hybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.
For instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.

[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.

The auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.

## What is Timescale Vector?

**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**

Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.

- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.
- Enables fast time-based vector search via automatic time-based partitioning and indexing.

Timescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:

- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.
- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.
- Enables a worry-free experience with enterprise-grade security and compliance.

### How to access Timescale Vector

Llama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.

## CLI Usage

You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:

```bash
llamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack
```

You can then inspect the files at `./tsv_pack` and use them as a template for your own project.

## Code Usage

You can download the pack to a the `./tsv_pack` directory:

```python
from llama_hub.llama_pack import download_llama_pack

# download and install dependencies
TimescaleVectorAutoretrievalPack = download_llama_pack(
    "TimescaleVectorAutoretrievalPack", "./tsv_pack"
)
```

From here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.

Then, you can set up the pack like so:

```python
# setup pack arguments
from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo
from timescale_vector import client
from dotenv import load_dotenv, find_dotenv
import os
from datetime import timedelta

# this is an example of the metadata describing the nodes. The example is for git commit history.
vector_store_info = VectorStoreInfo(
    content_info="Description of the commits to PostgreSQL. Describes changes made to Postgres",
    metadata_info=[
        MetadataInfo(
            name="commit_hash",
            type="str",
            description="Commit Hash",
        ),
        MetadataInfo(
            name="author",
            type="str",
            description="Author of the commit",
        ),
        # "__start_date" is a special reserved name referring to the starting point for the time of the uuid field.
        MetadataInfo(
            name="__start_date",
            type="datetime in iso format",
            description="All results will be after this datetime",
        ),
        # "__end_date" is a special reserved name referring to the last point for the time of the uuid field.
        MetadataInfo(
            name="__end_date",
            type="datetime in iso format",
            description="All results will be before this datetime",
        ),
    ],
)

# nodes have to have their `id_` field set using a V1 UUID with the right time component
# this can be achieved by using `client.uuid_from_time(datetime_obj)`
nodes = [...]
# an example:
# nodes = [
#    TextNode(
#        id_=str(client.uuid_from_time(datetime(2021, 1, 1))),
#        text="My very interesting commit message",
#        metadata={
#            "author": "Matvey Arye",
#        },
#    )
# ]

_ = load_dotenv(find_dotenv(), override=True)
service_url = os.environ["TIMESCALE_SERVICE_URL"]

# choose a time_partition_interval for your data
# the interval should be chosen so that most queries
# will touch 1-2 partitions while all your data should
# fit in less than 1000 partitions.
time_partition_interval = timedelta(days=30)

# create the pack
tsv_pack = TimescaleVectorAutoretrievalPack(
    service_url=service_url,
    table_name="test",
    time_partition_interval=time_partition_interval,
    vector_store_info=vector_store_info,
    nodes=nodes,
)
```

The `run()` function is a light wrapper around `query_engine.query()`.

```python
response = tsv_pack.run(
    "What new features were added in the past three months?"
)
```

You can also use modules individually.

```python
# use the retriever
retriever = tsv_pack.retriever
nodes = retriever.retrieve("query_str")

# use the query engine
query_engine = tsv_pack.query_engine
response = query_engine.query("query_str")
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "llama-index-packs-timescale-vector-autoretrieval",
    "maintainer": "cevian",
    "docs_url": null,
    "requires_python": ">=3.8.1,<4.0",
    "maintainer_email": "",
    "keywords": "autoretrieval,index,timescale,vector",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/c2/e2/3a6801cce2b619ed8d846f008c3c44e9d14fc86020275c5c2f623cbc03cb/llama_index_packs_timescale_vector_autoretrieval-0.1.3.tar.gz",
    "platform": null,
    "description": "# Timescale Vector AutoRetrieval Pack With Hybrid Search on Time\n\nThis pack demonstrates performing auto-retrieval for hybrid search based on both similarity and time, using the timescale-vector (PostgreSQL) vectorstore.\nThis sort of time-based retrieval is particularly effective for data where time is a key element of the data, such as:\n\n- News articles (covering various timely topics like politics or business)\n- Blog posts, documentation, or other published materials (both public and private)\n- Social media posts\n- Changelogs\n- Messages\n- Any timestamped data\n\nHybrid search of similarity and time method is ideal for queries that require sorting by semantic relevance while filtering by time and date.\nFor instance: (1) Finding recent posts about NVDA in the past 7 days (2) Finding all news articles related to music celebrities from 2020.\n\n[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is a PostgreSQL-based vector database that provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.\n\nThe auto-retriever will use the LLM at runtime to set metadata filters (including deducing the time-ranges to search), a top-k value, and the query string for similarity search based on the text of user queries. That query will then be executed on the vector store.\n\n## What is Timescale Vector?\n\n**[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) is PostgreSQL++ for AI applications.**\n\nTimescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n\n- Enhances `pgvector` with faster and more accurate similarity search on millions vectors via a DiskANN inspired indexing algorithm.\n- Enables fast time-based vector search via automatic time-based partitioning and indexing.\n\nTimescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:\n\n- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.\n- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.\n- Enables a worry-free experience with enterprise-grade security and compliance.\n\n### How to access Timescale Vector\n\nLlama index users get a 90-day free trial for Timescale Vector. [Sign up here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for a free cloud vector database.\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack TimescaleVectorAutoretrievalPack --download-dir ./tsv_pack\n```\n\nYou can then inspect the files at `./tsv_pack` and use them as a template for your own project.\n\n## Code Usage\n\nYou can download the pack to a the `./tsv_pack` directory:\n\n```python\nfrom llama_hub.llama_pack import download_llama_pack\n\n# download and install dependencies\nTimescaleVectorAutoretrievalPack = download_llama_pack(\n    \"TimescaleVectorAutoretrievalPack\", \"./tsv_pack\"\n)\n```\n\nFrom here, you can use the pack, or inspect and modify the pack in `./tsv_pack`.\n\nThen, you can set up the pack like so:\n\n```python\n# setup pack arguments\nfrom llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo\nfrom timescale_vector import client\nfrom dotenv import load_dotenv, find_dotenv\nimport os\nfrom datetime import timedelta\n\n# this is an example of the metadata describing the nodes. The example is for git commit history.\nvector_store_info = VectorStoreInfo(\n    content_info=\"Description of the commits to PostgreSQL. Describes changes made to Postgres\",\n    metadata_info=[\n        MetadataInfo(\n            name=\"commit_hash\",\n            type=\"str\",\n            description=\"Commit Hash\",\n        ),\n        MetadataInfo(\n            name=\"author\",\n            type=\"str\",\n            description=\"Author of the commit\",\n        ),\n        # \"__start_date\" is a special reserved name referring to the starting point for the time of the uuid field.\n        MetadataInfo(\n            name=\"__start_date\",\n            type=\"datetime in iso format\",\n            description=\"All results will be after this datetime\",\n        ),\n        # \"__end_date\" is a special reserved name referring to the last point for the time of the uuid field.\n        MetadataInfo(\n            name=\"__end_date\",\n            type=\"datetime in iso format\",\n            description=\"All results will be before this datetime\",\n        ),\n    ],\n)\n\n# nodes have to have their `id_` field set using a V1 UUID with the right time component\n# this can be achieved by using `client.uuid_from_time(datetime_obj)`\nnodes = [...]\n# an example:\n# nodes = [\n#    TextNode(\n#        id_=str(client.uuid_from_time(datetime(2021, 1, 1))),\n#        text=\"My very interesting commit message\",\n#        metadata={\n#            \"author\": \"Matvey Arye\",\n#        },\n#    )\n# ]\n\n_ = load_dotenv(find_dotenv(), override=True)\nservice_url = os.environ[\"TIMESCALE_SERVICE_URL\"]\n\n# choose a time_partition_interval for your data\n# the interval should be chosen so that most queries\n# will touch 1-2 partitions while all your data should\n# fit in less than 1000 partitions.\ntime_partition_interval = timedelta(days=30)\n\n# create the pack\ntsv_pack = TimescaleVectorAutoretrievalPack(\n    service_url=service_url,\n    table_name=\"test\",\n    time_partition_interval=time_partition_interval,\n    vector_store_info=vector_store_info,\n    nodes=nodes,\n)\n```\n\nThe `run()` function is a light wrapper around `query_engine.query()`.\n\n```python\nresponse = tsv_pack.run(\n    \"What new features were added in the past three months?\"\n)\n```\n\nYou can also use modules individually.\n\n```python\n# use the retriever\nretriever = tsv_pack.retriever\nnodes = retriever.retrieve(\"query_str\")\n\n# use the query engine\nquery_engine = tsv_pack.query_engine\nresponse = query_engine.query(\"query_str\")\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index packs timescale_vector_autoretrieval integration",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [
        "autoretrieval",
        "index",
        "timescale",
        "vector"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5230b7143e8d4a8ebdaf42f0f685f26b9992d43694cdb4999f325a381e75adb",
                "md5": "593020b8dc40032c21dd449c924aa500",
                "sha256": "86f9565b7ecc911e68fcbf5df5a55afd0b55b761cc13cb585f0fc63476aff19e"
            },
            "downloads": -1,
            "filename": "llama_index_packs_timescale_vector_autoretrieval-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "593020b8dc40032c21dd449c924aa500",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1,<4.0",
            "size": 5044,
            "upload_time": "2024-02-22T01:43:36",
            "upload_time_iso_8601": "2024-02-22T01:43:36.041305Z",
            "url": "https://files.pythonhosted.org/packages/b5/23/0b7143e8d4a8ebdaf42f0f685f26b9992d43694cdb4999f325a381e75adb/llama_index_packs_timescale_vector_autoretrieval-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c2e23a6801cce2b619ed8d846f008c3c44e9d14fc86020275c5c2f623cbc03cb",
                "md5": "2e9a63e69220fa316f2c920fa677fc88",
                "sha256": "979258459a6f475b150a57e20a7f3851f293b5d5c72a847dbe44f40ea1fcf340"
            },
            "downloads": -1,
            "filename": "llama_index_packs_timescale_vector_autoretrieval-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "2e9a63e69220fa316f2c920fa677fc88",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1,<4.0",
            "size": 4684,
            "upload_time": "2024-02-22T01:43:37",
            "upload_time_iso_8601": "2024-02-22T01:43:37.649575Z",
            "url": "https://files.pythonhosted.org/packages/c2/e2/3a6801cce2b619ed8d846f008c3c44e9d14fc86020275c5c2f623cbc03cb/llama_index_packs_timescale_vector_autoretrieval-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-22 01:43:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-packs-timescale-vector-autoretrieval"
}
        
Elapsed time: 0.18792s