llm-elasticsearch-cache


Namellm-elasticsearch-cache JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/SpazioDati/llm-elasticsearch-cache
SummaryA caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.
upload_time2024-04-08 10:14:30
maintainerGiacomo Berardi
docs_urlNone
authorSpazioDati s.r.l.
requires_python<4.0,>=3.10
licenseMIT
keywords langchain elasticsearch openai llm chatgpt
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # llm-elasticsearch-cache

A caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.

## Install

```shell
pip install llm-elasticsearch-cache
```

## Chat cache usage

The LangChain cache can be used similarly to the
[other cache integrations](https://python.langchain.com/docs/integrations/llms/llm_caching).

### Basic example

```python
from langchain.globals import set_llm_cache
from llmescache.langchain import ElasticsearchCache
from elasticsearch import Elasticsearch

es_client = Elasticsearch(hosts="http://localhost:9200")
set_llm_cache(
    ElasticsearchCache(
        es_client=es_client, 
        es_index="llm-chat-cache", 
        metadata={"project": "my_chatgpt_project"}
    )
)
```

The `es_index` parameter can also take aliases. This allows to use the 
[ILM: Manage the index lifecycle](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html)
that we suggest to consider for managing retention and controlling cache growth.

Look at the class docstring for all parameters.

### Index the generated text

The cached data won't be searchable by default.
The developer can customize the building of the Elasticsearch document in order to add indexed text fields,
where to put, for example, the text generated by the LLM.

This can be done by subclassing end overriding methods.
The new cache class can be applied also to a pre-existing cache index:

```python
from llmescache.langchain import ElasticsearchCache
from elasticsearch import Elasticsearch
from langchain_core.caches import RETURN_VAL_TYPE
from typing import Any, Dict, List
from langchain.globals import set_llm_cache
import json


class SearchableElasticsearchCache(ElasticsearchCache):

    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["parsed_llm_output"] = {"type": "text", "analyzer": "english"}
        return mapping
    
    def build_document(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> Dict[str, Any]:
        body = super().build_document(prompt, llm_string, return_val)
        body["parsed_llm_output"] = self._parse_output(body["llm_output"])
        return body

    @staticmethod
    def _parse_output(data: List[str]) -> List[str]:
        return [json.loads(output)["kwargs"]["message"]["kwargs"]["content"] for output in data]


es_client = Elasticsearch(hosts="http://localhost:9200")
set_llm_cache(SearchableElasticsearchCache(es_client=es_client, es_index="llm-chat-cache"))
```

## Embeddings cache usage

Caching embeddings is obtained by using the [CacheBackedEmbeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/caching_embeddings),
in a slightly different way than the official documentation.

```python
from llmescache.langchain import ElasticsearchStore
from elasticsearch import Elasticsearch
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings

es_client = Elasticsearch(hosts="http://localhost:9200")

underlying_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = ElasticsearchStore(
    es_client=es_client, 
    es_index="llm-embeddings-cache",
    namespace=underlying_embeddings.model,
    metadata={"project": "my_llm_project"}
)
cached_embeddings = CacheBackedEmbeddings(
    underlying_embeddings, 
    store
)
```

Similarly to the chat cache, one can subclass `ElasticsearchStore` in order to index vectors for search.

```python
from llmescache.langchain import ElasticsearchStore
from typing import Any, Dict, List

class SearchableElasticsearchStore(ElasticsearchStore):

    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["vector"] = {"type": "dense_vector", "dims": 1536, "index": True, "similarity": "dot_product"}
        return mapping
    
    def build_document(self, llm_input: str, vector: List[float]) -> Dict[str, Any]:
        body = super().build_document(llm_input, vector)
        body["vector"] = vector
        return body
```

Be aware that `CacheBackedEmbeddings` does 
[not currently support caching queries](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain.embeddings.cache.CacheBackedEmbeddings.embed_query),
this means that text queries, for vector searches, won't be cached.
However, by overriding the `embed_query` method one should be able to easily implement it.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SpazioDati/llm-elasticsearch-cache",
    "name": "llm-elasticsearch-cache",
    "maintainer": "Giacomo Berardi",
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": "berardi@spaziodati.eu",
    "keywords": "langchain, elasticsearch, openai, llm, chatgpt",
    "author": "SpazioDati s.r.l.",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/88/51/b62c027e667ebdfee1216a30d75c450491153a4ed95d11cceb82516e8992/llm_elasticsearch_cache-0.2.2.tar.gz",
    "platform": null,
    "description": "# llm-elasticsearch-cache\n\nA caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.\n\n## Install\n\n```shell\npip install llm-elasticsearch-cache\n```\n\n## Chat cache usage\n\nThe LangChain cache can be used similarly to the\n[other cache integrations](https://python.langchain.com/docs/integrations/llms/llm_caching).\n\n### Basic example\n\n```python\nfrom langchain.globals import set_llm_cache\nfrom llmescache.langchain import ElasticsearchCache\nfrom elasticsearch import Elasticsearch\n\nes_client = Elasticsearch(hosts=\"http://localhost:9200\")\nset_llm_cache(\n    ElasticsearchCache(\n        es_client=es_client, \n        es_index=\"llm-chat-cache\", \n        metadata={\"project\": \"my_chatgpt_project\"}\n    )\n)\n```\n\nThe `es_index` parameter can also take aliases. This allows to use the \n[ILM: Manage the index lifecycle](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html)\nthat we suggest to consider for managing retention and controlling cache growth.\n\nLook at the class docstring for all parameters.\n\n### Index the generated text\n\nThe cached data won't be searchable by default.\nThe developer can customize the building of the Elasticsearch document in order to add indexed text fields,\nwhere to put, for example, the text generated by the LLM.\n\nThis can be done by subclassing end overriding methods.\nThe new cache class can be applied also to a pre-existing cache index:\n\n```python\nfrom llmescache.langchain import ElasticsearchCache\nfrom elasticsearch import Elasticsearch\nfrom langchain_core.caches import RETURN_VAL_TYPE\nfrom typing import Any, Dict, List\nfrom langchain.globals import set_llm_cache\nimport json\n\n\nclass SearchableElasticsearchCache(ElasticsearchCache):\n\n    @property\n    def mapping(self) -> Dict[str, Any]:\n        mapping = super().mapping\n        mapping[\"mappings\"][\"properties\"][\"parsed_llm_output\"] = {\"type\": \"text\", \"analyzer\": \"english\"}\n        return mapping\n    \n    def build_document(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> Dict[str, Any]:\n        body = super().build_document(prompt, llm_string, return_val)\n        body[\"parsed_llm_output\"] = self._parse_output(body[\"llm_output\"])\n        return body\n\n    @staticmethod\n    def _parse_output(data: List[str]) -> List[str]:\n        return [json.loads(output)[\"kwargs\"][\"message\"][\"kwargs\"][\"content\"] for output in data]\n\n\nes_client = Elasticsearch(hosts=\"http://localhost:9200\")\nset_llm_cache(SearchableElasticsearchCache(es_client=es_client, es_index=\"llm-chat-cache\"))\n```\n\n## Embeddings cache usage\n\nCaching embeddings is obtained by using the [CacheBackedEmbeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/caching_embeddings),\nin a slightly different way than the official documentation.\n\n```python\nfrom llmescache.langchain import ElasticsearchStore\nfrom elasticsearch import Elasticsearch\nfrom langchain.embeddings import CacheBackedEmbeddings\nfrom langchain_openai import OpenAIEmbeddings\n\nes_client = Elasticsearch(hosts=\"http://localhost:9200\")\n\nunderlying_embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\nstore = ElasticsearchStore(\n    es_client=es_client, \n    es_index=\"llm-embeddings-cache\",\n    namespace=underlying_embeddings.model,\n    metadata={\"project\": \"my_llm_project\"}\n)\ncached_embeddings = CacheBackedEmbeddings(\n    underlying_embeddings, \n    store\n)\n```\n\nSimilarly to the chat cache, one can subclass `ElasticsearchStore` in order to index vectors for search.\n\n```python\nfrom llmescache.langchain import ElasticsearchStore\nfrom typing import Any, Dict, List\n\nclass SearchableElasticsearchStore(ElasticsearchStore):\n\n    @property\n    def mapping(self) -> Dict[str, Any]:\n        mapping = super().mapping\n        mapping[\"mappings\"][\"properties\"][\"vector\"] = {\"type\": \"dense_vector\", \"dims\": 1536, \"index\": True, \"similarity\": \"dot_product\"}\n        return mapping\n    \n    def build_document(self, llm_input: str, vector: List[float]) -> Dict[str, Any]:\n        body = super().build_document(llm_input, vector)\n        body[\"vector\"] = vector\n        return body\n```\n\nBe aware that `CacheBackedEmbeddings` does \n[not currently support caching queries](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain.embeddings.cache.CacheBackedEmbeddings.embed_query),\nthis means that text queries, for vector searches, won't be cached.\nHowever, by overriding the `embed_query` method one should be able to easily implement it.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.",
    "version": "0.2.2",
    "project_urls": {
        "Homepage": "https://github.com/SpazioDati/llm-elasticsearch-cache",
        "Repository": "https://github.com/SpazioDati/llm-elasticsearch-cache"
    },
    "split_keywords": [
        "langchain",
        " elasticsearch",
        " openai",
        " llm",
        " chatgpt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5085ab84a426d615060735ba34c9d48fcf50f5b0f4c6bd3cd4c1b5350eea5545",
                "md5": "67090191347fd17152cf2ef8bd29872b",
                "sha256": "a7e1992094367ff98c5204da3471a4946903ee06dc43963d98fe68d80d1af1f2"
            },
            "downloads": -1,
            "filename": "llm_elasticsearch_cache-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "67090191347fd17152cf2ef8bd29872b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 8828,
            "upload_time": "2024-04-08T10:14:27",
            "upload_time_iso_8601": "2024-04-08T10:14:27.555401Z",
            "url": "https://files.pythonhosted.org/packages/50/85/ab84a426d615060735ba34c9d48fcf50f5b0f4c6bd3cd4c1b5350eea5545/llm_elasticsearch_cache-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8851b62c027e667ebdfee1216a30d75c450491153a4ed95d11cceb82516e8992",
                "md5": "4e099fd811ee6aa314bde740f0c643dd",
                "sha256": "8ab069a7870553f78e815cac7061a839a75e6853a58cd5594d77032b2eb5e5d6"
            },
            "downloads": -1,
            "filename": "llm_elasticsearch_cache-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4e099fd811ee6aa314bde740f0c643dd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 6555,
            "upload_time": "2024-04-08T10:14:30",
            "upload_time_iso_8601": "2024-04-08T10:14:30.203462Z",
            "url": "https://files.pythonhosted.org/packages/88/51/b62c027e667ebdfee1216a30d75c450491153a4ed95d11cceb82516e8992/llm_elasticsearch_cache-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-08 10:14:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SpazioDati",
    "github_project": "llm-elasticsearch-cache",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "llm-elasticsearch-cache"
}
        
Elapsed time: 0.22823s