langchain-elasticsearch

Name	langchain-elasticsearch JSON
Version	0.3.2 JSON
	download
home_page	https://github.com/langchain-ai/langchain-elastic
Summary	An integration package connecting Elasticsearch and LangChain
upload_time	2025-01-20 14:33:36
maintainer	None
docs_url	None
author	None
requires_python	<4.0,>=3.9
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # langchain-elasticsearch

This package contains the LangChain integration with Elasticsearch.

## Installation

```bash
pip install -U langchain-elasticsearch
```

## Elasticsearch setup

### Elastic Cloud

You need a running Elasticsearch deployment. The easiest way to start one is through [Elastic Cloud](https://cloud.elastic.co/).
You can sign up for a [free trial](https://www.elastic.co/cloud/cloud-trial-overview).

1. [Create a deployment](https://www.elastic.co/guide/en/cloud/current/ec-create-deployment.html)
2. Get your Cloud ID:
    1. In the [Elastic Cloud console](https://cloud.elastic.co), click "Manage" next to your deployment
    2. Copy the Cloud ID and paste it into the `es_cloud_id` parameter below
3. Create an API key:
    1. In the [Elastic Cloud console](https://cloud.elastic.co), click "Open" next to your deployment
    2. In the left-hand side menu, go to "Stack Management", then to "API Keys"
    3. Click "Create API key"
    4. Enter a name for the API key and click "Create"
    5. Copy the API key and paste it into the `es_api_key` parameter below

### Elastic Cloud

Alternatively, you can run Elasticsearch via Docker as described in the [docs](https://python.langchain.com/docs/integrations/vectorstores/elasticsearch).

## Usage

### ElasticsearchStore

The `ElasticsearchStore` class exposes Elasticsearch as a vector store.

```python
from langchain_elasticsearch import ElasticsearchStore

embeddings = ... # use a LangChain Embeddings class or ElasticsearchEmbeddings

vectorstore = ElasticsearchStore(
    es_cloud_id="your-cloud-id",
    es_api_key="your-api-key",
    index_name="your-index-name",
    embeddings=embeddings,
)
```

### ElasticsearchRetriever

The `ElasticsearchRetriever` class can be user to implement more complex queries.
This can be useful for power users and necessary if data was ingested outside of LangChain
(for example using a web crawler).

```python
def fuzzy_query(search_query: str) -> Dict:
    return {
        "query": {
            "match": {
                text_field: {
                    "query": search_query,
                    "fuzziness": "AUTO",
                }
            },
        },
    }


fuzzy_retriever = ElasticsearchRetriever.from_es_params(
    es_cloud_id="your-cloud-id",
    es_api_key="your-api-key",
    index_name="your-index-name",
    body_func=fuzzy_query,
    content_field=text_field,
)

fuzzy_retriever.get_relevant_documents("fooo")
```

### ElasticsearchEmbeddings

The `ElasticsearchEmbeddings` class provides an interface to generate embeddings using a model
deployed in an Elasticsearch cluster.

```python
from langchain_elasticsearch import ElasticsearchEmbeddings

embeddings = ElasticsearchEmbeddings.from_credentials(
    model_id="your-model-id",
    input_field="your-input-field",
    es_cloud_id="your-cloud-id",
    es_api_key="your-api-key",
)
```

### ElasticsearchChatMessageHistory

The `ElasticsearchChatMessageHistory` class stores chat histories in Elasticsearch.

```python
from langchain_elasticsearch import ElasticsearchChatMessageHistory

chat_history = ElasticsearchChatMessageHistory(
    index="your-index-name",
    session_id="your-session-id",
    es_cloud_id="your-cloud-id",
    es_api_key="your-api-key",
)
```


### ElasticsearchCache

A caching layer for LLMs that uses Elasticsearch.

Simple example:

```python
from langchain.globals import set_llm_cache

from langchain_elasticsearch import ElasticsearchCache

set_llm_cache(
    ElasticsearchCache(
        es_url="http://localhost:9200",
        index_name="llm-chat-cache",
        metadata={"project": "my_chatgpt_project"},
    )
)
```

The `index_name` parameter can also accept aliases. This allows to use the 
[ILM: Manage the index lifecycle](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html)
that we suggest to consider for managing retention and controlling cache growth.

Look at the class docstring for all parameters.

#### Index the generated text

The cached data won't be searchable by default.
The developer can customize the building of the Elasticsearch document in order to add indexed text fields,
where to put, for example, the text generated by the LLM.

This can be done by subclassing end overriding methods.
The new cache class can be applied also to a pre-existing cache index:

```python
import json
from typing import Any, Dict, List

from langchain.globals import set_llm_cache
from langchain_core.caches import RETURN_VAL_TYPE

from langchain_elasticsearch import ElasticsearchCache


class SearchableElasticsearchCache(ElasticsearchCache):
    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["parsed_llm_output"] = {
            "type": "text",
            "analyzer": "english",
        }
        return mapping

    def build_document(
        self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE
    ) -> Dict[str, Any]:
        body = super().build_document(prompt, llm_string, return_val)
        body["parsed_llm_output"] = self._parse_output(body["llm_output"])
        return body

    @staticmethod
    def _parse_output(data: List[str]) -> List[str]:
        return [
            json.loads(output)["kwargs"]["message"]["kwargs"]["content"]
            for output in data
        ]


set_llm_cache(
    SearchableElasticsearchCache(
       es_url="http://localhost:9200", 
       index_name="llm-chat-cache"
    )
)
```

When overriding the mapping and the document building, 
please only make additive modifications, keeping the base mapping intact.

###  ElasticsearchEmbeddingsCache

Store and temporarily cache embeddings.

Caching embeddings is obtained by using the [CacheBackedEmbeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/caching_embeddings), it can be instantiated using `CacheBackedEmbeddings.from_bytes_store` method.

```python
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings

from langchain_elasticsearch import ElasticsearchEmbeddingsCache

underlying_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

store = ElasticsearchEmbeddingsCache(
    es_url="http://localhost:9200",
    index_name="llm-chat-cache",
    metadata={"project": "my_chatgpt_project"},
    namespace="my_chatgpt_project",
)

embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=OpenAIEmbeddings(),
    document_embedding_cache=store,
    query_embedding_cache=store,
)
```

Similarly to the chat cache, one can subclass `ElasticsearchEmbeddingsCache` in order to index vectors for search.

```python
from typing import Any, Dict, List
from langchain_elasticsearch import ElasticsearchEmbeddingsCache

class SearchableElasticsearchStore(ElasticsearchEmbeddingsCache):
    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["vector"] = {
            "type": "dense_vector",
            "dims": 1536,
            "index": True,
            "similarity": "dot_product",
        }
        return mapping

    def build_document(self, llm_input: str, vector: List[float]) -> Dict[str, Any]:
        body = super().build_document(llm_input, vector)
        body["vector"] = vector
        return body
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/langchain-ai/langchain-elastic",
    "name": "langchain-elasticsearch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/40/19/b53995433281025cd1e57cda94264dffcc0c2a85fac0626f77d3a6d51d61/langchain_elasticsearch-0.3.2.tar.gz",
    "platform": null,
    "description": "# langchain-elasticsearch\n\nThis package contains the LangChain integration with Elasticsearch.\n\n## Installation\n\n```bash\npip install -U langchain-elasticsearch\n```\n\n## Elasticsearch setup\n\n### Elastic Cloud\n\nYou need a running Elasticsearch deployment. The easiest way to start one is through [Elastic Cloud](https://cloud.elastic.co/).\nYou can sign up for a [free trial](https://www.elastic.co/cloud/cloud-trial-overview).\n\n1. [Create a deployment](https://www.elastic.co/guide/en/cloud/current/ec-create-deployment.html)\n2. Get your Cloud ID:\n    1. In the [Elastic Cloud console](https://cloud.elastic.co), click \"Manage\" next to your deployment\n    2. Copy the Cloud ID and paste it into the `es_cloud_id` parameter below\n3. Create an API key:\n    1. In the [Elastic Cloud console](https://cloud.elastic.co), click \"Open\" next to your deployment\n    2. In the left-hand side menu, go to \"Stack Management\", then to \"API Keys\"\n    3. Click \"Create API key\"\n    4. Enter a name for the API key and click \"Create\"\n    5. Copy the API key and paste it into the `es_api_key` parameter below\n\n### Elastic Cloud\n\nAlternatively, you can run Elasticsearch via Docker as described in the [docs](https://python.langchain.com/docs/integrations/vectorstores/elasticsearch).\n\n## Usage\n\n### ElasticsearchStore\n\nThe `ElasticsearchStore` class exposes Elasticsearch as a vector store.\n\n```python\nfrom langchain_elasticsearch import ElasticsearchStore\n\nembeddings = ... # use a LangChain Embeddings class or ElasticsearchEmbeddings\n\nvectorstore = ElasticsearchStore(\n    es_cloud_id=\"your-cloud-id\",\n    es_api_key=\"your-api-key\",\n    index_name=\"your-index-name\",\n    embeddings=embeddings,\n)\n```\n\n### ElasticsearchRetriever\n\nThe `ElasticsearchRetriever` class can be user to implement more complex queries.\nThis can be useful for power users and necessary if data was ingested outside of LangChain\n(for example using a web crawler).\n\n```python\ndef fuzzy_query(search_query: str) -> Dict:\n    return {\n        \"query\": {\n            \"match\": {\n                text_field: {\n                    \"query\": search_query,\n                    \"fuzziness\": \"AUTO\",\n                }\n            },\n        },\n    }\n\n\nfuzzy_retriever = ElasticsearchRetriever.from_es_params(\n    es_cloud_id=\"your-cloud-id\",\n    es_api_key=\"your-api-key\",\n    index_name=\"your-index-name\",\n    body_func=fuzzy_query,\n    content_field=text_field,\n)\n\nfuzzy_retriever.get_relevant_documents(\"fooo\")\n```\n\n### ElasticsearchEmbeddings\n\nThe `ElasticsearchEmbeddings` class provides an interface to generate embeddings using a model\ndeployed in an Elasticsearch cluster.\n\n```python\nfrom langchain_elasticsearch import ElasticsearchEmbeddings\n\nembeddings = ElasticsearchEmbeddings.from_credentials(\n    model_id=\"your-model-id\",\n    input_field=\"your-input-field\",\n    es_cloud_id=\"your-cloud-id\",\n    es_api_key=\"your-api-key\",\n)\n```\n\n### ElasticsearchChatMessageHistory\n\nThe `ElasticsearchChatMessageHistory` class stores chat histories in Elasticsearch.\n\n```python\nfrom langchain_elasticsearch import ElasticsearchChatMessageHistory\n\nchat_history = ElasticsearchChatMessageHistory(\n    index=\"your-index-name\",\n    session_id=\"your-session-id\",\n    es_cloud_id=\"your-cloud-id\",\n    es_api_key=\"your-api-key\",\n)\n```\n\n\n### ElasticsearchCache\n\nA caching layer for LLMs that uses Elasticsearch.\n\nSimple example:\n\n```python\nfrom langchain.globals import set_llm_cache\n\nfrom langchain_elasticsearch import ElasticsearchCache\n\nset_llm_cache(\n    ElasticsearchCache(\n        es_url=\"http://localhost:9200\",\n        index_name=\"llm-chat-cache\",\n        metadata={\"project\": \"my_chatgpt_project\"},\n    )\n)\n```\n\nThe `index_name` parameter can also accept aliases. This allows to use the \n[ILM: Manage the index lifecycle](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html)\nthat we suggest to consider for managing retention and controlling cache growth.\n\nLook at the class docstring for all parameters.\n\n#### Index the generated text\n\nThe cached data won't be searchable by default.\nThe developer can customize the building of the Elasticsearch document in order to add indexed text fields,\nwhere to put, for example, the text generated by the LLM.\n\nThis can be done by subclassing end overriding methods.\nThe new cache class can be applied also to a pre-existing cache index:\n\n```python\nimport json\nfrom typing import Any, Dict, List\n\nfrom langchain.globals import set_llm_cache\nfrom langchain_core.caches import RETURN_VAL_TYPE\n\nfrom langchain_elasticsearch import ElasticsearchCache\n\n\nclass SearchableElasticsearchCache(ElasticsearchCache):\n    @property\n    def mapping(self) -> Dict[str, Any]:\n        mapping = super().mapping\n        mapping[\"mappings\"][\"properties\"][\"parsed_llm_output\"] = {\n            \"type\": \"text\",\n            \"analyzer\": \"english\",\n        }\n        return mapping\n\n    def build_document(\n        self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE\n    ) -> Dict[str, Any]:\n        body = super().build_document(prompt, llm_string, return_val)\n        body[\"parsed_llm_output\"] = self._parse_output(body[\"llm_output\"])\n        return body\n\n    @staticmethod\n    def _parse_output(data: List[str]) -> List[str]:\n        return [\n            json.loads(output)[\"kwargs\"][\"message\"][\"kwargs\"][\"content\"]\n            for output in data\n        ]\n\n\nset_llm_cache(\n    SearchableElasticsearchCache(\n       es_url=\"http://localhost:9200\", \n       index_name=\"llm-chat-cache\"\n    )\n)\n```\n\nWhen overriding the mapping and the document building, \nplease only make additive modifications, keeping the base mapping intact.\n\n###  ElasticsearchEmbeddingsCache\n\nStore and temporarily cache embeddings.\n\nCaching embeddings is obtained by using the [CacheBackedEmbeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/caching_embeddings), it can be instantiated using `CacheBackedEmbeddings.from_bytes_store` method.\n\n```python\nfrom langchain.embeddings import CacheBackedEmbeddings\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom langchain_elasticsearch import ElasticsearchEmbeddingsCache\n\nunderlying_embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n\nstore = ElasticsearchEmbeddingsCache(\n    es_url=\"http://localhost:9200\",\n    index_name=\"llm-chat-cache\",\n    metadata={\"project\": \"my_chatgpt_project\"},\n    namespace=\"my_chatgpt_project\",\n)\n\nembeddings = CacheBackedEmbeddings.from_bytes_store(\n    underlying_embeddings=OpenAIEmbeddings(),\n    document_embedding_cache=store,\n    query_embedding_cache=store,\n)\n```\n\nSimilarly to the chat cache, one can subclass `ElasticsearchEmbeddingsCache` in order to index vectors for search.\n\n```python\nfrom typing import Any, Dict, List\nfrom langchain_elasticsearch import ElasticsearchEmbeddingsCache\n\nclass SearchableElasticsearchStore(ElasticsearchEmbeddingsCache):\n    @property\n    def mapping(self) -> Dict[str, Any]:\n        mapping = super().mapping\n        mapping[\"mappings\"][\"properties\"][\"vector\"] = {\n            \"type\": \"dense_vector\",\n            \"dims\": 1536,\n            \"index\": True,\n            \"similarity\": \"dot_product\",\n        }\n        return mapping\n\n    def build_document(self, llm_input: str, vector: List[float]) -> Dict[str, Any]:\n        body = super().build_document(llm_input, vector)\n        body[\"vector\"] = vector\n        return body\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An integration package connecting Elasticsearch and LangChain",
    "version": "0.3.2",
    "project_urls": {
        "Homepage": "https://github.com/langchain-ai/langchain-elastic",
        "Repository": "https://github.com/langchain-ai/langchain-elastic",
        "Source Code": "https://github.com/langchain-ai/langchain-elastic/tree/main/libs/elasticsearch"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "329d3bd4d2ac53b5733eb634189bf97e0c2f997d200ad7f2d7587f5d13f0c684",
                "md5": "020a1512624c13d5d8c7538d112e9db6",
                "sha256": "556b6cdb559f1587d595c6b09a77a25c669dc512c3dec6c485238504ec5d7e35"
            },
            "downloads": -1,
            "filename": "langchain_elasticsearch-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "020a1512624c13d5d8c7538d112e9db6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 45996,
            "upload_time": "2025-01-20T14:33:35",
            "upload_time_iso_8601": "2025-01-20T14:33:35.692283Z",
            "url": "https://files.pythonhosted.org/packages/32/9d/3bd4d2ac53b5733eb634189bf97e0c2f997d200ad7f2d7587f5d13f0c684/langchain_elasticsearch-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4019b53995433281025cd1e57cda94264dffcc0c2a85fac0626f77d3a6d51d61",
                "md5": "d8dfbd51fc89eb4d8264d84d94b055e4",
                "sha256": "25be786325eaac6ba517b53ea074b701d465ad4ac8908fa065df3dfc365432e2"
            },
            "downloads": -1,
            "filename": "langchain_elasticsearch-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "d8dfbd51fc89eb4d8264d84d94b055e4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 38185,
            "upload_time": "2025-01-20T14:33:36",
            "upload_time_iso_8601": "2025-01-20T14:33:36.766152Z",
            "url": "https://files.pythonhosted.org/packages/40/19/b53995433281025cd1e57cda94264dffcc0c2a85fac0626f77d3a6d51d61/langchain_elasticsearch-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-20 14:33:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "langchain-ai",
    "github_project": "langchain-elastic",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "langchain-elasticsearch"
}

None