sentry-nodestore-elastic

Name	sentry-nodestore-elastic JSON
Version	1.0.1 JSON
	download
home_page	https://github.com/andrsp/sentry-nodestore-elastic
Summary	Sentry nodestore Elasticsearch backend
upload_time	2024-04-25 12:49:17
maintainer	None
docs_url	None
author	andrsp@gmail.com
requires_python	None
license	Apache-2.0
keywords	sentry elasticsearch nodestore backend
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # sentry-nodestore-elastic

Sentry nodestore Elasticsearch backend

[![image](https://img.shields.io/pypi/v/sentry-nodestore-elastic.svg)](https://pypi.python.org/pypi/sentry-nodestore-elastic)

Supported Sentry 24.x & elasticsearch 8.x versions

Use Elasticsearch cluster for store node objects from Sentry

By default selfhosted Sentry uses Postgresql database for settings and nodestore, and under high load it becomes a bottleneck, database size growing fast and slowing down entire system

Switching nodestore to dedicated Elasticsearch cluster provides more scalability:
- Elasticsearch cluster may be scaled horizontally by adding more data nodes (Postgres not)
- Data in Elasticsearch may be sharded and replicated between data nodes, which increases throughput
- Elasticsearch can rebalance automatically when new data nodes added
- Scheduled Sentry cleanup performs much faster and stable when using elastic nodestore because of simple deleting old indices (cleanup in Postgresql terabyte-size nodestore is a huge pain)

## Installation

Rebuild sentry docker image with nodestore package installation

``` shell
FROM getsentry/sentry:24.4.1
RUN  pip install sentry-nodestore-elastic
```

## Configuration

Set `SENTRY_NODESTORE` at your `sentry.conf.py`

``` python
from elasticsearch import Elasticsearch
es = Elasticsearch(
        ['https://username:password@elasticsearch:9200'],
        http_compress=True,
        request_timeout=60,
        max_retries=3,
        retry_on_timeout=True,
        # ❯ openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin
        ssl_assert_fingerprint=(
            "PUT_FINGERPRINT_HERE"
        )
    )
SENTRY_NODESTORE = 'sentry_nodestore_elastic.ElasticNodeStorage'
SENTRY_NODESTORE_OPTIONS = {
    'es': es,
    'refresh': False,  # ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
    # other ES related options
}

from sentry.conf.server import *  # default for sentry.conf.py
INSTALLED_APPS = list(INSTALLED_APPS)
INSTALLED_APPS.append('sentry_nodestore_elastic')
INSTALLED_APPS = tuple(INSTALLED_APPS)
```

## Usage

### Setup elasticsearch index template

Elasticsearch shoud be up and running before this step, this will create index template in elasticsearch

``` shell
sentry upgrade --with-nodestore
```

Or you can prepare index template manually with this json, it may be customized for your needs (but template name should be `sentry` because of nodestore init script checks)

``` json
{
  "template": {
    "settings": {
      "index": {
        "number_of_shards": "3",
        "number_of_replicas": "0",
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        }
      }
    },
    "mappings": {
      "dynamic": "false",
      "dynamic_templates": [],
      "properties": {
        "data": {
          "type": "text",
          "index": false,
          "store": true
        },
        "timestamp": {
          "type": "date",
          "store": true
        }
      }
    },
    "aliases": {
      "sentry": {}
    }
  }
}
```

### Migrate data from default Postgres nodestore to elasticsearch

    Postgres and Elasticsearch must be accessible from place where you run this code

``` python
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk, BulkIndexError
import psycopg2

es = Elasticsearch(
        ['https://username:password@elasticsearch:9200'],
        http_compress=True,
        request_timeout=60,
        max_retries=3,
        retry_on_timeout=True,
        # ❯ openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin
        ssl_assert_fingerprint=(
            "PUT_FINGERPRINT_HERE"
        )
    )

name = 'sentry'

conn = psycopg2.connect(dbname="sentry", user="sentry", password="password", host="hostname", port="5432")

cur = conn.cursor()
cur.execute("SELECT reltuples AS estimate FROM pg_class where relname = 'nodestore_node'")
result = cur.fetchone()
count = int(result[0])
print(f"Estimated rows: {count}")
cur.close()

cursor = conn.cursor(name='fetch_nodes')
cursor.execute("SELECT * FROM nodestore_node ORDER BY timestamp ASC")

while True:
    records = cursor.fetchmany(size=2000)

    if not records:
        break

    bulk_data = []

    for r in records:
        id = r[0]
        data = r[1]
        date = r[2].strftime("%Y-%m-%d")
        ts = r[2].isoformat()
        index = f"sentry-{date}"

        doc = {
            'data': data,
            'timestamp' : ts
        }

        action = {
                "_index": index,
                "_id": id,
                "_source": doc
        }

        bulk_data.append(action)

    bulk(es, bulk_data)
    count = count - 2000
    print(f"Remainig rows: {count}")

cursor.close()
conn.close()
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/andrsp/sentry-nodestore-elastic",
    "name": "sentry-nodestore-elastic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "sentry, elasticsearch, nodestore, backend",
    "author": "andrsp@gmail.com",
    "author_email": "andrsp@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/90/95/a147423ab2a18b7399c050d80a79232b5e92c53170efe493d3b2a98f3272/sentry_nodestore_elastic-1.0.1.tar.gz",
    "platform": null,
    "description": "# sentry-nodestore-elastic\n\nSentry nodestore Elasticsearch backend\n\n[![image](https://img.shields.io/pypi/v/sentry-nodestore-elastic.svg)](https://pypi.python.org/pypi/sentry-nodestore-elastic)\n\nSupported Sentry 24.x & elasticsearch 8.x versions\n\nUse Elasticsearch cluster for store node objects from Sentry\n\nBy default selfhosted Sentry uses Postgresql database for settings and nodestore, and under high load it becomes a bottleneck, database size growing fast and slowing down entire system\n\nSwitching nodestore to dedicated Elasticsearch cluster provides more scalability:\n- Elasticsearch cluster may be scaled horizontally by adding more data nodes (Postgres not)\n- Data in Elasticsearch may be sharded and replicated between data nodes, which increases throughput\n- Elasticsearch can rebalance automatically when new data nodes added\n- Scheduled Sentry cleanup performs much faster and stable when using elastic nodestore because of simple deleting old indices (cleanup in Postgresql terabyte-size nodestore is a huge pain)\n\n## Installation\n\nRebuild sentry docker image with nodestore package installation\n\n``` shell\nFROM getsentry/sentry:24.4.1\nRUN  pip install sentry-nodestore-elastic\n```\n\n## Configuration\n\nSet `SENTRY_NODESTORE` at your `sentry.conf.py`\n\n``` python\nfrom elasticsearch import Elasticsearch\nes = Elasticsearch(\n        ['https://username:password@elasticsearch:9200'],\n        http_compress=True,\n        request_timeout=60,\n        max_retries=3,\n        retry_on_timeout=True,\n        # \u276f openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin\n        ssl_assert_fingerprint=(\n            \"PUT_FINGERPRINT_HERE\"\n        )\n    )\nSENTRY_NODESTORE = 'sentry_nodestore_elastic.ElasticNodeStorage'\nSENTRY_NODESTORE_OPTIONS = {\n    'es': es,\n    'refresh': False,  # ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html\n    # other ES related options\n}\n\nfrom sentry.conf.server import *  # default for sentry.conf.py\nINSTALLED_APPS = list(INSTALLED_APPS)\nINSTALLED_APPS.append('sentry_nodestore_elastic')\nINSTALLED_APPS = tuple(INSTALLED_APPS)\n```\n\n## Usage\n\n### Setup elasticsearch index template\n\nElasticsearch shoud be up and running before this step, this will create index template in elasticsearch\n\n``` shell\nsentry upgrade --with-nodestore\n```\n\nOr you can prepare index template manually with this json, it may be customized for your needs (but template name should be `sentry` because of nodestore init script checks)\n\n``` json\n{\n  \"template\": {\n    \"settings\": {\n      \"index\": {\n        \"number_of_shards\": \"3\",\n        \"number_of_replicas\": \"0\",\n        \"routing\": {\n          \"allocation\": {\n            \"include\": {\n              \"_tier_preference\": \"data_content\"\n            }\n          }\n        }\n      }\n    },\n    \"mappings\": {\n      \"dynamic\": \"false\",\n      \"dynamic_templates\": [],\n      \"properties\": {\n        \"data\": {\n          \"type\": \"text\",\n          \"index\": false,\n          \"store\": true\n        },\n        \"timestamp\": {\n          \"type\": \"date\",\n          \"store\": true\n        }\n      }\n    },\n    \"aliases\": {\n      \"sentry\": {}\n    }\n  }\n}\n```\n\n### Migrate data from default Postgres nodestore to elasticsearch\n\n    Postgres and Elasticsearch must be accessible from place where you run this code\n\n``` python\nfrom elasticsearch import Elasticsearch\nfrom elasticsearch.helpers import bulk, BulkIndexError\nimport psycopg2\n\nes = Elasticsearch(\n        ['https://username:password@elasticsearch:9200'],\n        http_compress=True,\n        request_timeout=60,\n        max_retries=3,\n        retry_on_timeout=True,\n        # \u276f openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin\n        ssl_assert_fingerprint=(\n            \"PUT_FINGERPRINT_HERE\"\n        )\n    )\n\nname = 'sentry'\n\nconn = psycopg2.connect(dbname=\"sentry\", user=\"sentry\", password=\"password\", host=\"hostname\", port=\"5432\")\n\ncur = conn.cursor()\ncur.execute(\"SELECT reltuples AS estimate FROM pg_class where relname = 'nodestore_node'\")\nresult = cur.fetchone()\ncount = int(result[0])\nprint(f\"Estimated rows: {count}\")\ncur.close()\n\ncursor = conn.cursor(name='fetch_nodes')\ncursor.execute(\"SELECT * FROM nodestore_node ORDER BY timestamp ASC\")\n\nwhile True:\n    records = cursor.fetchmany(size=2000)\n\n    if not records:\n        break\n\n    bulk_data = []\n\n    for r in records:\n        id = r[0]\n        data = r[1]\n        date = r[2].strftime(\"%Y-%m-%d\")\n        ts = r[2].isoformat()\n        index = f\"sentry-{date}\"\n\n        doc = {\n            'data': data,\n            'timestamp' : ts\n        }\n\n        action = {\n                \"_index\": index,\n                \"_id\": id,\n                \"_source\": doc\n        }\n\n        bulk_data.append(action)\n\n    bulk(es, bulk_data)\n    count = count - 2000\n    print(f\"Remainig rows: {count}\")\n\ncursor.close()\nconn.close()\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Sentry nodestore Elasticsearch backend",
    "version": "1.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/andrsp/sentry-nodestore-elastic/issues",
        "CI": "https://github.com/andrsp/sentry-nodestore-elastic/actions",
        "Homepage": "https://github.com/andrsp/sentry-nodestore-elastic",
        "Source Code": "https://github.com/andrsp/sentry-nodestore-elastic"
    },
    "split_keywords": [
        "sentry",
        " elasticsearch",
        " nodestore",
        " backend"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ef7f2fea8f1924101ec29a127224ff9c2129b01e079eb34099102056655e578",
                "md5": "b15ba765e10b99ebf16a896ba99698c1",
                "sha256": "50ecc6c7640e3c3cbf3f077bbbe7e113a89be63d9bed4b014c1078c8a704160b"
            },
            "downloads": -1,
            "filename": "sentry_nodestore_elastic-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b15ba765e10b99ebf16a896ba99698c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9813,
            "upload_time": "2024-04-25T12:49:16",
            "upload_time_iso_8601": "2024-04-25T12:49:16.707979Z",
            "url": "https://files.pythonhosted.org/packages/9e/f7/f2fea8f1924101ec29a127224ff9c2129b01e079eb34099102056655e578/sentry_nodestore_elastic-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9095a147423ab2a18b7399c050d80a79232b5e92c53170efe493d3b2a98f3272",
                "md5": "7678d5328630d9c503e22ae8b237b6b5",
                "sha256": "b75ac9563cc5d444bfe807ab4ebc5a2148718270e3d38fb680c58f5f74f90755"
            },
            "downloads": -1,
            "filename": "sentry_nodestore_elastic-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7678d5328630d9c503e22ae8b237b6b5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9365,
            "upload_time": "2024-04-25T12:49:17",
            "upload_time_iso_8601": "2024-04-25T12:49:17.716589Z",
            "url": "https://files.pythonhosted.org/packages/90/95/a147423ab2a18b7399c050d80a79232b5e92c53170efe493d3b2a98f3272/sentry_nodestore_elastic-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-25 12:49:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "andrsp",
    "github_project": "sentry-nodestore-elastic",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sentry-nodestore-elastic"
}

andrsp@gmail.com