# sentry-nodestore-elastic
Sentry nodestore Elasticsearch backend
[![image](https://img.shields.io/pypi/v/sentry-nodestore-elastic.svg)](https://pypi.python.org/pypi/sentry-nodestore-elastic)
Supported Sentry 24.x & elasticsearch 8.x versions
Use Elasticsearch cluster for store node objects from Sentry
By default selfhosted Sentry uses Postgresql database for settings and nodestore, and under high load it becomes a bottleneck, database size growing fast and slowing down entire system
Switching nodestore to dedicated Elasticsearch cluster provides more scalability:
- Elasticsearch cluster may be scaled horizontally by adding more data nodes (Postgres not)
- Data in Elasticsearch may be sharded and replicated between data nodes, which increases throughput
- Elasticsearch can rebalance automatically when new data nodes added
- Scheduled Sentry cleanup performs much faster and stable when using elastic nodestore because of simple deleting old indices (cleanup in Postgresql terabyte-size nodestore is a huge pain)
## Installation
Rebuild sentry docker image with nodestore package installation
``` shell
FROM getsentry/sentry:24.4.1
RUN pip install sentry-nodestore-elastic
```
## Configuration
Set `SENTRY_NODESTORE` at your `sentry.conf.py`
``` python
from elasticsearch import Elasticsearch
es = Elasticsearch(
['https://username:password@elasticsearch:9200'],
http_compress=True,
request_timeout=60,
max_retries=3,
retry_on_timeout=True,
# ❯ openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin
ssl_assert_fingerprint=(
"PUT_FINGERPRINT_HERE"
)
)
SENTRY_NODESTORE = 'sentry_nodestore_elastic.ElasticNodeStorage'
SENTRY_NODESTORE_OPTIONS = {
'es': es,
'refresh': False, # ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
# other ES related options
}
from sentry.conf.server import * # default for sentry.conf.py
INSTALLED_APPS = list(INSTALLED_APPS)
INSTALLED_APPS.append('sentry_nodestore_elastic')
INSTALLED_APPS = tuple(INSTALLED_APPS)
```
## Usage
### Setup elasticsearch index template
Elasticsearch shoud be up and running before this step, this will create index template in elasticsearch
``` shell
sentry upgrade --with-nodestore
```
Or you can prepare index template manually with this json, it may be customized for your needs (but template name should be `sentry` because of nodestore init script checks)
``` json
{
"template": {
"settings": {
"index": {
"number_of_shards": "3",
"number_of_replicas": "0",
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
}
}
},
"mappings": {
"dynamic": "false",
"dynamic_templates": [],
"properties": {
"data": {
"type": "text",
"index": false,
"store": true
},
"timestamp": {
"type": "date",
"store": true
}
}
},
"aliases": {
"sentry": {}
}
}
}
```
### Migrate data from default Postgres nodestore to elasticsearch
Postgres and Elasticsearch must be accessible from place where you run this code
``` python
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk, BulkIndexError
import psycopg2
es = Elasticsearch(
['https://username:password@elasticsearch:9200'],
http_compress=True,
request_timeout=60,
max_retries=3,
retry_on_timeout=True,
# ❯ openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin
ssl_assert_fingerprint=(
"PUT_FINGERPRINT_HERE"
)
)
name = 'sentry'
conn = psycopg2.connect(dbname="sentry", user="sentry", password="password", host="hostname", port="5432")
cur = conn.cursor()
cur.execute("SELECT reltuples AS estimate FROM pg_class where relname = 'nodestore_node'")
result = cur.fetchone()
count = int(result[0])
print(f"Estimated rows: {count}")
cur.close()
cursor = conn.cursor(name='fetch_nodes')
cursor.execute("SELECT * FROM nodestore_node ORDER BY timestamp ASC")
while True:
records = cursor.fetchmany(size=2000)
if not records:
break
bulk_data = []
for r in records:
id = r[0]
data = r[1]
date = r[2].strftime("%Y-%m-%d")
ts = r[2].isoformat()
index = f"sentry-{date}"
doc = {
'data': data,
'timestamp' : ts
}
action = {
"_index": index,
"_id": id,
"_source": doc
}
bulk_data.append(action)
bulk(es, bulk_data)
count = count - 2000
print(f"Remainig rows: {count}")
cursor.close()
conn.close()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/andrsp/sentry-nodestore-elastic",
"name": "sentry-nodestore-elastic",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "sentry, elasticsearch, nodestore, backend",
"author": "andrsp@gmail.com",
"author_email": "andrsp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/90/95/a147423ab2a18b7399c050d80a79232b5e92c53170efe493d3b2a98f3272/sentry_nodestore_elastic-1.0.1.tar.gz",
"platform": null,
"description": "# sentry-nodestore-elastic\n\nSentry nodestore Elasticsearch backend\n\n[![image](https://img.shields.io/pypi/v/sentry-nodestore-elastic.svg)](https://pypi.python.org/pypi/sentry-nodestore-elastic)\n\nSupported Sentry 24.x & elasticsearch 8.x versions\n\nUse Elasticsearch cluster for store node objects from Sentry\n\nBy default selfhosted Sentry uses Postgresql database for settings and nodestore, and under high load it becomes a bottleneck, database size growing fast and slowing down entire system\n\nSwitching nodestore to dedicated Elasticsearch cluster provides more scalability:\n- Elasticsearch cluster may be scaled horizontally by adding more data nodes (Postgres not)\n- Data in Elasticsearch may be sharded and replicated between data nodes, which increases throughput\n- Elasticsearch can rebalance automatically when new data nodes added\n- Scheduled Sentry cleanup performs much faster and stable when using elastic nodestore because of simple deleting old indices (cleanup in Postgresql terabyte-size nodestore is a huge pain)\n\n## Installation\n\nRebuild sentry docker image with nodestore package installation\n\n``` shell\nFROM getsentry/sentry:24.4.1\nRUN pip install sentry-nodestore-elastic\n```\n\n## Configuration\n\nSet `SENTRY_NODESTORE` at your `sentry.conf.py`\n\n``` python\nfrom elasticsearch import Elasticsearch\nes = Elasticsearch(\n ['https://username:password@elasticsearch:9200'],\n http_compress=True,\n request_timeout=60,\n max_retries=3,\n retry_on_timeout=True,\n # \u276f openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin\n ssl_assert_fingerprint=(\n \"PUT_FINGERPRINT_HERE\"\n )\n )\nSENTRY_NODESTORE = 'sentry_nodestore_elastic.ElasticNodeStorage'\nSENTRY_NODESTORE_OPTIONS = {\n 'es': es,\n 'refresh': False, # ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html\n # other ES related options\n}\n\nfrom sentry.conf.server import * # default for sentry.conf.py\nINSTALLED_APPS = list(INSTALLED_APPS)\nINSTALLED_APPS.append('sentry_nodestore_elastic')\nINSTALLED_APPS = tuple(INSTALLED_APPS)\n```\n\n## Usage\n\n### Setup elasticsearch index template\n\nElasticsearch shoud be up and running before this step, this will create index template in elasticsearch\n\n``` shell\nsentry upgrade --with-nodestore\n```\n\nOr you can prepare index template manually with this json, it may be customized for your needs (but template name should be `sentry` because of nodestore init script checks)\n\n``` json\n{\n \"template\": {\n \"settings\": {\n \"index\": {\n \"number_of_shards\": \"3\",\n \"number_of_replicas\": \"0\",\n \"routing\": {\n \"allocation\": {\n \"include\": {\n \"_tier_preference\": \"data_content\"\n }\n }\n }\n }\n },\n \"mappings\": {\n \"dynamic\": \"false\",\n \"dynamic_templates\": [],\n \"properties\": {\n \"data\": {\n \"type\": \"text\",\n \"index\": false,\n \"store\": true\n },\n \"timestamp\": {\n \"type\": \"date\",\n \"store\": true\n }\n }\n },\n \"aliases\": {\n \"sentry\": {}\n }\n }\n}\n```\n\n### Migrate data from default Postgres nodestore to elasticsearch\n\n Postgres and Elasticsearch must be accessible from place where you run this code\n\n``` python\nfrom elasticsearch import Elasticsearch\nfrom elasticsearch.helpers import bulk, BulkIndexError\nimport psycopg2\n\nes = Elasticsearch(\n ['https://username:password@elasticsearch:9200'],\n http_compress=True,\n request_timeout=60,\n max_retries=3,\n retry_on_timeout=True,\n # \u276f openssl s_client -connect elasticsearch:9200 < /dev/null 2>/dev/null | openssl x509 -fingerprint -noout -in /dev/stdin\n ssl_assert_fingerprint=(\n \"PUT_FINGERPRINT_HERE\"\n )\n )\n\nname = 'sentry'\n\nconn = psycopg2.connect(dbname=\"sentry\", user=\"sentry\", password=\"password\", host=\"hostname\", port=\"5432\")\n\ncur = conn.cursor()\ncur.execute(\"SELECT reltuples AS estimate FROM pg_class where relname = 'nodestore_node'\")\nresult = cur.fetchone()\ncount = int(result[0])\nprint(f\"Estimated rows: {count}\")\ncur.close()\n\ncursor = conn.cursor(name='fetch_nodes')\ncursor.execute(\"SELECT * FROM nodestore_node ORDER BY timestamp ASC\")\n\nwhile True:\n records = cursor.fetchmany(size=2000)\n\n if not records:\n break\n\n bulk_data = []\n\n for r in records:\n id = r[0]\n data = r[1]\n date = r[2].strftime(\"%Y-%m-%d\")\n ts = r[2].isoformat()\n index = f\"sentry-{date}\"\n\n doc = {\n 'data': data,\n 'timestamp' : ts\n }\n\n action = {\n \"_index\": index,\n \"_id\": id,\n \"_source\": doc\n }\n\n bulk_data.append(action)\n\n bulk(es, bulk_data)\n count = count - 2000\n print(f\"Remainig rows: {count}\")\n\ncursor.close()\nconn.close()\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Sentry nodestore Elasticsearch backend",
"version": "1.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/andrsp/sentry-nodestore-elastic/issues",
"CI": "https://github.com/andrsp/sentry-nodestore-elastic/actions",
"Homepage": "https://github.com/andrsp/sentry-nodestore-elastic",
"Source Code": "https://github.com/andrsp/sentry-nodestore-elastic"
},
"split_keywords": [
"sentry",
" elasticsearch",
" nodestore",
" backend"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9ef7f2fea8f1924101ec29a127224ff9c2129b01e079eb34099102056655e578",
"md5": "b15ba765e10b99ebf16a896ba99698c1",
"sha256": "50ecc6c7640e3c3cbf3f077bbbe7e113a89be63d9bed4b014c1078c8a704160b"
},
"downloads": -1,
"filename": "sentry_nodestore_elastic-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b15ba765e10b99ebf16a896ba99698c1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9813,
"upload_time": "2024-04-25T12:49:16",
"upload_time_iso_8601": "2024-04-25T12:49:16.707979Z",
"url": "https://files.pythonhosted.org/packages/9e/f7/f2fea8f1924101ec29a127224ff9c2129b01e079eb34099102056655e578/sentry_nodestore_elastic-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9095a147423ab2a18b7399c050d80a79232b5e92c53170efe493d3b2a98f3272",
"md5": "7678d5328630d9c503e22ae8b237b6b5",
"sha256": "b75ac9563cc5d444bfe807ab4ebc5a2148718270e3d38fb680c58f5f74f90755"
},
"downloads": -1,
"filename": "sentry_nodestore_elastic-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "7678d5328630d9c503e22ae8b237b6b5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9365,
"upload_time": "2024-04-25T12:49:17",
"upload_time_iso_8601": "2024-04-25T12:49:17.716589Z",
"url": "https://files.pythonhosted.org/packages/90/95/a147423ab2a18b7399c050d80a79232b5e92c53170efe493d3b2a98f3272/sentry_nodestore_elastic-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-25 12:49:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "andrsp",
"github_project": "sentry-nodestore-elastic",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sentry-nodestore-elastic"
}