Team and repository tags
========================
.. image:: https://governance.openstack.org/tc/badges/monasca-notification.svg
:target: https://governance.openstack.org/tc/reference/tags/index.html
.. Change things from this point on
Notification Engine
===================
This engine reads alarms from Kafka and then notifies the customer using
the configured notification method. Multiple notification and retry
engines can run in parallel, up to one per available Kafka partition.
Zookeeper is used to negotiate access to the Kafka partitions whenever a
new process joins or leaves the working set.
Architecture
============
The notification engine generates notifications using the following
steps:
1. Read Alarms from Kafka, with no auto commit. -
monasca\_common.kafka.KafkaConsumer class
2. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
3. Send notification. - NotificationProcessor class
4. Add successful notifications to a sent notification topic. - NotificationEngine class
5. Add failed notifications to a retry topic. - NotificationEngine class
6. Commit offset to Kafka - KafkaConsumer class
The notification engine uses three Kafka topics:
1. alarm\_topic: Alarms inbound to the notification engine.
2. notification\_topic: Successfully sent notifications.
3. notification\_retry\_topic: Failed notifications.
A retry engine runs in parallel with the notification engine and gives
any failed notification a configurable number of extra chances at
success.
The retry engine generates notifications using the following steps:
1. Read notification json data from Kafka, with no auto commit. - KafkaConsumer class
2. Rebuild the notification that failed. - RetryEngine class
3. Send notification. - NotificationProcessor class
4. Add successful notifications to a sent notification topic. - RetryEngine class
5. Add failed notifications that have not hit the retry limit back to the retry topic. -
RetryEngine class
6. Discard failed notifications that have hit the retry limit. - RetryEngine class
7. Commit offset to Kafka. - KafkaConsumer class
The retry engine uses two Kafka topics:
1. notification\_retry\_topic: Notifications that need to be retried.
2. notification\_topic: Successfully sent notifications.
Fault Tolerance
---------------
When reading from the alarm topic, no committing is done. The committing
is done only after processing. This allows the processing to continue
even though some notifications can be slow. In the event of a
catastrophic failure some notifications could be sent but the alarms
have not yet been acknowledged. This is an acceptable failure mode,
better to send a notification twice than not at all.
The general process when a major error is encountered is to exit the
daemon which should allow the other processes to renegotiate access to
the Kafka partitions. It is also assumed that the notification engine
will be run by a process supervisor which will restart it in case of a
failure. In this way, any errors which are not easy to recover from are
automatically handled by the service restarting and the active daemon
switching to another instance.
Though this should cover all errors, there is the risk that an alarm or
a set of alarms can be processed and notifications are sent out multiple
times. To minimize this risk a number of techniques are used:
- Timeouts are implemented for all notification types.
- An alarm TTL is utilized. Any alarm older than the TTL is not
processed.
Operation
=========
``oslo.config`` is used for handling configuration options. A sample
configuration file ``etc/monasca/notification.conf.sample`` can be
generated by running:
::
tox -e genconfig
To run the service using the default config file location
of `/etc/monasca/notification.conf`:
::
monasca-notification
To run the service and explicitly specify the config file:
::
monasca-notification --config-file /etc/monasca/monasca-notification.conf
Monitoring
----------
StatsD is incorporated into the daemon and will send all stats to the
StatsD server launched by monasca-agent. Default host and port points to
**localhost:8125**.
- Counters
- ConsumedFromKafka
- AlarmsFailedParse
- AlarmsNoNotification
- NotificationsCreated
- NotificationsSentSMTP
- NotificationsSentWebhook
- NotificationsSentPagerduty
- NotificationsSentFailed
- NotificationsInvalidType
- AlarmsFinished
- PublishedToKafka
- Timers
- ConfigDBTime
- SendNotificationTime
Plugins
-------
The following notification plugins are available:
- Email
- HipChat
- Jira
- PagerDuty
- Slack
- Webhook
The plugins can be configured via the Monasca Notification config file. In
general you will need to follow these steps to enable a plugin:
- Make sure that the plugin is enabled in the config file
- Make sure that the plugin is configured in the config file
- Restart the Monasca Notification service
PagerDuty plugin
----------------
The PagerDuty plugin supports the PagerDuty v1 Events API. The first step
is to `configure`_ a service in PagerDuty which uses this API. Once
configured, the service will be assigned an integration key. This key should be
used as the `ADDRESS` field when creating the notification type, for example:
::
monasca notification-create pd_notification pagerduty a30d5560c5ce4239a6f52a01a15850ca
The default settings for the plugin, including the v1 Events API URL should
be sufficient to get started, but it is worth checking that the PagerDuty
Events v1 API URL matches that provided in the example Monasca Notification
config file.
Slack plugin
~~~~~~~~~~~~
To use the Slack plugin you must first configure an incoming `webhook`_
for the Slack channel you wish to post notifications to. The notification can
then be created as follows:
::
monasca notification-create slack_notification slack https://hooks.slack.com/services/MY/SECRET/WEBHOOK/URL
Note that whilst it is also possible to use a token instead of a webhook,
this approach is now `deprecated`_.
By default the Slack notification will dump all available information into
the alert. For example, a notification may be posted to Slack which looks
like this:
::
{
"metrics":[
{
"dimensions":{
"hostname":"operator"
},
"id":null,
"name":"cpu.user_perc"
}
],
"alarm_id":"20a54a65-44b8-4ac9-a398-1f2d888827d2",
"state":"ALARM",
"alarm_timestamp":1556703552,
"tenant_id":"62f7a7a314904aa3ab137d569d6b4fde",
"old_state":"OK",
"alarm_description":"Dummy alarm",
"message":"Thresholds were exceeded for the sub-alarms: count(cpu.user_perc, deterministic) >= 1.0 with the values: [1.0]",
"alarm_definition_id":"78ce7b53-f7e6-4b51-88d0-cb741e7dc906",
"alarm_name":"dummy_alarm"
}
The format of the above message can be customised with a Jinja template. All fields
from the raw Slack message are available in the template. For example, you may
configure the plugin as follows:
::
[notification_types]
enabled = slack
[slack_notifier]
message_template = /etc/monasca/slack_template.j2
timeout = 10
ca_certs = /etc/ssl/certs/ca-bundle.crt
insecure = False
With the following contents of `/etc/monasca/slack_template.j2`:
::
{{ alarm_name }} has triggered on {% for item in metrics %}host {{ item.dimensions.hostname }}{% if not loop.last %}, {% endif %}{% endfor %}.
With this configuration, the raw Slack message above would be transformed
into:
::
dummy_alarm has triggered on host(s): operator.
Future Considerations
=====================
- More extensive load testing is needed:
- How fast is the mysql db? How much load do we put on it. Initially I
think it makes most sense to read notification details for each alarm
but eventually I may want to cache that info.
- How expensive are commits to Kafka for every message we read? Should
we commit every N messages?
- How efficient is the default Kafka consumer batch size?
- Currently we can get ~200 notifications per second per
NotificationEngine instance using webhooks to a local http server. Is
that fast enough?
- Are we putting too much load on Kafka at ~200 commits per second?
.. _webhook: https://api.slack.com/incoming-webhooks
.. _deprecated: https://api.slack.com/custom-integrations/legacy-tokens
.. _configure: https://support.pagerduty.com/docs/services-and-integrations#section-events-api-v1
Raw data
{
"_id": null,
"home_page": "https://opendev.org/openstack/monasca-notification",
"name": "monasca-notification",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "openstack monitoring email",
"author": "OpenStack",
"author_email": "openstack-discuss@lists.openstack.org",
"download_url": "https://files.pythonhosted.org/packages/96/87/35c329dff5e0bcfa846138e299e289c3c85d2cbc94496b7e3addafcc784b/monasca-notification-8.0.0.tar.gz",
"platform": null,
"description": "Team and repository tags\n========================\n\n.. image:: https://governance.openstack.org/tc/badges/monasca-notification.svg\n :target: https://governance.openstack.org/tc/reference/tags/index.html\n\n.. Change things from this point on\n\nNotification Engine\n===================\n\nThis engine reads alarms from Kafka and then notifies the customer using\nthe configured notification method. Multiple notification and retry\nengines can run in parallel, up to one per available Kafka partition.\nZookeeper is used to negotiate access to the Kafka partitions whenever a\nnew process joins or leaves the working set.\n\nArchitecture\n============\n\nThe notification engine generates notifications using the following\nsteps:\n\n1. Read Alarms from Kafka, with no auto commit. -\n monasca\\_common.kafka.KafkaConsumer class\n2. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class\n3. Send notification. - NotificationProcessor class\n4. Add successful notifications to a sent notification topic. - NotificationEngine class\n5. Add failed notifications to a retry topic. - NotificationEngine class\n6. Commit offset to Kafka - KafkaConsumer class\n\nThe notification engine uses three Kafka topics:\n\n1. alarm\\_topic: Alarms inbound to the notification engine.\n2. notification\\_topic: Successfully sent notifications.\n3. notification\\_retry\\_topic: Failed notifications.\n\nA retry engine runs in parallel with the notification engine and gives\nany failed notification a configurable number of extra chances at\nsuccess.\n\nThe retry engine generates notifications using the following steps:\n\n1. Read notification json data from Kafka, with no auto commit. - KafkaConsumer class\n2. Rebuild the notification that failed. - RetryEngine class\n3. Send notification. - NotificationProcessor class\n4. Add successful notifications to a sent notification topic. - RetryEngine class\n5. Add failed notifications that have not hit the retry limit back to the retry topic. -\n RetryEngine class\n6. Discard failed notifications that have hit the retry limit. - RetryEngine class\n7. Commit offset to Kafka. - KafkaConsumer class\n\nThe retry engine uses two Kafka topics:\n\n1. notification\\_retry\\_topic: Notifications that need to be retried.\n2. notification\\_topic: Successfully sent notifications.\n\nFault Tolerance\n---------------\n\nWhen reading from the alarm topic, no committing is done. The committing\nis done only after processing. This allows the processing to continue\neven though some notifications can be slow. In the event of a\ncatastrophic failure some notifications could be sent but the alarms\nhave not yet been acknowledged. This is an acceptable failure mode,\nbetter to send a notification twice than not at all.\n\nThe general process when a major error is encountered is to exit the\ndaemon which should allow the other processes to renegotiate access to\nthe Kafka partitions. It is also assumed that the notification engine\nwill be run by a process supervisor which will restart it in case of a\nfailure. In this way, any errors which are not easy to recover from are\nautomatically handled by the service restarting and the active daemon\nswitching to another instance.\n\nThough this should cover all errors, there is the risk that an alarm or\na set of alarms can be processed and notifications are sent out multiple\ntimes. To minimize this risk a number of techniques are used:\n\n- Timeouts are implemented for all notification types.\n- An alarm TTL is utilized. Any alarm older than the TTL is not\n processed.\n\nOperation\n=========\n\n``oslo.config`` is used for handling configuration options. A sample\nconfiguration file ``etc/monasca/notification.conf.sample`` can be\ngenerated by running:\n\n::\n\n tox -e genconfig\n\nTo run the service using the default config file location\nof `/etc/monasca/notification.conf`:\n\n::\n\n monasca-notification\n\nTo run the service and explicitly specify the config file:\n\n::\n\n monasca-notification --config-file /etc/monasca/monasca-notification.conf\n\nMonitoring\n----------\n\nStatsD is incorporated into the daemon and will send all stats to the\nStatsD server launched by monasca-agent. Default host and port points to\n**localhost:8125**.\n\n- Counters\n\n - ConsumedFromKafka\n - AlarmsFailedParse\n - AlarmsNoNotification\n - NotificationsCreated\n - NotificationsSentSMTP\n - NotificationsSentWebhook\n - NotificationsSentPagerduty\n - NotificationsSentFailed\n - NotificationsInvalidType\n - AlarmsFinished\n - PublishedToKafka\n\n- Timers\n\n - ConfigDBTime\n - SendNotificationTime\n\nPlugins\n-------\n\nThe following notification plugins are available:\n\n- Email\n- HipChat\n- Jira\n- PagerDuty\n- Slack\n- Webhook\n\nThe plugins can be configured via the Monasca Notification config file. In\ngeneral you will need to follow these steps to enable a plugin:\n\n- Make sure that the plugin is enabled in the config file\n- Make sure that the plugin is configured in the config file\n- Restart the Monasca Notification service\n\nPagerDuty plugin\n----------------\n\nThe PagerDuty plugin supports the PagerDuty v1 Events API. The first step\nis to `configure`_ a service in PagerDuty which uses this API. Once\nconfigured, the service will be assigned an integration key. This key should be\nused as the `ADDRESS` field when creating the notification type, for example:\n\n::\n\n monasca notification-create pd_notification pagerduty a30d5560c5ce4239a6f52a01a15850ca\n\nThe default settings for the plugin, including the v1 Events API URL should\nbe sufficient to get started, but it is worth checking that the PagerDuty\nEvents v1 API URL matches that provided in the example Monasca Notification\nconfig file.\n\nSlack plugin\n~~~~~~~~~~~~\n\nTo use the Slack plugin you must first configure an incoming `webhook`_\nfor the Slack channel you wish to post notifications to. The notification can\nthen be created as follows:\n\n::\n\n monasca notification-create slack_notification slack https://hooks.slack.com/services/MY/SECRET/WEBHOOK/URL\n\nNote that whilst it is also possible to use a token instead of a webhook,\nthis approach is now `deprecated`_.\n\nBy default the Slack notification will dump all available information into\nthe alert. For example, a notification may be posted to Slack which looks\nlike this:\n\n::\n\n {\n \"metrics\":[\n {\n \"dimensions\":{\n \"hostname\":\"operator\"\n },\n \"id\":null,\n \"name\":\"cpu.user_perc\"\n }\n ],\n \"alarm_id\":\"20a54a65-44b8-4ac9-a398-1f2d888827d2\",\n \"state\":\"ALARM\",\n \"alarm_timestamp\":1556703552,\n \"tenant_id\":\"62f7a7a314904aa3ab137d569d6b4fde\",\n \"old_state\":\"OK\",\n \"alarm_description\":\"Dummy alarm\",\n \"message\":\"Thresholds were exceeded for the sub-alarms: count(cpu.user_perc, deterministic) >= 1.0 with the values: [1.0]\",\n \"alarm_definition_id\":\"78ce7b53-f7e6-4b51-88d0-cb741e7dc906\",\n \"alarm_name\":\"dummy_alarm\"\n }\n\nThe format of the above message can be customised with a Jinja template. All fields\nfrom the raw Slack message are available in the template. For example, you may\nconfigure the plugin as follows:\n\n::\n\n [notification_types]\n enabled = slack\n\n [slack_notifier]\n message_template = /etc/monasca/slack_template.j2\n timeout = 10\n ca_certs = /etc/ssl/certs/ca-bundle.crt\n insecure = False\n\nWith the following contents of `/etc/monasca/slack_template.j2`:\n\n::\n\n {{ alarm_name }} has triggered on {% for item in metrics %}host {{ item.dimensions.hostname }}{% if not loop.last %}, {% endif %}{% endfor %}.\n\nWith this configuration, the raw Slack message above would be transformed\ninto:\n\n::\n\n dummy_alarm has triggered on host(s): operator.\n\nFuture Considerations\n=====================\n\n- More extensive load testing is needed:\n\n - How fast is the mysql db? How much load do we put on it. Initially I\n think it makes most sense to read notification details for each alarm\n but eventually I may want to cache that info.\n - How expensive are commits to Kafka for every message we read? Should\n we commit every N messages?\n - How efficient is the default Kafka consumer batch size?\n - Currently we can get ~200 notifications per second per\n NotificationEngine instance using webhooks to a local http server. Is\n that fast enough?\n - Are we putting too much load on Kafka at ~200 commits per second?\n\n.. _webhook: https://api.slack.com/incoming-webhooks\n\n.. _deprecated: https://api.slack.com/custom-integrations/legacy-tokens\n\n.. _configure: https://support.pagerduty.com/docs/services-and-integrations#section-events-api-v1\n\n\n\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "Reads alarms from Kafka and then notifies the customer using their configured notification method.",
"version": "8.0.0",
"project_urls": {
"Homepage": "https://opendev.org/openstack/monasca-notification"
},
"split_keywords": [
"openstack",
"monitoring",
"email"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "45a421ce3606b612bb14c5ecee0393365bf1b738f2ad6fcd627f88c9521dde36",
"md5": "f8f9ca0486879dc0439fd4c4ba550399",
"sha256": "9c6d005a6491fe5149ca99cab0200eb7e049fee698d63ef55c430355837cd508"
},
"downloads": -1,
"filename": "monasca_notification-8.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f8f9ca0486879dc0439fd4c4ba550399",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 66708,
"upload_time": "2023-03-22T12:23:49",
"upload_time_iso_8601": "2023-03-22T12:23:49.375585Z",
"url": "https://files.pythonhosted.org/packages/45/a4/21ce3606b612bb14c5ecee0393365bf1b738f2ad6fcd627f88c9521dde36/monasca_notification-8.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "968735c329dff5e0bcfa846138e299e289c3c85d2cbc94496b7e3addafcc784b",
"md5": "0211acd9922052330f4f16c4122168d2",
"sha256": "9107e0d359bd50e831a54025a4b69fbb19148421a477d6df04a6e8a27b16808c"
},
"downloads": -1,
"filename": "monasca-notification-8.0.0.tar.gz",
"has_sig": false,
"md5_digest": "0211acd9922052330f4f16c4122168d2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 80389,
"upload_time": "2023-03-22T12:23:51",
"upload_time_iso_8601": "2023-03-22T12:23:51.244035Z",
"url": "https://files.pythonhosted.org/packages/96/87/35c329dff5e0bcfa846138e299e289c3c85d2cbc94496b7e3addafcc784b/monasca-notification-8.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-22 12:23:51",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "monasca-notification"
}