circuit-maintenance-parser


Namecircuit-maintenance-parser JSON
Version 2.6.0 PyPI version JSON
download
home_pagehttps://github.com/networktocode/circuit-maintenance-parser
SummaryPython library to parse Circuit Maintenance notifications and return a structured data back
upload_time2024-04-04 13:27:34
maintainerNone
docs_urlNone
authorNetwork to Code
requires_python<4.0,>=3.8
licenseApache-2.0
keywords parser circuit maintenance
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # circuit-maintenance-parser

`circuit-maintenance-parser` is a Python library that parses circuit maintenance notifications from Network Service Providers (NSPs), converting heterogeneous formats to a well-defined structured format.

## Context

Every network depends on external circuits provided by NSPs who interconnect them to the Internet, to office branches or to
external service providers such as Public Clouds.

Obviously, these services occasionally require operation windows to upgrade or to fix related issues, and usually, they happen in the form of **circuit maintenance periods**.
NSPs generally notify customers of these upcoming events so that customers can take actions to minimize the impact on the regular usage of the related circuits.

The challenge faced by many customers is that almost every NSP defines its own maintenance notification format, even though in the
end the relevant information is mostly the same across NSPs. This library is built to parse notification formats from
several providers and to return always the same object struct which will make it easier to process them afterwards.

The format of this output follows the [BCOP](https://github.com/jda/maintnote-std/blob/master/standard.md) defined
during a NANOG meeting that aimed to promote the usage of the iCalendar format. Indeed, if the NSP is using the
proposed iCalendar format, the parser is straightforward and there is no need to define custom logic, but this library
enables supporting other providers that are not using this proposed practice, getting the same outcome.

You can leverage this library in your automation framework to process circuit maintenance notifications and use the standardized [`Maintenance`](https://github.com/networktocode/circuit-maintenance-parser/blob/develop/circuit_maintenance_parser/output.py) model to handle your received circuit maintenance notifications in a simple way. Every `Maintenance` object contains the following attributes:

- **provider**: identifies the provider of the service that is the subject of the maintenance notification.
- **account**: identifies an account associated with the service that is the subject of the maintenance notification.
- **maintenance_id**: contains text that uniquely identifies (at least within the context of a specific provider) the maintenance that is the subject of the notification.
- **circuits**: list of circuits affected by the maintenance notification and their specific impact. Note that in a maintenance canceled or completed notification, some providers omit the circuit list, so this may be blank for maintenance notifications with a status of CANCELLED or COMPLETED.
- **start**: timestamp that defines the starting date/time of the maintenance in GMT.
- **end**: timestamp that defines the ending date/time of the maintenance in GMT.
- **stamp**: timestamp that defines the update date/time of the maintenance in GMT.
- **organizer**: defines the contact information included in the original notification.
- **status**: defines the overall status or confirmation for the maintenance.¹
- **summary**: human-readable details about this maintenance notification. May be an empty string.
- **sequence**: a sequence number for notifications involving this maintenance window. In practice, this is generally redundant with the **stamp** field and will be defaulted to `1` for most non-iCalendar parsed notifications.²
- **uid**: a unique (?) identifier for a thread of related notifications. In practice, this is generally redundant with the **maintenance_id** field and will be defaulted to `0` for most non-iCalendar parsed notifications.

> Please, refer to the [BCOP](https://github.com/jda/maintnote-std/blob/master/standard.md) to more details about the standardized meaning of these attributes.

¹ Per the BCOP, the **status** (`X-MAINTNOTE_STATUS`) is an optional field in iCalendar notifications. However, a `Maintenance` object will always contain a `status` value; in the case where an iCalendar notification omits this field, the `status` will be set to `"NO-CHANGE"`, and it's up to the consumer of this library to determine how to appropriately handle this case. Parsers of other notification formats are responsible for setting an appropriate value for this field based on the notification contents, and may or may not include `"NO-CHANGE"` as one of the possible reported values.

² Per the BCOP, the **sequence** is a mandatory field in iCalendar notifications. However, some NSPs have been seen to send notifications which, while otherwise consistent with the BCOP, omit the `SEQUENCE` field; in such cases, this library will report a sequence number of `-1`.

## Workflow

1. We instantiate a `Provider`, directly or via the `init_provider` method, that depending on the selected type will return the corresponding instance.
2. Get an instance of the `NotificationData` class. This instance groups together `DataParts` which each contain some content and a specific type (that will match a specific `Parser`). For example, a `NotificationData` might describe a received email message, with `DataParts` corresponding to the subject line and body of the email. There are factory methods to initialize a `NotificationData` describing a single chunk of binary data, as well as others to initialize one directly from a raw email message or `email.message.EmailMessage` instance.
3. Each `Provider` uses one or more `Processors` that will be used to build `Maintenances` when the `Provider.get_maintenances(data)` method is called.
4. Each `Processor` class uses one or more `Parsers` to process each type of data that it handles. It can have custom logic to combine the parsed data from multiple `Parsers` to create the final `Maintenance` object.
5. Each `Parser` class supports one or a set of related data types, and implements the `Parser.parse()` method used to retrieve a `Dict` with the relevant keys/values.

<p align="center">
<img src="https://raw.githubusercontent.com/networktocode/circuit-maintenance-parser/develop/docs/images/new_workflow.png" width="800" class="center">
</p>

By default, there is a `GenericProvider` that supports a `SimpleProcessor` using the standard `ICal` `Parser`, being the easiest path to start using the library in case the provider uses the reference iCalendar standard.

### Supported Providers

#### Supported providers using the BCOP standard

- Arelion (previously Telia)
- EuNetworks
- EXA (formerly GTT) (\*)
- NTT
- PacketFabric
- Telstra (\*)

#### Supported providers based on other parsers

- AWS
- AquaComms
- BSO
- Cogent
- Colt
- Crown Castle Fiber
- Equinix
- EXA (formerly GTT) (\*)
- HGC
- Global Cloud Xchange
- Google
- Lumen
- Megaport
- Momentum
- Netflix (AS2906 only)
- Seaborn
- Sparkle
- Telstra (\*)
- Turkcell
- Verizon
- Windstream
- Zayo

(\*) Providers in both lists, with BCOP standard and nonstandard parsers.

> Note: Because these providers do not support the BCOP standard natively, maybe there are some gaps on the implemented parser that will be refined with new test cases. We encourage you to report related **issues**!

#### LLM-powered Parsers

The library supports an optional parser option leveraging Large Language Models (LLM) to provide best-effort parsing when the specific parsers have not been successful.

> Warning: Some of these integrations, such as OpenAI, require of extras installations parameters. Check the [extras section](#extras)

When the appropriate environment variable(s) are set (see below), these LLM parsers are automatically appended after all existing processors for each defined Provider.

> These integrations may involve some costs for API usage. Use it carefully! As an order of magnitude, a parsing of an email with OpenAI GPT gpt-3.5-turbo model costs $0.004.

These are the currently supported LLM integrations:

- `PARSER_LLM_QUESTION_STR` (Optional), question to overwrite the default one. Change it carefully. It has precedence over `PARSER_LLM_QUESTION_FILEPATH`
- `PARSER_LLM_QUESTION_FILEPATH` (Optional), a path to a file that contains a question to overwrite the default one.

- [OpenAI](https://openai.com/product), these are the supported ENVs:
  - `PARSER_OPENAI_API_KEY` (Required): OpenAI API Key.
  - `PARSER_OPENAI_MODEL` (Optional): The LLM model to use, defaults to "gpt-3.5-turbo".

### Metadata

Each `Maintenance` comes with a `metadata` attribute to provide information about the provider used and the process and parsers used in the successful parsing of the maintenance.

This information is relevant to validate the actual content of the `Maintenance` because it may be generated using an LLM-powered parser which means that the confidence level is lower than using a pre-defined parser. You can check the `Metadata.generate_by_llm` boolean to check it.

## Installation

The library is available as a Python package in pypi and can be installed with pip:
`pip install circuit-maintenance-parser`

### Extras

#### OpenAI

`pip install circuit-maintenance-parser[openai]`

## How to use it?

The library requires two things:

- The `notificationdata`: this is the data that the library will check to extract the maintenance notifications. It can be simple (only one data type and content, such as an iCalendar notification) or more complex (with multiple data parts of different types, such as from an email).
- The `provider` identifier: used to select the proper `Provider` which contains the `processor` logic to take the proper `Parsers` and use the data that they extract. By default, the `GenericProvider` (used when no other provider type is defined) will support parsing of `iCalendar` notifications using the recommended format.

### Python Library

The first step is to define the `Provider` that we will use to parse the notifications. As commented, there is a `GenericProvider` that implements the gold standard format and can be reused for any notification matching the expectations.

```python
from circuit_maintenance_parser import init_provider

generic_provider = init_provider()

type(generic_provider)
<class 'circuit_maintenance_parser.provider.GenericProvider'>
```

However, usually some `Providers` don't fully implement the standard and maybe some information is missing, for example the `organizer` email or maybe a custom logic to combine information is required, so we allow custom `Providers`:

```python
ntt_provider = init_provider("ntt")

type(ntt_provider)
<class 'circuit_maintenance_parser.provider.NTT'>
```

Once we have the `Provider` ready, we need to initialize the data to process, we call it `NotificationData` and can be initialized from a simple content and type or from more complex structures, such as an email.

```python
from circuit_maintenance_parser import NotificationData

raw_data = b"""BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Maint Note//https://github.com/maint-notification//
BEGIN:VEVENT
SUMMARY:Maint Note Example
DTSTART;VALUE=DATE-TIME:20151010T080000Z
DTEND;VALUE=DATE-TIME:20151010T100000Z
DTSTAMP;VALUE=DATE-TIME:20151010T001000Z
UID:42
SEQUENCE:1
X-MAINTNOTE-PROVIDER:example.com
X-MAINTNOTE-ACCOUNT:137.035999173
X-MAINTNOTE-MAINTENANCE-ID:WorkOrder-31415
X-MAINTNOTE-IMPACT:OUTAGE
X-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=NO-IMPACT:acme-widgets-as-a-service
X-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=OUTAGE:acme-widgets-as-a-service-2
X-MAINTNOTE-STATUS:TENTATIVE
ORGANIZER;CN="Example NOC":mailto:noone@example.com
END:VEVENT
END:VCALENDAR
"""

data_to_process = NotificationData.init_from_raw("ical", raw_data)

type(data_to_process)
<class 'circuit_maintenance_parser.data.NotificationData'>
```

Finally, with we retrieve the maintenances (it is a `List` because a notification can contain multiple maintenances) from the data calling the `get_maintenances` method from the `Provider` instance:

```python
maintenances = generic_provider.get_maintenances(data_to_process)

print(maintenances[0].to_json())
{
"account": "137.035999173",
"circuits": [
{
"circuit_id": "acme-widgets-as-a-service",
"impact": "NO-IMPACT"
},
{
"circuit_id": "acme-widgets-as-a-service-2",
"impact": "OUTAGE"
}
],
"end": 1444471200,
"maintenance_id": "WorkOrder-31415",
"organizer": "mailto:noone@example.com",
"provider": "example.com",
"sequence": 1,
"stamp": 1444435800,
"start": 1444464000,
"status": "TENTATIVE",
"summary": "Maint Note Example",
"uid": "42"
}
```

Notice that, either with the `GenericProvider` or `NTT` provider, we get the same result from the same data, because they are using exactly the same `Processor` and `Parser`. The only difference is that `NTT` notifications come without `organizer` and `provider` in the notification, and this info is fulfilled with some default values for the `Provider`, but in this case, the original notification contains all the necessary information, so the defaults are not used.

```python
ntt_maintenances = ntt_provider.get_maintenances(data_to_process)
assert maintenances_ntt == maintenances
```

Every maintenance contains the `metadata` attribute to understand how has been parsed:

```python
print(maintenances[0].metadata)
provider='genericprovider' processor="SimpleProcessor" parsers=["ICal"], generated_by_llm=False
```

### CLI

There is also a `cli` entry point `circuit-maintenance-parser` which offers easy access to the library using a few arguments:

- `data-file`: file storing the notification.
- `data-type`: `ical`, `html` or `email`, depending on the data type.
- `provider-type`: to choose the right `Provider`. If empty, the `GenericProvider` is used.

```bash
circuit-maintenance-parser --data-file "/tmp/___ZAYO TTN-00000000 Planned MAINTENANCE NOTIFICATION___.eml" --data-type email --provider-type zayo
Circuit Maintenance Notification #0
{
  "account": "some account",
  "circuits": [
    {
      "circuit_id": "/OGYX/000000/ /ZYO /",
      "impact": "OUTAGE"
    }
  ],
  "end": 1601035200,
  "maintenance_id": "TTN-00000000",
  "organizer": "mr@zayo.com",
  "provider": "zayo",
  "sequence": 1,
  "stamp": 1599436800,
  "start": 1601017200,
  "status": "CONFIRMED",
  "summary": "Zayo will implement planned maintenance to troubleshoot and restore degraded span",
  "uid": "0"
}
```

## How to Extend the Library?

Even though the library aims to include support for as many providers as possible, it's likely that not all the thousands of NSP are supported and you may need to add support for some new one. Adding a new `Provider` is quite straightforward, and in the following example we are adding support for an imaginary provider, ABCDE, that uses HTML notifications.

The first step is creating a new file: `circuit_maintenance_parser/parsers/abcde.py`. This file will contain all the custom parsers needed for the provider and it will import the base classes for each parser type from `circuit_maintenance_parser.parser`. In the example, we only need to import `Html` and in the child class implement the methods required by the class, in this case `parse_html()` which will return a `dict` with all the data that this `Parser` can extract. In this case, we have to helper methods, `_parse_bs` and `_parse_tables` that implement the logic to navigate the notification data.

```python
from typing import Dict
import bs4  # type: ignore
from bs4.element import ResultSet  # type: ignore
from circuit_maintenance_parser.parser import Html

class HtmlParserABCDE1(Html):
    def parse_html(self, soup: ResultSet) -> Dict:
        data = {}
        self._parse_bs(soup.find_all("b"), data)
        self._parse_tables(soup.find_all("table"), data)
        return [data]

    def _parse_bs(self, btags: ResultSet, data: Dict):
      ...

    def _parse_tables(self, tables: ResultSet, data: Dict):
      ...
```

The next step is to create the new `Provider` by defining a new class in `circuit_maintenance_parser/provider.py`. This class that inherits from `GenericProvider` only needs to define two attributes:

- `_processors`: is a `list` of `Processor` instances that uses several data `Parsers`. In this example, we don't need to create a new custom `Processor` because the combined logic serves well (the most likely case), and we only need to use the newly defined `HtmlParserABCDE1` and also the generic `EmailDateParser` that extracts the email date. Also notice that you could have multiple `Processors` with different `Parsers` in this list, supporting several formats.
- `_default_organizer`: This is a default helper to fill the `organizer` attribute in the `Maintenance` if the information is not part of the original notification.

```python
class ABCDE(GenericProvider):
    _processors: List[GenericProcessor] = [
        CombinedProcessor(data_parsers=[EmailDateParser, HtmlParserABCDE1]),
    ]
    _default_organizer = "noc@abcde.com"
```

And expose the new `Provider` in `circuit_maintenance_parser/__init__.py`:

```python
from .provider import (
    GenericProvider,
    ABCDE,
    ...
)

SUPPORTED_PROVIDERS = (
    GenericProvider,
    ABCDE,
    ...
)
```

Last, but not least, you should update the tests!

- Test the new `Parser` in `tests/unit/test_parsers.py`
- Test the new `Provider` logic in `tests/unit/test_e2e.py`

... adding the necessary data samples in `tests/unit/data/abcde/`.

> You can anonymize your IPv4 and IPv6 addresses using the `invoke anonymize-ips --local`. Keep in mind that only IPv4 addresses for documentation purposes (RFC5737: "192.0.2.0/24", "198.51.100.0/24", "203.0.113.0/24") are preserved, in case you need to check these IPs in your test output (unlikely)

# Contributing

Pull requests are welcomed and automatically built and tested against multiple versions of Python through Travis CI.

The project is following Network to Code software development guidelines and is leveraging:

- Black, Pylint, Mypy, Bandit and pydocstyle for Python linting and formatting.
- Unit and integration tests to ensure the library is working properly.

## Local Development

### Requirements

- Install `poetry`
- Install dependencies and library locally: `poetry install`
- Run CI tests locally: `invoke tests --local`

### How to add a new Circuit Maintenance provider?

1. Define the `Parsers`(inheriting from some of the generic `Parsers` or a new one) that will extract the data from the notification, which could contain multiple `DataParts`. The `data_type` of the `Parser` and the `DataPart` have to match. The custom `Parsers` will be placed in the `parsers` folder.
2. Update the `unit/test_parsers.py` with the new parsers, providing some data to test and validate the extracted data.
3. Define a new `Provider` inheriting from the `GenericProvider`, defining the `Processors` and the respective `Parsers` to be used. Maybe you can reuse some of the generic `Processors` or maybe you will need to create a custom one. If this is the case, place it in the `processors` folder.
   - The `Provider` also supports the definition of a `_include_filter` and a `_exclude_filter` to limit the notifications that are actually processed, avoiding false positive errors for notification that are not relevant.
4. Update the `unit/test_e2e.py` with the new provider, providing some data to test and validate the final `Maintenances` created.
5. **Expose the new `Provider` class** updating the map `SUPPORTED_PROVIDERS` in `circuit_maintenance_parser/__init__.py` to officially expose the `Provider`.
6. You can run some tests here to verify that your new unit tests do not cause issues with existing tests, and in general they work as expected. You can do this by running `pytest --log-cli-level=DEBUG --capture=tee-sys`. You can narrow down the tests that you want to execute with the `-k` flag. If successful, your results should look similar to the following:

```
-> % pytest --log-cli-level=DEBUG --capture=tee-sys -k test_parsers
...omitted debug logs...
====================================================== 99 passed, 174 deselected, 17 warnings in 10.35s ======================================================
```

7. Run some final CI tests locally to ensure that there is no linting/formatting issues with your changes. You should look to get a code score of 10/10. See the example below: `invoke tests --local`

```
-> % invoke tests --local
LOCAL - Running command black --check --diff .
All done! ✨ 🍰 ✨
41 files would be left unchanged.
LOCAL - Running command flake8 .
LOCAL - Running command find . -name "*.py" | xargs pylint
************* Module tasks
tasks.py:4:0: W0402: Uses of a deprecated module 'distutils.util' (deprecated-module)

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
```

### How to debug circuit-maintenance-parser library locally

1. `poetry install` updates the library and its dependencies locally.
2. `circuit-maintenance-parser` is now built with your recent local changes.

If you were to add loggers or debuggers to one of the classes:

```python
class HtmlParserZayo1(Html):
    def parse_bs(self, btags: ResultSet, data: dict):
        """Parse B tag."""
        raise Exception('Debugging exception')
```

After running `poetry install`:

```
-> % circuit-maintenance-parser --data-file ~/Downloads/zayo.eml --data-type email --provider-type zayo
Provider processing failed: Failed creating Maintenance notification for Zayo.
Details:
- Processor CombinedProcessor from Zayo failed due to: Debugging exception
```

> Note: `invoke build` will result in an error due to no Dockerfile. This is expected as the library runs simple pytest testing without a container.

```
-> % invoke build
Building image circuit-maintenance-parser:2.2.2-py3.8
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 2B done
#1 DONE 0.0s
WARNING: failed to get git remote url: fatal: No remote configured to list refs from.
ERROR: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: open /var/lib/docker/tmp/buildkit-mount1243547759/Dockerfile: no such file or directory
```

## Questions

For any questions or comments, please check the [FAQ](FAQ.md) first and feel free to swing by the [Network to Code slack channel](https://networktocode.slack.com/) (channel #networktocode).
Sign up [here](http://slack.networktocode.com/)

## License notes

This library uses a Basic World Cities Database by Pareto Software, LLC, the owner of Simplemaps.com: The Provider offers a Basic World Cities Database free of charge. This database is licensed under the Creative Commons Attribution 4.0 license as described at: https://creativecommons.org/licenses/by/4.0/.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/networktocode/circuit-maintenance-parser",
    "name": "circuit-maintenance-parser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "parser, circuit, maintenance",
    "author": "Network to Code",
    "author_email": "opensource@networktocode.com",
    "download_url": "https://files.pythonhosted.org/packages/9b/d7/0b8e4607980cf1c192d8ab6e009ce1e5ee20fe762f28eead57f88fdfe81b/circuit_maintenance_parser-2.6.0.tar.gz",
    "platform": null,
    "description": "# circuit-maintenance-parser\n\n`circuit-maintenance-parser` is a Python library that parses circuit maintenance notifications from Network Service Providers (NSPs), converting heterogeneous formats to a well-defined structured format.\n\n## Context\n\nEvery network depends on external circuits provided by NSPs who interconnect them to the Internet, to office branches or to\nexternal service providers such as Public Clouds.\n\nObviously, these services occasionally require operation windows to upgrade or to fix related issues, and usually, they happen in the form of **circuit maintenance periods**.\nNSPs generally notify customers of these upcoming events so that customers can take actions to minimize the impact on the regular usage of the related circuits.\n\nThe challenge faced by many customers is that almost every NSP defines its own maintenance notification format, even though in the\nend the relevant information is mostly the same across NSPs. This library is built to parse notification formats from\nseveral providers and to return always the same object struct which will make it easier to process them afterwards.\n\nThe format of this output follows the [BCOP](https://github.com/jda/maintnote-std/blob/master/standard.md) defined\nduring a NANOG meeting that aimed to promote the usage of the iCalendar format. Indeed, if the NSP is using the\nproposed iCalendar format, the parser is straightforward and there is no need to define custom logic, but this library\nenables supporting other providers that are not using this proposed practice, getting the same outcome.\n\nYou can leverage this library in your automation framework to process circuit maintenance notifications and use the standardized [`Maintenance`](https://github.com/networktocode/circuit-maintenance-parser/blob/develop/circuit_maintenance_parser/output.py) model to handle your received circuit maintenance notifications in a simple way. Every `Maintenance` object contains the following attributes:\n\n- **provider**: identifies the provider of the service that is the subject of the maintenance notification.\n- **account**: identifies an account associated with the service that is the subject of the maintenance notification.\n- **maintenance_id**: contains text that uniquely identifies (at least within the context of a specific provider) the maintenance that is the subject of the notification.\n- **circuits**: list of circuits affected by the maintenance notification and their specific impact. Note that in a maintenance canceled or completed notification, some providers omit the circuit list, so this may be blank for maintenance notifications with a status of CANCELLED or COMPLETED.\n- **start**: timestamp that defines the starting date/time of the maintenance in GMT.\n- **end**: timestamp that defines the ending date/time of the maintenance in GMT.\n- **stamp**: timestamp that defines the update date/time of the maintenance in GMT.\n- **organizer**: defines the contact information included in the original notification.\n- **status**: defines the overall status or confirmation for the maintenance.\u00b9\n- **summary**: human-readable details about this maintenance notification. May be an empty string.\n- **sequence**: a sequence number for notifications involving this maintenance window. In practice, this is generally redundant with the **stamp** field and will be defaulted to `1` for most non-iCalendar parsed notifications.\u00b2\n- **uid**: a unique (?) identifier for a thread of related notifications. In practice, this is generally redundant with the **maintenance_id** field and will be defaulted to `0` for most non-iCalendar parsed notifications.\n\n> Please, refer to the [BCOP](https://github.com/jda/maintnote-std/blob/master/standard.md) to more details about the standardized meaning of these attributes.\n\n\u00b9 Per the BCOP, the **status** (`X-MAINTNOTE_STATUS`) is an optional field in iCalendar notifications. However, a `Maintenance` object will always contain a `status` value; in the case where an iCalendar notification omits this field, the `status` will be set to `\"NO-CHANGE\"`, and it's up to the consumer of this library to determine how to appropriately handle this case. Parsers of other notification formats are responsible for setting an appropriate value for this field based on the notification contents, and may or may not include `\"NO-CHANGE\"` as one of the possible reported values.\n\n\u00b2 Per the BCOP, the **sequence** is a mandatory field in iCalendar notifications. However, some NSPs have been seen to send notifications which, while otherwise consistent with the BCOP, omit the `SEQUENCE` field; in such cases, this library will report a sequence number of `-1`.\n\n## Workflow\n\n1. We instantiate a `Provider`, directly or via the `init_provider` method, that depending on the selected type will return the corresponding instance.\n2. Get an instance of the `NotificationData` class. This instance groups together `DataParts` which each contain some content and a specific type (that will match a specific `Parser`). For example, a `NotificationData` might describe a received email message, with `DataParts` corresponding to the subject line and body of the email. There are factory methods to initialize a `NotificationData` describing a single chunk of binary data, as well as others to initialize one directly from a raw email message or `email.message.EmailMessage` instance.\n3. Each `Provider` uses one or more `Processors` that will be used to build `Maintenances` when the `Provider.get_maintenances(data)` method is called.\n4. Each `Processor` class uses one or more `Parsers` to process each type of data that it handles. It can have custom logic to combine the parsed data from multiple `Parsers` to create the final `Maintenance` object.\n5. Each `Parser` class supports one or a set of related data types, and implements the `Parser.parse()` method used to retrieve a `Dict` with the relevant keys/values.\n\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/networktocode/circuit-maintenance-parser/develop/docs/images/new_workflow.png\" width=\"800\" class=\"center\">\n</p>\n\nBy default, there is a `GenericProvider` that supports a `SimpleProcessor` using the standard `ICal` `Parser`, being the easiest path to start using the library in case the provider uses the reference iCalendar standard.\n\n### Supported Providers\n\n#### Supported providers using the BCOP standard\n\n- Arelion (previously Telia)\n- EuNetworks\n- EXA (formerly GTT) (\\*)\n- NTT\n- PacketFabric\n- Telstra (\\*)\n\n#### Supported providers based on other parsers\n\n- AWS\n- AquaComms\n- BSO\n- Cogent\n- Colt\n- Crown Castle Fiber\n- Equinix\n- EXA (formerly GTT) (\\*)\n- HGC\n- Global Cloud Xchange\n- Google\n- Lumen\n- Megaport\n- Momentum\n- Netflix (AS2906 only)\n- Seaborn\n- Sparkle\n- Telstra (\\*)\n- Turkcell\n- Verizon\n- Windstream\n- Zayo\n\n(\\*) Providers in both lists, with BCOP standard and nonstandard parsers.\n\n> Note: Because these providers do not support the BCOP standard natively, maybe there are some gaps on the implemented parser that will be refined with new test cases. We encourage you to report related **issues**!\n\n#### LLM-powered Parsers\n\nThe library supports an optional parser option leveraging Large Language Models (LLM) to provide best-effort parsing when the specific parsers have not been successful.\n\n> Warning: Some of these integrations, such as OpenAI, require of extras installations parameters. Check the [extras section](#extras)\n\nWhen the appropriate environment variable(s) are set (see below), these LLM parsers are automatically appended after all existing processors for each defined Provider.\n\n> These integrations may involve some costs for API usage. Use it carefully! As an order of magnitude, a parsing of an email with OpenAI GPT gpt-3.5-turbo model costs $0.004.\n\nThese are the currently supported LLM integrations:\n\n- `PARSER_LLM_QUESTION_STR` (Optional), question to overwrite the default one. Change it carefully. It has precedence over `PARSER_LLM_QUESTION_FILEPATH`\n- `PARSER_LLM_QUESTION_FILEPATH` (Optional), a path to a file that contains a question to overwrite the default one.\n\n- [OpenAI](https://openai.com/product), these are the supported ENVs:\n  - `PARSER_OPENAI_API_KEY` (Required): OpenAI API Key.\n  - `PARSER_OPENAI_MODEL` (Optional): The LLM model to use, defaults to \"gpt-3.5-turbo\".\n\n### Metadata\n\nEach `Maintenance` comes with a `metadata` attribute to provide information about the provider used and the process and parsers used in the successful parsing of the maintenance.\n\nThis information is relevant to validate the actual content of the `Maintenance` because it may be generated using an LLM-powered parser which means that the confidence level is lower than using a pre-defined parser. You can check the `Metadata.generate_by_llm` boolean to check it.\n\n## Installation\n\nThe library is available as a Python package in pypi and can be installed with pip:\n`pip install circuit-maintenance-parser`\n\n### Extras\n\n#### OpenAI\n\n`pip install circuit-maintenance-parser[openai]`\n\n## How to use it?\n\nThe library requires two things:\n\n- The `notificationdata`: this is the data that the library will check to extract the maintenance notifications. It can be simple (only one data type and content, such as an iCalendar notification) or more complex (with multiple data parts of different types, such as from an email).\n- The `provider` identifier: used to select the proper `Provider` which contains the `processor` logic to take the proper `Parsers` and use the data that they extract. By default, the `GenericProvider` (used when no other provider type is defined) will support parsing of `iCalendar` notifications using the recommended format.\n\n### Python Library\n\nThe first step is to define the `Provider` that we will use to parse the notifications. As commented, there is a `GenericProvider` that implements the gold standard format and can be reused for any notification matching the expectations.\n\n```python\nfrom circuit_maintenance_parser import init_provider\n\ngeneric_provider = init_provider()\n\ntype(generic_provider)\n<class 'circuit_maintenance_parser.provider.GenericProvider'>\n```\n\nHowever, usually some `Providers` don't fully implement the standard and maybe some information is missing, for example the `organizer` email or maybe a custom logic to combine information is required, so we allow custom `Providers`:\n\n```python\nntt_provider = init_provider(\"ntt\")\n\ntype(ntt_provider)\n<class 'circuit_maintenance_parser.provider.NTT'>\n```\n\nOnce we have the `Provider` ready, we need to initialize the data to process, we call it `NotificationData` and can be initialized from a simple content and type or from more complex structures, such as an email.\n\n```python\nfrom circuit_maintenance_parser import NotificationData\n\nraw_data = b\"\"\"BEGIN:VCALENDAR\nVERSION:2.0\nPRODID:-//Maint Note//https://github.com/maint-notification//\nBEGIN:VEVENT\nSUMMARY:Maint Note Example\nDTSTART;VALUE=DATE-TIME:20151010T080000Z\nDTEND;VALUE=DATE-TIME:20151010T100000Z\nDTSTAMP;VALUE=DATE-TIME:20151010T001000Z\nUID:42\nSEQUENCE:1\nX-MAINTNOTE-PROVIDER:example.com\nX-MAINTNOTE-ACCOUNT:137.035999173\nX-MAINTNOTE-MAINTENANCE-ID:WorkOrder-31415\nX-MAINTNOTE-IMPACT:OUTAGE\nX-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=NO-IMPACT:acme-widgets-as-a-service\nX-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=OUTAGE:acme-widgets-as-a-service-2\nX-MAINTNOTE-STATUS:TENTATIVE\nORGANIZER;CN=\"Example NOC\":mailto:noone@example.com\nEND:VEVENT\nEND:VCALENDAR\n\"\"\"\n\ndata_to_process = NotificationData.init_from_raw(\"ical\", raw_data)\n\ntype(data_to_process)\n<class 'circuit_maintenance_parser.data.NotificationData'>\n```\n\nFinally, with we retrieve the maintenances (it is a `List` because a notification can contain multiple maintenances) from the data calling the `get_maintenances` method from the `Provider` instance:\n\n```python\nmaintenances = generic_provider.get_maintenances(data_to_process)\n\nprint(maintenances[0].to_json())\n{\n\"account\": \"137.035999173\",\n\"circuits\": [\n{\n\"circuit_id\": \"acme-widgets-as-a-service\",\n\"impact\": \"NO-IMPACT\"\n},\n{\n\"circuit_id\": \"acme-widgets-as-a-service-2\",\n\"impact\": \"OUTAGE\"\n}\n],\n\"end\": 1444471200,\n\"maintenance_id\": \"WorkOrder-31415\",\n\"organizer\": \"mailto:noone@example.com\",\n\"provider\": \"example.com\",\n\"sequence\": 1,\n\"stamp\": 1444435800,\n\"start\": 1444464000,\n\"status\": \"TENTATIVE\",\n\"summary\": \"Maint Note Example\",\n\"uid\": \"42\"\n}\n```\n\nNotice that, either with the `GenericProvider` or `NTT` provider, we get the same result from the same data, because they are using exactly the same `Processor` and `Parser`. The only difference is that `NTT` notifications come without `organizer` and `provider` in the notification, and this info is fulfilled with some default values for the `Provider`, but in this case, the original notification contains all the necessary information, so the defaults are not used.\n\n```python\nntt_maintenances = ntt_provider.get_maintenances(data_to_process)\nassert maintenances_ntt == maintenances\n```\n\nEvery maintenance contains the `metadata` attribute to understand how has been parsed:\n\n```python\nprint(maintenances[0].metadata)\nprovider='genericprovider' processor=\"SimpleProcessor\" parsers=[\"ICal\"], generated_by_llm=False\n```\n\n### CLI\n\nThere is also a `cli` entry point `circuit-maintenance-parser` which offers easy access to the library using a few arguments:\n\n- `data-file`: file storing the notification.\n- `data-type`: `ical`, `html` or `email`, depending on the data type.\n- `provider-type`: to choose the right `Provider`. If empty, the `GenericProvider` is used.\n\n```bash\ncircuit-maintenance-parser --data-file \"/tmp/___ZAYO TTN-00000000 Planned MAINTENANCE NOTIFICATION___.eml\" --data-type email --provider-type zayo\nCircuit Maintenance Notification #0\n{\n  \"account\": \"some account\",\n  \"circuits\": [\n    {\n      \"circuit_id\": \"/OGYX/000000/ /ZYO /\",\n      \"impact\": \"OUTAGE\"\n    }\n  ],\n  \"end\": 1601035200,\n  \"maintenance_id\": \"TTN-00000000\",\n  \"organizer\": \"mr@zayo.com\",\n  \"provider\": \"zayo\",\n  \"sequence\": 1,\n  \"stamp\": 1599436800,\n  \"start\": 1601017200,\n  \"status\": \"CONFIRMED\",\n  \"summary\": \"Zayo will implement planned maintenance to troubleshoot and restore degraded span\",\n  \"uid\": \"0\"\n}\n```\n\n## How to Extend the Library?\n\nEven though the library aims to include support for as many providers as possible, it's likely that not all the thousands of NSP are supported and you may need to add support for some new one. Adding a new `Provider` is quite straightforward, and in the following example we are adding support for an imaginary provider, ABCDE, that uses HTML notifications.\n\nThe first step is creating a new file: `circuit_maintenance_parser/parsers/abcde.py`. This file will contain all the custom parsers needed for the provider and it will import the base classes for each parser type from `circuit_maintenance_parser.parser`. In the example, we only need to import `Html` and in the child class implement the methods required by the class, in this case `parse_html()` which will return a `dict` with all the data that this `Parser` can extract. In this case, we have to helper methods, `_parse_bs` and `_parse_tables` that implement the logic to navigate the notification data.\n\n```python\nfrom typing import Dict\nimport bs4  # type: ignore\nfrom bs4.element import ResultSet  # type: ignore\nfrom circuit_maintenance_parser.parser import Html\n\nclass HtmlParserABCDE1(Html):\n    def parse_html(self, soup: ResultSet) -> Dict:\n        data = {}\n        self._parse_bs(soup.find_all(\"b\"), data)\n        self._parse_tables(soup.find_all(\"table\"), data)\n        return [data]\n\n    def _parse_bs(self, btags: ResultSet, data: Dict):\n      ...\n\n    def _parse_tables(self, tables: ResultSet, data: Dict):\n      ...\n```\n\nThe next step is to create the new `Provider` by defining a new class in `circuit_maintenance_parser/provider.py`. This class that inherits from `GenericProvider` only needs to define two attributes:\n\n- `_processors`: is a `list` of `Processor` instances that uses several data `Parsers`. In this example, we don't need to create a new custom `Processor` because the combined logic serves well (the most likely case), and we only need to use the newly defined `HtmlParserABCDE1` and also the generic `EmailDateParser` that extracts the email date. Also notice that you could have multiple `Processors` with different `Parsers` in this list, supporting several formats.\n- `_default_organizer`: This is a default helper to fill the `organizer` attribute in the `Maintenance` if the information is not part of the original notification.\n\n```python\nclass ABCDE(GenericProvider):\n    _processors: List[GenericProcessor] = [\n        CombinedProcessor(data_parsers=[EmailDateParser, HtmlParserABCDE1]),\n    ]\n    _default_organizer = \"noc@abcde.com\"\n```\n\nAnd expose the new `Provider` in `circuit_maintenance_parser/__init__.py`:\n\n```python\nfrom .provider import (\n    GenericProvider,\n    ABCDE,\n    ...\n)\n\nSUPPORTED_PROVIDERS = (\n    GenericProvider,\n    ABCDE,\n    ...\n)\n```\n\nLast, but not least, you should update the tests!\n\n- Test the new `Parser` in `tests/unit/test_parsers.py`\n- Test the new `Provider` logic in `tests/unit/test_e2e.py`\n\n... adding the necessary data samples in `tests/unit/data/abcde/`.\n\n> You can anonymize your IPv4 and IPv6 addresses using the `invoke anonymize-ips --local`. Keep in mind that only IPv4 addresses for documentation purposes (RFC5737: \"192.0.2.0/24\", \"198.51.100.0/24\", \"203.0.113.0/24\") are preserved, in case you need to check these IPs in your test output (unlikely)\n\n# Contributing\n\nPull requests are welcomed and automatically built and tested against multiple versions of Python through Travis CI.\n\nThe project is following Network to Code software development guidelines and is leveraging:\n\n- Black, Pylint, Mypy, Bandit and pydocstyle for Python linting and formatting.\n- Unit and integration tests to ensure the library is working properly.\n\n## Local Development\n\n### Requirements\n\n- Install `poetry`\n- Install dependencies and library locally: `poetry install`\n- Run CI tests locally: `invoke tests --local`\n\n### How to add a new Circuit Maintenance provider?\n\n1. Define the `Parsers`(inheriting from some of the generic `Parsers` or a new one) that will extract the data from the notification, which could contain multiple `DataParts`. The `data_type` of the `Parser` and the `DataPart` have to match. The custom `Parsers` will be placed in the `parsers` folder.\n2. Update the `unit/test_parsers.py` with the new parsers, providing some data to test and validate the extracted data.\n3. Define a new `Provider` inheriting from the `GenericProvider`, defining the `Processors` and the respective `Parsers` to be used. Maybe you can reuse some of the generic `Processors` or maybe you will need to create a custom one. If this is the case, place it in the `processors` folder.\n   - The `Provider` also supports the definition of a `_include_filter` and a `_exclude_filter` to limit the notifications that are actually processed, avoiding false positive errors for notification that are not relevant.\n4. Update the `unit/test_e2e.py` with the new provider, providing some data to test and validate the final `Maintenances` created.\n5. **Expose the new `Provider` class** updating the map `SUPPORTED_PROVIDERS` in `circuit_maintenance_parser/__init__.py` to officially expose the `Provider`.\n6. You can run some tests here to verify that your new unit tests do not cause issues with existing tests, and in general they work as expected. You can do this by running `pytest --log-cli-level=DEBUG --capture=tee-sys`. You can narrow down the tests that you want to execute with the `-k` flag. If successful, your results should look similar to the following:\n\n```\n-> % pytest --log-cli-level=DEBUG --capture=tee-sys -k test_parsers\n...omitted debug logs...\n====================================================== 99 passed, 174 deselected, 17 warnings in 10.35s ======================================================\n```\n\n7. Run some final CI tests locally to ensure that there is no linting/formatting issues with your changes. You should look to get a code score of 10/10. See the example below: `invoke tests --local`\n\n```\n-> % invoke tests --local\nLOCAL - Running command black --check --diff .\nAll done! \u2728 \ud83c\udf70 \u2728\n41 files would be left unchanged.\nLOCAL - Running command flake8 .\nLOCAL - Running command find . -name \"*.py\" | xargs pylint\n************* Module tasks\ntasks.py:4:0: W0402: Uses of a deprecated module 'distutils.util' (deprecated-module)\n\n--------------------------------------------------------------------\nYour code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)\n```\n\n### How to debug circuit-maintenance-parser library locally\n\n1. `poetry install` updates the library and its dependencies locally.\n2. `circuit-maintenance-parser` is now built with your recent local changes.\n\nIf you were to add loggers or debuggers to one of the classes:\n\n```python\nclass HtmlParserZayo1(Html):\n    def parse_bs(self, btags: ResultSet, data: dict):\n        \"\"\"Parse B tag.\"\"\"\n        raise Exception('Debugging exception')\n```\n\nAfter running `poetry install`:\n\n```\n-> % circuit-maintenance-parser --data-file ~/Downloads/zayo.eml --data-type email --provider-type zayo\nProvider processing failed: Failed creating Maintenance notification for Zayo.\nDetails:\n- Processor CombinedProcessor from Zayo failed due to: Debugging exception\n```\n\n> Note: `invoke build` will result in an error due to no Dockerfile. This is expected as the library runs simple pytest testing without a container.\n\n```\n-> % invoke build\nBuilding image circuit-maintenance-parser:2.2.2-py3.8\n#1 [internal] load build definition from Dockerfile\n#1 transferring dockerfile: 2B done\n#1 DONE 0.0s\nWARNING: failed to get git remote url: fatal: No remote configured to list refs from.\nERROR: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: open /var/lib/docker/tmp/buildkit-mount1243547759/Dockerfile: no such file or directory\n```\n\n## Questions\n\nFor any questions or comments, please check the [FAQ](FAQ.md) first and feel free to swing by the [Network to Code slack channel](https://networktocode.slack.com/) (channel #networktocode).\nSign up [here](http://slack.networktocode.com/)\n\n## License notes\n\nThis library uses a Basic World Cities Database by Pareto Software, LLC, the owner of Simplemaps.com: The Provider offers a Basic World Cities Database free of charge. This database is licensed under the Creative Commons Attribution 4.0 license as described at: https://creativecommons.org/licenses/by/4.0/.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Python library to parse Circuit Maintenance notifications and return a structured data back",
    "version": "2.6.0",
    "project_urls": {
        "Homepage": "https://github.com/networktocode/circuit-maintenance-parser",
        "Repository": "https://github.com/networktocode/circuit-maintenance-parser"
    },
    "split_keywords": [
        "parser",
        " circuit",
        " maintenance"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e8840fca3dfdf11d890d5afad6ad2a68096f50e905828d6540bc4377f332a89",
                "md5": "0ce99b194ecf7a48520097702cfaec8d",
                "sha256": "db6b627aa850f2f4ebdd9976a3a6d5c664b4b4fce94e09932a6d593282b1e991"
            },
            "downloads": -1,
            "filename": "circuit_maintenance_parser-2.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0ce99b194ecf7a48520097702cfaec8d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 1536443,
            "upload_time": "2024-04-04T13:27:32",
            "upload_time_iso_8601": "2024-04-04T13:27:32.222613Z",
            "url": "https://files.pythonhosted.org/packages/2e/88/40fca3dfdf11d890d5afad6ad2a68096f50e905828d6540bc4377f332a89/circuit_maintenance_parser-2.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9bd70b8e4607980cf1c192d8ab6e009ce1e5ee20fe762f28eead57f88fdfe81b",
                "md5": "5eec0aa7e005f3ad85632ea8c68f2876",
                "sha256": "6cc7e76ff27861876fbdefc31d223dc15dbe687ff7bd0de474ff2c9218848d73"
            },
            "downloads": -1,
            "filename": "circuit_maintenance_parser-2.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5eec0aa7e005f3ad85632ea8c68f2876",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 1498336,
            "upload_time": "2024-04-04T13:27:34",
            "upload_time_iso_8601": "2024-04-04T13:27:34.843309Z",
            "url": "https://files.pythonhosted.org/packages/9b/d7/0b8e4607980cf1c192d8ab6e009ce1e5ee20fe762f28eead57f88fdfe81b/circuit_maintenance_parser-2.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-04 13:27:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "networktocode",
    "github_project": "circuit-maintenance-parser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "circuit-maintenance-parser"
}
        
Elapsed time: 0.21765s