kohlrahbi


Namekohlrahbi JSON
Version 0.4.1 PyPI version JSON
download
home_pageNone
SummaryTool to generate machine readable files from AHB documents
upload_time2024-03-28 14:35:44
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseGPL
keywords ahb automation bdew edi@energy
VCS
bugtrack_url
requirements attrs click colorama colorlog et-xmlfile lxml marshmallow maus more-itertools numpy openpyxl packaging pandas python-dateutil python-docx pytz six tomlkit typing-extensions tzdata xlsxwriter
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="kohlrahbi-image.png" alt="kohlrahbi-logo" width="512" height="512">
</p>

# KohlrAHBi
![Unittests status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Unittests/badge.svg)
![Coverage status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Coverage/badge.svg)
![Linting status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Linting/badge.svg)
![Black status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Black/badge.svg)
![PyPI](https://img.shields.io/pypi/v/kohlrahbi)


Kohlrahbi generates machine-readable files from AHB documents.
Kohlrahbi's sister is [MIG_mose](https://github.com/Hochfrequenz/migmose).

## Rationale
German utilities exchange data using [EDIFACT](https://en.wikipedia.org/wiki/EDIFACT); This is called market communication (mako).
The _Forum Datenformate_ of the BDEW publishes the technical regulations of the EDIFACT based market communication on [`edi-energy.de`](https://www.edi-energy.de/).
These rules are not stable but change twice a year (in theory) or few times per year (in reality).

Specific rules, which are binding for every German utility are kind of formalised in so called "**A**nwendungs**h**and**b**üchern" (AHB).
Those AHBs are basically long tables that describe:
> As a utility, if I want to exchange data about business process XYZ with a market partner, then I have to provide the following information: [...]

In total the regulations from these Anwendungshandbücher span several thousand pages.
And by pages, we really _mean_ pages.
EDIFACT communication is basically the API between German utilities for most of their B2B processes.
However, the technical specifications of this API are
* prose
* on DIN A4 pages.

The Anwendungshandbücher are the epitome of digitization with some good intentions.

Although the AHBs are publicly available as PDF or Word files on `edi-energy.de`, they are hardly accessible in a technical sense:
* You cannot automatically extract information from the AHBs.
* You cannot run automatic comparisons between different versions.
* You cannot automatically test your own API against the set of rules, described in the AHBs (as prose).
* You cannot view or visualize the information from the AHBs in any more intuitive or practical way, than the raw tables from the AHB files.
* ...any many more...

The root cause for all these inaccessibility is a technical one:
Information that are theoretically structured are published in an unstructured format (PDF or Word), which is not suited for technical specifications in IT.

KohlrAHBi as a tool helps you to break those chains and access the AHBs as you'd expect it from technical specs: easy and automatically instead of with hours of mindless manual work.

**KohlrAHBi takes the `.docx` files published by `edi-energy.de` as an input and returns truly machine-readable data in a variety of formats (JSON, CSV...) as a result.**

Hence, KohlrAHBi is the key for unlocking any automation potential that is reliant on information hidden in the Anwendungshandbücher.

We're all hoping for the day of true digitization on which this repository will become obsolete.

## Installation
Kohlrahbi is a Python based tool.
Therefor you have to make sure, that Python is running on your machine.

We recommend to use virtual environments to keep your system clean.

Create a new virtual environment with
```bash
python -m venv .venv
```

The activation of the virtual environment depends on your used OS.

**Windows**
```
.venv\Scripts\activate
```
**MacOS/Linux**
```
source .venv/bin/activate
```
Finally, install the package with

```bash
pip install kohlrahbi
```

## Usage

There are two ways to use kohlrahbi.
1. You can extract all prüfidentifikatoren listed in [all_known_pruefis.toml,](src/kohlrahbi/all_known_pruefis.toml)
2. or you can extract a specific prüfidentifikator.

### Get all Prüfidentifikatoren
If you want to extract all prüfidentifikatoren, you can run the following command.
For the following steps we assume that you cloned our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror/) to a neighbouring directory.

```bash
kohlrahbi --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/ --file-type flatahb
```

This will extract all prüfidentifikatoren listed in [all_known_pruefis.toml](src/kohlrahbi/all_known_pruefis.toml) and save them in the provided output path.

### `.docx` Data Sources
kohlrahbi internally relies on a [specific naming schema](https://github.com/Hochfrequenz/kohlrahbi/blob/22a78dc076c7d5f9248cb9e8707b0cc14a2981d3/src/kohlrahbi/read_functions.py#L57) of the `.docx` files in which the file name holds information about the edifact format and validity period of the AHBs contained within the file.
The easiest way to be compliant with this naming schema is to clone our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror/) repository to your localhost.

### Get a specific Prüfidentifikator

If you want to extract a specific prüfidentifikator, you can run the following command.

```bash
kohlrahbi --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/ --pruefis 13002 --file-type xlsx
```

You can also provide multiple prüfidentifikatoren.

```bash
kohlrahbi --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/ --pruefis 13002 --pruefis 13003 --pruefis 13005 --file-type csv
```
### Results
There is a kohlrahbi based CI pipeline from the edi_energy_mirror mentioned above to the repository [machine-readable_anwendungshandbuecher](https://github.com/Hochfrequenz/machine-readable_anwendungshandbuecher) where you can find scraped AHBs as JSON, CSV or Excel files.

### Export ConditionKeys and ConditionTexts
For example to export condition.json files to [edi_energy_ahb_conditions_and_packages](https://github.com/Hochfrequenz/edi_energy_ahb_conditions_and_packages). Works best if no flags for "Prüfindentifikatoren" (--pruefis). In this case all known "Prüfidentifikatoren" are scanned. Thus all related conditions are gathered.
```bash
kohlrahbi --file-type conditions --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/edi_energy_ahb_conditions_and_packages/aktuelleFV
```

## Workflow

```mermaid
flowchart TB
    S[Start] --> RD[Read docx]
    RD --> RPT[Read all paragraphs <br> and tables]
    RPT --> I[Start iterating]
    I --> NI[Read next item]
    %% check for text paragraph %%
    NI --> CTP{Text Paragraph?}
    CTP -- Yes --> NI
    CTP -- No --> CCST{Is item just<br>Chapter or Section Title?}
    CCST -- Yes --> CTAenderunghistorie{Is Chapter Title<br>'Änderungshistorie'?}
    CTAenderunghistorie -- Yes --> EXPORT[Export Extract]
    CCST -- No --> CT{Is item a table<br>with prüfis?}
    CT -- Yes --> Extract[Create Extract]
```



## Development

### Setup

To set up the development environment, you have to install the dev dependencies.

```bash
tox -e dev
```

### Run all tests and linters

To run the tests, you can use tox.

```bash
tox
```
See our [Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine) for detailed explanations.

## Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

## Related Tools and Context

This repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kohlrahbi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "ahb, automation, bdew, edi@energy",
    "author": null,
    "author_email": "Kevin Krechan <kevin.krechan@hochfrequenz.de>",
    "download_url": "https://files.pythonhosted.org/packages/d7/2c/fb696e2dc5ee8b3c2b789e0d10fbebf618c35bee63372f760104a3bdb9d0/kohlrahbi-0.4.1.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"kohlrahbi-image.png\" alt=\"kohlrahbi-logo\" width=\"512\" height=\"512\">\n</p>\n\n# KohlrAHBi\n![Unittests status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Unittests/badge.svg)\n![Coverage status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Coverage/badge.svg)\n![Linting status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Linting/badge.svg)\n![Black status badge](https://github.com/Hochfrequenz/kohlrahbi/workflows/Black/badge.svg)\n![PyPI](https://img.shields.io/pypi/v/kohlrahbi)\n\n\nKohlrahbi generates machine-readable files from AHB documents.\nKohlrahbi's sister is [MIG_mose](https://github.com/Hochfrequenz/migmose).\n\n## Rationale\nGerman utilities exchange data using [EDIFACT](https://en.wikipedia.org/wiki/EDIFACT); This is called market communication (mako).\nThe _Forum Datenformate_ of the BDEW publishes the technical regulations of the EDIFACT based market communication on [`edi-energy.de`](https://www.edi-energy.de/).\nThese rules are not stable but change twice a year (in theory) or few times per year (in reality).\n\nSpecific rules, which are binding for every German utility are kind of formalised in so called \"**A**nwendungs**h**and**b**\u00fcchern\" (AHB).\nThose AHBs are basically long tables that describe:\n> As a utility, if I want to exchange data about business process XYZ with a market partner, then I have to provide the following information: [...]\n\nIn total the regulations from these Anwendungshandb\u00fccher span several thousand pages.\nAnd by pages, we really _mean_ pages.\nEDIFACT communication is basically the API between German utilities for most of their B2B processes.\nHowever, the technical specifications of this API are\n* prose\n* on DIN A4 pages.\n\nThe Anwendungshandb\u00fccher are the epitome of digitization with some good intentions.\n\nAlthough the AHBs are publicly available as PDF or Word files on `edi-energy.de`, they are hardly accessible in a technical sense:\n* You cannot automatically extract information from the AHBs.\n* You cannot run automatic comparisons between different versions.\n* You cannot automatically test your own API against the set of rules, described in the AHBs (as prose).\n* You cannot view or visualize the information from the AHBs in any more intuitive or practical way, than the raw tables from the AHB files.\n* ...any many more...\n\nThe root cause for all these inaccessibility is a technical one:\nInformation that are theoretically structured are published in an unstructured format (PDF or Word), which is not suited for technical specifications in IT.\n\nKohlrAHBi as a tool helps you to break those chains and access the AHBs as you'd expect it from technical specs: easy and automatically instead of with hours of mindless manual work.\n\n**KohlrAHBi takes the `.docx` files published by `edi-energy.de` as an input and returns truly machine-readable data in a variety of formats (JSON, CSV...) as a result.**\n\nHence, KohlrAHBi is the key for unlocking any automation potential that is reliant on information hidden in the Anwendungshandb\u00fccher.\n\nWe're all hoping for the day of true digitization on which this repository will become obsolete.\n\n## Installation\nKohlrahbi is a Python based tool.\nTherefor you have to make sure, that Python is running on your machine.\n\nWe recommend to use virtual environments to keep your system clean.\n\nCreate a new virtual environment with\n```bash\npython -m venv .venv\n```\n\nThe activation of the virtual environment depends on your used OS.\n\n**Windows**\n```\n.venv\\Scripts\\activate\n```\n**MacOS/Linux**\n```\nsource .venv/bin/activate\n```\nFinally, install the package with\n\n```bash\npip install kohlrahbi\n```\n\n## Usage\n\nThere are two ways to use kohlrahbi.\n1. You can extract all pr\u00fcfidentifikatoren listed in [all_known_pruefis.toml,](src/kohlrahbi/all_known_pruefis.toml)\n2. or you can extract a specific pr\u00fcfidentifikator.\n\n### Get all Pr\u00fcfidentifikatoren\nIf you want to extract all pr\u00fcfidentifikatoren, you can run the following command.\nFor the following steps we assume that you cloned our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror/) to a neighbouring directory.\n\n```bash\nkohlrahbi --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/ --file-type flatahb\n```\n\nThis will extract all pr\u00fcfidentifikatoren listed in [all_known_pruefis.toml](src/kohlrahbi/all_known_pruefis.toml) and save them in the provided output path.\n\n### `.docx` Data Sources\nkohlrahbi internally relies on a [specific naming schema](https://github.com/Hochfrequenz/kohlrahbi/blob/22a78dc076c7d5f9248cb9e8707b0cc14a2981d3/src/kohlrahbi/read_functions.py#L57) of the `.docx` files in which the file name holds information about the edifact format and validity period of the AHBs contained within the file.\nThe easiest way to be compliant with this naming schema is to clone our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror/) repository to your localhost.\n\n### Get a specific Pr\u00fcfidentifikator\n\nIf you want to extract a specific pr\u00fcfidentifikator, you can run the following command.\n\n```bash\nkohlrahbi --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/ --pruefis 13002 --file-type xlsx\n```\n\nYou can also provide multiple pr\u00fcfidentifikatoren.\n\n```bash\nkohlrahbi --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/ --pruefis 13002 --pruefis 13003 --pruefis 13005 --file-type csv\n```\n### Results\nThere is a kohlrahbi based CI pipeline from the edi_energy_mirror mentioned above to the repository [machine-readable_anwendungshandbuecher](https://github.com/Hochfrequenz/machine-readable_anwendungshandbuecher) where you can find scraped AHBs as JSON, CSV or Excel files.\n\n### Export ConditionKeys and ConditionTexts\nFor example to export condition.json files to [edi_energy_ahb_conditions_and_packages](https://github.com/Hochfrequenz/edi_energy_ahb_conditions_and_packages). Works best if no flags for \"Pr\u00fcfindentifikatoren\" (--pruefis). In this case all known \"Pr\u00fcfidentifikatoren\" are scanned. Thus all related conditions are gathered.\n```bash\nkohlrahbi --file-type conditions --input-path ../edi_energy_mirror/edi_energy_de/current --output-path ./output/edi_energy_ahb_conditions_and_packages/aktuelleFV\n```\n\n## Workflow\n\n```mermaid\nflowchart TB\n    S[Start] --> RD[Read docx]\n    RD --> RPT[Read all paragraphs <br> and tables]\n    RPT --> I[Start iterating]\n    I --> NI[Read next item]\n    %% check for text paragraph %%\n    NI --> CTP{Text Paragraph?}\n    CTP -- Yes --> NI\n    CTP -- No --> CCST{Is item just<br>Chapter or Section Title?}\n    CCST -- Yes --> CTAenderunghistorie{Is Chapter Title<br>'\u00c4nderungshistorie'?}\n    CTAenderunghistorie -- Yes --> EXPORT[Export Extract]\n    CCST -- No --> CT{Is item a table<br>with pr\u00fcfis?}\n    CT -- Yes --> Extract[Create Extract]\n```\n\n\n\n## Development\n\n### Setup\n\nTo set up the development environment, you have to install the dev dependencies.\n\n```bash\ntox -e dev\n```\n\n### Run all tests and linters\n\nTo run the tests, you can use tox.\n\n```bash\ntox\n```\nSee our [Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine) for detailed explanations.\n\n## Contribute\n\nYou are very welcome to contribute to this template repository by opening a pull request against the main branch.\n\n## Related Tools and Context\n\nThis repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "Tool to generate machine readable files from AHB documents",
    "version": "0.4.1",
    "project_urls": {
        "Changelog": "https://github.com/Hochfrequenz/kohlrahbi/releases",
        "Homepage": "https://github.com/Hochfrequenz/kohlrahbi"
    },
    "split_keywords": [
        "ahb",
        " automation",
        " bdew",
        " edi@energy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "553df500e58bcf18edc0c7dd519673da4a5543436e4640329c46f51114154985",
                "md5": "290778fe6047a0431cb4c5b0fa1984b4",
                "sha256": "68bc39cfcd6d19c11b86e125a3f37c61fb7735f9b4b4c2e0dba2db75b00509d2"
            },
            "downloads": -1,
            "filename": "kohlrahbi-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "290778fe6047a0431cb4c5b0fa1984b4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 62338,
            "upload_time": "2024-03-28T14:35:42",
            "upload_time_iso_8601": "2024-03-28T14:35:42.719814Z",
            "url": "https://files.pythonhosted.org/packages/55/3d/f500e58bcf18edc0c7dd519673da4a5543436e4640329c46f51114154985/kohlrahbi-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d72cfb696e2dc5ee8b3c2b789e0d10fbebf618c35bee63372f760104a3bdb9d0",
                "md5": "00b827e336967fe8e6e444745223cd6d",
                "sha256": "5f03f11853e1b1ebfa3840bdf177102cb1fbfca952ecae0dbe8c91279cba005a"
            },
            "downloads": -1,
            "filename": "kohlrahbi-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "00b827e336967fe8e6e444745223cd6d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 1106072,
            "upload_time": "2024-03-28T14:35:44",
            "upload_time_iso_8601": "2024-03-28T14:35:44.164674Z",
            "url": "https://files.pythonhosted.org/packages/d7/2c/fb696e2dc5ee8b3c2b789e0d10fbebf618c35bee63372f760104a3bdb9d0/kohlrahbi-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-28 14:35:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Hochfrequenz",
    "github_project": "kohlrahbi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "23.2.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "colorlog",
            "specs": [
                [
                    "==",
                    "6.8.2"
                ]
            ]
        },
        {
            "name": "et-xmlfile",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "5.1.0"
                ]
            ]
        },
        {
            "name": "marshmallow",
            "specs": [
                [
                    "==",
                    "3.21.1"
                ]
            ]
        },
        {
            "name": "maus",
            "specs": [
                [
                    "==",
                    "0.4.2"
                ]
            ]
        },
        {
            "name": "more-itertools",
            "specs": [
                [
                    "==",
                    "10.2.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "==",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "python-docx",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "tomlkit",
            "specs": [
                [
                    "==",
                    "0.12.4"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    "==",
                    "4.10.0"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "xlsxwriter",
            "specs": [
                [
                    "==",
                    "3.2.0"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "kohlrahbi"
}
        
Elapsed time: 0.50013s