ebdamame


Nameebdamame JSON
Version 0.1.7 PyPI version JSON
download
home_pageNone
SummaryA scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure
upload_time2024-09-27 10:19:07
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseGPL
keywords ebd energiewirtschaft marktkommunikation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ebdamame

![Unittests status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Unittests/badge.svg)
![Coverage status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Coverage/badge.svg)
![Linting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Linting/badge.svg)
![Formatting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Formatting/badge.svg)
![PyPi Status Badge](https://img.shields.io/pypi/v/ebdamame)

🇩🇪 Dieses Repository enthält ein Python-Paket namens [`ebdamame`](https://pypi.org/project/ebdamame) (früher: `ebddocx2table`), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen).
Diese Entscheidungsbäume sind Teil eines regulatorischen Regelwerks für die deutsche Energiewirtschaft und kommen in der Eingangsprüfung der Marktkommunikation zum Einsatz.
Die mit diesem Paket erstellten maschinenlesbaren Tabellen können mit [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (früher: `ebdtable2graph`) in echte Graphen und Diagramme umgewandelt werden.
Exemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository [`machine-readable_entscheidungsbaumdiagramme`](https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/).

🇬🇧 This repository contains the source code of the Python package [`ebdamame`](https://pypi.org/project/ebdamame) (formerly published as `ebddocx2table`).

## Rationale

Assume that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy.
The website edi-energy.de, as always, only provides you with PDF or Word files instead of _really_ digitized data.

The package `ebdamame` scrapes the `.docx` files and returns data in a model defined in the "sister" package [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (formerly known as `ebdtable2graph`).

Once you scraped the data (using this package) you can plot it with [`rebdhuhn`](https://pypi.org/project/rebdhuhn).

## How to use the package

In any case, install the repo from PyPI:

```bash
pip install ebdamame
```

### Use as a library

```python
import json
from pathlib import Path

import cattrs

from ebdamame import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables  # type:ignore[import]
from ebdamame.docxtableconverter import DocxTableConverter  # type:ignore[import]

docx_file_path = Path("unittests/test_data/ebd20230629_v34.docx")
# download this .docx File from edi-energy.de or find it in the unittests of this repository.
# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx
docx_tables = get_ebd_docx_tables(docx_file_path, ebd_key="E_0003")
converter = DocxTableConverter(
    docx_tables,
    ebd_key="E_0003",
    chapter="MaBiS",
    sub_chapter="7.42.1: AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone",
)
result = converter.convert_docx_tables_to_ebd_table()
with open(Path("E_0003.json"), "w+", encoding="utf-8") as result_file:
    # the result file can be found here:
    # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310
    json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)
```

### Use as a CLI tool

_to be written_

## How to use this Repository on Your Machine (for development)

Please follow the instructions in our
[Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine).
And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).

## Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

## Related Tools and Context

This repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ebdamame",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "EBD, Energiewirtschaft, Marktkommunikation",
    "author": null,
    "author_email": "Hochfrequenz Unternehmensberatung GmbH <info@hochfrequenz.de>",
    "download_url": "https://files.pythonhosted.org/packages/b1/92/628e58cc2b6feab23609d915505347f9991c4c6460491f895c4329644e5e/ebdamame-0.1.7.tar.gz",
    "platform": null,
    "description": "# ebdamame\n\n![Unittests status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Unittests/badge.svg)\n![Coverage status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Coverage/badge.svg)\n![Linting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Linting/badge.svg)\n![Formatting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Formatting/badge.svg)\n![PyPi Status Badge](https://img.shields.io/pypi/v/ebdamame)\n\n\ud83c\udde9\ud83c\uddea Dieses Repository enth\u00e4lt ein Python-Paket namens [`ebdamame`](https://pypi.org/project/ebdamame) (fr\u00fcher: `ebddocx2table`), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen).\nDiese Entscheidungsb\u00e4ume sind Teil eines regulatorischen Regelwerks f\u00fcr die deutsche Energiewirtschaft und kommen in der Eingangspr\u00fcfung der Marktkommunikation zum Einsatz.\nDie mit diesem Paket erstellten maschinenlesbaren Tabellen k\u00f6nnen mit [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (fr\u00fcher: `ebdtable2graph`) in echte Graphen und Diagramme umgewandelt werden.\nExemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository [`machine-readable_entscheidungsbaumdiagramme`](https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/).\n\n\ud83c\uddec\ud83c\udde7 This repository contains the source code of the Python package [`ebdamame`](https://pypi.org/project/ebdamame) (formerly published as `ebddocx2table`).\n\n## Rationale\n\nAssume that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy.\nThe website edi-energy.de, as always, only provides you with PDF or Word files instead of _really_ digitized data.\n\nThe package `ebdamame` scrapes the `.docx` files and returns data in a model defined in the \"sister\" package [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (formerly known as `ebdtable2graph`).\n\nOnce you scraped the data (using this package) you can plot it with [`rebdhuhn`](https://pypi.org/project/rebdhuhn).\n\n## How to use the package\n\nIn any case, install the repo from PyPI:\n\n```bash\npip install ebdamame\n```\n\n### Use as a library\n\n```python\nimport json\nfrom pathlib import Path\n\nimport cattrs\n\nfrom ebdamame import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables  # type:ignore[import]\nfrom ebdamame.docxtableconverter import DocxTableConverter  # type:ignore[import]\n\ndocx_file_path = Path(\"unittests/test_data/ebd20230629_v34.docx\")\n# download this .docx File from edi-energy.de or find it in the unittests of this repository.\n# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx\ndocx_tables = get_ebd_docx_tables(docx_file_path, ebd_key=\"E_0003\")\nconverter = DocxTableConverter(\n    docx_tables,\n    ebd_key=\"E_0003\",\n    chapter=\"MaBiS\",\n    sub_chapter=\"7.42.1: AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone\",\n)\nresult = converter.convert_docx_tables_to_ebd_table()\nwith open(Path(\"E_0003.json\"), \"w+\", encoding=\"utf-8\") as result_file:\n    # the result file can be found here:\n    # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310\n    json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)\n```\n\n### Use as a CLI tool\n\n_to be written_\n\n## How to use this Repository on Your Machine (for development)\n\nPlease follow the instructions in our\n[Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine).\nAnd for further information, see the [Tox Repository](https://github.com/tox-dev/tox).\n\n## Contribute\n\nYou are very welcome to contribute to this template repository by opening a pull request against the main branch.\n\n## Related Tools and Context\n\nThis repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "A scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure",
    "version": "0.1.7",
    "project_urls": {
        "Changelog": "https://github.com/Hochfrequenz/ebdamame/releases",
        "Homepage": "https://github.com/Hochfrequenz/ebdamame"
    },
    "split_keywords": [
        "ebd",
        " energiewirtschaft",
        " marktkommunikation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a57297d1528f3c8291b9320e7d88f7f7d43c0fe1ab8815a8c119dd31a8a25426",
                "md5": "eba3cb48f26514d02181a327f2e3d199",
                "sha256": "1f8c952edd4b3bf33ab8f14d6519b5df343ea508099395758676e717253359c7"
            },
            "downloads": -1,
            "filename": "ebdamame-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eba3cb48f26514d02181a327f2e3d199",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 24849,
            "upload_time": "2024-09-27T10:19:05",
            "upload_time_iso_8601": "2024-09-27T10:19:05.830179Z",
            "url": "https://files.pythonhosted.org/packages/a5/72/97d1528f3c8291b9320e7d88f7f7d43c0fe1ab8815a8c119dd31a8a25426/ebdamame-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b192628e58cc2b6feab23609d915505347f9991c4c6460491f895c4329644e5e",
                "md5": "5dba129417f6bc5ec1139d9d353260a7",
                "sha256": "e088a7100c9becac4a5beb1c7ecad12289591127673d006853d5a0c1d04f4d70"
            },
            "downloads": -1,
            "filename": "ebdamame-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "5dba129417f6bc5ec1139d9d353260a7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 29402,
            "upload_time": "2024-09-27T10:19:07",
            "upload_time_iso_8601": "2024-09-27T10:19:07.165977Z",
            "url": "https://files.pythonhosted.org/packages/b1/92/628e58cc2b6feab23609d915505347f9991c4c6460491f895c4329644e5e/ebdamame-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-27 10:19:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Hochfrequenz",
    "github_project": "ebdamame",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "ebdamame"
}
        
Elapsed time: 0.70069s