Name | ebdamame JSON |
Version |
0.4.0
JSON |
| download |
home_page | None |
Summary | A scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure |
upload_time | 2024-11-07 12:14:06 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.11 |
license | GPL |
keywords |
ebd
energiewirtschaft
marktkommunikation
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# ebdamame
[![License: GPL](https://img.shields.io/badge/License-GPL-yellow.svg)](LICENSE)
![Python Versions (officially) supported](https://img.shields.io/pypi/pyversions/ebdamame.svg)
![Unittests status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Unittests/badge.svg)
![Coverage status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Coverage/badge.svg)
![Linting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Linting/badge.svg)
![Formatting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Formatting/badge.svg)
![PyPi Status Badge](https://img.shields.io/pypi/v/ebdamame)
🇩🇪 Dieses Repository enthält ein Python-Paket namens [`ebdamame`](https://pypi.org/project/ebdamame) (früher: `ebddocx2table`), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen).
Diese Entscheidungsbäume sind Teil eines regulatorischen Regelwerks für die deutsche Energiewirtschaft und kommen in der Eingangsprüfung der Marktkommunikation zum Einsatz.
Die mit diesem Paket erstellten maschinenlesbaren Tabellen können mit [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (früher: `ebdtable2graph`) in echte Graphen und Diagramme umgewandelt werden.
Exemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository [`machine-readable_entscheidungsbaumdiagramme`](https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/).
🇬🇧 This repository contains the source code of the Python package [`ebdamame`](https://pypi.org/project/ebdamame) (formerly published as `ebddocx2table`).
## Rationale
Assume that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy.
The website edi-energy.de, as always, only provides you with PDF or Word files instead of _really_ digitized data.
The package `ebdamame` scrapes the `.docx` files and returns data in a model defined in the "sister" package [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (formerly known as `ebdtable2graph`).
Once you scraped the data (using this package) you can plot it with [`rebdhuhn`](https://pypi.org/project/rebdhuhn).
## How to use the package
In any case, install the repo from PyPI:
```bash
pip install ebdamame
```
### Use as a library
```python
import json
from pathlib import Path
import cattrs
from ebdamame import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables # type:ignore[import]
from ebdamame.docxtableconverter import DocxTableConverter # type:ignore[import]
docx_file_path = Path("unittests/test_data/ebd20230629_v34.docx")
# download this .docx File from edi-energy.de or find it in the unittests of this repository.
# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx
docx_tables = get_ebd_docx_tables(docx_file_path, ebd_key="E_0003")
converter = DocxTableConverter(
docx_tables,
ebd_key="E_0003",
ebd_name="E_0003_AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone",
chapter="MaBiS",
section="7.42.1"
)
result = converter.convert_docx_tables_to_ebd_table()
with open(Path("E_0003.json"), "w+", encoding="utf-8") as result_file:
# the result file can be found here:
# https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310
json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)
```
### Use as a CLI tool
_to be written_
## How to use this Repository on Your Machine (for development)
Please follow the instructions in our
[Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine).
And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).
## Contribute
You are very welcome to contribute to this template repository by opening a pull request against the main branch.
## Related Tools and Context
This repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).
Raw data
{
"_id": null,
"home_page": null,
"name": "ebdamame",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "EBD, Energiewirtschaft, Marktkommunikation",
"author": null,
"author_email": "Hochfrequenz Unternehmensberatung GmbH <info@hochfrequenz.de>",
"download_url": "https://files.pythonhosted.org/packages/dc/ef/159c80dc39751b61725d3bb1b97f290677a6f00bc6c5230c2bb1481ce021/ebdamame-0.4.0.tar.gz",
"platform": null,
"description": "# ebdamame\n\n[![License: GPL](https://img.shields.io/badge/License-GPL-yellow.svg)](LICENSE)\n![Python Versions (officially) supported](https://img.shields.io/pypi/pyversions/ebdamame.svg)\n![Unittests status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Unittests/badge.svg)\n![Coverage status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Coverage/badge.svg)\n![Linting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Linting/badge.svg)\n![Formatting status badge](https://github.com/Hochfrequenz/ebdamame/workflows/Formatting/badge.svg)\n![PyPi Status Badge](https://img.shields.io/pypi/v/ebdamame)\n\n\ud83c\udde9\ud83c\uddea Dieses Repository enth\u00e4lt ein Python-Paket namens [`ebdamame`](https://pypi.org/project/ebdamame) (fr\u00fcher: `ebddocx2table`), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen).\nDiese Entscheidungsb\u00e4ume sind Teil eines regulatorischen Regelwerks f\u00fcr die deutsche Energiewirtschaft und kommen in der Eingangspr\u00fcfung der Marktkommunikation zum Einsatz.\nDie mit diesem Paket erstellten maschinenlesbaren Tabellen k\u00f6nnen mit [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (fr\u00fcher: `ebdtable2graph`) in echte Graphen und Diagramme umgewandelt werden.\nExemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository [`machine-readable_entscheidungsbaumdiagramme`](https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/).\n\n\ud83c\uddec\ud83c\udde7 This repository contains the source code of the Python package [`ebdamame`](https://pypi.org/project/ebdamame) (formerly published as `ebddocx2table`).\n\n## Rationale\n\nAssume that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy.\nThe website edi-energy.de, as always, only provides you with PDF or Word files instead of _really_ digitized data.\n\nThe package `ebdamame` scrapes the `.docx` files and returns data in a model defined in the \"sister\" package [`rebdhuhn`](https://pypi.org/project/rebdhuhn) (formerly known as `ebdtable2graph`).\n\nOnce you scraped the data (using this package) you can plot it with [`rebdhuhn`](https://pypi.org/project/rebdhuhn).\n\n## How to use the package\n\nIn any case, install the repo from PyPI:\n\n```bash\npip install ebdamame\n```\n\n### Use as a library\n\n```python\nimport json\nfrom pathlib import Path\n\nimport cattrs\n\nfrom ebdamame import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables # type:ignore[import]\nfrom ebdamame.docxtableconverter import DocxTableConverter # type:ignore[import]\n\ndocx_file_path = Path(\"unittests/test_data/ebd20230629_v34.docx\")\n# download this .docx File from edi-energy.de or find it in the unittests of this repository.\n# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx\ndocx_tables = get_ebd_docx_tables(docx_file_path, ebd_key=\"E_0003\")\nconverter = DocxTableConverter(\n docx_tables,\n ebd_key=\"E_0003\",\n ebd_name=\"E_0003_AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone\",\n chapter=\"MaBiS\",\n section=\"7.42.1\"\n)\nresult = converter.convert_docx_tables_to_ebd_table()\nwith open(Path(\"E_0003.json\"), \"w+\", encoding=\"utf-8\") as result_file:\n # the result file can be found here:\n # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310\n json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)\n```\n\n### Use as a CLI tool\n\n_to be written_\n\n## How to use this Repository on Your Machine (for development)\n\nPlease follow the instructions in our\n[Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine).\nAnd for further information, see the [Tox Repository](https://github.com/tox-dev/tox).\n\n## Contribute\n\nYou are very welcome to contribute to this template repository by opening a pull request against the main branch.\n\n## Related Tools and Context\n\nThis repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).\n",
"bugtrack_url": null,
"license": "GPL",
"summary": "A scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure",
"version": "0.4.0",
"project_urls": {
"Changelog": "https://github.com/Hochfrequenz/ebdamame/releases",
"Homepage": "https://github.com/Hochfrequenz/ebdamame"
},
"split_keywords": [
"ebd",
" energiewirtschaft",
" marktkommunikation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8aed16b1aa302bcd61af084607bf41df71b5e9824ce387a3ee9d365c5ad38f2c",
"md5": "825ddf777b5f745f61d5a314c582a11f",
"sha256": "e83ceb6c7ad4b483b2a0d639ba251b6a73429a8a256b41576e1813b907341cbb"
},
"downloads": -1,
"filename": "ebdamame-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "825ddf777b5f745f61d5a314c582a11f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 25579,
"upload_time": "2024-11-07T12:14:04",
"upload_time_iso_8601": "2024-11-07T12:14:04.950237Z",
"url": "https://files.pythonhosted.org/packages/8a/ed/16b1aa302bcd61af084607bf41df71b5e9824ce387a3ee9d365c5ad38f2c/ebdamame-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dcef159c80dc39751b61725d3bb1b97f290677a6f00bc6c5230c2bb1481ce021",
"md5": "fbddc9e4e562ffd35710442879ce05d9",
"sha256": "3081a46753925816b40cd68a7d7a3a3cb5770fa457d54564b30e800806ff6e96"
},
"downloads": -1,
"filename": "ebdamame-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "fbddc9e4e562ffd35710442879ce05d9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 30356,
"upload_time": "2024-11-07T12:14:06",
"upload_time_iso_8601": "2024-11-07T12:14:06.718698Z",
"url": "https://files.pythonhosted.org/packages/dc/ef/159c80dc39751b61725d3bb1b97f290677a6f00bc6c5230c2bb1481ce021/ebdamame-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-07 12:14:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Hochfrequenz",
"github_project": "ebdamame",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "ebdamame"
}