ebddocx2table


Nameebddocx2table JSON
Version 0.0.9 PyPI version JSON
download
home_page
SummaryA scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure
upload_time2024-03-13 07:51:01
maintainer
docs_urlNone
author
requires_python>=3.11
licenseGPL
keywords ebd energiewirtschaft marktkommunikation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ebddocx2table

> [!IMPORTANT]
⚠ This is the last version using the name `ebddocx2table`. Both the repository and the Python package will be renamed to `ebdamame`.

![Unittests status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Unittests/badge.svg)
![Coverage status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Coverage/badge.svg)
![Linting status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Linting/badge.svg)
![Black status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Black/badge.svg)
![PyPi Status Badge](https://img.shields.io/pypi/v/ebddocx2table)

🇩🇪 Dieses Repository enthält ein Python-Paket namens [`ebddocx2table`](https://pypi.org/project/ebddocx2table), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen).
Diese Entscheidungsbäume sind Teil eines regulatorischen Regelwerks für die deutsche Energiewirtschaft und kommen in der Eingangsprüfung der Marktkommunikation zum Einsatz.
Die mit diesem Paket erstellten maschinenlesbaren Tabellen können mit [`ebdtable2graph`](https://pypi.org/project/ebdtable2graph) in echte Graphen und Diagramme umgewandelt werden.
Exemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository [`machine-readable_entscheidungsbaumdiagramme`](https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/).

🇬🇧 This repository contains the source code of the Python package [`ebddocx2table`](https://pypi.org/project/ebddocx2table).

## Rationale

Assume, that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy.
The website edi-energy.de, as always, only provides you with PDF or Word files instead of _really_ digitized data.

The package `ebddocx2table` scrapes the `.docx` files and returns data in a model defined in the "sister" package [`ebdtable2graph`](https://pypi.org/project/ebdtable2graph).

Once you scraped the data (using this package) you can plot it with [`ebdtable2graph`](https://pypi.org/project/ebdtable2graph).

## How to use the package

In any case, install the repo from PyPI:

```bash
pip install ebddocx2table
```

### Use as a library

```python
import json
from pathlib import Path

import cattrs

from ebddocx2table import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables  # type:ignore[import]
from ebddocx2table.docxtableconverter import DocxTableConverter  # type:ignore[import]

docx_file_path = Path("unittests/test_data/ebd20230629_v34.docx")
# download this .docx File from edi-energy.de or find it in the unittests of this repository.
# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx
docx_tables = get_ebd_docx_tables(docx_file_path, ebd_key="E_0003")
converter = DocxTableConverter(
    docx_tables,
    ebd_key="E_0003",
    chapter="MaBiS",
    sub_chapter="7.42.1: AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone",
)
result = converter.convert_docx_tables_to_ebd_table()
with open(Path("E_0003.json"), "w+", encoding="utf-8") as result_file:
    # the result file can be found here:
    # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310
    json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)
```

### Use as a CLI tool

_to be written_

## How to use this Repository on Your Machine (for development)

Please follow the instructions in our
[Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine).
And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).

## Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

## Related Tools and Context

This repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "ebddocx2table",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "",
    "keywords": "EBD,Energiewirtschaft,Marktkommunikation",
    "author": "",
    "author_email": "Hochfrequenz Unternehmensberatung GmbH <info@hochfrequenz.de>",
    "download_url": "https://files.pythonhosted.org/packages/95/70/f889ee931274ca962a7765604f0fee72005584d2394d2ee23cff11ebb6ba/ebddocx2table-0.0.9.tar.gz",
    "platform": null,
    "description": "# ebddocx2table\n\n> [!IMPORTANT]\n\u26a0 This is the last version using the name `ebddocx2table`. Both the repository and the Python package will be renamed to `ebdamame`.\n\n![Unittests status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Unittests/badge.svg)\n![Coverage status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Coverage/badge.svg)\n![Linting status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Linting/badge.svg)\n![Black status badge](https://github.com/Hochfrequenz/ebd_docx_to_table/workflows/Black/badge.svg)\n![PyPi Status Badge](https://img.shields.io/pypi/v/ebddocx2table)\n\n\ud83c\udde9\ud83c\uddea Dieses Repository enth\u00e4lt ein Python-Paket namens [`ebddocx2table`](https://pypi.org/project/ebddocx2table), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen).\nDiese Entscheidungsb\u00e4ume sind Teil eines regulatorischen Regelwerks f\u00fcr die deutsche Energiewirtschaft und kommen in der Eingangspr\u00fcfung der Marktkommunikation zum Einsatz.\nDie mit diesem Paket erstellten maschinenlesbaren Tabellen k\u00f6nnen mit [`ebdtable2graph`](https://pypi.org/project/ebdtable2graph) in echte Graphen und Diagramme umgewandelt werden.\nExemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository [`machine-readable_entscheidungsbaumdiagramme`](https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/).\n\n\ud83c\uddec\ud83c\udde7 This repository contains the source code of the Python package [`ebddocx2table`](https://pypi.org/project/ebddocx2table).\n\n## Rationale\n\nAssume, that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy.\nThe website edi-energy.de, as always, only provides you with PDF or Word files instead of _really_ digitized data.\n\nThe package `ebddocx2table` scrapes the `.docx` files and returns data in a model defined in the \"sister\" package [`ebdtable2graph`](https://pypi.org/project/ebdtable2graph).\n\nOnce you scraped the data (using this package) you can plot it with [`ebdtable2graph`](https://pypi.org/project/ebdtable2graph).\n\n## How to use the package\n\nIn any case, install the repo from PyPI:\n\n```bash\npip install ebddocx2table\n```\n\n### Use as a library\n\n```python\nimport json\nfrom pathlib import Path\n\nimport cattrs\n\nfrom ebddocx2table import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables  # type:ignore[import]\nfrom ebddocx2table.docxtableconverter import DocxTableConverter  # type:ignore[import]\n\ndocx_file_path = Path(\"unittests/test_data/ebd20230629_v34.docx\")\n# download this .docx File from edi-energy.de or find it in the unittests of this repository.\n# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx\ndocx_tables = get_ebd_docx_tables(docx_file_path, ebd_key=\"E_0003\")\nconverter = DocxTableConverter(\n    docx_tables,\n    ebd_key=\"E_0003\",\n    chapter=\"MaBiS\",\n    sub_chapter=\"7.42.1: AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone\",\n)\nresult = converter.convert_docx_tables_to_ebd_table()\nwith open(Path(\"E_0003.json\"), \"w+\", encoding=\"utf-8\") as result_file:\n    # the result file can be found here:\n    # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310\n    json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)\n```\n\n### Use as a CLI tool\n\n_to be written_\n\n## How to use this Repository on Your Machine (for development)\n\nPlease follow the instructions in our\n[Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine).\nAnd for further information, see the [Tox Repository](https://github.com/tox-dev/tox).\n\n## Contribute\n\nYou are very welcome to contribute to this template repository by opening a pull request against the main branch.\n\n## Related Tools and Context\n\nThis repository is part of the [Hochfrequenz Libraries and Tools for a truly digitized market communication](https://github.com/Hochfrequenz/digital_market_communication/).\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "A scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure",
    "version": "0.0.9",
    "project_urls": {
        "Changelog": "https://github.com/Hochfrequenz/ebddocx2table/releases",
        "Homepage": "https://github.com/Hochfrequenz/ebddocx2table"
    },
    "split_keywords": [
        "ebd",
        "energiewirtschaft",
        "marktkommunikation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0389082aa1d11838224687df8112b773bda1753cd3a204eb126c855423b615b5",
                "md5": "3b3088fcfd172be7c57cb3a23a93f9f3",
                "sha256": "27e1d38fd4afc842e09455b728dcd87a5a512cc018eea22e53dfc177beabda77"
            },
            "downloads": -1,
            "filename": "ebddocx2table-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3b3088fcfd172be7c57cb3a23a93f9f3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 24508,
            "upload_time": "2024-03-13T07:50:59",
            "upload_time_iso_8601": "2024-03-13T07:50:59.209798Z",
            "url": "https://files.pythonhosted.org/packages/03/89/082aa1d11838224687df8112b773bda1753cd3a204eb126c855423b615b5/ebddocx2table-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9570f889ee931274ca962a7765604f0fee72005584d2394d2ee23cff11ebb6ba",
                "md5": "2171e39ded3ec04f810ed4bf1be5f59f",
                "sha256": "d7538c81093435ce04f8d26dacdb7f4a7b66f252d1d55cb6c5618fb0ba618088"
            },
            "downloads": -1,
            "filename": "ebddocx2table-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "2171e39ded3ec04f810ed4bf1be5f59f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 32272,
            "upload_time": "2024-03-13T07:51:01",
            "upload_time_iso_8601": "2024-03-13T07:51:01.948123Z",
            "url": "https://files.pythonhosted.org/packages/95/70/f889ee931274ca962a7765604f0fee72005584d2394d2ee23cff11ebb6ba/ebddocx2table-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-13 07:51:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Hochfrequenz",
    "github_project": "ebddocx2table",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "ebddocx2table"
}
        
Elapsed time: 4.88397s