edi-energy-scraper


Nameedi-energy-scraper JSON
Version 0.6.0 PyPI version JSON
download
home_pageNone
Summarya scraper to mirror edi-energy.de
upload_time2024-04-19 14:31:28
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT
keywords ahb automation bdew edi@energy
VCS
bugtrack_url
requirements aiodns aiohttp aiohttp-requests aiosignal attrs beautifulsoup4 brotli cffi coworker frozenlist idna marshmallow maus more-itertools multidict packaging pycares pycparser pypdf pytz soupsieve yarl
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # edi-energy.de scraper

<!--- you need to replace the `organization/repo_name` in the status badge URLs --->

![Unittests status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Unittests/badge.svg)
![Coverage status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Coverage/badge.svg)
![Linting status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Linting/badge.svg)
![Black status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Black/badge.svg)
![PyPi Status Badge](https://img.shields.io/pypi/v/edi_energy_scraper)

The Python package `edi_energy_scraper` provides easy to use methods to mirror the website edi-energy.de.

### Rationale / Why?

If you'd like to be informed about new regulations or data formats being published on edi-energy.de you can either

- visit the site every day and hope that you see the changes if this is your favourite hobby,
- or automate the task.

This repository helps you with the latter. It allows you to create an up-to-date copy of edi-energy.de on your local
computer. Other than if you mirrored the files using `wget` or `curl`, you'll get a clean and intuitive directory
structure.


From there you can e.g. commit the files into a VCS (like e.g. our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror)), scrape the PDF/Word files for later use...

We're all hoping for the day of true digitization on which this repository will become obsolete.

## How to use the Package (as a user)

Install via pip:

```bash
pip install edi_energy_scraper
```

Create a directory in which you'd like to save the mirrored data:

```bash
mkdir edi_energy_de
```

Then import it and start the download:

```python
import asyncio
from edi_energy_scraper import EdiEnergyScraper

# add the following lines to enable debug logging to stdout (CLI)
# import logging
# import sys
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)

async def mirror():
    scraper = EdiEnergyScraper(path_to_mirror_directory="edi_energy_de")
    await scraper.mirror()


if __name__ == "__main__":
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    asyncio.run(mirror())

```

This creates a directory structure:

```
-|-your_script_cwd.py
 |-edi_energy_de
    |- past (contains archived files)
        |- ahb.pdf
        |- ahb.docx
        |- ...
    |- current (contains files valid as of today)
        |- mig.pdf
        |- mig.docx
        |- ...
    |- future (contains files valid in the future)
        |- allgemeine_festlegungen.pdf
        |- schema.xsd
        |- ...
```

## How to use this Repository on Your Machine (for development)

Please follow the instructions in
our [Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine)
. And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).

## Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "edi-energy-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "ahb, automation, bdew, edi@energy",
    "author": null,
    "author_email": "Hochfrequenz Unternehmensberatung GmbH <info+github@hochfrequenz.de>",
    "download_url": "https://files.pythonhosted.org/packages/fe/ab/44b4e1cb5c65225083f412e062dbb166b55a0933aebfeaeb7dc43ed815a5/edi_energy_scraper-0.6.0.tar.gz",
    "platform": null,
    "description": "# edi-energy.de scraper\n\n<!--- you need to replace the `organization/repo_name` in the status badge URLs --->\n\n![Unittests status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Unittests/badge.svg)\n![Coverage status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Coverage/badge.svg)\n![Linting status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Linting/badge.svg)\n![Black status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Black/badge.svg)\n![PyPi Status Badge](https://img.shields.io/pypi/v/edi_energy_scraper)\n\nThe Python package `edi_energy_scraper` provides easy to use methods to mirror the website edi-energy.de.\n\n### Rationale / Why?\n\nIf you'd like to be informed about new regulations or data formats being published on edi-energy.de you can either\n\n- visit the site every day and hope that you see the changes if this is your favourite hobby,\n- or automate the task.\n\nThis repository helps you with the latter. It allows you to create an up-to-date copy of edi-energy.de on your local\ncomputer. Other than if you mirrored the files using `wget` or `curl`, you'll get a clean and intuitive directory\nstructure.\n\n\nFrom there you can e.g. commit the files into a VCS (like e.g. our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror)), scrape the PDF/Word files for later use...\n\nWe're all hoping for the day of true digitization on which this repository will become obsolete.\n\n## How to use the Package (as a user)\n\nInstall via pip:\n\n```bash\npip install edi_energy_scraper\n```\n\nCreate a directory in which you'd like to save the mirrored data:\n\n```bash\nmkdir edi_energy_de\n```\n\nThen import it and start the download:\n\n```python\nimport asyncio\nfrom edi_energy_scraper import EdiEnergyScraper\n\n# add the following lines to enable debug logging to stdout (CLI)\n# import logging\n# import sys\n# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n\nasync def mirror():\n    scraper = EdiEnergyScraper(path_to_mirror_directory=\"edi_energy_de\")\n    await scraper.mirror()\n\n\nif __name__ == \"__main__\":\n    loop = asyncio.new_event_loop()\n    asyncio.set_event_loop(loop)\n    asyncio.run(mirror())\n\n```\n\nThis creates a directory structure:\n\n```\n-|-your_script_cwd.py\n |-edi_energy_de\n    |- past (contains archived files)\n        |- ahb.pdf\n        |- ahb.docx\n        |- ...\n    |- current (contains files valid as of today)\n        |- mig.pdf\n        |- mig.docx\n        |- ...\n    |- future (contains files valid in the future)\n        |- allgemeine_festlegungen.pdf\n        |- schema.xsd\n        |- ...\n```\n\n## How to use this Repository on Your Machine (for development)\n\nPlease follow the instructions in\nour [Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine)\n. And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).\n\n## Contribute\n\nYou are very welcome to contribute to this template repository by opening a pull request against the main branch.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "a scraper to mirror edi-energy.de",
    "version": "0.6.0",
    "project_urls": {
        "Changelog": "https://github.com/Hochfrequenz/edi_energy_scraper/releases",
        "Homepage": "https://github.com/Hochfrequenz/edi_energy_scraper"
    },
    "split_keywords": [
        "ahb",
        " automation",
        " bdew",
        " edi@energy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eec5839f2ddac09d98369808fba765bff1be29ab1138f94728eba0e6e64c5aaa",
                "md5": "489164a5cf676ca9296570069740b3c9",
                "sha256": "a957a3db94449e26c2a625f6b9c74e4d0e7db6a91aa5c25d6305337cda60909c"
            },
            "downloads": -1,
            "filename": "edi_energy_scraper-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "489164a5cf676ca9296570069740b3c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 10881,
            "upload_time": "2024-04-19T14:31:27",
            "upload_time_iso_8601": "2024-04-19T14:31:27.963517Z",
            "url": "https://files.pythonhosted.org/packages/ee/c5/839f2ddac09d98369808fba765bff1be29ab1138f94728eba0e6e64c5aaa/edi_energy_scraper-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "feab44b4e1cb5c65225083f412e062dbb166b55a0933aebfeaeb7dc43ed815a5",
                "md5": "ba8bea0438f2cdfb9487cbbe5b7fe714",
                "sha256": "e2c672f81eb94748ef08354fb348c0ffa316dd739e2303bda44129586faf76d1"
            },
            "downloads": -1,
            "filename": "edi_energy_scraper-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ba8bea0438f2cdfb9487cbbe5b7fe714",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 17907,
            "upload_time": "2024-04-19T14:31:28",
            "upload_time_iso_8601": "2024-04-19T14:31:28.976856Z",
            "url": "https://files.pythonhosted.org/packages/fe/ab/44b4e1cb5c65225083f412e062dbb166b55a0933aebfeaeb7dc43ed815a5/edi_energy_scraper-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-19 14:31:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Hochfrequenz",
    "github_project": "edi_energy_scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "aiodns",
            "specs": [
                [
                    "==",
                    "3.2.0"
                ]
            ]
        },
        {
            "name": "aiohttp",
            "specs": [
                [
                    "==",
                    "3.9.4"
                ]
            ]
        },
        {
            "name": "aiohttp-requests",
            "specs": [
                [
                    "==",
                    "0.2.3"
                ]
            ]
        },
        {
            "name": "aiosignal",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "23.2.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.12.3"
                ]
            ]
        },
        {
            "name": "brotli",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "cffi",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "coworker",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "frozenlist",
            "specs": [
                [
                    "==",
                    "1.4.1"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.7"
                ]
            ]
        },
        {
            "name": "marshmallow",
            "specs": [
                [
                    "==",
                    "3.21.1"
                ]
            ]
        },
        {
            "name": "maus",
            "specs": [
                [
                    "==",
                    "0.4.2"
                ]
            ]
        },
        {
            "name": "more-itertools",
            "specs": [
                [
                    "==",
                    "10.2.0"
                ]
            ]
        },
        {
            "name": "multidict",
            "specs": [
                [
                    "==",
                    "6.0.5"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.0"
                ]
            ]
        },
        {
            "name": "pycares",
            "specs": [
                [
                    "==",
                    "4.4.0"
                ]
            ]
        },
        {
            "name": "pycparser",
            "specs": [
                [
                    "==",
                    "2.22"
                ]
            ]
        },
        {
            "name": "pypdf",
            "specs": [
                [
                    "==",
                    "4.2.0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "soupsieve",
            "specs": [
                [
                    "==",
                    "2.5"
                ]
            ]
        },
        {
            "name": "yarl",
            "specs": [
                [
                    "==",
                    "1.9.4"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "edi-energy-scraper"
}
        
Elapsed time: 0.25983s