matricula-online-scraper


Namematricula-online-scraper JSON
Version 0.5.0 PyPI version JSON
download
home_pagehttps://github.com/lsg551/matricula-online-scraper
SummaryCommand Line Interface tool for scraping Matricula Online https://data.matricula-online.eu.
upload_time2024-06-07 16:26:03
maintainerNone
docs_urlNone
authorLuis Schulte
requires_python<4.0,>=3.12
licenseMIT
keywords matricula-online matricula scraper parish-registers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Matricula Online Scraper

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/matricula-online-scraper?logo=python)
![GitHub License](https://img.shields.io/github/license/lsg551/matricula-online-scraper?logo=pypi)
![PyPI - Version](https://img.shields.io/pypi/v/matricula-online-scraper?logo=pypi)

> :warning: This tool is still under development and is NOT yet feature-complete. Expect breaking changes and bugs. Please report any issues.

[Matricula Online](https://data.matricula-online.eu/) is a website that hosts parish registers from various regions across Europe. This CLI tool allows you to fetch data from it and save the data to a file.

---

Our GitHub Workflow automatically scrapes a list with all parishes once a week and pushes to [`cache/parishes`](https://github.com/lsg551/matricula-online-scraper/tree/cache/parishes). Download [`parishes.csv`](https://github.com/lsg551/matricula-online-scraper/raw/cache/parishes/parishes.csv.gz) ⚡️

[![Cache Parishes](https://github.com/lsg551/matricula-online-scraper/actions/workflows/cache-parishes.yml/badge.svg)](https://github.com/lsg551/matricula-online-scraper/actions/workflows/cache-parishes.yml)
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/lsg551/matricula-online-scraper/cache%2Fparishes?path=parishes.csv.gz&label=last%20caching&cacheSeconds=43200)

---

Note that this tool will not format or clean the data in any way. Instead, the data is saved as-is to a file. I mention this because the original data is especially poorly formatted and contains a lot of inconsistencies. It is up to the user to process the data further.

## 🔧 Installation

Make sure to have a recent version of Python installed. You can then install this script via `pip`:

```console
$ pip install --user matricula-online-scraper
```

Nevertheless, you can clone this repository and run the script with [Poetry](https://python-poetry.org).

## 💡 How To Use

```console
$ matricula-online-scraper --help
```

prints available commands and options, including documentation. Same goes for each subcommand, e.g. `matricula-online-scraper fetch --help`.

The `fetch` command is the primary command to fetch any resources from Matricula Online. Its subcommands allow you to scrape different resources, run `matricula-online-scraper fetch --help` to see available subcommands.

### Example 1:

Fetch all available locations and save them to a `.jsonl` file:

```console
$ matricula-online-scraper fetch locations ./output.jsonl
```

> :warning: This will fetch all parishes from Matricula Online, which may take a few minutes. Despite that, this data only changes rarely, but frequent scraping will put unnecessary load on the server. Therefore our GitHub Workflow caches this data once a week and pushes to [`cache/parishes`](https://github.com/lsg551/matricula-online-scraper/tree/cache/parishes). ⚡️ [Download CSV](https://github.com/lsg551/matricula-online-scraper/raw/cache/parishes/parishes.csv.gz) ⚡️

### Example 2:

Fetch all available register from one parish in Münster, Germany and save them to a `.jsonl` file:

```console
$ matricula-online-scraper fetch parish ./output.jsonl --urls https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/
```

## License & Contributing

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions, especially bug fixes. Please make sure to follow the [Contributing Guidelines](CONTRIBUTING.md).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lsg551/matricula-online-scraper",
    "name": "matricula-online-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": "matricula-online, matricula, scraper, parish-registers",
    "author": "Luis Schulte",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/1c/8b/cb2658fbf14950935935709f6065afb4f395960d786edfd69ac81d27eec4/matricula_online_scraper-0.5.0.tar.gz",
    "platform": null,
    "description": "# Matricula Online Scraper\n\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/matricula-online-scraper?logo=python)\n![GitHub License](https://img.shields.io/github/license/lsg551/matricula-online-scraper?logo=pypi)\n![PyPI - Version](https://img.shields.io/pypi/v/matricula-online-scraper?logo=pypi)\n\n> :warning: This tool is still under development and is NOT yet feature-complete. Expect breaking changes and bugs. Please report any issues.\n\n[Matricula Online](https://data.matricula-online.eu/) is a website that hosts parish registers from various regions across Europe. This CLI tool allows you to fetch data from it and save the data to a file.\n\n---\n\nOur GitHub Workflow automatically scrapes a list with all parishes once a week and pushes to [`cache/parishes`](https://github.com/lsg551/matricula-online-scraper/tree/cache/parishes). Download [`parishes.csv`](https://github.com/lsg551/matricula-online-scraper/raw/cache/parishes/parishes.csv.gz) \u26a1\ufe0f\n\n[![Cache Parishes](https://github.com/lsg551/matricula-online-scraper/actions/workflows/cache-parishes.yml/badge.svg)](https://github.com/lsg551/matricula-online-scraper/actions/workflows/cache-parishes.yml)\n![GitHub last commit (branch)](https://img.shields.io/github/last-commit/lsg551/matricula-online-scraper/cache%2Fparishes?path=parishes.csv.gz&label=last%20caching&cacheSeconds=43200)\n\n---\n\nNote that this tool will not format or clean the data in any way. Instead, the data is saved as-is to a file. I mention this because the original data is especially poorly formatted and contains a lot of inconsistencies. It is up to the user to process the data further.\n\n## \ud83d\udd27 Installation\n\nMake sure to have a recent version of Python installed. You can then install this script via `pip`:\n\n```console\n$ pip install --user matricula-online-scraper\n```\n\nNevertheless, you can clone this repository and run the script with [Poetry](https://python-poetry.org).\n\n## \ud83d\udca1 How To Use\n\n```console\n$ matricula-online-scraper --help\n```\n\nprints available commands and options, including documentation. Same goes for each subcommand, e.g. `matricula-online-scraper fetch --help`.\n\nThe `fetch` command is the primary command to fetch any resources from Matricula Online. Its subcommands allow you to scrape different resources, run `matricula-online-scraper fetch --help` to see available subcommands.\n\n### Example 1:\n\nFetch all available locations and save them to a `.jsonl` file:\n\n```console\n$ matricula-online-scraper fetch locations ./output.jsonl\n```\n\n> :warning: This will fetch all parishes from Matricula Online, which may take a few minutes. Despite that, this data only changes rarely, but frequent scraping will put unnecessary load on the server. Therefore our GitHub Workflow caches this data once a week and pushes to [`cache/parishes`](https://github.com/lsg551/matricula-online-scraper/tree/cache/parishes). \u26a1\ufe0f [Download CSV](https://github.com/lsg551/matricula-online-scraper/raw/cache/parishes/parishes.csv.gz) \u26a1\ufe0f\n\n### Example 2:\n\nFetch all available register from one parish in M\u00fcnster, Germany and save them to a `.jsonl` file:\n\n```console\n$ matricula-online-scraper fetch parish ./output.jsonl --urls https://data.matricula-online.eu/en/deutschland/muenster/muenster-st-martini/\n```\n\n## License & Contributing\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\nContributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions, especially bug fixes. Please make sure to follow the [Contributing Guidelines](CONTRIBUTING.md).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Command Line Interface tool for scraping Matricula Online https://data.matricula-online.eu.",
    "version": "0.5.0",
    "project_urls": {
        "Homepage": "https://github.com/lsg551/matricula-online-scraper",
        "Repository": "https://github.com/lsg551/matricula-online-scraper"
    },
    "split_keywords": [
        "matricula-online",
        " matricula",
        " scraper",
        " parish-registers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "14159dc86b2c0530e67aed091e7cf9253b41f6dbada70c42b2554cb37a2e6b0d",
                "md5": "bc9887866182b14c781df3cd03067fa2",
                "sha256": "2d376628569ad5a07201d4824b02fa66e3685199f26718009c90536c7feff7b5"
            },
            "downloads": -1,
            "filename": "matricula_online_scraper-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bc9887866182b14c781df3cd03067fa2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 14665,
            "upload_time": "2024-06-07T16:26:00",
            "upload_time_iso_8601": "2024-06-07T16:26:00.807667Z",
            "url": "https://files.pythonhosted.org/packages/14/15/9dc86b2c0530e67aed091e7cf9253b41f6dbada70c42b2554cb37a2e6b0d/matricula_online_scraper-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1c8bcb2658fbf14950935935709f6065afb4f395960d786edfd69ac81d27eec4",
                "md5": "4d665c76c1f1bc786a49314841659bb8",
                "sha256": "e31ab30e158c10954a3a98925a1df743ceb6126f66ae01d4536367c5a7a7a184"
            },
            "downloads": -1,
            "filename": "matricula_online_scraper-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4d665c76c1f1bc786a49314841659bb8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 11248,
            "upload_time": "2024-06-07T16:26:03",
            "upload_time_iso_8601": "2024-06-07T16:26:03.407856Z",
            "url": "https://files.pythonhosted.org/packages/1c/8b/cb2658fbf14950935935709f6065afb4f395960d786edfd69ac81d27eec4/matricula_online_scraper-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-07 16:26:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lsg551",
    "github_project": "matricula-online-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "matricula-online-scraper"
}
        
Elapsed time: 0.89106s