hdx-python-scraper


Namehdx-python-scraper JSON
Version 2.6.3 PyPI version JSON
download
home_pageNone
SummaryHDX Python scraper utilities to assemble data from multiple sources
upload_time2025-02-18 01:32:12
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords hdx data assembly data transformation scrapers tabular data
VCS
bugtrack_url
requirements annotated-types attrs cachetools certifi cfgv chardet charset-normalizer ckanapi click coverage defopt distlib dnspython docopt docutils email-validator et-xmlfile filelock frictionless google-auth google-auth-oauthlib gspread hdx-python-api hdx-python-country hdx-python-utilities humanize identify idna ijson inflect iniconfig isodate jinja2 jsonlines jsonpath-ng jsonschema jsonschema-specifications libhxl loguru makefun markdown-it-py marko markupsafe mdurl more-itertools nodeenv num2words numpy oauthlib openpyxl packaging pandas petl platformdirs pluggy ply pockets pre-commit pyasn1 pyasn1-modules pydantic pydantic-core pygments pyphonetics pytest pytest-cov python-dateutil python-io-wrapper python-slugify pytz pyyaml quantulum3 ratelimit referencing regex requests requests-file requests-oauthlib rfc3986 rich rpds-py rsa ruamel-yaml ruamel-yaml-clib setuptools shellingham simpleeval simplejson six sphinxcontrib-napoleon stringcase structlog tableschema-to-template tabulate tenacity text-unidecode typeguard typer typing-extensions tzdata unidecode urllib3 validators virtualenv wheel xlrd xlrd3 xlsx2csv xlsxwriter xlwt
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Build Status](https://github.com/OCHA-DAP/hdx-python-scraper/actions/workflows/run-python-tests.yaml/badge.svg)](https://github.com/OCHA-DAP/hdx-python-scraper/actions/workflows/run-python-tests.yaml)
[![Coverage Status](https://coveralls.io/repos/github/OCHA-DAP/hdx-python-scraper/badge.svg?branch=main&ts=1)](https://coveralls.io/github/OCHA-DAP/hdx-python-scraper?branch=main)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Downloads](https://img.shields.io/pypi/dm/hdx-python-scraper.svg)](https://pypistats.org/packages/hdx-python-scraper)

The HDX Python Scraper Library is designed to enable you to easily develop code that
assembles data from one or more tabular sources that can be csv, xls, xlsx or JSON. It
uses a YAML file that specifies for each source what needs to be read and allows some
transformations to be performed on the data. The output is written to JSON, Google sheets
and/or Excel and includes the addition of
[Humanitarian Exchange Language (HXL)](https://hxlstandard.org/) hashtags specified in
the YAML file. Custom Python scrapers can also be written that conform to a defined
specification and the framework handles the execution of both configurable and custom
scrapers.

For more information, please read the
[documentation](https://hdx-python-scraper.readthedocs.io/en/latest/).

This library is part of the
[Humanitarian Data Exchange](https://data.humdata.org/) (HDX) project. If you have
humanitarian related data, please upload your datasets to HDX.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hdx-python-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "HDX, data assembly, data transformation, scrapers, tabular data",
    "author": null,
    "author_email": "Michael Rans <rans@email.com>",
    "download_url": "https://files.pythonhosted.org/packages/26/9a/011e1ab6bf68a9b23c104add043082d96b92839ff51fa7f7917df8f063ab/hdx_python_scraper-2.6.3.tar.gz",
    "platform": null,
    "description": "[![Build Status](https://github.com/OCHA-DAP/hdx-python-scraper/actions/workflows/run-python-tests.yaml/badge.svg)](https://github.com/OCHA-DAP/hdx-python-scraper/actions/workflows/run-python-tests.yaml)\n[![Coverage Status](https://coveralls.io/repos/github/OCHA-DAP/hdx-python-scraper/badge.svg?branch=main&ts=1)](https://coveralls.io/github/OCHA-DAP/hdx-python-scraper?branch=main)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![Downloads](https://img.shields.io/pypi/dm/hdx-python-scraper.svg)](https://pypistats.org/packages/hdx-python-scraper)\n\nThe HDX Python Scraper Library is designed to enable you to easily develop code that\nassembles data from one or more tabular sources that can be csv, xls, xlsx or JSON. It\nuses a YAML file that specifies for each source what needs to be read and allows some\ntransformations to be performed on the data. The output is written to JSON, Google sheets\nand/or Excel and includes the addition of\n[Humanitarian Exchange Language (HXL)](https://hxlstandard.org/) hashtags specified in\nthe YAML file. Custom Python scrapers can also be written that conform to a defined\nspecification and the framework handles the execution of both configurable and custom\nscrapers.\n\nFor more information, please read the\n[documentation](https://hdx-python-scraper.readthedocs.io/en/latest/).\n\nThis library is part of the\n[Humanitarian Data Exchange](https://data.humdata.org/) (HDX) project. If you have\nhumanitarian related data, please upload your datasets to HDX.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "HDX Python scraper utilities to assemble data from multiple sources",
    "version": "2.6.3",
    "project_urls": {
        "Homepage": "https://github.com/OCHA-DAP/hdx-python-scraper"
    },
    "split_keywords": [
        "hdx",
        " data assembly",
        " data transformation",
        " scrapers",
        " tabular data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2f743bc8662e90a01229be06b0e798086f13b1d1f07c7da3986cf2990b8b7d75",
                "md5": "4ac8f11c4e0a897106e67b06218d24bf",
                "sha256": "fbda2264351ba5c41af0b4aed7ddfff5613ada355382ba91fa9a3cb0c7812108"
            },
            "downloads": -1,
            "filename": "hdx_python_scraper-2.6.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4ac8f11c4e0a897106e67b06218d24bf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 58193,
            "upload_time": "2025-02-18T01:32:14",
            "upload_time_iso_8601": "2025-02-18T01:32:14.773602Z",
            "url": "https://files.pythonhosted.org/packages/2f/74/3bc8662e90a01229be06b0e798086f13b1d1f07c7da3986cf2990b8b7d75/hdx_python_scraper-2.6.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "269a011e1ab6bf68a9b23c104add043082d96b92839ff51fa7f7917df8f063ab",
                "md5": "2e3be61d5195101b316889940bb77197",
                "sha256": "bd02af591bc6adbecc05bf896079f5554981c98a330dc95d1c97f29777deb92a"
            },
            "downloads": -1,
            "filename": "hdx_python_scraper-2.6.3.tar.gz",
            "has_sig": false,
            "md5_digest": "2e3be61d5195101b316889940bb77197",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6362660,
            "upload_time": "2025-02-18T01:32:12",
            "upload_time_iso_8601": "2025-02-18T01:32:12.227749Z",
            "url": "https://files.pythonhosted.org/packages/26/9a/011e1ab6bf68a9b23c104add043082d96b92839ff51fa7f7917df8f063ab/hdx_python_scraper-2.6.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-18 01:32:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OCHA-DAP",
    "github_project": "hdx-python-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "annotated-types",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "25.1.0"
                ]
            ]
        },
        {
            "name": "cachetools",
            "specs": [
                [
                    "==",
                    "5.5.1"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2025.1.31"
                ]
            ]
        },
        {
            "name": "cfgv",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "chardet",
            "specs": [
                [
                    "==",
                    "5.2.0"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.4.1"
                ]
            ]
        },
        {
            "name": "ckanapi",
            "specs": [
                [
                    "==",
                    "4.8"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.8"
                ]
            ]
        },
        {
            "name": "coverage",
            "specs": [
                [
                    "==",
                    "7.6.12"
                ]
            ]
        },
        {
            "name": "defopt",
            "specs": [
                [
                    "==",
                    "6.4.0"
                ]
            ]
        },
        {
            "name": "distlib",
            "specs": [
                [
                    "==",
                    "0.3.9"
                ]
            ]
        },
        {
            "name": "dnspython",
            "specs": [
                [
                    "==",
                    "2.7.0"
                ]
            ]
        },
        {
            "name": "docopt",
            "specs": [
                [
                    "==",
                    "0.6.2"
                ]
            ]
        },
        {
            "name": "docutils",
            "specs": [
                [
                    "==",
                    "0.21.2"
                ]
            ]
        },
        {
            "name": "email-validator",
            "specs": [
                [
                    "==",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "et-xmlfile",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "filelock",
            "specs": [
                [
                    "==",
                    "3.17.0"
                ]
            ]
        },
        {
            "name": "frictionless",
            "specs": [
                [
                    "==",
                    "5.18.0"
                ]
            ]
        },
        {
            "name": "google-auth",
            "specs": [
                [
                    "==",
                    "2.38.0"
                ]
            ]
        },
        {
            "name": "google-auth-oauthlib",
            "specs": [
                [
                    "==",
                    "1.2.1"
                ]
            ]
        },
        {
            "name": "gspread",
            "specs": [
                [
                    "==",
                    "6.1.4"
                ]
            ]
        },
        {
            "name": "hdx-python-api",
            "specs": [
                [
                    "==",
                    "6.3.8"
                ]
            ]
        },
        {
            "name": "hdx-python-country",
            "specs": [
                [
                    "==",
                    "3.8.8"
                ]
            ]
        },
        {
            "name": "hdx-python-utilities",
            "specs": [
                [
                    "==",
                    "3.8.3"
                ]
            ]
        },
        {
            "name": "humanize",
            "specs": [
                [
                    "==",
                    "4.12.0"
                ]
            ]
        },
        {
            "name": "identify",
            "specs": [
                [
                    "==",
                    "2.6.7"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.10"
                ]
            ]
        },
        {
            "name": "ijson",
            "specs": [
                [
                    "==",
                    "3.3.0"
                ]
            ]
        },
        {
            "name": "inflect",
            "specs": [
                [
                    "==",
                    "7.5.0"
                ]
            ]
        },
        {
            "name": "iniconfig",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "isodate",
            "specs": [
                [
                    "==",
                    "0.7.2"
                ]
            ]
        },
        {
            "name": "jinja2",
            "specs": [
                [
                    "==",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "jsonlines",
            "specs": [
                [
                    "==",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "jsonpath-ng",
            "specs": [
                [
                    "==",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "jsonschema",
            "specs": [
                [
                    "==",
                    "4.23.0"
                ]
            ]
        },
        {
            "name": "jsonschema-specifications",
            "specs": [
                [
                    "==",
                    "2024.10.1"
                ]
            ]
        },
        {
            "name": "libhxl",
            "specs": [
                [
                    "==",
                    "5.2.2"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.7.3"
                ]
            ]
        },
        {
            "name": "makefun",
            "specs": [
                [
                    "==",
                    "1.15.6"
                ]
            ]
        },
        {
            "name": "markdown-it-py",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "marko",
            "specs": [
                [
                    "==",
                    "2.1.2"
                ]
            ]
        },
        {
            "name": "markupsafe",
            "specs": [
                [
                    "==",
                    "3.0.2"
                ]
            ]
        },
        {
            "name": "mdurl",
            "specs": [
                [
                    "==",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": "more-itertools",
            "specs": [
                [
                    "==",
                    "10.6.0"
                ]
            ]
        },
        {
            "name": "nodeenv",
            "specs": [
                [
                    "==",
                    "1.9.1"
                ]
            ]
        },
        {
            "name": "num2words",
            "specs": [
                [
                    "==",
                    "0.5.14"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "oauthlib",
            "specs": [
                [
                    "==",
                    "3.2.2"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "==",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "petl",
            "specs": [
                [
                    "==",
                    "1.7.15"
                ]
            ]
        },
        {
            "name": "platformdirs",
            "specs": [
                [
                    "==",
                    "4.3.6"
                ]
            ]
        },
        {
            "name": "pluggy",
            "specs": [
                [
                    "==",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "ply",
            "specs": [
                [
                    "==",
                    "3.11"
                ]
            ]
        },
        {
            "name": "pockets",
            "specs": [
                [
                    "==",
                    "0.9.1"
                ]
            ]
        },
        {
            "name": "pre-commit",
            "specs": [
                [
                    "==",
                    "4.1.0"
                ]
            ]
        },
        {
            "name": "pyasn1",
            "specs": [
                [
                    "==",
                    "0.6.1"
                ]
            ]
        },
        {
            "name": "pyasn1-modules",
            "specs": [
                [
                    "==",
                    "0.4.1"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.10.6"
                ]
            ]
        },
        {
            "name": "pydantic-core",
            "specs": [
                [
                    "==",
                    "2.27.2"
                ]
            ]
        },
        {
            "name": "pygments",
            "specs": [
                [
                    "==",
                    "2.19.1"
                ]
            ]
        },
        {
            "name": "pyphonetics",
            "specs": [
                [
                    "==",
                    "0.5.3"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.3.4"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    "==",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "python-io-wrapper",
            "specs": [
                [
                    "==",
                    "0.3.1"
                ]
            ]
        },
        {
            "name": "python-slugify",
            "specs": [
                [
                    "==",
                    "8.0.4"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2025.1"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    "==",
                    "6.0.2"
                ]
            ]
        },
        {
            "name": "quantulum3",
            "specs": [
                [
                    "==",
                    "0.9.2"
                ]
            ]
        },
        {
            "name": "ratelimit",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "referencing",
            "specs": [
                [
                    "==",
                    "0.36.2"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2024.11.6"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "requests-file",
            "specs": [
                [
                    "==",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "requests-oauthlib",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "rfc3986",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.9.4"
                ]
            ]
        },
        {
            "name": "rpds-py",
            "specs": [
                [
                    "==",
                    "0.22.3"
                ]
            ]
        },
        {
            "name": "rsa",
            "specs": [
                [
                    "==",
                    "4.9"
                ]
            ]
        },
        {
            "name": "ruamel-yaml",
            "specs": [
                [
                    "==",
                    "0.18.10"
                ]
            ]
        },
        {
            "name": "ruamel-yaml-clib",
            "specs": [
                [
                    "==",
                    "0.2.12"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "75.8.0"
                ]
            ]
        },
        {
            "name": "shellingham",
            "specs": [
                [
                    "==",
                    "1.5.4"
                ]
            ]
        },
        {
            "name": "simpleeval",
            "specs": [
                [
                    "==",
                    "1.0.3"
                ]
            ]
        },
        {
            "name": "simplejson",
            "specs": [
                [
                    "==",
                    "3.20.1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.17.0"
                ]
            ]
        },
        {
            "name": "sphinxcontrib-napoleon",
            "specs": [
                [
                    "==",
                    "0.7"
                ]
            ]
        },
        {
            "name": "stringcase",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "structlog",
            "specs": [
                [
                    "==",
                    "25.1.0"
                ]
            ]
        },
        {
            "name": "tableschema-to-template",
            "specs": [
                [
                    "==",
                    "0.0.13"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    "==",
                    "9.0.0"
                ]
            ]
        },
        {
            "name": "text-unidecode",
            "specs": [
                [
                    "==",
                    "1.3"
                ]
            ]
        },
        {
            "name": "typeguard",
            "specs": [
                [
                    "==",
                    "4.4.2"
                ]
            ]
        },
        {
            "name": "typer",
            "specs": [
                [
                    "==",
                    "0.15.1"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "tzdata",
            "specs": [
                [
                    "==",
                    "2025.1"
                ]
            ]
        },
        {
            "name": "unidecode",
            "specs": [
                [
                    "==",
                    "1.3.8"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "validators",
            "specs": [
                [
                    "==",
                    "0.34.0"
                ]
            ]
        },
        {
            "name": "virtualenv",
            "specs": [
                [
                    "==",
                    "20.29.2"
                ]
            ]
        },
        {
            "name": "wheel",
            "specs": [
                [
                    "==",
                    "0.45.1"
                ]
            ]
        },
        {
            "name": "xlrd",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "xlrd3",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "xlsx2csv",
            "specs": [
                [
                    "==",
                    "0.8.4"
                ]
            ]
        },
        {
            "name": "xlsxwriter",
            "specs": [
                [
                    "==",
                    "3.2.2"
                ]
            ]
        },
        {
            "name": "xlwt",
            "specs": [
                [
                    "==",
                    "1.3.0"
                ]
            ]
        }
    ],
    "lcname": "hdx-python-scraper"
}
        
Elapsed time: 0.91135s