libhxl


Namelibhxl JSON
Version 5.2.2 PyPI version JSON
download
home_pagehttp://hxlproject.org
SummaryPython support library for the Humanitarian Exchange Language (HXL). See http://hxlstandard.org and https://github.com/HXLStandard/libhxl-python
upload_time2024-10-25 09:19:11
maintainerNone
docs_urlNone
authorDavid Megginson
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            libhxl-python
=============

Python support library for the Humanitarian Exchange Language (HXL)
data standard.  The library requires Python 3 (versions prior to 4.6
also supported Python 2.7).

**API docs:** https://hxlstandard.github.io/libhxl-python/ (and in the ``docs/`` folder)

**HXL standard:** http://hxlstandard.org

## Quick start

From the command line (or inside a Python3 virtual environment):

```
$ pip3 install libhxl
```

In your code:

```
import hxl

url = "https://github.com/HXLStandard/libhxl-python/blob/main/tests/files/test_io/input-valid.csv"

data = hxl.data(url).with_rows("#sector=WASH").sort("#country")

for line in data.gen_csv():
    print(line)
```

## Usage

### Reading from a data source

The _hxl.data()_ function reads HXL from a file object, filename, URL,
or list of arrays and makes it available for processing, much like
``$()`` in JQuery. The following will read HXLated data from standard input:

```
import sys
import hxl

dataset = hxl.data(sys.stdin)
```

Most commonly, you will open a dataset via a URL:

```
dataset = hxl.data("https://example.org/dataset.url"
```

To open a local file rather than a URL, use the _allow\_local_ property
of the
[InputOptions](https://hxlstandard.github.io/libhxl-python/input.html#hxl.input.InputOptions)
class:

```
dataset = hxl.data("dataset.xlsx", hxl.InputOptions(allow_local=True))
```

#### Input caching

libhxl uses the Python
[requests](http://docs.python-requests.org/en/master/) library for
opening URLs. If you want to enable caching (for example, to avoid
beating up on your source with repeated requests), your code can use
the [requests_cache](https://pypi.python.org/pypi/requests-cache)
plugin, like this:

    import requests_cache
    requests_cache.install_cache('demo_cache', expire_after=3600)

The default caching backend is a sqlite database at the location specied.


### Filter chains

You can filters to transform the output, and chain them as
needed. Transformation is lazy, and uses the minimum memory
possible. For example, this command selects only data rows where the
country is "Somalia", sorted by the organisation:

```
transformed = hxl.data(url).with_rows("#country=Somalia").sort("#org")
```

For more on filters see the API documentation for the
[hxl.model.Dataset](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset)
class and the
[hxl.filters](https://hxlstandard.github.io/libhxl-python/filters.html)
module.


### Generators

Generators allow the re-serialising of HXL data, returning something that works like an iterator.  Example:

```
for line in hxl.data(url).gen_csv():
    print(line)
```

The following generators are available (you can use the parameters to turn the text headers and HXL tags on or off):

Generator method | Description
-- | --
[gen_raw()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.gen_raw) | Generate arrays of strings, one row at a time.
[gen_csv()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.gen_csv) | Generate encoded CSV rows, one row at a time.
[gen_json()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.gen_json) | Generate JSON output, either as rows or as JSON objects with the HXL hashtags as property names.

### Validation

To validate a HXL dataset against a schema (also in HXL), use the [validate()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.validate) method at the end of the filter chain:

```
is_valid = hxl.data(url).validate('my-schema.csv')
```

If you don't specify a schema, the library will use a simple, built-in schema:

```
is_valid = hxl.data(url).validate()
```

If you include a callback, you can collect details about the errors and warnings:

```
def my_callback(error_info):
    ## error_info is a HXLValidationException
    sys.stderr.write(error_info)

is_valid = hxl.data(url).validate(schema='my-schema.csv', callback=my_callback)
```

For more information on validation, see the API documentation for the
[hxl.validation](https://hxlstandard.github.io/libhxl-python/validation.html)
module and the format documentation for [HXL
schemas](https://github.com/HXLStandard/hxl-proxy/wiki/HXL-schemas).


## Command-line scripts

The filters are also available as command-line scripts, installed with
the library. For example,

```
$ hxlcount -t country dataset.csv
```

Will perform the same action as

```
import hxl

hxl.data("dataset.csv", hxl.InputOptions(allow_local=True)).count("country").gen_csv()
```

See the API documentation for the
[hxl.scripts](https://hxlstandard.github.io/libhxl-python/scripts.html)
module for more information about the command-line scripts
available. All scripts have an ``-h`` option that gives usage
information.


## Installation

This repository includes a standard Python `setup.py` script for
installing the library and scripts (applications) on your system. In a
Unix-like operating system, you can install using the following
command:

```
python setup.py install
```

If you don't need to install from source, try simply

```
pip install libhxl
```

Once you've installed, you will be able to include the HXL libraries
from any Python application, and will be able to call scripts like
_hxlvalidate_ from the command line.


## Makefile

There is also a generic Makefile that automates many tasks, including
setting up a Python virtual environment for testing. The Python3 venv
module is required for most of the targets.


```
make build-venv
```

Set up a local Python virtual environment for testing, if it doesn't
already exist. Will recreate the virtual environment if setup.py has
changed.

```
make test
```

Set up a virtual environment (if missing) and run all the unit tests

```
make test-install
```

Test a clean installation to verify there are no missing dependencies,
etc.

## License

libhxl-python is released into the Public Domain, and comes with NO
WARRANTY. See [LICENSE.md](./LICENSE.md) for details.

            

Raw data

            {
    "_id": null,
    "home_page": "http://hxlproject.org",
    "name": "libhxl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "David Megginson",
    "author_email": "megginson@un.org",
    "download_url": "https://files.pythonhosted.org/packages/63/ad/c1dafb7c59e685692a2eeafce83245b7c5f46e05881fdb77a0cbb7c19c74/libhxl-5.2.2.tar.gz",
    "platform": null,
    "description": "libhxl-python\n=============\n\nPython support library for the Humanitarian Exchange Language (HXL)\ndata standard.  The library requires Python 3 (versions prior to 4.6\nalso supported Python 2.7).\n\n**API docs:** https://hxlstandard.github.io/libhxl-python/ (and in the ``docs/`` folder)\n\n**HXL standard:** http://hxlstandard.org\n\n## Quick start\n\nFrom the command line (or inside a Python3 virtual environment):\n\n```\n$ pip3 install libhxl\n```\n\nIn your code:\n\n```\nimport hxl\n\nurl = \"https://github.com/HXLStandard/libhxl-python/blob/main/tests/files/test_io/input-valid.csv\"\n\ndata = hxl.data(url).with_rows(\"#sector=WASH\").sort(\"#country\")\n\nfor line in data.gen_csv():\n    print(line)\n```\n\n## Usage\n\n### Reading from a data source\n\nThe _hxl.data()_ function reads HXL from a file object, filename, URL,\nor list of arrays and makes it available for processing, much like\n``$()`` in JQuery. The following will read HXLated data from standard input:\n\n```\nimport sys\nimport hxl\n\ndataset = hxl.data(sys.stdin)\n```\n\nMost commonly, you will open a dataset via a URL:\n\n```\ndataset = hxl.data(\"https://example.org/dataset.url\"\n```\n\nTo open a local file rather than a URL, use the _allow\\_local_ property\nof the\n[InputOptions](https://hxlstandard.github.io/libhxl-python/input.html#hxl.input.InputOptions)\nclass:\n\n```\ndataset = hxl.data(\"dataset.xlsx\", hxl.InputOptions(allow_local=True))\n```\n\n#### Input caching\n\nlibhxl uses the Python\n[requests](http://docs.python-requests.org/en/master/) library for\nopening URLs. If you want to enable caching (for example, to avoid\nbeating up on your source with repeated requests), your code can use\nthe [requests_cache](https://pypi.python.org/pypi/requests-cache)\nplugin, like this:\n\n    import requests_cache\n    requests_cache.install_cache('demo_cache', expire_after=3600)\n\nThe default caching backend is a sqlite database at the location specied.\n\n\n### Filter chains\n\nYou can filters to transform the output, and chain them as\nneeded. Transformation is lazy, and uses the minimum memory\npossible. For example, this command selects only data rows where the\ncountry is \"Somalia\", sorted by the organisation:\n\n```\ntransformed = hxl.data(url).with_rows(\"#country=Somalia\").sort(\"#org\")\n```\n\nFor more on filters see the API documentation for the\n[hxl.model.Dataset](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset)\nclass and the\n[hxl.filters](https://hxlstandard.github.io/libhxl-python/filters.html)\nmodule.\n\n\n### Generators\n\nGenerators allow the re-serialising of HXL data, returning something that works like an iterator.  Example:\n\n```\nfor line in hxl.data(url).gen_csv():\n    print(line)\n```\n\nThe following generators are available (you can use the parameters to turn the text headers and HXL tags on or off):\n\nGenerator method | Description\n-- | --\n[gen_raw()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.gen_raw) | Generate arrays of strings, one row at a time.\n[gen_csv()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.gen_csv) | Generate encoded CSV rows, one row at a time.\n[gen_json()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.gen_json) | Generate JSON output, either as rows or as JSON objects with the HXL hashtags as property names.\n\n### Validation\n\nTo validate a HXL dataset against a schema (also in HXL), use the [validate()](https://hxlstandard.github.io/libhxl-python/model.html#hxl.model.Dataset.validate) method at the end of the filter chain:\n\n```\nis_valid = hxl.data(url).validate('my-schema.csv')\n```\n\nIf you don't specify a schema, the library will use a simple, built-in schema:\n\n```\nis_valid = hxl.data(url).validate()\n```\n\nIf you include a callback, you can collect details about the errors and warnings:\n\n```\ndef my_callback(error_info):\n    ## error_info is a HXLValidationException\n    sys.stderr.write(error_info)\n\nis_valid = hxl.data(url).validate(schema='my-schema.csv', callback=my_callback)\n```\n\nFor more information on validation, see the API documentation for the\n[hxl.validation](https://hxlstandard.github.io/libhxl-python/validation.html)\nmodule and the format documentation for [HXL\nschemas](https://github.com/HXLStandard/hxl-proxy/wiki/HXL-schemas).\n\n\n## Command-line scripts\n\nThe filters are also available as command-line scripts, installed with\nthe library. For example,\n\n```\n$ hxlcount -t country dataset.csv\n```\n\nWill perform the same action as\n\n```\nimport hxl\n\nhxl.data(\"dataset.csv\", hxl.InputOptions(allow_local=True)).count(\"country\").gen_csv()\n```\n\nSee the API documentation for the\n[hxl.scripts](https://hxlstandard.github.io/libhxl-python/scripts.html)\nmodule for more information about the command-line scripts\navailable. All scripts have an ``-h`` option that gives usage\ninformation.\n\n\n## Installation\n\nThis repository includes a standard Python `setup.py` script for\ninstalling the library and scripts (applications) on your system. In a\nUnix-like operating system, you can install using the following\ncommand:\n\n```\npython setup.py install\n```\n\nIf you don't need to install from source, try simply\n\n```\npip install libhxl\n```\n\nOnce you've installed, you will be able to include the HXL libraries\nfrom any Python application, and will be able to call scripts like\n_hxlvalidate_ from the command line.\n\n\n## Makefile\n\nThere is also a generic Makefile that automates many tasks, including\nsetting up a Python virtual environment for testing. The Python3 venv\nmodule is required for most of the targets.\n\n\n```\nmake build-venv\n```\n\nSet up a local Python virtual environment for testing, if it doesn't\nalready exist. Will recreate the virtual environment if setup.py has\nchanged.\n\n```\nmake test\n```\n\nSet up a virtual environment (if missing) and run all the unit tests\n\n```\nmake test-install\n```\n\nTest a clean installation to verify there are no missing dependencies,\netc.\n\n## License\n\nlibhxl-python is released into the Public Domain, and comes with NO\nWARRANTY. See [LICENSE.md](./LICENSE.md) for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python support library for the Humanitarian Exchange Language (HXL). See http://hxlstandard.org and https://github.com/HXLStandard/libhxl-python",
    "version": "5.2.2",
    "project_urls": {
        "Changelog": "https://github.com/HXLStandard/libhxl-python/blob/prod/CHANGELOG",
        "Documentation": "https://hxlstandard.github.io/libhxl-python/index.html",
        "GitHub": "https://github.com/HXLStandard/libhxl-python/",
        "Homepage": "http://hxlproject.org"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "63adc1dafb7c59e685692a2eeafce83245b7c5f46e05881fdb77a0cbb7c19c74",
                "md5": "b3588ec9e1670551b3bccf30772675f6",
                "sha256": "3a74d9f23561bfcefd20e5229c574bc68dfc114fdb6ba1ba684d9e9d7ea52483"
            },
            "downloads": -1,
            "filename": "libhxl-5.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b3588ec9e1670551b3bccf30772675f6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 127736,
            "upload_time": "2024-10-25T09:19:11",
            "upload_time_iso_8601": "2024-10-25T09:19:11.202362Z",
            "url": "https://files.pythonhosted.org/packages/63/ad/c1dafb7c59e685692a2eeafce83245b7c5f46e05881fdb77a0cbb7c19c74/libhxl-5.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 09:19:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "HXLStandard",
    "github_project": "libhxl-python",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "libhxl"
}
        
Elapsed time: 0.49975s