unesco-reader

Name	unesco-reader JSON
Version	2.0.0 JSON
	download
home_page	None
Summary	Pythonic access to UNESCO data
upload_time	2024-04-08 10:18:44
maintainer	None
docs_url	None
author	Luca Picci
requires_python	<4,>=3.10
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # unesco_reader

[![PyPI](https://img.shields.io/pypi/v/unesco_reader.svg)](https://pypi.org/project/unesco_reader/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unesco_reader.svg)](https://pypi.org/project/unesco_reader/)
[![Documentation Status](https://readthedocs.org/projects/unesco-reader/badge/?version=latest)](https://unesco-reader.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/lpicci96/unesco_reader/branch/main/graph/badge.svg)](https://codecov.io/gh/lpicci96/unesco_reader)
![Black](https://img.shields.io/badge/code%20style-black-000000.svg)


Pythonic access to UNESCO data

`unesco_reader` is a Python package that provides a simple interface to access UNESCO Institute of Statistics (UIS)
data. UIS currently does not offer API access to its data. Users must download zipped files and extract the data.
This process requires several manual steps explained in their [python tutorial](https://apiportal.uis.unesco.org/bdds-tutorial). This package simplifies the process by providing a simple
interface to access, explore, and analyze the data, already structured and formatted through pandas DataFrames. This package also
allows users to view dataset documentation and other information such as the date of last update, as well as retrieve
information about all available datasets from UIS.

### Note</b>: 
UIS data is expected to be accessible through the [DataCommons](https://datacommons.org/) API in the future and should
be the preferred method to access the data. Future versions of this package may include support for the API,
or may be deprecated and remain as a legacy package.

This package is designed to scrape data from the UIS website. As a result of this approach
the package may be subject to breakage if the website structure or data file formats change without notice. 
Please report any unexpected errors or issues you encounter. All feedback, suggestions, and contributions are welcome!

## Installation

```bash
$ pip install unesco-reader
```

## Usage

Importing the package
```python
import unesco_reader as uis
```

Retrieve information about all the available datasets from UIS.
```python
uis.info()
```
This function will display all available datasets and relevant information about them.
```
>>>
name                                                               latest_update    theme
-----------------------------------------------------------------  ---------------  ---------
SDG Global and Thematic Indicators                                 February 2024    Education
Other Policy Relevant Indicators (OPRI)                            February 2024    Education
Research and Development (R&D) SDG 9.5                             February 2024    Science
Research and Development (R&D) – Other Policy Relevant Indicators  February 2024    Science
...
```

Retrieve a list of all available datasets from UIS.
```python
uis.available_datasets()
```

```
>>> ['SDG Global and Thematic Indicators',
     'Other Policy Relevant Indicators (OPRI)',
     'Research and Development (R&D) SDG 9.5',
     ...]
```

Optionally you can specify a theme to filter the datasets.
```python
uis.available_datasets(theme='Education')
```


To access data for a particular dataset, use the `UIS` class passing the name of the dataset. 
A `UIS` object allows a user to easily access, explore, and analyse the data.
On instantiation, the data will be extracted from the UIS website, or if it has already been 
extracted, it will be read from the cache (more on caching below)

```python
from unesco_reader import UIS

sdg = UIS("SDG Global and Thematic Indicators")
```

Basic information about the dataset can be accessed using the `info` method.
```python
sdg.info()
```
This will display information about the dataset, such as the name, and the latest update, and theme

```
>>>
-------------  ----------------------------------
name           SDG Global and Thematic Indicators
latest update  February 2024
theme          Education
-------------  ----------------------------------
```

Information is also accessible through the attributes of the object.
```python
name = sdg.name
update = sdg.latest_update
theme = sdg.theme
documentation = sdg.readme
```

The `readme` attribute contains the dataset documentation. To display the documentation, use the `display_readme` method.
```python
sdg.display_readme()
```

Various methods exist to access the data.
To access country data:
```python
df = sdg.get_country_data()
```
This will return a pandas DataFrame with the country data, in a structured and expected format.
By default the dataframe will not contain metadata. To include metadata in the output, set the `include_metadata` parameter to `True`.
Countries may also be filtered for a specific region by specifying the region's ID in the `region` parameter.
To see available regions use the `get_regions` method.

```python
df = sdg.get_country_data(include_metadata=True, region='WB: World')
```

To access regional data:
```python
df = sdg.get_region_data()
```
This will return a pandas DataFrame with the regional data, in a structured and expected format. Note that not all datasets contain regional data.
If the dataset does not contain regional data, an error will be raised. This is the same for any other data that is not available for the particular dataset.
By default the dataframe will not contain metadata. To include metadata in the output, set the `include_metadata` parameter to `True`.

Metadata, available countries, available regions, and variables are also accessible through class objects.
```python
metadata_df = sdg.get_metadata()
countries_df = sdg.get_countries()
regions_df = sdg.get_regions()
variables_df = sdg.get_variables()
```

To refresh the data and extract the latest data from the UIS website, use the `refresh` method.
```python
sdg.refresh()
```

### Caching

Caching is used to prevent unnecessary requests to the UIS website and enhance performance.
To refresh data returned by functions, use the `refresh` parameter. Caching using the LRU 
(Least Recently Used) algorithm approach and stores data in RAM. The cache is cleared when the
program is terminated.

```python
uis.info(refresh=True)
uis.available_datasets(refresh=True)
```
`refresh=True` will clear the cache and force extraction of the data and information from the UIS website.

For the `UIS` class, the `refresh` method will clear the cache and extract the latest data from the UIS website.
```python
sdg.refresh()
```

To clear all cached data, use the `clear_all_caches` method.
```python
uis.clear_all_caches()
```


## Contributing

All contributions are welcome! If you find a bug, 
or have a suggestion for a new feature, or an 
improvement on the documentation please open an issue.
Since this project is under current development, 
please check open issues and make sure the issue has 
not been raised already.

A detailed overview of the contribution process can be found
[here](https://github.com/lpicci96/unesco_reader/blob/main/CONTRIBUTING.md).
By contributing to this project, you agree to abide by its terms.

## License

`unesco_reader` was created by Luca Picci. It is licensed under the terms of the MIT license.

## Credits

`unesco_reader` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the
`py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "unesco-reader",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Luca Picci",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/f1/60/c5e9a5bbbdb70e9d26bc2537ce1787e8b776f35d4cfbd806e7a25fbb9621/unesco_reader-2.0.0.tar.gz",
    "platform": null,
    "description": "# unesco_reader\n\n[![PyPI](https://img.shields.io/pypi/v/unesco_reader.svg)](https://pypi.org/project/unesco_reader/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unesco_reader.svg)](https://pypi.org/project/unesco_reader/)\n[![Documentation Status](https://readthedocs.org/projects/unesco-reader/badge/?version=latest)](https://unesco-reader.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://codecov.io/gh/lpicci96/unesco_reader/branch/main/graph/badge.svg)](https://codecov.io/gh/lpicci96/unesco_reader)\n![Black](https://img.shields.io/badge/code%20style-black-000000.svg)\n\n\nPythonic access to UNESCO data\n\n`unesco_reader` is a Python package that provides a simple interface to access UNESCO Institute of Statistics (UIS)\ndata. UIS currently does not offer API access to its data. Users must download zipped files and extract the data.\nThis process requires several manual steps explained in their [python tutorial](https://apiportal.uis.unesco.org/bdds-tutorial). This package simplifies the process by providing a simple\ninterface to access, explore, and analyze the data, already structured and formatted through pandas DataFrames. This package also\nallows users to view dataset documentation and other information such as the date of last update, as well as retrieve\ninformation about all available datasets from UIS.\n\n### Note</b>: \nUIS data is expected to be accessible through the [DataCommons](https://datacommons.org/) API in the future and should\nbe the preferred method to access the data. Future versions of this package may include support for the API,\nor may be deprecated and remain as a legacy package.\n\nThis package is designed to scrape data from the UIS website. As a result of this approach\nthe package may be subject to breakage if the website structure or data file formats change without notice. \nPlease report any unexpected errors or issues you encounter. All feedback, suggestions, and contributions are welcome!\n\n## Installation\n\n```bash\n$ pip install unesco-reader\n```\n\n## Usage\n\nImporting the package\n```python\nimport unesco_reader as uis\n```\n\nRetrieve information about all the available datasets from UIS.\n```python\nuis.info()\n```\nThis function will display all available datasets and relevant information about them.\n```\n>>>\nname                                                               latest_update    theme\n-----------------------------------------------------------------  ---------------  ---------\nSDG Global and Thematic Indicators                                 February 2024    Education\nOther Policy Relevant Indicators (OPRI)                            February 2024    Education\nResearch and Development (R&D) SDG 9.5                             February 2024    Science\nResearch and Development (R&D) \u2013 Other Policy Relevant Indicators  February 2024    Science\n...\n```\n\nRetrieve a list of all available datasets from UIS.\n```python\nuis.available_datasets()\n```\n\n```\n>>> ['SDG Global and Thematic Indicators',\n     'Other Policy Relevant Indicators (OPRI)',\n     'Research and Development (R&D) SDG 9.5',\n     ...]\n```\n\nOptionally you can specify a theme to filter the datasets.\n```python\nuis.available_datasets(theme='Education')\n```\n\n\nTo access data for a particular dataset, use the `UIS` class passing the name of the dataset. \nA `UIS` object allows a user to easily access, explore, and analyse the data.\nOn instantiation, the data will be extracted from the UIS website, or if it has already been \nextracted, it will be read from the cache (more on caching below)\n\n```python\nfrom unesco_reader import UIS\n\nsdg = UIS(\"SDG Global and Thematic Indicators\")\n```\n\nBasic information about the dataset can be accessed using the `info` method.\n```python\nsdg.info()\n```\nThis will display information about the dataset, such as the name, and the latest update, and theme\n\n```\n>>>\n-------------  ----------------------------------\nname           SDG Global and Thematic Indicators\nlatest update  February 2024\ntheme          Education\n-------------  ----------------------------------\n```\n\nInformation is also accessible through the attributes of the object.\n```python\nname = sdg.name\nupdate = sdg.latest_update\ntheme = sdg.theme\ndocumentation = sdg.readme\n```\n\nThe `readme` attribute contains the dataset documentation. To display the documentation, use the `display_readme` method.\n```python\nsdg.display_readme()\n```\n\nVarious methods exist to access the data.\nTo access country data:\n```python\ndf = sdg.get_country_data()\n```\nThis will return a pandas DataFrame with the country data, in a structured and expected format.\nBy default the dataframe will not contain metadata. To include metadata in the output, set the `include_metadata` parameter to `True`.\nCountries may also be filtered for a specific region by specifying the region's ID in the `region` parameter.\nTo see available regions use the `get_regions` method.\n\n```python\ndf = sdg.get_country_data(include_metadata=True, region='WB: World')\n```\n\nTo access regional data:\n```python\ndf = sdg.get_region_data()\n```\nThis will return a pandas DataFrame with the regional data, in a structured and expected format. Note that not all datasets contain regional data.\nIf the dataset does not contain regional data, an error will be raised. This is the same for any other data that is not available for the particular dataset.\nBy default the dataframe will not contain metadata. To include metadata in the output, set the `include_metadata` parameter to `True`.\n\nMetadata, available countries, available regions, and variables are also accessible through class objects.\n```python\nmetadata_df = sdg.get_metadata()\ncountries_df = sdg.get_countries()\nregions_df = sdg.get_regions()\nvariables_df = sdg.get_variables()\n```\n\nTo refresh the data and extract the latest data from the UIS website, use the `refresh` method.\n```python\nsdg.refresh()\n```\n\n### Caching\n\nCaching is used to prevent unnecessary requests to the UIS website and enhance performance.\nTo refresh data returned by functions, use the `refresh` parameter. Caching using the LRU \n(Least Recently Used) algorithm approach and stores data in RAM. The cache is cleared when the\nprogram is terminated.\n\n```python\nuis.info(refresh=True)\nuis.available_datasets(refresh=True)\n```\n`refresh=True` will clear the cache and force extraction of the data and information from the UIS website.\n\nFor the `UIS` class, the `refresh` method will clear the cache and extract the latest data from the UIS website.\n```python\nsdg.refresh()\n```\n\nTo clear all cached data, use the `clear_all_caches` method.\n```python\nuis.clear_all_caches()\n```\n\n\n## Contributing\n\nAll contributions are welcome! If you find a bug, \nor have a suggestion for a new feature, or an \nimprovement on the documentation please open an issue.\nSince this project is under current development, \nplease check open issues and make sure the issue has \nnot been raised already.\n\nA detailed overview of the contribution process can be found\n[here](https://github.com/lpicci96/unesco_reader/blob/main/CONTRIBUTING.md).\nBy contributing to this project, you agree to abide by its terms.\n\n## License\n\n`unesco_reader` was created by Luca Picci. It is licensed under the terms of the MIT license.\n\n## Credits\n\n`unesco_reader` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the\n`py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pythonic access to UNESCO data",
    "version": "2.0.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c37518ca2c58b804d3f1a9382c33a097182d89a90fc7a93a0332d766b20572d",
                "md5": "9ac2dee5c3151d4ef267235494d43628",
                "sha256": "ed382e86aebafcd1d7df172afc2d1337aaf3e9c4e6b2a1a228db02bdcf0a095a"
            },
            "downloads": -1,
            "filename": "unesco_reader-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9ac2dee5c3151d4ef267235494d43628",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.10",
            "size": 14325,
            "upload_time": "2024-04-08T10:18:42",
            "upload_time_iso_8601": "2024-04-08T10:18:42.546479Z",
            "url": "https://files.pythonhosted.org/packages/0c/37/518ca2c58b804d3f1a9382c33a097182d89a90fc7a93a0332d766b20572d/unesco_reader-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f160c5e9a5bbbdb70e9d26bc2537ce1787e8b776f35d4cfbd806e7a25fbb9621",
                "md5": "bc8e6eb43d84599d976dcade30fed9d5",
                "sha256": "45692fb8c325dd63ea5598674915dd17d0b67c2e404de8f19ac9cbdcc7a6dd10"
            },
            "downloads": -1,
            "filename": "unesco_reader-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bc8e6eb43d84599d976dcade30fed9d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.10",
            "size": 14717,
            "upload_time": "2024-04-08T10:18:44",
            "upload_time_iso_8601": "2024-04-08T10:18:44.268392Z",
            "url": "https://files.pythonhosted.org/packages/f1/60/c5e9a5bbbdb70e9d26bc2537ce1787e8b776f35d4cfbd806e7a25fbb9621/unesco_reader-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-08 10:18:44",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "unesco-reader"
}

Luca Picci