cldfzenodo

Name	cldfzenodo JSON
Version	2.1.2 JSON
	download
home_page	https://github.com/cldf/cldfzenodo
Summary	Functionality to retrieve CLDF datasets deposited on Zenodo
upload_time	2024-10-15 12:42:04
maintainer	None
docs_url	None
author	Robert Forkel
requires_python	>=3.8
license	Apache 2.0
keywords	linguistics
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # cldfzenodo

[![Build Status](https://github.com/cldf/cldfzenodo/workflows/tests/badge.svg)](https://github.com/cldf/cldfzenodo/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/cldfzenodo.svg)](https://pypi.org/project/cldfzenodo)

`cldfzenodo` provides programmatic access to CLDF data deposited on [Zenodo](https://zenodo.org).

**NOTE:** The Zenodo upgrade from October 13, 2023 introduced quite a few changes in various parts
of the system. Thus, `cldfzenodo` before version 2.0 cannot be used anymore. `cldfzenodo` is meant
to be backwards compatible, i.e. provides the same Python API as `cldfzenodo` 1.x - but may issue
deprecation warnings.


## Install

```shell
pip install cldfzenodo
```


## `pycldf` dataset resolver

`cldfzenodo` registers (upon installation) a [`pycldf` dataset resolver](https://pycldf.readthedocs.io/en/latest/ext_discovery.html)
for dataset locators of the form `https://doi.org/10.5281/zenodo.[0-9]+` and `https://zenodo.org/record/[0-9]+`.
Thus, after installation you should be able to retrieve `pycldf.Dataset` instances running

```python
>>> from pycldf.ext.discovery import get_dataset
>>> import pathlib
>>> pathlib.Path('wacl').mkdir()
>>> ds = get_dataset('https://doi.org/10.5281/zenodo.7322688', pathlib.Path('wacl'))
>>> ds.properties['dc:title']
'World Atlas of Classifier Languages'
```


## CLI

`cldfzenodo` provides a subcommand to be run from [cldfbench](https://github.com/cldf/cldfbench).
To make use of this command, you have to install `cldfbench`, which can be done via
```shell
pip install cldfzenodo[cli]
```
Then you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.
```shell
cldfbench zenodo.download 10.5281/zenodo.4683137  --directory wals-2020.1/
```
will download WALS Online as CLDF dataset into `wals-2020.1`:
```shell
$ tree wals-2020.1/
wals-2020.1/
├── areas.csv
├── chapters.csv
├── codes.csv
├── contributors.csv
├── countries.csv
├── examples.csv
├── language_names.csv
├── languages.csv
├── parameters.csv
├── sources.bib
├── StructureDataset-metadata.json
└── values.csv

0 directories, 12 files
```


## API

Metadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via `cldfzenodo.Record`
objects. Such objects can be obtained in various ways:
- Via DOI:
  ```python
  >>> from cldfzenodo import API
  >>> rec = API.get_record(doi='10.5281/zenodo.4762034')
  >>> rec.title
  'glottolog/glottolog: Glottolog database 4.4 as CLDF'
  ```
- Via [concept DOI](https://help.zenodo.org/#versioning) and version tag:
  ```python
  >>> from cldfzenodo import API
  >>> rec = API.get_record(conceptdoi='10.5281/zenodo.3260727', version='4.5')
  >>> rec.title
  'glottolog/glottolog: Glottolog database 4.5 as CLDF'
  ```
- From deposits grouped into a Zenodo community:
  ```python
  >>> from cldfzenodo import API
  >>> for rec in API.iter_records(community='dictionaria'):
  ...     print(rec.title)
  ...     break
  ...     
  dictionaria/iquito: Iquito dictionary
  ```
- From search results using keywords:
  ```python
  >>> from cldfzenodo import API
  >>> for rec in API.iter_records(keyword='cldf:Wordlist'):
  ...     print(rec.title)
  ...     break
  ...     
  CLDF dataset accompanying Zariquiey et al.'s "Evolution of Body-Part Terminology in Pano" from 2022
  ```

`cldfzenodo.Record` objects provide sufficient metadata to allow identification and data access:
```python
>>> from cldfzenodo import API
>>> print(API.get_record(doi='10.5281/zenodo.4762034').bibtex)
@misc{zenodo-4762034,
  author    = {Hammarström, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},
  title     = {glottolog/glottolog: Glottolog database 4.4 as CLDF},
  keywords  = {cldf:StructureDataset, linguistics},
  publisher = {Zenodo},
  year      = {2021},
  doi       = {10.5281/zenodo.4762034},
  url       = {https://doi.org/10.5281/zenodo.4762034},
  copyright = {Creative Commons Attribution 4.0}
}
```

One can download the full deposit (and access - possible multiple - CLDF datasets):
```python
from pycldf import iter_datasets

API.get_record(doi='...').download('my_directory')
for cldf in iter_datasets('my_directory'):
    pass
```

But often, only the "pure" CLDF data is of interest - and not the additional metadata and curation
context, e.g. of [cldfbench](https://github.com/cldf/cldfbench)-curated datasets. This can be done
via
```python
cldf = API.get_record(doi='...').download_dataset('my_directory')
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cldf/cldfzenodo",
    "name": "cldfzenodo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "linguistics",
    "author": "Robert Forkel",
    "author_email": "dlce.rdm@eva.mpg.de",
    "download_url": "https://files.pythonhosted.org/packages/2c/dc/6207da20bbec31fee2bd0d72e7f9c039f1345ad991fd55fc1cf418f02914/cldfzenodo-2.1.2.tar.gz",
    "platform": "any",
    "description": "# cldfzenodo\n\n[![Build Status](https://github.com/cldf/cldfzenodo/workflows/tests/badge.svg)](https://github.com/cldf/cldfzenodo/actions?query=workflow%3Atests)\n[![PyPI](https://img.shields.io/pypi/v/cldfzenodo.svg)](https://pypi.org/project/cldfzenodo)\n\n`cldfzenodo` provides programmatic access to CLDF data deposited on [Zenodo](https://zenodo.org).\n\n**NOTE:** The Zenodo upgrade from October 13, 2023 introduced quite a few changes in various parts\nof the system. Thus, `cldfzenodo` before version 2.0 cannot be used anymore. `cldfzenodo` is meant\nto be backwards compatible, i.e. provides the same Python API as `cldfzenodo` 1.x - but may issue\ndeprecation warnings.\n\n\n## Install\n\n```shell\npip install cldfzenodo\n```\n\n\n## `pycldf` dataset resolver\n\n`cldfzenodo` registers (upon installation) a [`pycldf` dataset resolver](https://pycldf.readthedocs.io/en/latest/ext_discovery.html)\nfor dataset locators of the form `https://doi.org/10.5281/zenodo.[0-9]+` and `https://zenodo.org/record/[0-9]+`.\nThus, after installation you should be able to retrieve `pycldf.Dataset` instances running\n\n```python\n>>> from pycldf.ext.discovery import get_dataset\n>>> import pathlib\n>>> pathlib.Path('wacl').mkdir()\n>>> ds = get_dataset('https://doi.org/10.5281/zenodo.7322688', pathlib.Path('wacl'))\n>>> ds.properties['dc:title']\n'World Atlas of Classifier Languages'\n```\n\n\n## CLI\n\n`cldfzenodo` provides a subcommand to be run from [cldfbench](https://github.com/cldf/cldfbench).\nTo make use of this command, you have to install `cldfbench`, which can be done via\n```shell\npip install cldfzenodo[cli]\n```\nThen you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.\n```shell\ncldfbench zenodo.download 10.5281/zenodo.4683137  --directory wals-2020.1/\n```\nwill download WALS Online as CLDF dataset into `wals-2020.1`:\n```shell\n$ tree wals-2020.1/\nwals-2020.1/\n\u251c\u2500\u2500 areas.csv\n\u251c\u2500\u2500 chapters.csv\n\u251c\u2500\u2500 codes.csv\n\u251c\u2500\u2500 contributors.csv\n\u251c\u2500\u2500 countries.csv\n\u251c\u2500\u2500 examples.csv\n\u251c\u2500\u2500 language_names.csv\n\u251c\u2500\u2500 languages.csv\n\u251c\u2500\u2500 parameters.csv\n\u251c\u2500\u2500 sources.bib\n\u251c\u2500\u2500 StructureDataset-metadata.json\n\u2514\u2500\u2500 values.csv\n\n0 directories, 12 files\n```\n\n\n## API\n\nMetadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via `cldfzenodo.Record`\nobjects. Such objects can be obtained in various ways:\n- Via DOI:\n  ```python\n  >>> from cldfzenodo import API\n  >>> rec = API.get_record(doi='10.5281/zenodo.4762034')\n  >>> rec.title\n  'glottolog/glottolog: Glottolog database 4.4 as CLDF'\n  ```\n- Via [concept DOI](https://help.zenodo.org/#versioning) and version tag:\n  ```python\n  >>> from cldfzenodo import API\n  >>> rec = API.get_record(conceptdoi='10.5281/zenodo.3260727', version='4.5')\n  >>> rec.title\n  'glottolog/glottolog: Glottolog database 4.5 as CLDF'\n  ```\n- From deposits grouped into a Zenodo community:\n  ```python\n  >>> from cldfzenodo import API\n  >>> for rec in API.iter_records(community='dictionaria'):\n  ...     print(rec.title)\n  ...     break\n  ...     \n  dictionaria/iquito: Iquito dictionary\n  ```\n- From search results using keywords:\n  ```python\n  >>> from cldfzenodo import API\n  >>> for rec in API.iter_records(keyword='cldf:Wordlist'):\n  ...     print(rec.title)\n  ...     break\n  ...     \n  CLDF dataset accompanying Zariquiey et al.'s \"Evolution of Body-Part Terminology in Pano\" from 2022\n  ```\n\n`cldfzenodo.Record` objects provide sufficient metadata to allow identification and data access:\n```python\n>>> from cldfzenodo import API\n>>> print(API.get_record(doi='10.5281/zenodo.4762034').bibtex)\n@misc{zenodo-4762034,\n  author    = {Hammarstr\u00f6m, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},\n  title     = {glottolog/glottolog: Glottolog database 4.4 as CLDF},\n  keywords  = {cldf:StructureDataset, linguistics},\n  publisher = {Zenodo},\n  year      = {2021},\n  doi       = {10.5281/zenodo.4762034},\n  url       = {https://doi.org/10.5281/zenodo.4762034},\n  copyright = {Creative Commons Attribution 4.0}\n}\n```\n\nOne can download the full deposit (and access - possible multiple - CLDF datasets):\n```python\nfrom pycldf import iter_datasets\n\nAPI.get_record(doi='...').download('my_directory')\nfor cldf in iter_datasets('my_directory'):\n    pass\n```\n\nBut often, only the \"pure\" CLDF data is of interest - and not the additional metadata and curation\ncontext, e.g. of [cldfbench](https://github.com/cldf/cldfbench)-curated datasets. This can be done\nvia\n```python\ncldf = API.get_record(doi='...').download_dataset('my_directory')\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Functionality to retrieve CLDF datasets deposited on Zenodo",
    "version": "2.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/cldf/cldfzenodo/issues",
        "Homepage": "https://github.com/cldf/cldfzenodo"
    },
    "split_keywords": [
        "linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "855c7f9325ebfc3c817cbf5a5842b5502975ae5815eab19ee38496f0a1611e2f",
                "md5": "9a8440c77310c584d18522a5a4cbcae2",
                "sha256": "fd2f13130522af7b9347529dee47b6dc8b3b03c5c62746557b992f4bf257e04f"
            },
            "downloads": -1,
            "filename": "cldfzenodo-2.1.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9a8440c77310c584d18522a5a4cbcae2",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 16899,
            "upload_time": "2024-10-15T12:42:03",
            "upload_time_iso_8601": "2024-10-15T12:42:03.108052Z",
            "url": "https://files.pythonhosted.org/packages/85/5c/7f9325ebfc3c817cbf5a5842b5502975ae5815eab19ee38496f0a1611e2f/cldfzenodo-2.1.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2cdc6207da20bbec31fee2bd0d72e7f9c039f1345ad991fd55fc1cf418f02914",
                "md5": "c561c06a8b346b9f116cddb53fb85381",
                "sha256": "479fdb8728a28b70fabd4d9be0d7436c34fc4d11a2afb4b68b5242c26faf2596"
            },
            "downloads": -1,
            "filename": "cldfzenodo-2.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c561c06a8b346b9f116cddb53fb85381",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20562,
            "upload_time": "2024-10-15T12:42:04",
            "upload_time_iso_8601": "2024-10-15T12:42:04.645313Z",
            "url": "https://files.pythonhosted.org/packages/2c/dc/6207da20bbec31fee2bd0d72e7f9c039f1345ad991fd55fc1cf418f02914/cldfzenodo-2.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-15 12:42:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cldf",
    "github_project": "cldfzenodo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "cldfzenodo"
}

Robert Forkel