# cldfzenodo
[![Build Status](https://github.com/cldf/cldfzenodo/workflows/tests/badge.svg)](https://github.com/cldf/cldfzenodo/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/cldfzenodo.svg)](https://pypi.org/project/cldfzenodo)
`cldfzenodo` provides programmatic access to CLDF data deposited on [Zenodo](https://zenodo.org).
**NOTE:** The Zenodo upgrade from October 13, 2023 introduced quite a few changes in various parts
of the system. Thus, `cldfzenodo` before version 2.0 cannot be used anymore. `cldfzenodo` is meant
to be backwards compatible, i.e. provides the same Python API as `cldfzenodo` 1.x - but may issue
deprecation warnings.
## Install
```shell
pip install cldfzenodo
```
## `pycldf` dataset resolver
`cldfzenodo` registers (upon installation) a [`pycldf` dataset resolver](https://pycldf.readthedocs.io/en/latest/ext_discovery.html)
for dataset locators of the form `https://doi.org/10.5281/zenodo.[0-9]+` and `https://zenodo.org/record/[0-9]+`.
Thus, after installation you should be able to retrieve `pycldf.Dataset` instances running
```python
>>> from pycldf.ext.discovery import get_dataset
>>> import pathlib
>>> pathlib.Path('wacl').mkdir()
>>> ds = get_dataset('https://doi.org/10.5281/zenodo.7322688', pathlib.Path('wacl'))
>>> ds.properties['dc:title']
'World Atlas of Classifier Languages'
```
## CLI
`cldfzenodo` provides a subcommand to be run from [cldfbench](https://github.com/cldf/cldfbench).
To make use of this command, you have to install `cldfbench`, which can be done via
```shell
pip install cldfzenodo[cli]
```
Then you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.
```shell
cldfbench zenodo.download 10.5281/zenodo.4683137 --directory wals-2020.1/
```
will download WALS Online as CLDF dataset into `wals-2020.1`:
```shell
$ tree wals-2020.1/
wals-2020.1/
├── areas.csv
├── chapters.csv
├── codes.csv
├── contributors.csv
├── countries.csv
├── examples.csv
├── language_names.csv
├── languages.csv
├── parameters.csv
├── sources.bib
├── StructureDataset-metadata.json
└── values.csv
0 directories, 12 files
```
## API
Metadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via `cldfzenodo.Record`
objects. Such objects can be obtained in various ways:
- Via DOI:
```python
>>> from cldfzenodo import API
>>> rec = API.get_record(doi='10.5281/zenodo.4762034')
>>> rec.title
'glottolog/glottolog: Glottolog database 4.4 as CLDF'
```
- Via [concept DOI](https://help.zenodo.org/#versioning) and version tag:
```python
>>> from cldfzenodo import API
>>> rec = API.get_record(conceptdoi='10.5281/zenodo.3260727', version='4.5')
>>> rec.title
'glottolog/glottolog: Glottolog database 4.5 as CLDF'
```
- From deposits grouped into a Zenodo community:
```python
>>> from cldfzenodo import API
>>> for rec in API.iter_records(community='dictionaria'):
... print(rec.title)
... break
...
dictionaria/iquito: Iquito dictionary
```
- From search results using keywords:
```python
>>> from cldfzenodo import API
>>> for rec in API.iter_records(keyword='cldf:Wordlist'):
... print(rec.title)
... break
...
CLDF dataset accompanying Zariquiey et al.'s "Evolution of Body-Part Terminology in Pano" from 2022
```
`cldfzenodo.Record` objects provide sufficient metadata to allow identification and data access:
```python
>>> from cldfzenodo import API
>>> print(API.get_record(doi='10.5281/zenodo.4762034').bibtex)
@misc{zenodo-4762034,
author = {Hammarström, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},
title = {glottolog/glottolog: Glottolog database 4.4 as CLDF},
keywords = {cldf:StructureDataset, linguistics},
publisher = {Zenodo},
year = {2021},
doi = {10.5281/zenodo.4762034},
url = {https://doi.org/10.5281/zenodo.4762034},
copyright = {Creative Commons Attribution 4.0}
}
```
One can download the full deposit (and access - possible multiple - CLDF datasets):
```python
from pycldf import iter_datasets
API.get_record(doi='...').download('my_directory')
for cldf in iter_datasets('my_directory'):
pass
```
But often, only the "pure" CLDF data is of interest - and not the additional metadata and curation
context, e.g. of [cldfbench](https://github.com/cldf/cldfbench)-curated datasets. This can be done
via
```python
cldf = API.get_record(doi='...').download_dataset('my_directory')
```
Raw data
{
"_id": null,
"home_page": "https://github.com/cldf/cldfzenodo",
"name": "cldfzenodo",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "linguistics",
"author": "Robert Forkel",
"author_email": "dlce.rdm@eva.mpg.de",
"download_url": "https://files.pythonhosted.org/packages/2c/dc/6207da20bbec31fee2bd0d72e7f9c039f1345ad991fd55fc1cf418f02914/cldfzenodo-2.1.2.tar.gz",
"platform": "any",
"description": "# cldfzenodo\n\n[![Build Status](https://github.com/cldf/cldfzenodo/workflows/tests/badge.svg)](https://github.com/cldf/cldfzenodo/actions?query=workflow%3Atests)\n[![PyPI](https://img.shields.io/pypi/v/cldfzenodo.svg)](https://pypi.org/project/cldfzenodo)\n\n`cldfzenodo` provides programmatic access to CLDF data deposited on [Zenodo](https://zenodo.org).\n\n**NOTE:** The Zenodo upgrade from October 13, 2023 introduced quite a few changes in various parts\nof the system. Thus, `cldfzenodo` before version 2.0 cannot be used anymore. `cldfzenodo` is meant\nto be backwards compatible, i.e. provides the same Python API as `cldfzenodo` 1.x - but may issue\ndeprecation warnings.\n\n\n## Install\n\n```shell\npip install cldfzenodo\n```\n\n\n## `pycldf` dataset resolver\n\n`cldfzenodo` registers (upon installation) a [`pycldf` dataset resolver](https://pycldf.readthedocs.io/en/latest/ext_discovery.html)\nfor dataset locators of the form `https://doi.org/10.5281/zenodo.[0-9]+` and `https://zenodo.org/record/[0-9]+`.\nThus, after installation you should be able to retrieve `pycldf.Dataset` instances running\n\n```python\n>>> from pycldf.ext.discovery import get_dataset\n>>> import pathlib\n>>> pathlib.Path('wacl').mkdir()\n>>> ds = get_dataset('https://doi.org/10.5281/zenodo.7322688', pathlib.Path('wacl'))\n>>> ds.properties['dc:title']\n'World Atlas of Classifier Languages'\n```\n\n\n## CLI\n\n`cldfzenodo` provides a subcommand to be run from [cldfbench](https://github.com/cldf/cldfbench).\nTo make use of this command, you have to install `cldfbench`, which can be done via\n```shell\npip install cldfzenodo[cli]\n```\nThen you can download CLDF datasets from Zenodo, using the DOI for identification. E.g.\n```shell\ncldfbench zenodo.download 10.5281/zenodo.4683137 --directory wals-2020.1/\n```\nwill download WALS Online as CLDF dataset into `wals-2020.1`:\n```shell\n$ tree wals-2020.1/\nwals-2020.1/\n\u251c\u2500\u2500 areas.csv\n\u251c\u2500\u2500 chapters.csv\n\u251c\u2500\u2500 codes.csv\n\u251c\u2500\u2500 contributors.csv\n\u251c\u2500\u2500 countries.csv\n\u251c\u2500\u2500 examples.csv\n\u251c\u2500\u2500 language_names.csv\n\u251c\u2500\u2500 languages.csv\n\u251c\u2500\u2500 parameters.csv\n\u251c\u2500\u2500 sources.bib\n\u251c\u2500\u2500 StructureDataset-metadata.json\n\u2514\u2500\u2500 values.csv\n\n0 directories, 12 files\n```\n\n\n## API\n\nMetadata and data of (potential) CLDF datasets deposited on Zenodo is accessed via `cldfzenodo.Record`\nobjects. Such objects can be obtained in various ways:\n- Via DOI:\n ```python\n >>> from cldfzenodo import API\n >>> rec = API.get_record(doi='10.5281/zenodo.4762034')\n >>> rec.title\n 'glottolog/glottolog: Glottolog database 4.4 as CLDF'\n ```\n- Via [concept DOI](https://help.zenodo.org/#versioning) and version tag:\n ```python\n >>> from cldfzenodo import API\n >>> rec = API.get_record(conceptdoi='10.5281/zenodo.3260727', version='4.5')\n >>> rec.title\n 'glottolog/glottolog: Glottolog database 4.5 as CLDF'\n ```\n- From deposits grouped into a Zenodo community:\n ```python\n >>> from cldfzenodo import API\n >>> for rec in API.iter_records(community='dictionaria'):\n ... print(rec.title)\n ... break\n ... \n dictionaria/iquito: Iquito dictionary\n ```\n- From search results using keywords:\n ```python\n >>> from cldfzenodo import API\n >>> for rec in API.iter_records(keyword='cldf:Wordlist'):\n ... print(rec.title)\n ... break\n ... \n CLDF dataset accompanying Zariquiey et al.'s \"Evolution of Body-Part Terminology in Pano\" from 2022\n ```\n\n`cldfzenodo.Record` objects provide sufficient metadata to allow identification and data access:\n```python\n>>> from cldfzenodo import API\n>>> print(API.get_record(doi='10.5281/zenodo.4762034').bibtex)\n@misc{zenodo-4762034,\n author = {Hammarstr\u00f6m, Harald and Forkel, Robert and Haspelmath, Martin and Bank, Sebastian},\n title = {glottolog/glottolog: Glottolog database 4.4 as CLDF},\n keywords = {cldf:StructureDataset, linguistics},\n publisher = {Zenodo},\n year = {2021},\n doi = {10.5281/zenodo.4762034},\n url = {https://doi.org/10.5281/zenodo.4762034},\n copyright = {Creative Commons Attribution 4.0}\n}\n```\n\nOne can download the full deposit (and access - possible multiple - CLDF datasets):\n```python\nfrom pycldf import iter_datasets\n\nAPI.get_record(doi='...').download('my_directory')\nfor cldf in iter_datasets('my_directory'):\n pass\n```\n\nBut often, only the \"pure\" CLDF data is of interest - and not the additional metadata and curation\ncontext, e.g. of [cldfbench](https://github.com/cldf/cldfbench)-curated datasets. This can be done\nvia\n```python\ncldf = API.get_record(doi='...').download_dataset('my_directory')\n```\n\n\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Functionality to retrieve CLDF datasets deposited on Zenodo",
"version": "2.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/cldf/cldfzenodo/issues",
"Homepage": "https://github.com/cldf/cldfzenodo"
},
"split_keywords": [
"linguistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "855c7f9325ebfc3c817cbf5a5842b5502975ae5815eab19ee38496f0a1611e2f",
"md5": "9a8440c77310c584d18522a5a4cbcae2",
"sha256": "fd2f13130522af7b9347529dee47b6dc8b3b03c5c62746557b992f4bf257e04f"
},
"downloads": -1,
"filename": "cldfzenodo-2.1.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "9a8440c77310c584d18522a5a4cbcae2",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 16899,
"upload_time": "2024-10-15T12:42:03",
"upload_time_iso_8601": "2024-10-15T12:42:03.108052Z",
"url": "https://files.pythonhosted.org/packages/85/5c/7f9325ebfc3c817cbf5a5842b5502975ae5815eab19ee38496f0a1611e2f/cldfzenodo-2.1.2-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2cdc6207da20bbec31fee2bd0d72e7f9c039f1345ad991fd55fc1cf418f02914",
"md5": "c561c06a8b346b9f116cddb53fb85381",
"sha256": "479fdb8728a28b70fabd4d9be0d7436c34fc4d11a2afb4b68b5242c26faf2596"
},
"downloads": -1,
"filename": "cldfzenodo-2.1.2.tar.gz",
"has_sig": false,
"md5_digest": "c561c06a8b346b9f116cddb53fb85381",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 20562,
"upload_time": "2024-10-15T12:42:04",
"upload_time_iso_8601": "2024-10-15T12:42:04.645313Z",
"url": "https://files.pythonhosted.org/packages/2c/dc/6207da20bbec31fee2bd0d72e7f9c039f1345ad991fd55fc1cf418f02914/cldfzenodo-2.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-15 12:42:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cldf",
"github_project": "cldfzenodo",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "cldfzenodo"
}