# pyglottolog
Programmatic access to [Glottolog data](https://github.com/glottolog/glottolog).
[![Build Status](https://github.com/glottolog/pyglottolog/workflows/tests/badge.svg?branch=master)](https://github.com/glottolog/pyglottolog/actions?query=workflow%3Atests+branch%3Amaster)
[![Documentation Status](https://readthedocs.org/projects/pyglottolog/badge/?version=latest)](https://pyglottolog.readthedocs.io/en/latest/?badge=latest)
[![PyPI](https://img.shields.io/pypi/v/pyglottolog.svg)](https://pypi.org/project/pyglottolog)
> [!NOTE]
> Accessing Glottolog data programmatically has become a lot easier with the
> [Glottolog CLDF dataset](https://github.com/glottolog/glottolog-cldf). Thus, `pyglottolog` now
> mostly serves as internal data curation tool.
## Install
To install `pyglottolog` you need a python installation on your system, running python >3.8. Run
```shell script
pip install pyglottolog
```
This will also install the command line interface `glottolog`.
**Note:** To make use of `pyglottolog` you also need a local copy of the
[Glottolog data](https://github.com/glottolog/glottolog). This can be
- a clone of the [glottolog/glottolog](https://github.com/glottolog/glottolog) repository or your fork of it,
- an unzipped [released version of Glottolog](https://github.com/glottolog/glottolog/releases) from GitHub,
- or an unzipped download of a [released version of Glottolog](https://doi.org/10.5281/zenodo.596479) from ZENODO.
Make sure you remember where this local copy of the data is located - you may
have to pass this location as option when using `pyglottolog`.
A convenient way to clone the data repository, keep it updated and access it
from `pyglottolog` is provided
by [`cldfbench`](https://pypi.org/project/cldfbench). See the [`README`](https://github.com/cldf/cldfbench#catalogs) for details.
## Python API
Using `pyglottolog`, Glottolog data can be accessed programmatically from within python programs.
All functionality is mediated through an instance of `pyglottolog.Glottolog`, e.g.
```python
>>> from pyglottolog import Glottolog
>>> glottolog = Glottolog('.')
>>> print(glottolog)
<Glottolog repos v0.2-259-g27ac0ef at /.../glottolog>
```
For details, refer to the [API documentation at readthedocs](https://pyglottolog.readthedocs.io/en/latest/index.html).
## Command line interface
Command line functionality is implemented via sub-commands of `glottolog`. The list of
available sub-commands can be inspected running
```shell script
$ glottolog -h
usage: glottolog [-h] [--log-level LOG_LEVEL] [--repos REPOS]
[--repos-version REPOS_VERSION]
COMMAND ...
optional arguments:
-h, --help show this help message and exit
--log-level LOG_LEVEL
log level [ERROR|WARN|INFO|DEBUG] (default: 20)
--repos REPOS clone of glottolog/glottolog
--repos-version REPOS_VERSION
version of repository data. Requires a git clone!
(default: None)
available commands:
Run "COMAMND -h" to get help for a specific command.
COMMAND
create Create a new languoid directory for a languoid
specified by name and level.
edit Open a languoid's INI file in a text editor.
htmlmap Create an HTML/Javascript map (using leaflet) of
Glottolog languoids.
iso2codes Map ISO codes to the list of all Glottolog languages
and dialects subsumed "under" it.
langdatastats List all metadata fields used in languoid INI files
and their frequency.
langsearch Search Glottolog languoids.
languoids Write languoids data to csv files
refsearch Search Glottolog references
searchindex Index
show Display details of a Glottolog object.
tree Print the classification tree starting at a specific
languoid.
```
### Extracting languoid data
Glottolog data is often integrated with other data or incorporated as reference
data in tools, e.g. as [LanguageTable](https://github.com/cldf/cldf/tree/master/components/languages)
in a [CLDF](https://cldf.clld.org) dataset.
To do this, the LanguageTable from [glottolog/glottolog-cldf](https://github.com/glottolog/glottolog-cldf)
could be copied, or one may use `glottolog`'s `languoids` subcommand, which
dumps basic languoid data into a CSVW file with accompanying metadata:
```shell script
glottolog languoids [--output=OUTDIR] [--version=VERSION]
```
This will create a CSVW package, i.e.
- a CSV table `glottolog-languoids-VERSION.csv`
- and a JSON description `glottolog-languoids-VERSION.csv-metadata.json`
where `VERSION` is the result of running `git describe` on the data repository,
or the version string passed as`--version=VERSION` in case you are running the command
on an export of the repository or a download from ZENODO.
### Languoid search
To allow convenient search across all languoid info files, `pyglottolog` comes with functionality
to create and search a [Whoosh](https://whoosh.readthedocs.io/en/latest/intro.html) index. To do
so, run
```shell script
glottolog searchindex
```
This will take a couple of minutes (~15 on a somewhat beefy laptop with SSD) and build an index of
about 800 MB size at `build/`.
Now you can search the index, e.g. using alternative names as query:
```shell
$ glottolog langsearch "Abipónok"
1 matches
Abipon [abip1241] language
languoids/tree/guai1249/guai1250/abip1241/md.ini
Abipónok [hu]
1 matches
```
But you can also exploit the schema defined in
[pyglottolog.fts.get_langs_index](https://github.com/glottolog/pyglottolog/blob/c382b849b5245acba78d8022aadd4de83e73e909/src/pyglottolog/fts.py#L41-L52);
i.e. use fields in [your query](https://whoosh.readthedocs.io/en/latest/querylang.html):
```shell
$ glottolog langsearch "country:PG"
...
Alamblak [alam1246] language
languoids/tree/sepi1257/sepi1258/east2496/alam1246/md.ini
Papua New Guinea (PG)
906 matches
$ glottolog --repos=. langsearch "iso:mal"
...
Malayalam [mala1464] language
languoids/tree/drav1251/sout3133/sout3138/tami1291/tami1292/tami1293/tami1294/tami1297/tami1298/mala1541/mala1464/md.ini
1 matches
```
### Reference search
The same can be done for reference data: To create a Whoosh index with all reference data, run
```shell script
glottolog searchindex
```
Now you can query the index (using the fields described in
[the schema](https://github.com/glottolog/pyglottolog/blob/c382b849b5245acba78d8022aadd4de83e73e909/src/pyglottolog/fts.py#L118-L128)):
```shell
$ glottolog refsearch "author:Haspelmath AND title:Atlas"
...
(13 matches)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/glottolog/pyglottolog",
"name": "pyglottolog",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "linguistics",
"author": "Robert Forkel",
"author_email": "robert_forkel@eva.mpg.de",
"download_url": "https://files.pythonhosted.org/packages/ba/16/f69a74565759c4eee383d51765734311cbb4c50cfbf08ceec944f3113c8c/pyglottolog-3.14.0.tar.gz",
"platform": "any",
"description": "# pyglottolog\n\nProgrammatic access to [Glottolog data](https://github.com/glottolog/glottolog).\n\n[![Build Status](https://github.com/glottolog/pyglottolog/workflows/tests/badge.svg?branch=master)](https://github.com/glottolog/pyglottolog/actions?query=workflow%3Atests+branch%3Amaster)\n[![Documentation Status](https://readthedocs.org/projects/pyglottolog/badge/?version=latest)](https://pyglottolog.readthedocs.io/en/latest/?badge=latest)\n[![PyPI](https://img.shields.io/pypi/v/pyglottolog.svg)](https://pypi.org/project/pyglottolog)\n\n> [!NOTE] \n> Accessing Glottolog data programmatically has become a lot easier with the\n> [Glottolog CLDF dataset](https://github.com/glottolog/glottolog-cldf). Thus, `pyglottolog` now\n> mostly serves as internal data curation tool.\n\n\n## Install\n\nTo install `pyglottolog` you need a python installation on your system, running python >3.8. Run\n```shell script\npip install pyglottolog\n```\n\nThis will also install the command line interface `glottolog`.\n\n**Note:** To make use of `pyglottolog` you also need a local copy of the\n[Glottolog data](https://github.com/glottolog/glottolog). This can be\n- a clone of the [glottolog/glottolog](https://github.com/glottolog/glottolog) repository or your fork of it,\n- an unzipped [released version of Glottolog](https://github.com/glottolog/glottolog/releases) from GitHub,\n- or an unzipped download of a [released version of Glottolog](https://doi.org/10.5281/zenodo.596479) from ZENODO.\n\nMake sure you remember where this local copy of the data is located - you may\nhave to pass this location as option when using `pyglottolog`.\n\nA convenient way to clone the data repository, keep it updated and access it\nfrom `pyglottolog` is provided\nby [`cldfbench`](https://pypi.org/project/cldfbench). See the [`README`](https://github.com/cldf/cldfbench#catalogs) for details.\n\n\n## Python API\n\nUsing `pyglottolog`, Glottolog data can be accessed programmatically from within python programs.\nAll functionality is mediated through an instance of `pyglottolog.Glottolog`, e.g.\n```python\n>>> from pyglottolog import Glottolog\n>>> glottolog = Glottolog('.')\n>>> print(glottolog)\n<Glottolog repos v0.2-259-g27ac0ef at /.../glottolog>\n```\n\nFor details, refer to the [API documentation at readthedocs](https://pyglottolog.readthedocs.io/en/latest/index.html).\n\n\n## Command line interface\n\nCommand line functionality is implemented via sub-commands of `glottolog`. The list of\navailable sub-commands can be inspected running\n```shell script\n$ glottolog -h\nusage: glottolog [-h] [--log-level LOG_LEVEL] [--repos REPOS]\n [--repos-version REPOS_VERSION]\n COMMAND ...\n\noptional arguments:\n -h, --help show this help message and exit\n --log-level LOG_LEVEL\n log level [ERROR|WARN|INFO|DEBUG] (default: 20)\n --repos REPOS clone of glottolog/glottolog\n --repos-version REPOS_VERSION\n version of repository data. Requires a git clone!\n (default: None)\n\navailable commands:\n Run \"COMAMND -h\" to get help for a specific command.\n\n COMMAND\n create Create a new languoid directory for a languoid\n specified by name and level.\n edit Open a languoid's INI file in a text editor.\n htmlmap Create an HTML/Javascript map (using leaflet) of\n Glottolog languoids.\n iso2codes Map ISO codes to the list of all Glottolog languages\n and dialects subsumed \"under\" it.\n langdatastats List all metadata fields used in languoid INI files\n and their frequency.\n langsearch Search Glottolog languoids.\n languoids Write languoids data to csv files\n refsearch Search Glottolog references\n searchindex Index\n show Display details of a Glottolog object.\n tree Print the classification tree starting at a specific\n languoid.\n```\n\n\n### Extracting languoid data\n\nGlottolog data is often integrated with other data or incorporated as reference\ndata in tools, e.g. as [LanguageTable](https://github.com/cldf/cldf/tree/master/components/languages)\nin a [CLDF](https://cldf.clld.org) dataset.\n\nTo do this, the LanguageTable from [glottolog/glottolog-cldf](https://github.com/glottolog/glottolog-cldf)\ncould be copied, or one may use `glottolog`'s `languoids` subcommand, which\ndumps basic languoid data into a CSVW file with accompanying metadata:\n\n```shell script\nglottolog languoids [--output=OUTDIR] [--version=VERSION]\n```\n\nThis will create a CSVW package, i.e. \n- a CSV table `glottolog-languoids-VERSION.csv`\n- and a JSON description `glottolog-languoids-VERSION.csv-metadata.json`\n\nwhere `VERSION` is the result of running `git describe` on the data repository,\nor the version string passed as`--version=VERSION` in case you are running the command\non an export of the repository or a download from ZENODO.\n\n\n### Languoid search\n\nTo allow convenient search across all languoid info files, `pyglottolog` comes with functionality\nto create and search a [Whoosh](https://whoosh.readthedocs.io/en/latest/intro.html) index. To do\nso, run\n```shell script\nglottolog searchindex\n```\n\nThis will take a couple of minutes (~15 on a somewhat beefy laptop with SSD) and build an index of \nabout 800 MB size at `build/`.\n\nNow you can search the index, e.g. using alternative names as query:\n```shell\n$ glottolog langsearch \"Abip\u00f3nok\"\n1 matches\nAbipon [abip1241] language\nlanguoids/tree/guai1249/guai1250/abip1241/md.ini\nAbip\u00f3nok [hu]\n\n1 matches\n```\n\nBut you can also exploit the schema defined in \n[pyglottolog.fts.get_langs_index](https://github.com/glottolog/pyglottolog/blob/c382b849b5245acba78d8022aadd4de83e73e909/src/pyglottolog/fts.py#L41-L52);\ni.e. use fields in [your query](https://whoosh.readthedocs.io/en/latest/querylang.html):\n```shell\n$ glottolog langsearch \"country:PG\"\n...\n\nAlamblak [alam1246] language\nlanguoids/tree/sepi1257/sepi1258/east2496/alam1246/md.ini\nPapua New Guinea (PG)\n\n906 matches\n\n$ glottolog --repos=. langsearch \"iso:mal\"\n...\n\nMalayalam [mala1464] language\nlanguoids/tree/drav1251/sout3133/sout3138/tami1291/tami1292/tami1293/tami1294/tami1297/tami1298/mala1541/mala1464/md.ini\n\n1 matches\n```\n\n\n### Reference search\n\nThe same can be done for reference data: To create a Whoosh index with all reference data, run\n```shell script\nglottolog searchindex\n```\n\nNow you can query the index (using the fields described in\n[the schema](https://github.com/glottolog/pyglottolog/blob/c382b849b5245acba78d8022aadd4de83e73e909/src/pyglottolog/fts.py#L118-L128)):\n```shell\n$ glottolog refsearch \"author:Haspelmath AND title:Atlas\"\n...\n(13 matches)\n```\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "python package for glottolog data curation",
"version": "3.14.0",
"project_urls": {
"Bug Tracker": "https://github.com/glottolog/pyglottolog/issues",
"Homepage": "https://github.com/glottolog/pyglottolog"
},
"split_keywords": [
"linguistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "44487fe33853aa111d85b57158257c5506f64f4c818dfb1d9509c26b5a95d527",
"md5": "2db4c2534bd2e0099503ef59cf6265bd",
"sha256": "bd8f4a43261b141082ee45cfe936378156965513f6f44c3b88bf0d1485ab9d04"
},
"downloads": -1,
"filename": "pyglottolog-3.14.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "2db4c2534bd2e0099503ef59cf6265bd",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 6685182,
"upload_time": "2024-10-25T06:30:56",
"upload_time_iso_8601": "2024-10-25T06:30:56.581095Z",
"url": "https://files.pythonhosted.org/packages/44/48/7fe33853aa111d85b57158257c5506f64f4c818dfb1d9509c26b5a95d527/pyglottolog-3.14.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ba16f69a74565759c4eee383d51765734311cbb4c50cfbf08ceec944f3113c8c",
"md5": "61073c21569f52dba7b2d97cf84d5659",
"sha256": "f90fa6e9145c7f8ea91bc91a8105e72c9e8d73b55ee55550e30b2e97cd4f9474"
},
"downloads": -1,
"filename": "pyglottolog-3.14.0.tar.gz",
"has_sig": false,
"md5_digest": "61073c21569f52dba7b2d97cf84d5659",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 6684048,
"upload_time": "2024-10-25T06:31:34",
"upload_time_iso_8601": "2024-10-25T06:31:34.718771Z",
"url": "https://files.pythonhosted.org/packages/ba/16/f69a74565759c4eee383d51765734311cbb4c50cfbf08ceec944f3113c8c/pyglottolog-3.14.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-25 06:31:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "glottolog",
"github_project": "pyglottolog",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "pyglottolog"
}