# CL ToolKit
[![Build Status](https://github.com/cldf/cltoolkit/workflows/tests/badge.svg)](https://github.com/cldf/cltoolkit/actions?query=workflow%3Atests)
[![Documentation Status](https://readthedocs.org/projects/cltoolkit/badge/?version=latest)](https://cltoolkit.readthedocs.io/en/latest/?badge=latest)
[![PyPI](https://img.shields.io/pypi/v/cltoolkit.svg)](https://pypi.org/project/cltoolkit)
A Python Library for the Processing of Cross-Linguistic Data.
By Johann-Mattis List and Robert Forkel.
## Overview
While [pycldf](https://github.com/cldf/pycldf) provides a basic Python API to access cross-linguistic data
encoded in [CLDF](https://cldf.clld.org) datasets,
`cltoolkit` goes one step further, turning the data into full-fledged Python objects rather than
shallow proxies for rows in a CSV file. Of course, as with `pycldf`'s ORM package, there's a trade-off
involved, gaining convenient access and a more pythonic API at the expense of performance (in particular
memory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations
of these) will be processable with `cltoolkit` on reasonable hardware in minutes - rather than hours.
The main idea behind `cltoolkit` is making (aggregated) CLDF data easily amenable for computation
of *linguistic features* in a general sense (e.g. typological features, etc.). This is done by
- providing the data for processing code [as Python objects](https://cltoolkit.readthedocs.io/en/latest/models.html),
- providing [a framework](https://cltoolkit.readthedocs.io/en/latest/features.html) that makes feature computation
as simple as writing a Python function acting on a `cltoolkit.models.Language` object.
In general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could
compare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference
properties to link to reference catalogs, i.e.
- [link language varieties](https://cldf.clld.org/v1.0/terms.rdf#glottocode) to [Glottolog](https://glottolog.org) languoids,
- [link senses](https://cldf.clld.org/v1.0/terms.rdf#concepticonReference) to [Concepticon concept sets](https://concepticon.clld.org/parameters),
- [link sound segments](https://cldf.clld.org/v1.0/terms.rdf#cltsReference) to [CLTS sounds](https://clts.clld.org/parameters).
`cltoolkit` objects exploit this extended comparability by distinguishing "senses" and "concepts" and "graphemes"
and "sounds" and providing convenient access to comparable subsets of objects in an aggregation
(see [models.py](src/cltoolkit/models.py)).
See [example.md](example.md) for a walk-through of the typical workflow with `cltoolkit`.
Raw data
{
"_id": null,
"home_page": "https://github.com/cldf/cltoolkit",
"name": "cltoolkit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "linguistics",
"author": "Johann-Mattis List, Robert Forkel and Frederic Blum",
"author_email": "robert_forkel@eva.mpg.de",
"download_url": "https://files.pythonhosted.org/packages/9c/c1/af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630/cltoolkit-0.2.0.tar.gz",
"platform": "any",
"description": "# CL ToolKit\n\n[![Build Status](https://github.com/cldf/cltoolkit/workflows/tests/badge.svg)](https://github.com/cldf/cltoolkit/actions?query=workflow%3Atests)\n[![Documentation Status](https://readthedocs.org/projects/cltoolkit/badge/?version=latest)](https://cltoolkit.readthedocs.io/en/latest/?badge=latest)\n[![PyPI](https://img.shields.io/pypi/v/cltoolkit.svg)](https://pypi.org/project/cltoolkit)\n\nA Python Library for the Processing of Cross-Linguistic Data.\n\nBy Johann-Mattis List and Robert Forkel.\n\n## Overview\n\nWhile [pycldf](https://github.com/cldf/pycldf) provides a basic Python API to access cross-linguistic data \nencoded in [CLDF](https://cldf.clld.org) datasets,\n`cltoolkit` goes one step further, turning the data into full-fledged Python objects rather than\nshallow proxies for rows in a CSV file. Of course, as with `pycldf`'s ORM package, there's a trade-off\ninvolved, gaining convenient access and a more pythonic API at the expense of performance (in particular \nmemory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations \nof these) will be processable with `cltoolkit` on reasonable hardware in minutes - rather than hours.\n\nThe main idea behind `cltoolkit` is making (aggregated) CLDF data easily amenable for computation\nof *linguistic features* in a general sense (e.g. typological features, etc.). This is done by\n- providing the data for processing code [as Python objects](https://cltoolkit.readthedocs.io/en/latest/models.html),\n- providing [a framework](https://cltoolkit.readthedocs.io/en/latest/features.html) that makes feature computation \n as simple as writing a Python function acting on a `cltoolkit.models.Language` object.\n\nIn general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could\ncompare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference\nproperties to link to reference catalogs, i.e.\n- [link language varieties](https://cldf.clld.org/v1.0/terms.rdf#glottocode) to [Glottolog](https://glottolog.org) languoids,\n- [link senses](https://cldf.clld.org/v1.0/terms.rdf#concepticonReference) to [Concepticon concept sets](https://concepticon.clld.org/parameters),\n- [link sound segments](https://cldf.clld.org/v1.0/terms.rdf#cltsReference) to [CLTS sounds](https://clts.clld.org/parameters).\n\n`cltoolkit` objects exploit this extended comparability by distinguishing \"senses\" and \"concepts\" and \"graphemes\"\nand \"sounds\" and providing convenient access to comparable subsets of objects in an aggregation \n(see [models.py](src/cltoolkit/models.py)).\n\nSee [example.md](example.md) for a walk-through of the typical workflow with `cltoolkit`.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python Library for the Processing of Cross-Linguistic Data",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/cldf/cltoolkit/issues",
"Homepage": "https://github.com/cldf/cltoolkit"
},
"split_keywords": [
"linguistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5ee167e9c8b3b4bf45c74f8ca41a1d67d5e270f4389558fb6aca518fd525914b",
"md5": "327b2127d9a5f91643f7629e19751dfb",
"sha256": "36447e5dbf1bd6ffbce8ce71e1e24be46e7a2619e8ce6fda433fb01c15fad68c"
},
"downloads": -1,
"filename": "cltoolkit-0.2.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "327b2127d9a5f91643f7629e19751dfb",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 25935,
"upload_time": "2024-11-28T13:40:26",
"upload_time_iso_8601": "2024-11-28T13:40:26.783371Z",
"url": "https://files.pythonhosted.org/packages/5e/e1/67e9c8b3b4bf45c74f8ca41a1d67d5e270f4389558fb6aca518fd525914b/cltoolkit-0.2.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9cc1af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630",
"md5": "ae8d121527e8df2770ed0c1a50ae36f5",
"sha256": "29c6cc1be983ee52959d4e97379c6608d0f94d7f89f3db8d2b39a295d516f79f"
},
"downloads": -1,
"filename": "cltoolkit-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "ae8d121527e8df2770ed0c1a50ae36f5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 30527,
"upload_time": "2024-11-28T13:40:28",
"upload_time_iso_8601": "2024-11-28T13:40:28.064717Z",
"url": "https://files.pythonhosted.org/packages/9c/c1/af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630/cltoolkit-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-28 13:40:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cldf",
"github_project": "cltoolkit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cltoolkit"
}