cltoolkit


Namecltoolkit JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/cldf/cltoolkit
SummaryA Python Library for the Processing of Cross-Linguistic Data
upload_time2024-11-28 13:40:28
maintainerNone
docs_urlNone
authorJohann-Mattis List, Robert Forkel and Frederic Blum
requires_python>=3.8
licenseMIT
keywords linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CL ToolKit

[![Build Status](https://github.com/cldf/cltoolkit/workflows/tests/badge.svg)](https://github.com/cldf/cltoolkit/actions?query=workflow%3Atests)
[![Documentation Status](https://readthedocs.org/projects/cltoolkit/badge/?version=latest)](https://cltoolkit.readthedocs.io/en/latest/?badge=latest)
[![PyPI](https://img.shields.io/pypi/v/cltoolkit.svg)](https://pypi.org/project/cltoolkit)

A Python Library for the Processing of Cross-Linguistic Data.

By Johann-Mattis List and Robert Forkel.

## Overview

While [pycldf](https://github.com/cldf/pycldf) provides a basic Python API to access cross-linguistic data 
encoded in [CLDF](https://cldf.clld.org) datasets,
`cltoolkit` goes one step further, turning the data into full-fledged Python objects rather than
shallow proxies for rows in a CSV file. Of course, as with `pycldf`'s ORM package, there's a trade-off
involved, gaining convenient access and a more pythonic API at the expense of performance (in particular 
memory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations 
of these) will be processable with `cltoolkit` on reasonable hardware in minutes - rather than hours.

The main idea behind `cltoolkit` is making (aggregated) CLDF data easily amenable for computation
of *linguistic features* in a general sense (e.g. typological features, etc.). This is done by
- providing the data for processing code [as Python objects](https://cltoolkit.readthedocs.io/en/latest/models.html),
- providing [a framework](https://cltoolkit.readthedocs.io/en/latest/features.html) that makes feature computation 
  as simple as writing a Python function acting on a `cltoolkit.models.Language` object.

In general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could
compare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference
properties to link to reference catalogs, i.e.
- [link language varieties](https://cldf.clld.org/v1.0/terms.rdf#glottocode) to [Glottolog](https://glottolog.org) languoids,
- [link senses](https://cldf.clld.org/v1.0/terms.rdf#concepticonReference) to [Concepticon concept sets](https://concepticon.clld.org/parameters),
- [link sound segments](https://cldf.clld.org/v1.0/terms.rdf#cltsReference) to [CLTS sounds](https://clts.clld.org/parameters).

`cltoolkit` objects exploit this extended comparability by distinguishing "senses" and "concepts" and "graphemes"
and "sounds" and providing convenient access to comparable subsets of objects in an aggregation 
(see [models.py](src/cltoolkit/models.py)).

See [example.md](example.md) for a walk-through of the typical workflow with `cltoolkit`.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cldf/cltoolkit",
    "name": "cltoolkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "linguistics",
    "author": "Johann-Mattis List, Robert Forkel and Frederic Blum",
    "author_email": "robert_forkel@eva.mpg.de",
    "download_url": "https://files.pythonhosted.org/packages/9c/c1/af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630/cltoolkit-0.2.0.tar.gz",
    "platform": "any",
    "description": "# CL ToolKit\n\n[![Build Status](https://github.com/cldf/cltoolkit/workflows/tests/badge.svg)](https://github.com/cldf/cltoolkit/actions?query=workflow%3Atests)\n[![Documentation Status](https://readthedocs.org/projects/cltoolkit/badge/?version=latest)](https://cltoolkit.readthedocs.io/en/latest/?badge=latest)\n[![PyPI](https://img.shields.io/pypi/v/cltoolkit.svg)](https://pypi.org/project/cltoolkit)\n\nA Python Library for the Processing of Cross-Linguistic Data.\n\nBy Johann-Mattis List and Robert Forkel.\n\n## Overview\n\nWhile [pycldf](https://github.com/cldf/pycldf) provides a basic Python API to access cross-linguistic data \nencoded in [CLDF](https://cldf.clld.org) datasets,\n`cltoolkit` goes one step further, turning the data into full-fledged Python objects rather than\nshallow proxies for rows in a CSV file. Of course, as with `pycldf`'s ORM package, there's a trade-off\ninvolved, gaining convenient access and a more pythonic API at the expense of performance (in particular \nmemory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations \nof these) will be processable with `cltoolkit` on reasonable hardware in minutes - rather than hours.\n\nThe main idea behind `cltoolkit` is making (aggregated) CLDF data easily amenable for computation\nof *linguistic features* in a general sense (e.g. typological features, etc.). This is done by\n- providing the data for processing code [as Python objects](https://cltoolkit.readthedocs.io/en/latest/models.html),\n- providing [a framework](https://cltoolkit.readthedocs.io/en/latest/features.html) that makes feature computation \n  as simple as writing a Python function acting on a `cltoolkit.models.Language` object.\n\nIn general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could\ncompare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference\nproperties to link to reference catalogs, i.e.\n- [link language varieties](https://cldf.clld.org/v1.0/terms.rdf#glottocode) to [Glottolog](https://glottolog.org) languoids,\n- [link senses](https://cldf.clld.org/v1.0/terms.rdf#concepticonReference) to [Concepticon concept sets](https://concepticon.clld.org/parameters),\n- [link sound segments](https://cldf.clld.org/v1.0/terms.rdf#cltsReference) to [CLTS sounds](https://clts.clld.org/parameters).\n\n`cltoolkit` objects exploit this extended comparability by distinguishing \"senses\" and \"concepts\" and \"graphemes\"\nand \"sounds\" and providing convenient access to comparable subsets of objects in an aggregation \n(see [models.py](src/cltoolkit/models.py)).\n\nSee [example.md](example.md) for a walk-through of the typical workflow with `cltoolkit`.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python Library for the Processing of Cross-Linguistic Data",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/cldf/cltoolkit/issues",
        "Homepage": "https://github.com/cldf/cltoolkit"
    },
    "split_keywords": [
        "linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5ee167e9c8b3b4bf45c74f8ca41a1d67d5e270f4389558fb6aca518fd525914b",
                "md5": "327b2127d9a5f91643f7629e19751dfb",
                "sha256": "36447e5dbf1bd6ffbce8ce71e1e24be46e7a2619e8ce6fda433fb01c15fad68c"
            },
            "downloads": -1,
            "filename": "cltoolkit-0.2.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "327b2127d9a5f91643f7629e19751dfb",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 25935,
            "upload_time": "2024-11-28T13:40:26",
            "upload_time_iso_8601": "2024-11-28T13:40:26.783371Z",
            "url": "https://files.pythonhosted.org/packages/5e/e1/67e9c8b3b4bf45c74f8ca41a1d67d5e270f4389558fb6aca518fd525914b/cltoolkit-0.2.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9cc1af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630",
                "md5": "ae8d121527e8df2770ed0c1a50ae36f5",
                "sha256": "29c6cc1be983ee52959d4e97379c6608d0f94d7f89f3db8d2b39a295d516f79f"
            },
            "downloads": -1,
            "filename": "cltoolkit-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ae8d121527e8df2770ed0c1a50ae36f5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 30527,
            "upload_time": "2024-11-28T13:40:28",
            "upload_time_iso_8601": "2024-11-28T13:40:28.064717Z",
            "url": "https://files.pythonhosted.org/packages/9c/c1/af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630/cltoolkit-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-28 13:40:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cldf",
    "github_project": "cltoolkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cltoolkit"
}
        
Elapsed time: 0.49868s