Name | cldflex JSON |
Version |
0.1.1
JSON |
| download |
home_page | https://fl.mt/cldflex |
Summary | Convert FLEx data to CLDF-ready CSV. |
upload_time | 2023-11-07 00:56:22 |
maintainer | |
docs_url | None |
author | Florian Matter |
requires_python | >=3.8.1,<4.0.0 |
license | Apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# cldflex
Convert FLEx data to CLDF-ready CSV.
[![Versions](https://img.shields.io/pypi/pyversions/clld_morphology_plugin)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/clld_morphology_plugin.svg)](https://pypi.org/project/clld_morphology_plugin)
[![License](https://img.shields.io/github/license/fmatter/cldflex)](https://www.apache.org/licenses/LICENSE-2.0)
Many descriptive linguists have annotated language data in a FLEx ([SIL's Fieldworks Lexical Explorer](https://software.sil.org/fieldworks/)) database, which provides perhaps the most popular and accessible assisted segmentation and annotation workflow.
However, a reasonably complete data export is only available in XML, which is not human-friendly, and is not readily converted to other data.
A data format growing in popularity is the [CLDF standard](https://cldf.clld.org/), a table-based approach with human-readable datasets, designed to be used in [CLLD](https://clld.org/) apps and easily processable by any software that can read [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) files, including [R](https://www.r-project.org/), [pandas](https://pandas.pydata.org/) or spreadsheet applications.
The goal of ``cldflex`` is to convert lexicon and corpus data stored in FLEx to CSV tables, primarily for use in CLDF datasets.
## Installation
`cldflex` is available on [PyPI](https://pypi.org/project/cldflex):
```shell
pip install cldflex
```
## Command line usage
At the moment, there are three commands: ``cldflex corpus`` for `.flextext` files; ``cldflex dictionary`` and `cldflex wordlist` for `.lift` files.
All commands create a number of CSV files.
One can either use [cldfbench](https://github.com/cldf/cldfbench) to create one's own CLDF datasets from these files, or add the `--cldf` argument to create a simple CLDF dataset.
Project-specific [configuration](#configuration) can be passed by `--conf your/config.yaml`, or creating a file `cldflex.yaml`
### `corpus`
Basic usage:
```shell
cldflex corpus texts.flextext
```
Connect the corpus with the lexicon:
```shell
cldflex corpus texts.flextext --lexicon lexicon.lift
```
Create a CLDF dataset:
```shell
cldflex corpus texts.flextext --lexicon lexicon.lift --cldf
```
### `dictionary`
Extract morphemes, morphs, and entries from `lexicon.lift`:
```shell
cldflex dictionary lexicon.lift
```
Create a CLDF dataset with a [`Dictionary`](https://github.com/cldf/cldf/tree/master/modules/Dictionary) module:
```shell
cldflex dictionary lexicon.lift --cldf
```
### `wordlist`
Create a CLDF dataset with a [`Wordlist`](https://github.com/cldf/cldf/tree/master/modules/Wordlist) module:
```shell
cldflex wordlist lexicon.lift --cldf
```
## API usage
The functions corresponding to the commands above are [`cldflex.corpus.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/corpus.py#L445) and [`cldflex.lift2csv.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/lift2csv.py#L130).
## Configuration
There is no default configuration.
Rather, `cldflex` will guess values for most of the parameters below and tell you what it's doing.
It is suggested to start out configuration-free until something goes wrong or you want to change something.
Create a [YAML](https://yaml.org/) file for CLI usage, pass a dict to the `convert` methods.
* `obj_lg`: the object language
* `gloss_lg`: the language used for glossing / translation
* `msa_lg`: the language used for storing POS information
* `lang_id`: the value to be used in the created tables
* `glottocode`: used to look up language metadata from glottolog
* `csv_cell_separator`: if there are multiple values in a cell (allomorphs, polysemy...), they are by default separated by `"; "`
* `form_slices`: set to `false` if you don't want form slices connecting morphs and word forms
* `mappings`: a dictionary specifying name changes of columns in the created CSV files
Raw data
{
"_id": null,
"home_page": "https://fl.mt/cldflex",
"name": "cldflex",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0.0",
"maintainer_email": "",
"keywords": "",
"author": "Florian Matter",
"author_email": "flmt@mailbox.org",
"download_url": "https://files.pythonhosted.org/packages/fb/71/ff444f58d4d9491396817ae180ad68cb3ee9bc95984859fa110bd7be1402/cldflex-0.1.1.tar.gz",
"platform": null,
"description": "# cldflex\n\nConvert FLEx data to CLDF-ready CSV.\n\n[![Versions](https://img.shields.io/pypi/pyversions/clld_morphology_plugin)](https://www.python.org/)\n[![PyPI](https://img.shields.io/pypi/v/clld_morphology_plugin.svg)](https://pypi.org/project/clld_morphology_plugin)\n[![License](https://img.shields.io/github/license/fmatter/cldflex)](https://www.apache.org/licenses/LICENSE-2.0)\n\n\nMany descriptive linguists have annotated language data in a FLEx ([SIL's Fieldworks Lexical Explorer](https://software.sil.org/fieldworks/)) database, which provides perhaps the most popular and accessible assisted segmentation and annotation workflow.\nHowever, a reasonably complete data export is only available in XML, which is not human-friendly, and is not readily converted to other data.\nA data format growing in popularity is the [CLDF standard](https://cldf.clld.org/), a table-based approach with human-readable datasets, designed to be used in [CLLD](https://clld.org/) apps and easily processable by any software that can read [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) files, including [R](https://www.r-project.org/), [pandas](https://pandas.pydata.org/) or spreadsheet applications.\nThe goal of ``cldflex`` is to convert lexicon and corpus data stored in FLEx to CSV tables, primarily for use in CLDF datasets.\n\n## Installation\n\n`cldflex` is available on [PyPI](https://pypi.org/project/cldflex):\n```shell\npip install cldflex\n```\n\n## Command line usage\nAt the moment, there are three commands: ``cldflex corpus`` for `.flextext` files; ``cldflex dictionary`` and `cldflex wordlist` for `.lift` files.\nAll commands create a number of CSV files.\nOne can either use [cldfbench](https://github.com/cldf/cldfbench) to create one's own CLDF datasets from these files, or add the `--cldf` argument to create a simple CLDF dataset.\nProject-specific [configuration](#configuration) can be passed by `--conf your/config.yaml`, or creating a file `cldflex.yaml`\n\n### `corpus`\nBasic usage:\n\n```shell\ncldflex corpus texts.flextext\n```\n\nConnect the corpus with the lexicon:\n\n```shell\ncldflex corpus texts.flextext --lexicon lexicon.lift\n```\n\nCreate a CLDF dataset:\n\n```shell\ncldflex corpus texts.flextext --lexicon lexicon.lift --cldf\n```\n\n### `dictionary`\n\nExtract morphemes, morphs, and entries from `lexicon.lift`:\n\n```shell\ncldflex dictionary lexicon.lift\n```\n\nCreate a CLDF dataset with a [`Dictionary`](https://github.com/cldf/cldf/tree/master/modules/Dictionary) module:\n\n```shell\ncldflex dictionary lexicon.lift --cldf\n```\n\n### `wordlist`\n\nCreate a CLDF dataset with a [`Wordlist`](https://github.com/cldf/cldf/tree/master/modules/Wordlist) module:\n\n```shell\ncldflex wordlist lexicon.lift --cldf\n```\n\n## API usage\nThe functions corresponding to the commands above are [`cldflex.corpus.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/corpus.py#L445) and [`cldflex.lift2csv.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/lift2csv.py#L130).\n\n## Configuration\nThere is no default configuration.\nRather, `cldflex` will guess values for most of the parameters below and tell you what it's doing.\nIt is suggested to start out configuration-free until something goes wrong or you want to change something.\nCreate a [YAML](https://yaml.org/) file for CLI usage, pass a dict to the `convert` methods.\n\n* `obj_lg`: the object language\n* `gloss_lg`: the language used for glossing / translation\n* `msa_lg`: the language used for storing POS information\n* `lang_id`: the value to be used in the created tables\n* `glottocode`: used to look up language metadata from glottolog\n* `csv_cell_separator`: if there are multiple values in a cell (allomorphs, polysemy...), they are by default separated by `\"; \"`\n* `form_slices`: set to `false` if you don't want form slices connecting morphs and word forms\n* `mappings`: a dictionary specifying name changes of columns in the created CSV files",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Convert FLEx data to CLDF-ready CSV.",
"version": "0.1.1",
"project_urls": {
"Bug tracker": "https://github.com/fmatter/cldflex/issues",
"Homepage": "https://fl.mt/cldflex",
"Repository": "https://github.com/fmatter/cldflex"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4489e9bd891849896a60286a473a4e941ffee7ccde195b4c57af1768ee677276",
"md5": "d2023200cbdb512ea7ef70cd6c8495f0",
"sha256": "87cbcb2eb15ae1cf41becb8793af1f70d057dd6b5ea67dbfca0110ffd8add4ec"
},
"downloads": -1,
"filename": "cldflex-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d2023200cbdb512ea7ef70cd6c8495f0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0.0",
"size": 22815,
"upload_time": "2023-11-07T00:56:20",
"upload_time_iso_8601": "2023-11-07T00:56:20.715948Z",
"url": "https://files.pythonhosted.org/packages/44/89/e9bd891849896a60286a473a4e941ffee7ccde195b4c57af1768ee677276/cldflex-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fb71ff444f58d4d9491396817ae180ad68cb3ee9bc95984859fa110bd7be1402",
"md5": "e707ea125c3aa5d10c68e2d207c34c99",
"sha256": "6381ebf7b0b2647d0d0c6ddd8bf48acdba5c2dbfc00e3effc407e02a319a3606"
},
"downloads": -1,
"filename": "cldflex-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "e707ea125c3aa5d10c68e2d207c34c99",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0.0",
"size": 21586,
"upload_time": "2023-11-07T00:56:22",
"upload_time_iso_8601": "2023-11-07T00:56:22.515480Z",
"url": "https://files.pythonhosted.org/packages/fb/71/ff444f58d4d9491396817ae180ad68cb3ee9bc95984859fa110bd7be1402/cldflex-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-07 00:56:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fmatter",
"github_project": "cldflex",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cldflex"
}