cldflex


Namecldflex JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://fl.mt/cldflex
SummaryConvert FLEx data to CLDF-ready CSV.
upload_time2023-11-07 00:56:22
maintainer
docs_urlNone
authorFlorian Matter
requires_python>=3.8.1,<4.0.0
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cldflex

Convert FLEx data to CLDF-ready CSV.

[![Versions](https://img.shields.io/pypi/pyversions/clld_morphology_plugin)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/clld_morphology_plugin.svg)](https://pypi.org/project/clld_morphology_plugin)
[![License](https://img.shields.io/github/license/fmatter/cldflex)](https://www.apache.org/licenses/LICENSE-2.0)


Many descriptive linguists have annotated language data in a FLEx ([SIL's Fieldworks Lexical Explorer](https://software.sil.org/fieldworks/)) database, which provides perhaps the most popular and accessible assisted segmentation and annotation workflow.
However, a reasonably complete data export is only available in XML, which is not human-friendly, and is not readily converted to other data.
A data format growing in popularity is the [CLDF standard](https://cldf.clld.org/), a table-based approach with human-readable datasets, designed to be used in [CLLD](https://clld.org/) apps and easily processable by any software that can read [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) files, including  [R](https://www.r-project.org/), [pandas](https://pandas.pydata.org/) or spreadsheet applications.
The goal of ``cldflex`` is to convert lexicon and corpus data stored in FLEx to CSV tables, primarily for use in CLDF datasets.

## Installation

`cldflex` is available on [PyPI](https://pypi.org/project/cldflex):
```shell
pip install cldflex
```

## Command line usage
At the moment, there are three commands: ``cldflex corpus`` for `.flextext` files; ``cldflex dictionary`` and `cldflex wordlist` for `.lift` files.
All commands create a number of CSV files.
One can either use [cldfbench](https://github.com/cldf/cldfbench) to create one's own CLDF datasets from these files, or add the `--cldf` argument to create a simple CLDF dataset.
Project-specific [configuration](#configuration) can be passed by `--conf your/config.yaml`, or creating a file `cldflex.yaml`

### `corpus`
Basic usage:

```shell
cldflex corpus texts.flextext
```

Connect the corpus with the lexicon:

```shell
cldflex corpus texts.flextext --lexicon lexicon.lift
```

Create a CLDF dataset:

```shell
cldflex corpus texts.flextext --lexicon lexicon.lift --cldf
```

### `dictionary`

Extract morphemes, morphs, and entries from `lexicon.lift`:

```shell
cldflex dictionary lexicon.lift
```

Create a CLDF dataset with a  [`Dictionary`](https://github.com/cldf/cldf/tree/master/modules/Dictionary) module:

```shell
cldflex dictionary lexicon.lift --cldf
```

### `wordlist`

Create a CLDF dataset with a  [`Wordlist`](https://github.com/cldf/cldf/tree/master/modules/Wordlist) module:

```shell
cldflex wordlist lexicon.lift --cldf
```

## API usage
The functions corresponding to the commands above are [`cldflex.corpus.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/corpus.py#L445) and [`cldflex.lift2csv.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/lift2csv.py#L130).

## Configuration
There is no default configuration.
Rather, `cldflex` will guess values for most of the parameters below and tell you what it's doing.
It is suggested to start out configuration-free until something goes wrong or you want to change something.
Create a [YAML](https://yaml.org/) file for CLI usage, pass a dict to the `convert` methods.

* `obj_lg`: the object language
* `gloss_lg`: the language used for glossing / translation
* `msa_lg`: the language used for storing POS information
* `lang_id`: the value to be used in the created tables
* `glottocode`: used to look up language metadata from glottolog
* `csv_cell_separator`: if there are multiple values in a cell (allomorphs, polysemy...), they are by default separated by `"; "`
* `form_slices`: set to `false` if you don't want form slices connecting morphs and word forms
* `mappings`: a dictionary specifying name changes of columns in the created CSV files
            

Raw data

            {
    "_id": null,
    "home_page": "https://fl.mt/cldflex",
    "name": "cldflex",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8.1,<4.0.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Florian Matter",
    "author_email": "flmt@mailbox.org",
    "download_url": "https://files.pythonhosted.org/packages/fb/71/ff444f58d4d9491396817ae180ad68cb3ee9bc95984859fa110bd7be1402/cldflex-0.1.1.tar.gz",
    "platform": null,
    "description": "# cldflex\n\nConvert FLEx data to CLDF-ready CSV.\n\n[![Versions](https://img.shields.io/pypi/pyversions/clld_morphology_plugin)](https://www.python.org/)\n[![PyPI](https://img.shields.io/pypi/v/clld_morphology_plugin.svg)](https://pypi.org/project/clld_morphology_plugin)\n[![License](https://img.shields.io/github/license/fmatter/cldflex)](https://www.apache.org/licenses/LICENSE-2.0)\n\n\nMany descriptive linguists have annotated language data in a FLEx ([SIL's Fieldworks Lexical Explorer](https://software.sil.org/fieldworks/)) database, which provides perhaps the most popular and accessible assisted segmentation and annotation workflow.\nHowever, a reasonably complete data export is only available in XML, which is not human-friendly, and is not readily converted to other data.\nA data format growing in popularity is the [CLDF standard](https://cldf.clld.org/), a table-based approach with human-readable datasets, designed to be used in [CLLD](https://clld.org/) apps and easily processable by any software that can read [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) files, including  [R](https://www.r-project.org/), [pandas](https://pandas.pydata.org/) or spreadsheet applications.\nThe goal of ``cldflex`` is to convert lexicon and corpus data stored in FLEx to CSV tables, primarily for use in CLDF datasets.\n\n## Installation\n\n`cldflex` is available on [PyPI](https://pypi.org/project/cldflex):\n```shell\npip install cldflex\n```\n\n## Command line usage\nAt the moment, there are three commands: ``cldflex corpus`` for `.flextext` files; ``cldflex dictionary`` and `cldflex wordlist` for `.lift` files.\nAll commands create a number of CSV files.\nOne can either use [cldfbench](https://github.com/cldf/cldfbench) to create one's own CLDF datasets from these files, or add the `--cldf` argument to create a simple CLDF dataset.\nProject-specific [configuration](#configuration) can be passed by `--conf your/config.yaml`, or creating a file `cldflex.yaml`\n\n### `corpus`\nBasic usage:\n\n```shell\ncldflex corpus texts.flextext\n```\n\nConnect the corpus with the lexicon:\n\n```shell\ncldflex corpus texts.flextext --lexicon lexicon.lift\n```\n\nCreate a CLDF dataset:\n\n```shell\ncldflex corpus texts.flextext --lexicon lexicon.lift --cldf\n```\n\n### `dictionary`\n\nExtract morphemes, morphs, and entries from `lexicon.lift`:\n\n```shell\ncldflex dictionary lexicon.lift\n```\n\nCreate a CLDF dataset with a  [`Dictionary`](https://github.com/cldf/cldf/tree/master/modules/Dictionary) module:\n\n```shell\ncldflex dictionary lexicon.lift --cldf\n```\n\n### `wordlist`\n\nCreate a CLDF dataset with a  [`Wordlist`](https://github.com/cldf/cldf/tree/master/modules/Wordlist) module:\n\n```shell\ncldflex wordlist lexicon.lift --cldf\n```\n\n## API usage\nThe functions corresponding to the commands above are [`cldflex.corpus.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/corpus.py#L445) and [`cldflex.lift2csv.convert()`](https://github.com/fmatter/cldflex/blob/4d9962ff53baab68a20ecce34f8623e87f7197ec/src/cldflex/lift2csv.py#L130).\n\n## Configuration\nThere is no default configuration.\nRather, `cldflex` will guess values for most of the parameters below and tell you what it's doing.\nIt is suggested to start out configuration-free until something goes wrong or you want to change something.\nCreate a [YAML](https://yaml.org/) file for CLI usage, pass a dict to the `convert` methods.\n\n* `obj_lg`: the object language\n* `gloss_lg`: the language used for glossing / translation\n* `msa_lg`: the language used for storing POS information\n* `lang_id`: the value to be used in the created tables\n* `glottocode`: used to look up language metadata from glottolog\n* `csv_cell_separator`: if there are multiple values in a cell (allomorphs, polysemy...), they are by default separated by `\"; \"`\n* `form_slices`: set to `false` if you don't want form slices connecting morphs and word forms\n* `mappings`: a dictionary specifying name changes of columns in the created CSV files",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Convert FLEx data to CLDF-ready CSV.",
    "version": "0.1.1",
    "project_urls": {
        "Bug tracker": "https://github.com/fmatter/cldflex/issues",
        "Homepage": "https://fl.mt/cldflex",
        "Repository": "https://github.com/fmatter/cldflex"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4489e9bd891849896a60286a473a4e941ffee7ccde195b4c57af1768ee677276",
                "md5": "d2023200cbdb512ea7ef70cd6c8495f0",
                "sha256": "87cbcb2eb15ae1cf41becb8793af1f70d057dd6b5ea67dbfca0110ffd8add4ec"
            },
            "downloads": -1,
            "filename": "cldflex-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d2023200cbdb512ea7ef70cd6c8495f0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1,<4.0.0",
            "size": 22815,
            "upload_time": "2023-11-07T00:56:20",
            "upload_time_iso_8601": "2023-11-07T00:56:20.715948Z",
            "url": "https://files.pythonhosted.org/packages/44/89/e9bd891849896a60286a473a4e941ffee7ccde195b4c57af1768ee677276/cldflex-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb71ff444f58d4d9491396817ae180ad68cb3ee9bc95984859fa110bd7be1402",
                "md5": "e707ea125c3aa5d10c68e2d207c34c99",
                "sha256": "6381ebf7b0b2647d0d0c6ddd8bf48acdba5c2dbfc00e3effc407e02a319a3606"
            },
            "downloads": -1,
            "filename": "cldflex-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e707ea125c3aa5d10c68e2d207c34c99",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1,<4.0.0",
            "size": 21586,
            "upload_time": "2023-11-07T00:56:22",
            "upload_time_iso_8601": "2023-11-07T00:56:22.515480Z",
            "url": "https://files.pythonhosted.org/packages/fb/71/ff444f58d4d9491396817ae180ad68cb3ee9bc95984859fa110bd7be1402/cldflex-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-07 00:56:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fmatter",
    "github_project": "cldflex",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cldflex"
}
        
Elapsed time: 0.30310s