pyigt


Namepyigt JSON
Version 2.1.0 PyPI version JSON
download
home_pagehttps://github.com/cldf/pyigt
SummaryA Python library for handling inter-linear-glossed text.
upload_time2023-11-28 11:40:21
maintainer
docs_urlNone
authorJohann-Mattis List and Robert Forkel
requires_python>=3.8
licenseGPL
keywords chinese linguistics historical linguistics computer-assisted language comparison
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyigt: Handling interlinear glossed text with Python

[![Build Status](https://github.com/cldf/pyigt/workflows/tests/badge.svg)](https://github.com/cldf/pyigt/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/pyigt.svg)](https://pypi.org/project/pyigt)
[![Documentation Status](https://readthedocs.org/projects/pyigt/badge/?version=latest)](https://pyigt.readthedocs.io/en/latest/?badge=latest)

This library provides easy access to **I**nterlinear **G**lossed **T**ext (IGT) according
to the [Leipzig Glossing Rules](https://www.eva.mpg.de/lingua/resources/glossing-rules.php), stored as 
[CLDF examples](https://github.com/cldf/cldf/tree/master/components/examples).


## Installation

Installing `pyigt` via pip

```shell
pip install pyigt
```
will install the Python package along with a [command line interface `igt`](#cli).

Note: The methods `Corpus.get_wordlist` and `Corpus.get_profile`, to extract a wordlist and an orthography profile
from a corpus, require the `lingpy` package. To make sure it is installed, install `pyigt` as
```shell
pip install pyigt[lingpy]
```

## CLI

```shell script
$ igt -h
usage: igt [-h] [--log-level LOG_LEVEL] COMMAND ...

optional arguments:
  -h, --help            show this help message and exit
  --log-level LOG_LEVEL
                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)

available commands:
  Run "COMAMND -h" to get help for a specific command.

  COMMAND
    ls                  List IGTs in a CLDF dataset
    stats               Describe the IGTs in a CLDF dataset

```

The `igt ls` command allows inspecting IGTs from the commandline, formatted using the
four standard lines described in the Leipzig Glossing Rules, where analyzed text and
glosses are aligned, e.g.
```shell script
$ igt ls tests/fixtures/examples.csv 
Example 1:
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le:       ȵi-ke:       pe-ji       qeʴlotʂu-ʁɑ,
earth-DEF:CL  WH-INDEF:CL  become-CSM  in.the.past-LOC

...

Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu      o-ʐgu-tɑ    i-pi-χuɑ-ȵi,
cypress-tree  one-CL-LOC  DIR-hide-because-ADV

IGT corpus at tests/fixtures/examples.csv
```

`igt ls` can be chained with other commandline tools such as commands from the 
[csvkit](https://csvkit.readthedocs.io/en/latest/) package for filtering:
```shell script
$ csvgrep -c Primary_Text -m"ȵi"  tests/fixtures/examples.csv | csvgrep -c Gloss -m"ADV" |  igt ls -
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu      o-ʐgu-tɑ    i-pi-χuɑ-ȵi,
cypress-tree  one-CL-LOC  DIR-hide-because-ADV

```


## Python API

The Python API is documented in detail at [readthedocs](https://pyigt.readthedocs.io/en/latest/).
Below is a quick overview.

You can read all IGT examples provided with a CLDF dataset

```python
>>> from pyigt import Corpus
>>> corpus = Corpus.from_path('tests/fixtures/cldf-metadata.json')
>>> len(corpus)
5
>>> for igt in corpus:
...     print(igt)
...     break
... 
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le:       ȵi-ke:       pe-ji       qeʴlotʂu-ʁɑ,
earth-DEF:CL  WH-INDEF:CL  become-CSM  in.the.past-LOC
```

or instantiate individual IGT examples, e.g. to check for validity:
```python
>>> from pyigt import IGT
>>> ex = IGT(phrase="palasi=lu", gloss="priest-and")
>>> ex.check(strict=True, verbose=True)
palasi=lu
priest-and
...
ValueError: Rule 2 violated: Number of morphemes does not match number of morpheme glosses!
```
or to expand known gloss abbreviations:
```python
>>> ex = IGT(phrase="Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.",
...          gloss="now they-OBL-GEN farm forever behind stay-FUT-NEG", 
...          translation="Now their farm will not stay behind forever.")
>>> ex.pprint()
Gila aburun ferma hamišaluǧ güǧüna amuq’dač.
Gila    abur-u-n      ferma    hamišaluǧ    güǧüna    amuq’-da-č.
now     they-OBL-GEN  farm     forever      behind    stay-FUT-NEG
‘Now their farm will not stay behind forever.’
  OBL = oblique
  GEN = genitive
  FUT = future
  NEG = negation, negative
```

And you can go deeper, parsing morphemes and glosses according to the LGR 
(see module [pyigt.lgrmorphemes](src/pyigt/lgrmorphemes.py)):

```python
>>> igt = IGT(phrase="zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,", gloss="earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC")
>>> igt.conformance
<LGRConformance.MORPHEME_ALIGNED: 2>
>>> igt[1, 1].gloss
<Morpheme "INDEF:CL">
>>> igt[1, 1].gloss.elements
[<GlossElement "INDEF">, <GlossElementAfterColon "CL">]
>>> igt[1, 1].morpheme
<Morpheme "ke:">
>>> print(igt[1, 1].morpheme)
ke:
```


## See also

- [interlineaR](https://cran.r-project.org/web/packages/interlineaR/index.html) - an R package with similar functionality, but support for more input formats.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cldf/pyigt",
    "name": "pyigt",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "Chinese linguistics,historical linguistics,computer-assisted language comparison",
    "author": "Johann-Mattis List and Robert Forkel",
    "author_email": "robert_forkel@eva.mpg.de",
    "download_url": "https://files.pythonhosted.org/packages/bb/a8/00b4eb9e7787174b92aef65dd7bf5c3894cad0e5b3445a9a1c0f260d9d47/pyigt-2.1.0.tar.gz",
    "platform": "any",
    "description": "# pyigt: Handling interlinear glossed text with Python\n\n[![Build Status](https://github.com/cldf/pyigt/workflows/tests/badge.svg)](https://github.com/cldf/pyigt/actions?query=workflow%3Atests)\n[![PyPI](https://img.shields.io/pypi/v/pyigt.svg)](https://pypi.org/project/pyigt)\n[![Documentation Status](https://readthedocs.org/projects/pyigt/badge/?version=latest)](https://pyigt.readthedocs.io/en/latest/?badge=latest)\n\nThis library provides easy access to **I**nterlinear **G**lossed **T**ext (IGT) according\nto the [Leipzig Glossing Rules](https://www.eva.mpg.de/lingua/resources/glossing-rules.php), stored as \n[CLDF examples](https://github.com/cldf/cldf/tree/master/components/examples).\n\n\n## Installation\n\nInstalling `pyigt` via pip\n\n```shell\npip install pyigt\n```\nwill install the Python package along with a [command line interface `igt`](#cli).\n\nNote: The methods `Corpus.get_wordlist` and `Corpus.get_profile`, to extract a wordlist and an orthography profile\nfrom a corpus, require the `lingpy` package. To make sure it is installed, install `pyigt` as\n```shell\npip install pyigt[lingpy]\n```\n\n## CLI\n\n```shell script\n$ igt -h\nusage: igt [-h] [--log-level LOG_LEVEL] COMMAND ...\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --log-level LOG_LEVEL\n                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)\n\navailable commands:\n  Run \"COMAMND -h\" to get help for a specific command.\n\n  COMMAND\n    ls                  List IGTs in a CLDF dataset\n    stats               Describe the IGTs in a CLDF dataset\n\n```\n\nThe `igt ls` command allows inspecting IGTs from the commandline, formatted using the\nfour standard lines described in the Leipzig Glossing Rules, where analyzed text and\nglosses are aligned, e.g.\n```shell script\n$ igt ls tests/fixtures/examples.csv \nExample 1:\nz\u0259ple: \u0235ike: peji qe\u02b4lot\u0282u\u0281\u0251,\nz\u0259p-le:       \u0235i-ke:       pe-ji       qe\u02b4lot\u0282u-\u0281\u0251,\nearth-DEF:CL  WH-INDEF:CL  become-CSM  in.the.past-LOC\n\n...\n\nExample 5:\nzu\u0251m\u0259\u0278u o\u0290gut\u0251 ipi\u03c7u\u0251\u0235i,\nzu\u0251m\u0259-\u0278u      o-\u0290gu-t\u0251    i-pi-\u03c7u\u0251-\u0235i,\ncypress-tree  one-CL-LOC  DIR-hide-because-ADV\n\nIGT corpus at tests/fixtures/examples.csv\n```\n\n`igt ls` can be chained with other commandline tools such as commands from the \n[csvkit](https://csvkit.readthedocs.io/en/latest/) package for filtering:\n```shell script\n$ csvgrep -c Primary_Text -m\"\u0235i\"  tests/fixtures/examples.csv | csvgrep -c Gloss -m\"ADV\" |  igt ls -\nExample 5:\nzu\u0251m\u0259\u0278u o\u0290gut\u0251 ipi\u03c7u\u0251\u0235i,\nzu\u0251m\u0259-\u0278u      o-\u0290gu-t\u0251    i-pi-\u03c7u\u0251-\u0235i,\ncypress-tree  one-CL-LOC  DIR-hide-because-ADV\n\n```\n\n\n## Python API\n\nThe Python API is documented in detail at [readthedocs](https://pyigt.readthedocs.io/en/latest/).\nBelow is a quick overview.\n\nYou can read all IGT examples provided with a CLDF dataset\n\n```python\n>>> from pyigt import Corpus\n>>> corpus = Corpus.from_path('tests/fixtures/cldf-metadata.json')\n>>> len(corpus)\n5\n>>> for igt in corpus:\n...     print(igt)\n...     break\n... \nz\u0259ple: \u0235ike: peji qe\u02b4lot\u0282u\u0281\u0251,\nz\u0259p-le:       \u0235i-ke:       pe-ji       qe\u02b4lot\u0282u-\u0281\u0251,\nearth-DEF:CL  WH-INDEF:CL  become-CSM  in.the.past-LOC\n```\n\nor instantiate individual IGT examples, e.g. to check for validity:\n```python\n>>> from pyigt import IGT\n>>> ex = IGT(phrase=\"palasi=lu\", gloss=\"priest-and\")\n>>> ex.check(strict=True, verbose=True)\npalasi=lu\npriest-and\n...\nValueError: Rule 2 violated: Number of morphemes does not match number of morpheme glosses!\n```\nor to expand known gloss abbreviations:\n```python\n>>> ex = IGT(phrase=\"Gila abur-u-n ferma hami\u0161alu\u01e7 g\u00fc\u01e7\u00fcna amuq\u2019-da-\u010d.\",\n...          gloss=\"now they-OBL-GEN farm forever behind stay-FUT-NEG\", \n...          translation=\"Now their farm will not stay behind forever.\")\n>>> ex.pprint()\nGila aburun ferma hami\u0161alu\u01e7 g\u00fc\u01e7\u00fcna amuq\u2019da\u010d.\nGila    abur-u-n      ferma    hami\u0161alu\u01e7    g\u00fc\u01e7\u00fcna    amuq\u2019-da-\u010d.\nnow     they-OBL-GEN  farm     forever      behind    stay-FUT-NEG\n\u2018Now their farm will not stay behind forever.\u2019\n  OBL = oblique\n  GEN = genitive\n  FUT = future\n  NEG = negation, negative\n```\n\nAnd you can go deeper, parsing morphemes and glosses according to the LGR \n(see module [pyigt.lgrmorphemes](src/pyigt/lgrmorphemes.py)):\n\n```python\n>>> igt = IGT(phrase=\"z\u0259p-le: \u0235i-ke: pe-ji qe\u02b4lot\u0282u-\u0281\u0251,\", gloss=\"earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC\")\n>>> igt.conformance\n<LGRConformance.MORPHEME_ALIGNED: 2>\n>>> igt[1, 1].gloss\n<Morpheme \"INDEF:CL\">\n>>> igt[1, 1].gloss.elements\n[<GlossElement \"INDEF\">, <GlossElementAfterColon \"CL\">]\n>>> igt[1, 1].morpheme\n<Morpheme \"ke:\">\n>>> print(igt[1, 1].morpheme)\nke:\n```\n\n\n## See also\n\n- [interlineaR](https://cran.r-project.org/web/packages/interlineaR/index.html) - an R package with similar functionality, but support for more input formats.\n\n\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "A Python library for handling inter-linear-glossed text.",
    "version": "2.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/cldf/pyigt/issues",
        "Homepage": "https://github.com/cldf/pyigt"
    },
    "split_keywords": [
        "chinese linguistics",
        "historical linguistics",
        "computer-assisted language comparison"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2c7e33605fbf945fccd6a95c2c1b1f051925362e3c1c45fac77b552080dad236",
                "md5": "8a77b52f8996bdceccd4cf3af2636f17",
                "sha256": "b1502bfc6d4c1776baf187bbf25e6618bbf6b7560059173fe22815f310672912"
            },
            "downloads": -1,
            "filename": "pyigt-2.1.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8a77b52f8996bdceccd4cf3af2636f17",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 33225,
            "upload_time": "2023-11-28T11:40:19",
            "upload_time_iso_8601": "2023-11-28T11:40:19.305151Z",
            "url": "https://files.pythonhosted.org/packages/2c/7e/33605fbf945fccd6a95c2c1b1f051925362e3c1c45fac77b552080dad236/pyigt-2.1.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bba800b4eb9e7787174b92aef65dd7bf5c3894cad0e5b3445a9a1c0f260d9d47",
                "md5": "7817da3ce1794d1c01b4d9d434f9ff75",
                "sha256": "2f9178b70fcb65d1228f05d26c528811c81e2dadb85df40a9a73e1504389f0f3"
            },
            "downloads": -1,
            "filename": "pyigt-2.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7817da3ce1794d1c01b4d9d434f9ff75",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 38708,
            "upload_time": "2023-11-28T11:40:21",
            "upload_time_iso_8601": "2023-11-28T11:40:21.267160Z",
            "url": "https://files.pythonhosted.org/packages/bb/a8/00b4eb9e7787174b92aef65dd7bf5c3894cad0e5b3445a9a1c0f260d9d47/pyigt-2.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-28 11:40:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cldf",
    "github_project": "pyigt",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pyigt"
}
        
Elapsed time: 0.19949s