# pyigt: Handling interlinear glossed text with Python
[![Build Status](https://github.com/cldf/pyigt/workflows/tests/badge.svg)](https://github.com/cldf/pyigt/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/pyigt.svg)](https://pypi.org/project/pyigt)
[![Documentation Status](https://readthedocs.org/projects/pyigt/badge/?version=latest)](https://pyigt.readthedocs.io/en/latest/?badge=latest)
This library provides easy access to **I**nterlinear **G**lossed **T**ext (IGT) according
to the [Leipzig Glossing Rules](https://www.eva.mpg.de/lingua/resources/glossing-rules.php), stored as
[CLDF examples](https://github.com/cldf/cldf/tree/master/components/examples).
## Installation
Installing `pyigt` via pip
```shell
pip install pyigt
```
will install the Python package along with a [command line interface `igt`](#cli).
Note: The methods `Corpus.get_wordlist` and `Corpus.get_profile`, to extract a wordlist and an orthography profile
from a corpus, require the `lingpy` package. To make sure it is installed, install `pyigt` as
```shell
pip install pyigt[lingpy]
```
## CLI
```shell script
$ igt -h
usage: igt [-h] [--log-level LOG_LEVEL] COMMAND ...
optional arguments:
-h, --help show this help message and exit
--log-level LOG_LEVEL
log level [ERROR|WARN|INFO|DEBUG] (default: 20)
available commands:
Run "COMAMND -h" to get help for a specific command.
COMMAND
ls List IGTs in a CLDF dataset
stats Describe the IGTs in a CLDF dataset
```
The `igt ls` command allows inspecting IGTs from the commandline, formatted using the
four standard lines described in the Leipzig Glossing Rules, where analyzed text and
glosses are aligned, e.g.
```shell script
$ igt ls tests/fixtures/examples.csv
Example 1:
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,
earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC
...
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu o-ʐgu-tɑ i-pi-χuɑ-ȵi,
cypress-tree one-CL-LOC DIR-hide-because-ADV
IGT corpus at tests/fixtures/examples.csv
```
`igt ls` can be chained with other commandline tools such as commands from the
[csvkit](https://csvkit.readthedocs.io/en/latest/) package for filtering:
```shell script
$ csvgrep -c Primary_Text -m"ȵi" tests/fixtures/examples.csv | csvgrep -c Gloss -m"ADV" | igt ls -
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu o-ʐgu-tɑ i-pi-χuɑ-ȵi,
cypress-tree one-CL-LOC DIR-hide-because-ADV
```
## Python API
The Python API is documented in detail at [readthedocs](https://pyigt.readthedocs.io/en/latest/).
Below is a quick overview.
You can read all IGT examples provided with a CLDF dataset
```python
>>> from pyigt import Corpus
>>> corpus = Corpus.from_path('tests/fixtures/cldf-metadata.json')
>>> len(corpus)
5
>>> for igt in corpus:
... print(igt)
... break
...
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,
earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC
```
or instantiate individual IGT examples, e.g. to check for validity:
```python
>>> from pyigt import IGT
>>> ex = IGT(phrase="palasi=lu", gloss="priest-and")
>>> ex.check(strict=True, verbose=True)
palasi=lu
priest-and
...
ValueError: Rule 2 violated: Number of morphemes does not match number of morpheme glosses!
```
or to expand known gloss abbreviations:
```python
>>> ex = IGT(phrase="Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.",
... gloss="now they-OBL-GEN farm forever behind stay-FUT-NEG",
... translation="Now their farm will not stay behind forever.")
>>> ex.pprint()
Gila aburun ferma hamišaluǧ güǧüna amuq’dač.
Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.
now they-OBL-GEN farm forever behind stay-FUT-NEG
‘Now their farm will not stay behind forever.’
OBL = oblique
GEN = genitive
FUT = future
NEG = negation, negative
```
And you can go deeper, parsing morphemes and glosses according to the LGR
(see module [pyigt.lgrmorphemes](src/pyigt/lgrmorphemes.py)):
```python
>>> igt = IGT(phrase="zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,", gloss="earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC")
>>> igt.conformance
<LGRConformance.MORPHEME_ALIGNED: 2>
>>> igt[1, 1].gloss
<Morpheme "INDEF:CL">
>>> igt[1, 1].gloss.elements
[<GlossElement "INDEF">, <GlossElementAfterColon "CL">]
>>> igt[1, 1].morpheme
<Morpheme "ke:">
>>> print(igt[1, 1].morpheme)
ke:
```
## See also
- [interlineaR](https://cran.r-project.org/web/packages/interlineaR/index.html) - an R package with similar functionality, but support for more input formats.
Raw data
{
"_id": null,
"home_page": "https://github.com/cldf/pyigt",
"name": "pyigt",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "Chinese linguistics,historical linguistics,computer-assisted language comparison",
"author": "Johann-Mattis List and Robert Forkel",
"author_email": "robert_forkel@eva.mpg.de",
"download_url": "https://files.pythonhosted.org/packages/bb/a8/00b4eb9e7787174b92aef65dd7bf5c3894cad0e5b3445a9a1c0f260d9d47/pyigt-2.1.0.tar.gz",
"platform": "any",
"description": "# pyigt: Handling interlinear glossed text with Python\n\n[![Build Status](https://github.com/cldf/pyigt/workflows/tests/badge.svg)](https://github.com/cldf/pyigt/actions?query=workflow%3Atests)\n[![PyPI](https://img.shields.io/pypi/v/pyigt.svg)](https://pypi.org/project/pyigt)\n[![Documentation Status](https://readthedocs.org/projects/pyigt/badge/?version=latest)](https://pyigt.readthedocs.io/en/latest/?badge=latest)\n\nThis library provides easy access to **I**nterlinear **G**lossed **T**ext (IGT) according\nto the [Leipzig Glossing Rules](https://www.eva.mpg.de/lingua/resources/glossing-rules.php), stored as \n[CLDF examples](https://github.com/cldf/cldf/tree/master/components/examples).\n\n\n## Installation\n\nInstalling `pyigt` via pip\n\n```shell\npip install pyigt\n```\nwill install the Python package along with a [command line interface `igt`](#cli).\n\nNote: The methods `Corpus.get_wordlist` and `Corpus.get_profile`, to extract a wordlist and an orthography profile\nfrom a corpus, require the `lingpy` package. To make sure it is installed, install `pyigt` as\n```shell\npip install pyigt[lingpy]\n```\n\n## CLI\n\n```shell script\n$ igt -h\nusage: igt [-h] [--log-level LOG_LEVEL] COMMAND ...\n\noptional arguments:\n -h, --help show this help message and exit\n --log-level LOG_LEVEL\n log level [ERROR|WARN|INFO|DEBUG] (default: 20)\n\navailable commands:\n Run \"COMAMND -h\" to get help for a specific command.\n\n COMMAND\n ls List IGTs in a CLDF dataset\n stats Describe the IGTs in a CLDF dataset\n\n```\n\nThe `igt ls` command allows inspecting IGTs from the commandline, formatted using the\nfour standard lines described in the Leipzig Glossing Rules, where analyzed text and\nglosses are aligned, e.g.\n```shell script\n$ igt ls tests/fixtures/examples.csv \nExample 1:\nz\u0259ple: \u0235ike: peji qe\u02b4lot\u0282u\u0281\u0251,\nz\u0259p-le: \u0235i-ke: pe-ji qe\u02b4lot\u0282u-\u0281\u0251,\nearth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC\n\n...\n\nExample 5:\nzu\u0251m\u0259\u0278u o\u0290gut\u0251 ipi\u03c7u\u0251\u0235i,\nzu\u0251m\u0259-\u0278u o-\u0290gu-t\u0251 i-pi-\u03c7u\u0251-\u0235i,\ncypress-tree one-CL-LOC DIR-hide-because-ADV\n\nIGT corpus at tests/fixtures/examples.csv\n```\n\n`igt ls` can be chained with other commandline tools such as commands from the \n[csvkit](https://csvkit.readthedocs.io/en/latest/) package for filtering:\n```shell script\n$ csvgrep -c Primary_Text -m\"\u0235i\" tests/fixtures/examples.csv | csvgrep -c Gloss -m\"ADV\" | igt ls -\nExample 5:\nzu\u0251m\u0259\u0278u o\u0290gut\u0251 ipi\u03c7u\u0251\u0235i,\nzu\u0251m\u0259-\u0278u o-\u0290gu-t\u0251 i-pi-\u03c7u\u0251-\u0235i,\ncypress-tree one-CL-LOC DIR-hide-because-ADV\n\n```\n\n\n## Python API\n\nThe Python API is documented in detail at [readthedocs](https://pyigt.readthedocs.io/en/latest/).\nBelow is a quick overview.\n\nYou can read all IGT examples provided with a CLDF dataset\n\n```python\n>>> from pyigt import Corpus\n>>> corpus = Corpus.from_path('tests/fixtures/cldf-metadata.json')\n>>> len(corpus)\n5\n>>> for igt in corpus:\n... print(igt)\n... break\n... \nz\u0259ple: \u0235ike: peji qe\u02b4lot\u0282u\u0281\u0251,\nz\u0259p-le: \u0235i-ke: pe-ji qe\u02b4lot\u0282u-\u0281\u0251,\nearth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC\n```\n\nor instantiate individual IGT examples, e.g. to check for validity:\n```python\n>>> from pyigt import IGT\n>>> ex = IGT(phrase=\"palasi=lu\", gloss=\"priest-and\")\n>>> ex.check(strict=True, verbose=True)\npalasi=lu\npriest-and\n...\nValueError: Rule 2 violated: Number of morphemes does not match number of morpheme glosses!\n```\nor to expand known gloss abbreviations:\n```python\n>>> ex = IGT(phrase=\"Gila abur-u-n ferma hami\u0161alu\u01e7 g\u00fc\u01e7\u00fcna amuq\u2019-da-\u010d.\",\n... gloss=\"now they-OBL-GEN farm forever behind stay-FUT-NEG\", \n... translation=\"Now their farm will not stay behind forever.\")\n>>> ex.pprint()\nGila aburun ferma hami\u0161alu\u01e7 g\u00fc\u01e7\u00fcna amuq\u2019da\u010d.\nGila abur-u-n ferma hami\u0161alu\u01e7 g\u00fc\u01e7\u00fcna amuq\u2019-da-\u010d.\nnow they-OBL-GEN farm forever behind stay-FUT-NEG\n\u2018Now their farm will not stay behind forever.\u2019\n OBL = oblique\n GEN = genitive\n FUT = future\n NEG = negation, negative\n```\n\nAnd you can go deeper, parsing morphemes and glosses according to the LGR \n(see module [pyigt.lgrmorphemes](src/pyigt/lgrmorphemes.py)):\n\n```python\n>>> igt = IGT(phrase=\"z\u0259p-le: \u0235i-ke: pe-ji qe\u02b4lot\u0282u-\u0281\u0251,\", gloss=\"earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC\")\n>>> igt.conformance\n<LGRConformance.MORPHEME_ALIGNED: 2>\n>>> igt[1, 1].gloss\n<Morpheme \"INDEF:CL\">\n>>> igt[1, 1].gloss.elements\n[<GlossElement \"INDEF\">, <GlossElementAfterColon \"CL\">]\n>>> igt[1, 1].morpheme\n<Morpheme \"ke:\">\n>>> print(igt[1, 1].morpheme)\nke:\n```\n\n\n## See also\n\n- [interlineaR](https://cran.r-project.org/web/packages/interlineaR/index.html) - an R package with similar functionality, but support for more input formats.\n\n\n",
"bugtrack_url": null,
"license": "GPL",
"summary": "A Python library for handling inter-linear-glossed text.",
"version": "2.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/cldf/pyigt/issues",
"Homepage": "https://github.com/cldf/pyigt"
},
"split_keywords": [
"chinese linguistics",
"historical linguistics",
"computer-assisted language comparison"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2c7e33605fbf945fccd6a95c2c1b1f051925362e3c1c45fac77b552080dad236",
"md5": "8a77b52f8996bdceccd4cf3af2636f17",
"sha256": "b1502bfc6d4c1776baf187bbf25e6618bbf6b7560059173fe22815f310672912"
},
"downloads": -1,
"filename": "pyigt-2.1.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "8a77b52f8996bdceccd4cf3af2636f17",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 33225,
"upload_time": "2023-11-28T11:40:19",
"upload_time_iso_8601": "2023-11-28T11:40:19.305151Z",
"url": "https://files.pythonhosted.org/packages/2c/7e/33605fbf945fccd6a95c2c1b1f051925362e3c1c45fac77b552080dad236/pyigt-2.1.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bba800b4eb9e7787174b92aef65dd7bf5c3894cad0e5b3445a9a1c0f260d9d47",
"md5": "7817da3ce1794d1c01b4d9d434f9ff75",
"sha256": "2f9178b70fcb65d1228f05d26c528811c81e2dadb85df40a9a73e1504389f0f3"
},
"downloads": -1,
"filename": "pyigt-2.1.0.tar.gz",
"has_sig": false,
"md5_digest": "7817da3ce1794d1c01b4d9d434f9ff75",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 38708,
"upload_time": "2023-11-28T11:40:21",
"upload_time_iso_8601": "2023-11-28T11:40:21.267160Z",
"url": "https://files.pythonhosted.org/packages/bb/a8/00b4eb9e7787174b92aef65dd7bf5c3894cad0e5b3445a9a1c0f260d9d47/pyigt-2.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-28 11:40:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cldf",
"github_project": "pyigt",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "pyigt"
}