algophon

Name	algophon JSON
Version	0.0.9 JSON
	download
home_page	None
Summary	Tools for an algorithmic approach to phonology (some useful to computational phonology and morphology more broadly)
upload_time	2024-05-06 15:46:42
maintainer	None
docs_url	None
author	Caleb Belth
requires_python	>=3.8
license	Apache 2.0
keywords	computational linguistics phonology morphology natural language processing
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # algophon

**Code for working on computational phonology and morphology in Python.** 

The project is based on code developed by [Caleb Belth](https://cbelth.github.io/) during the course of his PhD; the title of his [dissertation](https://cbelth.github.io/public/assets/documents/belth_dissertation.pdf), *Towards an Algorithmic Account of Phonological Rules and Representations*, serves as the origin for the repository's name *algophon*.

This is a <span style="color:orange">work in progress</span>. The pypi distribution and documentation will be updated as the project progresses! The initial plan for the project is to include:
1. Handy tools for working with strings of phonological segments.
2. Implementations of computational learning models.

Item (1) will be implemented first.

**Suggestions are welcome!**

## Install

```
pip install algophon
```

## Working With Strings of Segments

The code at the top level of the package provides some nice functionality for easily working with strings of phonological segments.

The following examples assume you have imported the appropriate classes:

```python
>>> from algophon import Seg, SegInv, SegStr, NatClass
```

### Segments: `Seg`

**A class to represent a phonological segment.**

You are unlikely to be creating `Seg` objects yourself very often. They will usually be constructed internally by other parts of the package (in particular, see `SegInv` and `SegStr`). However, if you ever need to, creating a `Seg` object requires the following arguments:
- `ipa`: a `str` IPA symbol
- `features` (optional): a `dict` of features mapping to their corresponding values

```python
>>> seg = Seg(ipa='i', features={'syl': '+', 'voi': '+', 'stri': '0'})
```

What is important to know is how `Seg` objects behave, and why they are handy.

<span style="color:green">**First**</span>, in the important respects `Seg` behaves like the `str` IPA segment used to create it.

If you `print` a `Seg` object, it will print its IPA:

```python
>>> print(seg)
i
```

If you compare a `Seg` object to a `str`, it will behave like it is the IPA symbol:

```python
>>> print(seg == 'i')
True
>>> print(seg == 'e')
False
```

A `Seg` object hashes to the same value as its IPA symbol:

```python
>>> print(len({seg, 'i'}))
1
>>> print('i' in {seg}, seg in {'i'})
True True
```

<span style="color:green">**Second**</span>, in the important respects `Seg` behaves like a feature bundle (see also the other classes, where other benefits will become clear).

```python
>>> print(seg.features['syl'])
+
```

<span style="color:green">**Third**</span>, `Seg` handles IPA symbols that are longer than one unicode char.

```python
>>> tsh = Seg(ipa='t͡ʃ')
>>> print(tsh)
t͡ʃ
>>> print(len(tsh))
1
>>> from algophon.symbols import LONG # see description of symbols below
>>> long_i = Seg(ipa=f'i{LONG}')
>>> print(long_i)
iː
>> print(len(long_i))
1
```

### Segment Inventory: `SegInv`

**A class to represent an inventory of phonological segments (Seg objects).**

A `SegInv` object is a collection of `Seg` objects. A `SegInv` requires no arguments to construct, though it provides two optional arguments:
- `ipa_file_path`: a `str` pointing to a file of segment-feature mappings.
- `sep`: a `str` specifying the column separator of the `ipa_file_path` file.

By default, `SegInv` uses [Panphon](https://github.com/dmort27/panphon) (Mortensen et. al., 2016) features. The optional parameters allow you to use your own features. The file at `ipa_file_path` must be formatted like this:
- The first row must be a header of feature names, separated by the `sep` (by default, `\t`)
- The first column must contain the segment IPAs (the header row can have anything, e.g., `SEG`)
- The remaining columns (non first row) must contain the feature values.

When a `SegInv` object is created, it is empty:

```python
>>> seginv = SegInv()
>>> seginv
SegInv of size 0
```

You can add segments by the `add`, `add_segments`, and `add_segments_by_str` methods:

```python
>>> seginv.add('i')
>>> print(seginv.segs)
{i}
>>> seginv.add_segs({'p', 'b', 't', 'd'})
>>> print(seginv.segs)
{b, t, d, i, p}
>>> seginv.add_segs_by_str('eː n t j ə') # segments in str must be space-separated
>>> print(seginv.segs)
{b, t, d, i, j, n, p, ə, eː}
```

The reason that `add_segs_by_str` requires the segments be space-separated is because not all IPA symbols are only one char (e.g., `'eː'`). Moreover, this is consistent with the [Sigmorphon](https://github.com/sigmorphon) challenges data format commonly used in morphophonology tasks.

These `add*` methods automatically create `Seg` objects and assign them `features` based on either Panphon (default) or the `ipa_file_path` file.

```python
>>> print(seginv['eː'].features)
{'syl': '+', 'son': '+', 'cons': '-', 'cont': '+', 'delrel': '-', 'lat': '-', 'nas': '-', 'strid': '0', 'voi': '+', 'sg': '-', 'cg': '-', 'ant': '0', 'cor': '-', 'distr': '0', 'lab': '-', 'hi': '-', 'lo': '-', 'back': '-', 'round': '-', 'velaric': '-', 'tense': '+', 'long': '+', 'hitone': '0', 'hireg': '0'}
```

This also demonstrates that `seginv` operates like a dictionary in that you can retrieve and check the existence of segments by their IPA.

```python
>>> 'eː' in seginv
True
```

### Strings of Segments: `SegStr`

**A class to represent a sequence of phonological segments (Seg objects).**

The class `SegStr` allows for handling several tricky aspects of IPA sequences. It is common practice to represent strings of IPA sequences in a space-separated fashion such that, for example, [eːntjə] is represented `'eː n t j ə'`.

Creating a `SegStr` object requires the following arguments:
  - `segs`: a collection of segments, which can be in any of the following formats:
    - str of IPA symbols, where each symbol is separated by a space ' ' (**most common**)
    - list of IPA symbols
    - list of Seg objects
  - `seginv`: a `SegInv` object

```python
>>> seginv = SegInv() # init SegInv
>>> seq = SegStr('eː n t j ə', seginv)
>>> print(seq)
eːntjə
```

Creating the `SegStr` object automatically adds the segments in the object to the `SegInv` object.

```python
>>> print(seginv.segs)
{ə, t, n, j, eː}
```

For clean visuzliation, `SegStr` displays the sequence of segments without spaces, as `print(seq)` shows above. But internally a `SegStr` object knows what the segments are:

```python
>>> print(len(seq))
5
>>> seq[0]
eː
>>> type(seq[0]) # indexing returns a Seg object
<class 'algophon.seg.Seg'>
>>> seq[-2:]
jə
>>> type(seq[-2:]) # slicing returns a new SegStr object
<class 'algophon.segstr.SegStr'>
>>> seq[-2:] == 'j ə' # comparison to str objects works as expected
True
>>> seq[-2:] == 'ə n'
False
```

`SegStr` also implements equivalents of useful str methods.

```python
>>> seq.endswith('j ə')
True
>>> dim_sufx = seq[-2:]
>>> seq.endswith(dim_sufx)
True
>>> seq.startswith(seq[:-2])
True
```

A `SegStr` object hashes to the value of its (space-separated) string:

```python
>>> len({seq, 'eː n t j ə'})
1
>>> seq in {'eː n t j ə'}
True
```

### Natural Class: `NatClass`

**A class to represent a Natural class, in the sense of sets of segments represented intensionally as conjunctions of features.**

```python
>>> son = NatClass(feats={'+son'}, seginv=seginv)
>>> son
[+son]
>>> 'ə' in son
True
>>> 'n' in son
True
>>> 't' in son
False
```

The class also allows you to get the natural class's extension and the extension's complement, relative to the `SegInv` (in our example, only `{ə, t, n, j, eː}` are in `seginv`):

```python
>>> son.extension()
{eː, j, ə, n}
>>> son.extension_complement()
{t}
```

You can also retrieve an extension (complement) directly from a `SegInv` object without creating a `NatClass` obj:

```python
>>> seginv.extension({'+syl'})
{ə, eː}
>>> seginv.extension_complement({'+syl'})
{j, t, n}
```

### Symbols: The `symbols` module

The `symbols` module (techincally just a file...) contains a number of constant variables that store some useful symbols:

```python
LWB = '⋊'
RWB = '⋉'
SYL_BOUNDARY = '.'
PRIMARY_STRESS = 'ˈ'
SEC_STRESS = 'ˌ'
LONG = 'ː'
NASALIZED = '\u0303' # ◌̃
UNDERSPECIFIED = '0'
UNK = '?'
NEG = '¬'
```

These can be accessed like this:

```python
>>> from algophon.symbols import *
>>> NASALIZED
'̃'
>>> f'i{LONG}'
iː
```

## Learning Models

<span style="color:orange">Work in Progress</span>

## Citation

If you use this package in your research, you can use the following citation:

```bibtex
@phdthesis{belth2023towards,
  title={{Towards an Algorithmic Account of Phonological Rules and Representations}},
  author={Belth, Caleb},
  year={2023},
  school={{University of Michigan}}
}
```

## References

- Mortensen, D. R., Littell, P., Bharadwaj, A., Goyal, K., Dyer, C., & Levin, L. (2016, December). Panphon: A resource for mapping IPA segments to articulatory feature vectors. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3475-3484).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "algophon",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "computational linguistics, phonology, morphology, natural language processing",
    "author": "Caleb Belth",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/08/ea/72b7cb78123727d9b051a23ba57fc69d36f0eb4ca9a0339b277f25cc6724/algophon-0.0.9.tar.gz",
    "platform": null,
    "description": "# algophon\n\n**Code for working on computational phonology and morphology in Python.** \n\nThe project is based on code developed by [Caleb Belth](https://cbelth.github.io/) during the course of his PhD; the title of his [dissertation](https://cbelth.github.io/public/assets/documents/belth_dissertation.pdf), *Towards an Algorithmic Account of Phonological Rules and Representations*, serves as the origin for the repository's name *algophon*.\n\nThis is a <span style=\"color:orange\">work in progress</span>. The pypi distribution and documentation will be updated as the project progresses! The initial plan for the project is to include:\n1. Handy tools for working with strings of phonological segments.\n2. Implementations of computational learning models.\n\nItem (1) will be implemented first.\n\n**Suggestions are welcome!**\n\n## Install\n\n```\npip install algophon\n```\n\n## Working With Strings of Segments\n\nThe code at the top level of the package provides some nice functionality for easily working with strings of phonological segments.\n\nThe following examples assume you have imported the appropriate classes:\n\n```python\n>>> from algophon import Seg, SegInv, SegStr, NatClass\n```\n\n### Segments: `Seg`\n\n**A class to represent a phonological segment.**\n\nYou are unlikely to be creating `Seg` objects yourself very often. They will usually be constructed internally by other parts of the package (in particular, see `SegInv` and `SegStr`). However, if you ever need to, creating a `Seg` object requires the following arguments:\n- `ipa`: a `str` IPA symbol\n- `features` (optional): a `dict` of features mapping to their corresponding values\n\n```python\n>>> seg = Seg(ipa='i', features={'syl': '+', 'voi': '+', 'stri': '0'})\n```\n\nWhat is important to know is how `Seg` objects behave, and why they are handy.\n\n<span style=\"color:green\">**First**</span>, in the important respects `Seg` behaves like the `str` IPA segment used to create it.\n\nIf you `print` a `Seg` object, it will print its IPA:\n\n```python\n>>> print(seg)\ni\n```\n\nIf you compare a `Seg` object to a `str`, it will behave like it is the IPA symbol:\n\n```python\n>>> print(seg == 'i')\nTrue\n>>> print(seg == 'e')\nFalse\n```\n\nA `Seg` object hashes to the same value as its IPA symbol:\n\n```python\n>>> print(len({seg, 'i'}))\n1\n>>> print('i' in {seg}, seg in {'i'})\nTrue True\n```\n\n<span style=\"color:green\">**Second**</span>, in the important respects `Seg` behaves like a feature bundle (see also the other classes, where other benefits will become clear).\n\n```python\n>>> print(seg.features['syl'])\n+\n```\n\n<span style=\"color:green\">**Third**</span>, `Seg` handles IPA symbols that are longer than one unicode char.\n\n```python\n>>> tsh = Seg(ipa='t\u0361\u0283')\n>>> print(tsh)\nt\u0361\u0283\n>>> print(len(tsh))\n1\n>>> from algophon.symbols import LONG # see description of symbols below\n>>> long_i = Seg(ipa=f'i{LONG}')\n>>> print(long_i)\ni\u02d0\n>> print(len(long_i))\n1\n```\n\n### Segment Inventory: `SegInv`\n\n**A class to represent an inventory of phonological segments (Seg objects).**\n\nA `SegInv` object is a collection of `Seg` objects. A `SegInv` requires no arguments to construct, though it provides two optional arguments:\n- `ipa_file_path`: a `str` pointing to a file of segment-feature mappings.\n- `sep`: a `str` specifying the column separator of the `ipa_file_path` file.\n\nBy default, `SegInv` uses [Panphon](https://github.com/dmort27/panphon) (Mortensen et. al., 2016) features. The optional parameters allow you to use your own features. The file at `ipa_file_path` must be formatted like this:\n- The first row must be a header of feature names, separated by the `sep` (by default, `\\t`)\n- The first column must contain the segment IPAs (the header row can have anything, e.g., `SEG`)\n- The remaining columns (non first row) must contain the feature values.\n\nWhen a `SegInv` object is created, it is empty:\n\n```python\n>>> seginv = SegInv()\n>>> seginv\nSegInv of size 0\n```\n\nYou can add segments by the `add`, `add_segments`, and `add_segments_by_str` methods:\n\n```python\n>>> seginv.add('i')\n>>> print(seginv.segs)\n{i}\n>>> seginv.add_segs({'p', 'b', 't', 'd'})\n>>> print(seginv.segs)\n{b, t, d, i, p}\n>>> seginv.add_segs_by_str('e\u02d0 n t j \u0259') # segments in str must be space-separated\n>>> print(seginv.segs)\n{b, t, d, i, j, n, p, \u0259, e\u02d0}\n```\n\nThe reason that `add_segs_by_str` requires the segments be space-separated is because not all IPA symbols are only one char (e.g., `'e\u02d0'`). Moreover, this is consistent with the [Sigmorphon](https://github.com/sigmorphon) challenges data format commonly used in morphophonology tasks.\n\nThese `add*` methods automatically create `Seg` objects and assign them `features` based on either Panphon (default) or the `ipa_file_path` file.\n\n```python\n>>> print(seginv['e\u02d0'].features)\n{'syl': '+', 'son': '+', 'cons': '-', 'cont': '+', 'delrel': '-', 'lat': '-', 'nas': '-', 'strid': '0', 'voi': '+', 'sg': '-', 'cg': '-', 'ant': '0', 'cor': '-', 'distr': '0', 'lab': '-', 'hi': '-', 'lo': '-', 'back': '-', 'round': '-', 'velaric': '-', 'tense': '+', 'long': '+', 'hitone': '0', 'hireg': '0'}\n```\n\nThis also demonstrates that `seginv` operates like a dictionary in that you can retrieve and check the existence of segments by their IPA.\n\n```python\n>>> 'e\u02d0' in seginv\nTrue\n```\n\n### Strings of Segments: `SegStr`\n\n**A class to represent a sequence of phonological segments (Seg objects).**\n\nThe class `SegStr` allows for handling several tricky aspects of IPA sequences. It is common practice to represent strings of IPA sequences in a space-separated fashion such that, for example, [e\u02d0ntj\u0259] is represented `'e\u02d0 n t j \u0259'`.\n\nCreating a `SegStr` object requires the following arguments:\n  - `segs`: a collection of segments, which can be in any of the following formats:\n    - str of IPA symbols, where each symbol is separated by a space ' ' (**most common**)\n    - list of IPA symbols\n    - list of Seg objects\n  - `seginv`: a `SegInv` object\n\n```python\n>>> seginv = SegInv() # init SegInv\n>>> seq = SegStr('e\u02d0 n t j \u0259', seginv)\n>>> print(seq)\ne\u02d0ntj\u0259\n```\n\nCreating the `SegStr` object automatically adds the segments in the object to the `SegInv` object.\n\n```python\n>>> print(seginv.segs)\n{\u0259, t, n, j, e\u02d0}\n```\n\nFor clean visuzliation, `SegStr` displays the sequence of segments without spaces, as `print(seq)` shows above. But internally a `SegStr` object knows what the segments are:\n\n```python\n>>> print(len(seq))\n5\n>>> seq[0]\ne\u02d0\n>>> type(seq[0]) # indexing returns a Seg object\n<class 'algophon.seg.Seg'>\n>>> seq[-2:]\nj\u0259\n>>> type(seq[-2:]) # slicing returns a new SegStr object\n<class 'algophon.segstr.SegStr'>\n>>> seq[-2:] == 'j \u0259' # comparison to str objects works as expected\nTrue\n>>> seq[-2:] == '\u0259 n'\nFalse\n```\n\n`SegStr` also implements equivalents of useful str methods.\n\n```python\n>>> seq.endswith('j \u0259')\nTrue\n>>> dim_sufx = seq[-2:]\n>>> seq.endswith(dim_sufx)\nTrue\n>>> seq.startswith(seq[:-2])\nTrue\n```\n\nA `SegStr` object hashes to the value of its (space-separated) string:\n\n```python\n>>> len({seq, 'e\u02d0 n t j \u0259'})\n1\n>>> seq in {'e\u02d0 n t j \u0259'}\nTrue\n```\n\n### Natural Class: `NatClass`\n\n**A class to represent a Natural class, in the sense of sets of segments represented intensionally as conjunctions of features.**\n\n```python\n>>> son = NatClass(feats={'+son'}, seginv=seginv)\n>>> son\n[+son]\n>>> '\u0259' in son\nTrue\n>>> 'n' in son\nTrue\n>>> 't' in son\nFalse\n```\n\nThe class also allows you to get the natural class's extension and the extension's complement, relative to the `SegInv` (in our example, only `{\u0259, t, n, j, e\u02d0}` are in `seginv`):\n\n```python\n>>> son.extension()\n{e\u02d0, j, \u0259, n}\n>>> son.extension_complement()\n{t}\n```\n\nYou can also retrieve an extension (complement) directly from a `SegInv` object without creating a `NatClass` obj:\n\n```python\n>>> seginv.extension({'+syl'})\n{\u0259, e\u02d0}\n>>> seginv.extension_complement({'+syl'})\n{j, t, n}\n```\n\n### Symbols: The `symbols` module\n\nThe `symbols` module (techincally just a file...) contains a number of constant variables that store some useful symbols:\n\n```python\nLWB = '\u22ca'\nRWB = '\u22c9'\nSYL_BOUNDARY = '.'\nPRIMARY_STRESS = '\u02c8'\nSEC_STRESS = '\u02cc'\nLONG = '\u02d0'\nNASALIZED = '\\u0303' # \u25cc\u0303\nUNDERSPECIFIED = '0'\nUNK = '?'\nNEG = '\u00ac'\n```\n\nThese can be accessed like this:\n\n```python\n>>> from algophon.symbols import *\n>>> NASALIZED\n'\u0303'\n>>> f'i{LONG}'\ni\u02d0\n```\n\n## Learning Models\n\n<span style=\"color:orange\">Work in Progress</span>\n\n## Citation\n\nIf you use this package in your research, you can use the following citation:\n\n```bibtex\n@phdthesis{belth2023towards,\n  title={{Towards an Algorithmic Account of Phonological Rules and Representations}},\n  author={Belth, Caleb},\n  year={2023},\n  school={{University of Michigan}}\n}\n```\n\n## References\n\n- Mortensen, D. R., Littell, P., Bharadwaj, A., Goyal, K., Dyer, C., & Levin, L. (2016, December). Panphon: A resource for mapping IPA segments to articulatory feature vectors. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3475-3484).\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Tools for an algorithmic approach to phonology (some useful to computational phonology and morphology more broadly)",
    "version": "0.0.9",
    "project_urls": {
        "Homepage": "https://github.com/cbelth/algophon"
    },
    "split_keywords": [
        "computational linguistics",
        " phonology",
        " morphology",
        " natural language processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ea70d4cb082221374b721919e5eaa0ad9a177c244eaef3808c76b59401bcaf09",
                "md5": "e1e6e2a82c38785a5f4c042456872a60",
                "sha256": "a5aec841f6609b889f2d859d45b7d9cf23c8e50d03129bbf7a565368a9b779db"
            },
            "downloads": -1,
            "filename": "algophon-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e1e6e2a82c38785a5f4c042456872a60",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 49636,
            "upload_time": "2024-05-06T15:46:40",
            "upload_time_iso_8601": "2024-05-06T15:46:40.411548Z",
            "url": "https://files.pythonhosted.org/packages/ea/70/d4cb082221374b721919e5eaa0ad9a177c244eaef3808c76b59401bcaf09/algophon-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "08ea72b7cb78123727d9b051a23ba57fc69d36f0eb4ca9a0339b277f25cc6724",
                "md5": "e8d8128fc954cae896bfae9ab7662d3c",
                "sha256": "702c6520e19b1d23004be44d2087e66fca50aff4a25df0c446dbefce348af80a"
            },
            "downloads": -1,
            "filename": "algophon-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "e8d8128fc954cae896bfae9ab7662d3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 53492,
            "upload_time": "2024-05-06T15:46:42",
            "upload_time_iso_8601": "2024-05-06T15:46:42.011468Z",
            "url": "https://files.pythonhosted.org/packages/08/ea/72b7cb78123727d9b051a23ba57fc69d36f0eb4ca9a0339b277f25cc6724/algophon-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-06 15:46:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cbelth",
    "github_project": "algophon",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "algophon"
}

Caleb Belth