# SoundVectors
This lightweight Python package provides a robust tool for translating sounds into phonological feature vectors. It is described in detail in our study "A Generative System for Translating Sounds to Phonological Feature Vectors". If you use the package, we ask you kindly to cite this paper.
> Rubehn, Arne, Jessica Nieder, and Johann-Mattis List (2024): A Generative System for Translating Sounds to Phonological Feature Vectors. +++
[![Build Status](https://github.com/cldf-clts/soundvectors/workflows/tests/badge.svg)](https://github.com/cldf-clts/soundvectors/actions?query=workflow%3Atests)
[![PyPI](https://img.shields.io/pypi/v/soundvectors.svg)](https://pypi.org/project/soundvectors)
## Installation
You can install the `soundvectors` package via `pip`.
```
pip install soundvectors
```
### Requirements for running the evaluation
If you wish to reproduce the evaluation from our paper, you require some additional dependencies that are not required by the core package. To install them, clone this repository and run:
```
$ pip install -e .[dev]
```
You also need to download the evaluation data from Lexibank. For this, `cd` into the `eval` directory and run:
```bash
soundvectors$ cd eval # cd into eval directory
eval$ make download
```
This will clone the [`lexibank-analysed`](https://github.com/lexibank/lexibank-analysed) dataset into the `eval` directory.
After running the evaluation scripts, you can clear the data from your disk by running the command:
```bash
eval$ make clear
```
## Usage
The core of this package is the `SoundVectors` class, which translates valid IPA symbols to their corresponding feature vectors.
The recommended usage of `SoundVectors` is passing a callable transcription system via the keyword argument `ts`:
```python
>>> from soundvectors import SoundVectors
>>> from pyclts import CLTS
>>> bipa = CLTS().bipa
>>> sv = SoundVectors(ts=bipa)
>>> sv.get_vec("t")
(1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 0, 0, -1, -1, -1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0)
```
Alternatively, the `get_vec` function can be called passing a `Sound` object (derived from `soundvectors`), or a string describing the sound according to IPA conventions. The resulting vectors are the same:
```python
>>> sv.get_vec("voiceless alveolar stop consonant") == sv.get_vec("t") == sv.get_vec(bipa["t"])
True
```
Instead of obtaining a vector directly, you can also obtain a `FeatureBundle` object:
```python
>>> feature_bundle = sv["t"] # set vectorize=False to return an object
>>> feature_bundle.cons # feature values can be retrieved by attribute access
1
>>> feature_bundle.as_set() # represent feature bundle as set of non-zero feature strings
frozenset({'-son', '-distr', '-cont', '-lab', '-lo', '-long', '+front', '-laryngeal', '-syl', '-delrel', '-voi', '-round', '+cons', '-velaric', '-dorsal', '-back', '-nas', '-pharyngeal', '+ant', '+cor', '-cg', '-sg', '-lat', '-hi'})
>>> str(feature_bundle) # string representation
'+cons,-syl,-son,-cont,-delrel,-lat,-nas,-voi,-sg,-cg,-pharyngeal,-laryngeal,+cor,-dorsal,-lab,-hi,-lo,-back,+front,0_tense,-round,-velaric,-long,+ant,-distr,0_strid,0_hitone,0_hireg,0_loreg,0_rising,0_falling,0_contour,0_backshift,0_frontshift,0_opening,0_closing,0_centering,0_longdistance,0_secondrounded'
>>> feature_bundle.as_vector() # raw vector representation (equal to the return value with vectorize=True)
(1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0)
```
Finally, you can `__call__` the `SoundVectors` object to process a `Collection` of sounds:
```python
>> > sv(["s", "v"])
[(1, -1, -1, -1, ..., 0),
(1, -1, -1, 1, ..., 0)]
```
## Evaluation
The `eval` directory provides the code that was used for the Evaluation section in the paper. If you wish to reproduce our results reported in the paper, make sure that you have installed the dependencies and downloaded the data (see above). Then, you can simply run all evaluation scripts - with each file corresponding to a subsection of the paper with the same name:
```bash
$ cd eval
$ python vector_similarities.py # 4.1 & 4.2
$ python equivalence_classes.py # 4.3
$ python distinctiveness.py # 4.4
$ python concordanceline.py # 4.4
```
Raw data
{
"_id": null,
"home_page": "https://github.com/cldf-clts/soundvectors",
"name": "soundvectors",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "linguistics, speech sounds, feature vectors, CLTS",
"author": "Arne Rubehn",
"author_email": "arne.rubehn@uni-passau.de",
"download_url": "https://files.pythonhosted.org/packages/48/af/3bb924320bc0eed8c4ed95f321a9b271154e89b160efad96bd75c6431c65/soundvectors-1.0.tar.gz",
"platform": "any",
"description": "# SoundVectors\n\nThis lightweight Python package provides a robust tool for translating sounds into phonological feature vectors. It is described in detail in our study \"A Generative System for Translating Sounds to Phonological Feature Vectors\". If you use the package, we ask you kindly to cite this paper.\n\n> Rubehn, Arne, Jessica Nieder, and Johann-Mattis List (2024): A Generative System for Translating Sounds to Phonological Feature Vectors. +++\n\n[![Build Status](https://github.com/cldf-clts/soundvectors/workflows/tests/badge.svg)](https://github.com/cldf-clts/soundvectors/actions?query=workflow%3Atests)\n[![PyPI](https://img.shields.io/pypi/v/soundvectors.svg)](https://pypi.org/project/soundvectors)\n\n\n## Installation\n\nYou can install the `soundvectors` package via `pip`.\n\n```\npip install soundvectors\n```\n\n### Requirements for running the evaluation\n\nIf you wish to reproduce the evaluation from our paper, you require some additional dependencies that are not required by the core package. To install them, clone this repository and run:\n\n```\n$ pip install -e .[dev]\n```\n\nYou also need to download the evaluation data from Lexibank. For this, `cd` into the `eval` directory and run:\n\n```bash\nsoundvectors$ cd eval # cd into eval directory\neval$ make download\n```\n\nThis will clone the [`lexibank-analysed`](https://github.com/lexibank/lexibank-analysed) dataset into the `eval` directory.\n\nAfter running the evaluation scripts, you can clear the data from your disk by running the command:\n\n```bash\neval$ make clear\n```\n\n## Usage\n\nThe core of this package is the `SoundVectors` class, which translates valid IPA symbols to their corresponding feature vectors.\nThe recommended usage of `SoundVectors` is passing a callable transcription system via the keyword argument `ts`:\n\n```python\n>>> from soundvectors import SoundVectors\n>>> from pyclts import CLTS\n>>> bipa = CLTS().bipa\n>>> sv = SoundVectors(ts=bipa)\n>>> sv.get_vec(\"t\")\n(1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 0, 0, -1, -1, -1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0)\n```\n\nAlternatively, the `get_vec` function can be called passing a `Sound` object (derived from `soundvectors`), or a string describing the sound according to IPA conventions. The resulting vectors are the same:\n\n```python\n>>> sv.get_vec(\"voiceless alveolar stop consonant\") == sv.get_vec(\"t\") == sv.get_vec(bipa[\"t\"])\nTrue\n```\n\nInstead of obtaining a vector directly, you can also obtain a `FeatureBundle` object: \n\n```python\n>>> feature_bundle = sv[\"t\"] # set vectorize=False to return an object\n>>> feature_bundle.cons # feature values can be retrieved by attribute access\n1\n\n>>> feature_bundle.as_set() # represent feature bundle as set of non-zero feature strings\nfrozenset({'-son', '-distr', '-cont', '-lab', '-lo', '-long', '+front', '-laryngeal', '-syl', '-delrel', '-voi', '-round', '+cons', '-velaric', '-dorsal', '-back', '-nas', '-pharyngeal', '+ant', '+cor', '-cg', '-sg', '-lat', '-hi'})\n\n>>> str(feature_bundle) # string representation\n'+cons,-syl,-son,-cont,-delrel,-lat,-nas,-voi,-sg,-cg,-pharyngeal,-laryngeal,+cor,-dorsal,-lab,-hi,-lo,-back,+front,0_tense,-round,-velaric,-long,+ant,-distr,0_strid,0_hitone,0_hireg,0_loreg,0_rising,0_falling,0_contour,0_backshift,0_frontshift,0_opening,0_closing,0_centering,0_longdistance,0_secondrounded'\n\n>>> feature_bundle.as_vector() # raw vector representation (equal to the return value with vectorize=True)\n(1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0,\n 0, 0, 0, 0, 0, 0)\n```\n\nFinally, you can `__call__` the `SoundVectors` object to process a `Collection` of sounds:\n\n```python\n>> > sv([\"s\", \"v\"])\n[(1, -1, -1, -1, ..., 0),\n (1, -1, -1, 1, ..., 0)]\n```\n\n\n## Evaluation\n\nThe `eval` directory provides the code that was used for the Evaluation section in the paper. If you wish to reproduce our results reported in the paper, make sure that you have installed the dependencies and downloaded the data (see above). Then, you can simply run all evaluation scripts - with each file corresponding to a subsection of the paper with the same name:\n\n```bash\n$ cd eval\n$ python vector_similarities.py # 4.1 & 4.2\n$ python equivalence_classes.py # 4.3\n$ python distinctiveness.py # 4.4\n$ python concordanceline.py # 4.4\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vectorizing Speech Sounds in Phonetic Transcription",
"version": "1.0",
"project_urls": {
"Bug Tracker": "https://github.com/cldf-clts/soundvectors/issues",
"Homepage": "https://github.com/cldf-clts/soundvectors"
},
"split_keywords": [
"linguistics",
" speech sounds",
" feature vectors",
" clts"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8766567fe022d3b3685ffa268e2769789b51f0449a483bca004cc7bfaf8889ca",
"md5": "28afbad6ec6cba4beb98b23fa5bcf2d8",
"sha256": "6b2de24aba7369d97ec35c59bb73188d3072e0b0db76146de1b46f4ab977bd23"
},
"downloads": -1,
"filename": "soundvectors-1.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "28afbad6ec6cba4beb98b23fa5bcf2d8",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 10483,
"upload_time": "2024-05-06T05:15:21",
"upload_time_iso_8601": "2024-05-06T05:15:21.649082Z",
"url": "https://files.pythonhosted.org/packages/87/66/567fe022d3b3685ffa268e2769789b51f0449a483bca004cc7bfaf8889ca/soundvectors-1.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "48af3bb924320bc0eed8c4ed95f321a9b271154e89b160efad96bd75c6431c65",
"md5": "655cfbf61f2b8603f68fbbfc2fae7c0f",
"sha256": "dc6515e7a78c7d5f53715a37fc8076dd7f1c4cff05648986d90cab10c7892574"
},
"downloads": -1,
"filename": "soundvectors-1.0.tar.gz",
"has_sig": false,
"md5_digest": "655cfbf61f2b8603f68fbbfc2fae7c0f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 13146,
"upload_time": "2024-05-06T05:15:23",
"upload_time_iso_8601": "2024-05-06T05:15:23.697001Z",
"url": "https://files.pythonhosted.org/packages/48/af/3bb924320bc0eed8c4ed95f321a9b271154e89b160efad96bd75c6431c65/soundvectors-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-06 05:15:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cldf-clts",
"github_project": "soundvectors",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "soundvectors"
}