praat-textgrids


Namepraat-textgrids JSON
Version 1.4.0 PyPI version JSON
download
home_pagehttp://github.com/Legisign/Praat-textgrids
SummaryManipulation of Praat TextGrids
upload_time2022-10-12 19:59:53
maintainer
docs_urlNone
authorLegisign.org
requires_python
licenseGPLv3
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # praat-textgrids -- Praat TextGrid manipulation in Python

## Description

`textgrids` is a module for handling Praat TextGrid files in any format (short text, long text, or binary). The module implements five classes, from largest to smallest:

* `TextGrid` -- a `dict` with tier names as keys and `Tier`s as values
* `Tier` -- a `list` of either `Interval` or `Point` objects
* `Interval` -- an `object` representing Praat intervals
* `Point` -- a `namedtuple` representing Praat points
* `Transcript` -- a `str` with special methods for transcription handling

All Praat text objects are represented as `Transcript` objects.

The module also exports the following variables:

* `diacritics` -- a `dict` of all diacritics with their Unicode counterparts
* `inline_diacritics` -- a `dict` of inline (symbol-like) diacritics
* `index_diacritics` -- a `dict` of over/understrike diacritics
* `symbols` -- a `dict` of special Praat symbols with their Unicode counterparts
* `vowels` -- a `list` of all vowels in either Praat or Unicode notation

And the following constants (although they are **not** actually constants in Python, they SHOULDN’T be changed):

* `BINARY` -- symbolic name for the binary file format
* `TEXT_LONG` -- symbolic name for the long text file format
* `TEXT_SHORT` -- symbolic name for the short text file format
* `version` -- module version as string

## Version

This file documents `praat-textgrids` version 1.4.0.

## Copyright

Copyright © 2019–22 Legisign.org, Tommi Nieminen <software@legisign.org>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program.  If not, see <https://www.gnu.org/licenses/>.

## Module contents

### 0. Module properties

Besides `textgrids.version`, which contains the module version number as string, the module exports the following properties:

#### 0.1. symbols

`symbols` is a `dict` that contains all the Praat special notation symbols (as keys) and their Unicode counterparts (as values).

#### 0.2. vowels

`vowels` is a `list` of all vowel symbols in either Praat notation (e.g., `"\as"`) or in Unicode. It is used by `Interval` methods `containsvowel()` and `startswithvowel()`, so changing it, for example, adding new symbols to it or removing symbols used for other purposes in a specific case, will change how those methods function.

#### 0.3. diacritics, inline_diacritics, and index_diacritics

`diacritics` is a `dict` of all diacritics in Praat notation (as keys) and their Unicode counterparts (as values).

`inline_diacritics` and `index_diacritics` are subsets of `diacritics`. The former are semantically diacritics but appear as inline symbols, the latter are the "true" diacritics (i.e., under- or overstrikes) that need special handling when transcoding.

### 0.4. TEXT_LONG, TEXT_SHORT, BINARY

Symbolic constants specifying different file formats in `TextGrid.format()` and `TextGrid.write()` methods. Internally they are just small integers (0, 1, and 2, respectively). The default format is `TEXT_LONG`.

### 1. TextGrid

`TextGrid` is an `collections.OrderedDict` whose keys are tier names (strings) and values are `Tier` objects. The constructor takes an optional filename argument for easy loading and parsing textgrid files.

#### 1.1. Properties

All the properties of `dict` plus:

* `filename` holds the textgrid filename, if any. `read()` and `write()` methods both set or update it.

#### 1.2. Methods

All the methods of `dict` plus:

* `parse()` -- parse string into a TextGrid
* `read()` -- read (and parse) a TextGrid file
* `tier_from_csv()` -- read a textgrid tier from a CSV file
* `tier_to_csv()` -- write a textgrid tier into a CSV file
* `write()` -- write a TextGrid file

`parse()` takes an obligatory string (or `bytes`) argument which contains textgrid data in any of Praat’s three formats (long text, short text, or binary).

`read()` and `write()` both take an obligatory filename argument.

`write()` can take an optional argument specifying the file format; this can be one of `BINARY` (= `int` 2), `TEXT_LONG` (= `int` 0, the default), or `TEXT_SHORT` (= `int` 1).

`tier_from_csv()` and `tier_to_csv()` both take two obligatory arguments, the tier name and the filename, in that order.

### 2. Tier

`Tier` is a list of either `Interval` or `Point` objects.

**NOTE:** `Tier` only allows adding `Interval` or `Point` objects. Adding anything else or mixing `Interval`s and `Point`s will trigger an exception.

#### 2.2. Properties

All the properties of `list` plus:

* `is_point_tier` -- `bool` `True` for point tier, `False` for interval tier.
* `tier_type` -- `str`, either `"IntervalTier"` or `"PointTier"`

`tier_type` exists principally for the convenience of the formatting functions.

#### 2.3. Methods

All the methods of `list` plus:

* `merge()` -- merge intervals **NOTE** renamed from 1.3!
* `to_csv()` -- convert tier data into a CSV-like list

`merge()` merges given intervals into one. It takes two arguments, `first=` and `last=`, both of which are integer indexes with the usual Python semantics: 0 stands for the first element, -1 for the last element, these being also the defaults. The function raises a `TypeError` if used with a point tier, and `ValueError` if the parameters do not specify a valid slice. **NB!** This is a function and returns the result instead of modifying the `Tier` in place.

`to_csv()` returns a CSV-like list. It’s mainly intended to be used from the `TextGrid` level method `tier_to_csv()` but can be called directly if writing to a file is not desired.

### 3. Interval

`Interval` is an `object` class representing one Interval on an IntervalTier.

#### 3.1. Properties

* `dur` -- interval duration (`float`)
* `mid` -- interval midpoint (`float`)
* `text` -- text label (`Transcript`)
* `xmax` -- interval end time (`float`)
* `xmin` -- interval start time (`float`)

#### 3.3. Methods

* `containsvowel()` -- does the interval contain a vowel?
* `endswithvowel()` -- does the interval end with a vowel?
* `startswithvowel()` -- does the interval start with a vowel?
* `timegrid()` -- create a time grid

`containsvowel()`, `endswithvowel()`, and `startswithvowel()` are `bool` functions (or predicates, in Lisp-parlance). They check for possible vowels in the `text` property in both Praat notation and Unicode, but can of course make an error if symbols are used in an unexpected way. They don’t take arguments. (Internally, `endswithvowel()` first transcodes the text to IPA removing all diacritics to simplify the test.)

`timegrid()` returns a list of timepoints (in `float`) evenly distributed from `xmin` to `xmax`. It takes an optional integer argument specifying the number of timepoints desired; the default is 3. It raises a `ValueError` if the argument is not an integer or is less than 1.

### 4. Point

`Point` is a `namedtuple` representing one Point on a PointTier.

#### 4.1. Properties

* `text` -- text label (`Transcript`)
* `xpos` -- temporal position (`float`)

### 5. Transcript

`Transcript` is a `str`-derived class with one special method: `transcode()`.

### 5.1. Properties

All the properties of `str`.

#### 5.2. Methods

All the methods of `str` plus:

* `transcode()` -- convert Praat notation to Unicode or vice versa.

Without arguments, `transcode()` assumes its input to be in Praat notation and converts it to Unicode; no check is made as to whether the input really is in Praat notation but nothing **should** happen if it isn’t. User should take care and handle any exceptions.

Optional `to_unicode=False` argument inverts the direction of the transcoding from Unicode to Praat. Again, it is not checked whether input is in Unicode.

With optional `retain_diacritics=True` argument the transcoding does not remove over- and understrike diacritics from the result.

## Examples

### Snippet 1: list syllable durations

    import sys
    import textgrids

    for arg in sys.argv[1:]:
        # Try to open the file as textgrid
        try:
            grid = textgrids.TextGrid(arg)
        # Discard and try the next one
        except:
            continue

        # Assume "syllables" is the name of the tier
        # containing syllable information
        for syll in grid['syllables']:
            # Convert Praat to Unicode in the label
            label = syll.text.transcode()
            # Print label and syllable duration, CSV-like
            print('"{}";{}'.format(label, syll.dur))

### Snippet 2: convert any textgrid to binary format

    import sys
    import os.path
    import textgrids

    for arg in sys.argv[1:]:
        name, ext = os.path.splitext(arg)
        try:
            grid = textgrids.TextGrid(arg)
        except (textgrids.ParseError, textgrids.BinaryError):
            print('Not a recognized file format!', file=sys.stderr)
            continue

        # Write a new file
        grid.write(name + '.bin', fmt=textgrids.BINARY)

            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/Legisign/Praat-textgrids",
    "name": "praat-textgrids",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Legisign.org",
    "author_email": "software@legisign.org",
    "download_url": "https://files.pythonhosted.org/packages/a0/15/8bb4cc6198a46ea9727fb6049ae343e5d306286004bae55f86351c9b1a94/praat-textgrids-1.4.0.tar.gz",
    "platform": null,
    "description": "# praat-textgrids -- Praat TextGrid manipulation in Python\n\n## Description\n\n`textgrids` is a module for handling Praat TextGrid files in any format (short text, long text, or binary). The module implements five classes, from largest to smallest:\n\n* `TextGrid` -- a `dict` with tier names as keys and `Tier`s as values\n* `Tier` -- a `list` of either `Interval` or `Point` objects\n* `Interval` -- an `object` representing Praat intervals\n* `Point` -- a `namedtuple` representing Praat points\n* `Transcript` -- a `str` with special methods for transcription handling\n\nAll Praat text objects are represented as `Transcript` objects.\n\nThe module also exports the following variables:\n\n* `diacritics` -- a `dict` of all diacritics with their Unicode counterparts\n* `inline_diacritics` -- a `dict` of inline (symbol-like) diacritics\n* `index_diacritics` -- a `dict` of over/understrike diacritics\n* `symbols` -- a `dict` of special Praat symbols with their Unicode counterparts\n* `vowels` -- a `list` of all vowels in either Praat or Unicode notation\n\nAnd the following constants (although they are **not** actually constants in Python, they SHOULDN\u2019T be changed):\n\n* `BINARY` -- symbolic name for the binary file format\n* `TEXT_LONG` -- symbolic name for the long text file format\n* `TEXT_SHORT` -- symbolic name for the short text file format\n* `version` -- module version as string\n\n## Version\n\nThis file documents `praat-textgrids` version 1.4.0.\n\n## Copyright\n\nCopyright \u00a9 2019\u201322 Legisign.org, Tommi Nieminen <software@legisign.org>\n\nThis program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.\n\nThis program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License along with this program.  If not, see <https://www.gnu.org/licenses/>.\n\n## Module contents\n\n### 0. Module properties\n\nBesides `textgrids.version`, which contains the module version number as string, the module exports the following properties:\n\n#### 0.1. symbols\n\n`symbols` is a `dict` that contains all the Praat special notation symbols (as keys) and their Unicode counterparts (as values).\n\n#### 0.2. vowels\n\n`vowels` is a `list` of all vowel symbols in either Praat notation (e.g., `\"\\as\"`) or in Unicode. It is used by `Interval` methods `containsvowel()` and `startswithvowel()`, so changing it, for example, adding new symbols to it or removing symbols used for other purposes in a specific case, will change how those methods function.\n\n#### 0.3. diacritics, inline_diacritics, and index_diacritics\n\n`diacritics` is a `dict` of all diacritics in Praat notation (as keys) and their Unicode counterparts (as values).\n\n`inline_diacritics` and `index_diacritics` are subsets of `diacritics`. The former are semantically diacritics but appear as inline symbols, the latter are the \"true\" diacritics (i.e., under- or overstrikes) that need special handling when transcoding.\n\n### 0.4. TEXT_LONG, TEXT_SHORT, BINARY\n\nSymbolic constants specifying different file formats in `TextGrid.format()` and `TextGrid.write()` methods. Internally they are just small integers (0, 1, and 2, respectively). The default format is `TEXT_LONG`.\n\n### 1. TextGrid\n\n`TextGrid` is an `collections.OrderedDict` whose keys are tier names (strings) and values are `Tier` objects. The constructor takes an optional filename argument for easy loading and parsing textgrid files.\n\n#### 1.1. Properties\n\nAll the properties of `dict` plus:\n\n* `filename` holds the textgrid filename, if any. `read()` and `write()` methods both set or update it.\n\n#### 1.2. Methods\n\nAll the methods of `dict` plus:\n\n* `parse()` -- parse string into a TextGrid\n* `read()` -- read (and parse) a TextGrid file\n* `tier_from_csv()` -- read a textgrid tier from a CSV file\n* `tier_to_csv()` -- write a textgrid tier into a CSV file\n* `write()` -- write a TextGrid file\n\n`parse()` takes an obligatory string (or `bytes`) argument which contains textgrid data in any of Praat\u2019s three formats (long text, short text, or binary).\n\n`read()` and `write()` both take an obligatory filename argument.\n\n`write()` can take an optional argument specifying the file format; this can be one of `BINARY` (= `int` 2), `TEXT_LONG` (= `int` 0, the default), or `TEXT_SHORT` (= `int` 1).\n\n`tier_from_csv()` and `tier_to_csv()` both take two obligatory arguments, the tier name and the filename, in that order.\n\n### 2. Tier\n\n`Tier` is a list of either `Interval` or `Point` objects.\n\n**NOTE:** `Tier` only allows adding `Interval` or `Point` objects. Adding anything else or mixing `Interval`s and `Point`s will trigger an exception.\n\n#### 2.2. Properties\n\nAll the properties of `list` plus:\n\n* `is_point_tier` -- `bool` `True` for point tier, `False` for interval tier.\n* `tier_type` -- `str`, either `\"IntervalTier\"` or `\"PointTier\"`\n\n`tier_type` exists principally for the convenience of the formatting functions.\n\n#### 2.3. Methods\n\nAll the methods of `list` plus:\n\n* `merge()` -- merge intervals **NOTE** renamed from 1.3!\n* `to_csv()` -- convert tier data into a CSV-like list\n\n`merge()` merges given intervals into one. It takes two arguments, `first=` and `last=`, both of which are integer indexes with the usual Python semantics: 0 stands for the first element, -1 for the last element, these being also the defaults. The function raises a `TypeError` if used with a point tier, and `ValueError` if the parameters do not specify a valid slice. **NB!** This is a function and returns the result instead of modifying the `Tier` in place.\n\n`to_csv()` returns a CSV-like list. It\u2019s mainly intended to be used from the `TextGrid` level method `tier_to_csv()` but can be called directly if writing to a file is not desired.\n\n### 3. Interval\n\n`Interval` is an `object` class representing one Interval on an IntervalTier.\n\n#### 3.1. Properties\n\n* `dur` -- interval duration (`float`)\n* `mid` -- interval midpoint (`float`)\n* `text` -- text label (`Transcript`)\n* `xmax` -- interval end time (`float`)\n* `xmin` -- interval start time (`float`)\n\n#### 3.3. Methods\n\n* `containsvowel()` -- does the interval contain a vowel?\n* `endswithvowel()` -- does the interval end with a vowel?\n* `startswithvowel()` -- does the interval start with a vowel?\n* `timegrid()` -- create a time grid\n\n`containsvowel()`, `endswithvowel()`, and `startswithvowel()` are `bool` functions (or predicates, in Lisp-parlance). They check for possible vowels in the `text` property in both Praat notation and Unicode, but can of course make an error if symbols are used in an unexpected way. They don\u2019t take arguments. (Internally, `endswithvowel()` first transcodes the text to IPA removing all diacritics to simplify the test.)\n\n`timegrid()` returns a list of timepoints (in `float`) evenly distributed from `xmin` to `xmax`. It takes an optional integer argument specifying the number of timepoints desired; the default is 3. It raises a `ValueError` if the argument is not an integer or is less than 1.\n\n### 4. Point\n\n`Point` is a `namedtuple` representing one Point on a PointTier.\n\n#### 4.1. Properties\n\n* `text` -- text label (`Transcript`)\n* `xpos` -- temporal position (`float`)\n\n### 5. Transcript\n\n`Transcript` is a `str`-derived class with one special method: `transcode()`.\n\n### 5.1. Properties\n\nAll the properties of `str`.\n\n#### 5.2. Methods\n\nAll the methods of `str` plus:\n\n* `transcode()` -- convert Praat notation to Unicode or vice versa.\n\nWithout arguments, `transcode()` assumes its input to be in Praat notation and converts it to Unicode; no check is made as to whether the input really is in Praat notation but nothing **should** happen if it isn\u2019t. User should take care and handle any exceptions.\n\nOptional `to_unicode=False` argument inverts the direction of the transcoding from Unicode to Praat. Again, it is not checked whether input is in Unicode.\n\nWith optional `retain_diacritics=True` argument the transcoding does not remove over- and understrike diacritics from the result.\n\n## Examples\n\n### Snippet 1: list syllable durations\n\n    import sys\n    import textgrids\n\n    for arg in sys.argv[1:]:\n        # Try to open the file as textgrid\n        try:\n            grid = textgrids.TextGrid(arg)\n        # Discard and try the next one\n        except:\n            continue\n\n        # Assume \"syllables\" is the name of the tier\n        # containing syllable information\n        for syll in grid['syllables']:\n            # Convert Praat to Unicode in the label\n            label = syll.text.transcode()\n            # Print label and syllable duration, CSV-like\n            print('\"{}\";{}'.format(label, syll.dur))\n\n### Snippet 2: convert any textgrid to binary format\n\n    import sys\n    import os.path\n    import textgrids\n\n    for arg in sys.argv[1:]:\n        name, ext = os.path.splitext(arg)\n        try:\n            grid = textgrids.TextGrid(arg)\n        except (textgrids.ParseError, textgrids.BinaryError):\n            print('Not a recognized file format!', file=sys.stderr)\n            continue\n\n        # Write a new file\n        grid.write(name + '.bin', fmt=textgrids.BINARY)\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "Manipulation of Praat TextGrids",
    "version": "1.4.0",
    "project_urls": {
        "Homepage": "http://github.com/Legisign/Praat-textgrids"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "198bde9acaf09cf119f3526dc9fdee6aa36f6ea12c70997e1b219bf5992549c9",
                "md5": "1ffd2e3647103f323849f7cc701b6c84",
                "sha256": "288ebf4061f2994adc0a6110df28eebd3b7fb0c68e3196e1f222d6b4f33fa823"
            },
            "downloads": -1,
            "filename": "praat_textgrids-1.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1ffd2e3647103f323849f7cc701b6c84",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 25669,
            "upload_time": "2022-10-12T19:59:51",
            "upload_time_iso_8601": "2022-10-12T19:59:51.199127Z",
            "url": "https://files.pythonhosted.org/packages/19/8b/de9acaf09cf119f3526dc9fdee6aa36f6ea12c70997e1b219bf5992549c9/praat_textgrids-1.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a0158bb4cc6198a46ea9727fb6049ae343e5d306286004bae55f86351c9b1a94",
                "md5": "7b03421b4a8d4ebe3b363604a3619198",
                "sha256": "57d86adcbb01722e732a898e37c85833a6326731e2c97802b18793ef1a64602c"
            },
            "downloads": -1,
            "filename": "praat-textgrids-1.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7b03421b4a8d4ebe3b363604a3619198",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 25473,
            "upload_time": "2022-10-12T19:59:53",
            "upload_time_iso_8601": "2022-10-12T19:59:53.586498Z",
            "url": "https://files.pythonhosted.org/packages/a0/15/8bb4cc6198a46ea9727fb6049ae343e5d306286004bae55f86351c9b1a94/praat-textgrids-1.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-10-12 19:59:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Legisign",
    "github_project": "Praat-textgrids",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "praat-textgrids"
}
        
Elapsed time: 0.66482s