fst-lookup


Namefst-lookup JSON
Version 2024.7.3 PyPI version JSON
download
home_pagehttps://github.com/eddieantonio/fst-lookup
SummaryLookup Foma FSTs
upload_time2024-07-03 13:49:22
maintainerNone
docs_urlNone
authorEddie Antonio Santos
requires_python<4.0,>=3.8
licenseMIT
keywords fst lookup transducer morphology foma
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            FST Lookup
==========

[![Tests](https://github.com/eddieantonio/fst-lookup/actions/workflows/python-package.yml/badge.svg)](https://github.com/eddieantonio/fst-lookup/actions/workflows/python-package.yml)
[![codecov](https://codecov.io/gh/eddieantonio/fst-lookup/branch/master/graph/badge.svg)](https://codecov.io/gh/eddieantonio/fst-lookup)
[![PyPI version](https://img.shields.io/pypi/v/fst-lookup.svg)](https://pypi.org/project/fst-lookup/)
[![calver YYYY.MM.DD](https://img.shields.io/badge/calver-YYYY.MM.DD-22bfda.svg)](http://calver.org/)

Implements lookup for [Foma][] finite state transducers.

Supports Python 3.5 and up.

[Foma]: https://fomafst.github.io/


Install
-------

    pip install fst-lookup

Usage
-----

Import the library, and load an FST from a file:

> Hint: Test this module by [downloading the `eat` FST](https://github.com/eddieantonio/fst-lookup/raw/master/tests/data/eat.fomabin)!

```python
>>> from fst_lookup import FST
>>> fst = FST.from_file('eat.fomabin')
```

### Assumed format of the FSTs

`fst_lookup` assumes that the **lower** label corresponds to the surface
form, while the **upper** label corresponds to the lemma, and linguistic
tags and features: e.g., your `LEXC` will look something like
this—note what is on each side of the colon (`:`):

```lexc
Multichar_Symbols +N +Sg +Pl
Lexicon Root
    cow+N+Sg:cow #;
    cow+N+Pl:cows #;
    goose+N+Sg:goose #;
    goose+N+Pl:geese #;
    sheep+N+Sg:sheep #;
    sheep+N+Pl:sheep #;
```

If your FST has labels on the opposite sides—e.g., the **upper** label
corresponds to the surface form and the **upper** label corresponds to
the lemma and linguistic tags—then instantiate the FST by providing
the `labels="invert"` keyword argument:

```python
fst = FST.from_file('eat-inverted.fomabin', labels="invert")
```

> **Hint**: FSTs originating from the HFST suite are often inverted, so
> try to loading the FST inverted first if `.generate()` or `.analyze()`
> aren't working correctly!


### Analyze a word form

To _analyze_ a form (take a word form, and get its linguistic analyzes)
call the `analyze()` function:

```python
def analyze(self, surface_form: str) -> Iterator[Analysis]
```

This will yield all possible linguistic analyses produced by the FST.

An analysis is a tuple of strings. The strings are either linguistic
tags, or the _lemma_ (base form of the word).

`FST.analyze()` is a generator, so you must call `list()` to get a list.

```python
>>> list(sorted(fst.analyze('eats')))
[('eat', '+N', '+Mass'),
 ('eat', '+V', '+3P', '+Sg')]
```


### Generate a word form

To _generate_ a form (take a linguistic analysis, and get its concrete
word forms), call the `generate()` function:

```python
def generate(self, analysis: str) -> Iterator[str]
```

`FST.generate()` is a Python generator, so you must call `list()` to get
a list.

```python
>>> list(fst.generate('eat+V+Past')))
['ate']
```


Contributing
------------

If you plan to contribute code, it is recommended you use [Poetry].
Fork and clone this repository, then install development dependencies
by typing:

    poetry install

Then, do all your development within a virtual environment, managed by
Poetry:

    poetry shell

### Type-checking

This project uses `mypy` to check static types. To invoke it on this
package, type the following:

    mypy -p fst_lookup

### Running tests

To run this project's tests, we use `py.test`:

    poetry run pytest

### C Extension

Building the C extension is handled in `build.py`

To disable building the C extension, add the following line to `.env`:

```sh
export FST_LOOKUP_BUILD_EXT=False
```

(by default, this is `True`).

To enable debugging flags while working on the C extension, add the
following line to `.env`:

```sh
export FST_LOOKUP_DEBUG=TRUE
```

(by default, this is `False`).


### Fixtures

If you are creating or modifying existing test fixtures (i.e., mostly
pre-built FSTs used for testing), you will need the following
dependencies:

 * GNU `make`
 * [Foma][]

Fixtures are stored in `tests/data/`. Here, you will use `make` to
compile all pre-built FSTs from source:

    make

[Poetry]: https://github.com/python-poetry/poetry#poetry-dependency-management-for-python


License
-------

Copyright © 2019–2021 National Research Council Canada.

Licensed under the MIT license.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/eddieantonio/fst-lookup",
    "name": "fst-lookup",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "fst, lookup, transducer, morphology, foma",
    "author": "Eddie Antonio Santos",
    "author_email": "Eddie.Santos@nrc-cnrc.gc.ca",
    "download_url": "https://files.pythonhosted.org/packages/ba/69/32254dd69be5fa111e2323433c22066478fae3d4e6b347fa19e660355474/fst_lookup-2024.7.3.tar.gz",
    "platform": null,
    "description": "FST Lookup\n==========\n\n[![Tests](https://github.com/eddieantonio/fst-lookup/actions/workflows/python-package.yml/badge.svg)](https://github.com/eddieantonio/fst-lookup/actions/workflows/python-package.yml)\n[![codecov](https://codecov.io/gh/eddieantonio/fst-lookup/branch/master/graph/badge.svg)](https://codecov.io/gh/eddieantonio/fst-lookup)\n[![PyPI version](https://img.shields.io/pypi/v/fst-lookup.svg)](https://pypi.org/project/fst-lookup/)\n[![calver YYYY.MM.DD](https://img.shields.io/badge/calver-YYYY.MM.DD-22bfda.svg)](http://calver.org/)\n\nImplements lookup for [Foma][] finite state transducers.\n\nSupports Python 3.5 and up.\n\n[Foma]: https://fomafst.github.io/\n\n\nInstall\n-------\n\n    pip install fst-lookup\n\nUsage\n-----\n\nImport the library, and load an FST from a file:\n\n> Hint: Test this module by [downloading the `eat` FST](https://github.com/eddieantonio/fst-lookup/raw/master/tests/data/eat.fomabin)!\n\n```python\n>>> from fst_lookup import FST\n>>> fst = FST.from_file('eat.fomabin')\n```\n\n### Assumed format of the FSTs\n\n`fst_lookup` assumes that the **lower** label corresponds to the surface\nform, while the **upper** label corresponds to the lemma, and linguistic\ntags and features: e.g., your `LEXC` will look something like\nthis\u2014note what is on each side of the colon (`:`):\n\n```lexc\nMultichar_Symbols +N +Sg +Pl\nLexicon Root\n    cow+N+Sg:cow #;\n    cow+N+Pl:cows #;\n    goose+N+Sg:goose #;\n    goose+N+Pl:geese #;\n    sheep+N+Sg:sheep #;\n    sheep+N+Pl:sheep #;\n```\n\nIf your FST has labels on the opposite sides\u2014e.g., the **upper** label\ncorresponds to the surface form and the **upper** label corresponds to\nthe lemma and linguistic tags\u2014then instantiate the FST by providing\nthe `labels=\"invert\"` keyword argument:\n\n```python\nfst = FST.from_file('eat-inverted.fomabin', labels=\"invert\")\n```\n\n> **Hint**: FSTs originating from the HFST suite are often inverted, so\n> try to loading the FST inverted first if `.generate()` or `.analyze()`\n> aren't working correctly!\n\n\n### Analyze a word form\n\nTo _analyze_ a form (take a word form, and get its linguistic analyzes)\ncall the `analyze()` function:\n\n```python\ndef analyze(self, surface_form: str) -> Iterator[Analysis]\n```\n\nThis will yield all possible linguistic analyses produced by the FST.\n\nAn analysis is a tuple of strings. The strings are either linguistic\ntags, or the _lemma_ (base form of the word).\n\n`FST.analyze()` is a generator, so you must call `list()` to get a list.\n\n```python\n>>> list(sorted(fst.analyze('eats')))\n[('eat', '+N', '+Mass'),\n ('eat', '+V', '+3P', '+Sg')]\n```\n\n\n### Generate a word form\n\nTo _generate_ a form (take a linguistic analysis, and get its concrete\nword forms), call the `generate()` function:\n\n```python\ndef generate(self, analysis: str) -> Iterator[str]\n```\n\n`FST.generate()` is a Python generator, so you must call `list()` to get\na list.\n\n```python\n>>> list(fst.generate('eat+V+Past')))\n['ate']\n```\n\n\nContributing\n------------\n\nIf you plan to contribute code, it is recommended you use [Poetry].\nFork and clone this repository, then install development dependencies\nby typing:\n\n    poetry install\n\nThen, do all your development within a virtual environment, managed by\nPoetry:\n\n    poetry shell\n\n### Type-checking\n\nThis project uses `mypy` to check static types. To invoke it on this\npackage, type the following:\n\n    mypy -p fst_lookup\n\n### Running tests\n\nTo run this project's tests, we use `py.test`:\n\n    poetry run pytest\n\n### C Extension\n\nBuilding the C extension is handled in `build.py`\n\nTo disable building the C extension, add the following line to `.env`:\n\n```sh\nexport FST_LOOKUP_BUILD_EXT=False\n```\n\n(by default, this is `True`).\n\nTo enable debugging flags while working on the C extension, add the\nfollowing line to `.env`:\n\n```sh\nexport FST_LOOKUP_DEBUG=TRUE\n```\n\n(by default, this is `False`).\n\n\n### Fixtures\n\nIf you are creating or modifying existing test fixtures (i.e., mostly\npre-built FSTs used for testing), you will need the following\ndependencies:\n\n * GNU `make`\n * [Foma][]\n\nFixtures are stored in `tests/data/`. Here, you will use `make` to\ncompile all pre-built FSTs from source:\n\n    make\n\n[Poetry]: https://github.com/python-poetry/poetry#poetry-dependency-management-for-python\n\n\nLicense\n-------\n\nCopyright \u00a9 2019\u20132021 National Research Council Canada.\n\nLicensed under the MIT license.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Lookup Foma FSTs",
    "version": "2024.7.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/eddieantonio/fst-lookup/issues",
        "Homepage": "https://github.com/eddieantonio/fst-lookup"
    },
    "split_keywords": [
        "fst",
        " lookup",
        " transducer",
        " morphology",
        " foma"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c7e98e31f377ab3398b134208d98857219c7a0d83ca1b45ad3aadb9661d0e3b1",
                "md5": "1112bbc69e6c0b8362507e8b7f411c55",
                "sha256": "6324a0cb6f45a79251a54da277a60c07915cf66c1dbec41d7da7645fa99f5d4d"
            },
            "downloads": -1,
            "filename": "fst_lookup-2024.7.3-cp38-cp38-macosx_14_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "1112bbc69e6c0b8362507e8b7f411c55",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": "<4.0,>=3.8",
            "size": 18425,
            "upload_time": "2024-07-03T13:49:21",
            "upload_time_iso_8601": "2024-07-03T13:49:21.224411Z",
            "url": "https://files.pythonhosted.org/packages/c7/e9/8e31f377ab3398b134208d98857219c7a0d83ca1b45ad3aadb9661d0e3b1/fst_lookup-2024.7.3-cp38-cp38-macosx_14_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ba6932254dd69be5fa111e2323433c22066478fae3d4e6b347fa19e660355474",
                "md5": "81d7a466bc67de54d84edf42d0878464",
                "sha256": "fe5907921c868c4872985ac9644babb11177d4ce01090265d2f45172e5ee4701"
            },
            "downloads": -1,
            "filename": "fst_lookup-2024.7.3.tar.gz",
            "has_sig": false,
            "md5_digest": "81d7a466bc67de54d84edf42d0878464",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 17617,
            "upload_time": "2024-07-03T13:49:22",
            "upload_time_iso_8601": "2024-07-03T13:49:22.879870Z",
            "url": "https://files.pythonhosted.org/packages/ba/69/32254dd69be5fa111e2323433c22066478fae3d4e6b347fa19e660355474/fst_lookup-2024.7.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-03 13:49:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "eddieantonio",
    "github_project": "fst-lookup",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fst-lookup"
}
        
Elapsed time: 0.30572s