pypykko


Namepypykko JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/fergusq/fst-python
SummaryA pure-python wrapper for the pykko Finnish morphological analyser and inflector
upload_time2025-10-07 10:20:13
maintainerNone
docs_urlNone
authorThéo Salmenkivi-Friberg
requires_python>=3.8
licenseMIT
keywords finnish nlp morphology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyPykko

PyPykko is a wrapper around [pykko](https://github.com/pkauppin/pykko). It provides the basic analysis and generation API in an easily installable package.
PyPykko can be installed without compiling anything (as the transducers are pre-compiled) or pulling in any native dependencies (such as hfst).

This package contains (slightly modified for kfst compatibility versions of) all the files in the tools directory of pykko as well as constants.py and file\_tools.py from the scripts directory and utils.py from the scripts directory as scriptutils.py. It also provides the novel reinflect.py and extras.py. The function `utils.analyze` returns a `NamedTuple` as opposed to the unnamed tuple returned by upstream Pykko as of writing.

## Installation

PyPykko is available on PyPI and can be installed with pip:

```sh
pip install pypykko
```

## Usage

There are two main Python methods `utils.analyze` and `generate.generate_wordform` inherited from Pykko proper; besides these there is `reinflect.reinflect` that is perhaps a more suitable interface for general reinflection. There is also bolted-on alignment support in `extras.analyze_with_compound_parts`.

### reinflect.reinflect

`reinflect.reinflect` tries to reinflect a word to the best of its ability. It can be instructed either with a model word or with a specific form. Further, it can be given the form the original word was in if known ahead of time and the part-of-speech of the word.

```py
>>> from pypykko.reinflect import reinflect
>>> reinflect("mökkiammeemme", model="talossa")
{'mökkiammeessa'}
>>> reinflect("esijuosta", model="katselemme")
{'esijuoksemme'}
>>> reinflect("mökkiammeemme", new_form="+sg+nom")
{'mökkiamme'}
>>> reinflect("möhkö", new_form="+pl+ine+ko")
{'möhköissäkö'}
>>> reinflect("viinissä", model="talot")
{'viinet'}
>>> reinflect("viinissä", model="talot", orig_form="+sg+ine")
{'viinit'}
>>> reinflect("hömppäämme", model="juokset", pos="verb")
{'hömppäät'}
>>> reinflect("hömppäämme", model="juokset", pos="noun")
{'hömpät'}
```


### utils.analyze and extras.analyze\_with\_compound\_parts

`utils.analyze` should be used in most cases:

```py
>>> from pypykko.utils import analyze
>>> analyze("hätkähtäneet")
[PykkoAnalysis(wordform='hätkähtäneet', source='Lexicon', lemma='hätkähtää', pos='verb', homonym='', info='', morphtags='+past+conneg+pl', weight=0.0),
 PykkoAnalysis(wordform='hätkähtäneet', source='Lexicon', lemma='hätkähtää', pos='verb', homonym='', info='', morphtags='+part_past+pl+nom', weight=0.0),
 PykkoAnalysis(wordform='hätkähtäneet', source='Lexicon', lemma='hätkähtänyt', pos='participle', homonym='', info=' ← verb:hätkähtää:+part_past', morphtags='+pl+nom', weight=0.0)]
 ```

The fields of the outcoming tuple are:

* `wordform`: Surface form (input as it is given)
* `source`: The source of the word: eg. `Lexicon` if it is a word known ahead of time, `Guesser|Any` for unknown words and `Lexicon|Pfx` for words analyzed as the compounds of known words.
* `lemma`: The lemma form of the word; notably this can contain pipe symbols to delimit compound parts: `ilma|luukku`. Sometimes Finnish has infix inflection, and the compound parts can be separately inflected (eg. `uudenvuoden` -> `uusi|vuosi`).
* `pos`: The part of speech of the word.
* `homonym`: The homonym number of the word (can be empty). Eg. the word viini has two senses that have slightly different inflection: wine (viini -> viinin) and quiver (viini -> viinen). In cases where such homonyms exist but it is impossible to tell which form is presented (the nominative form viini here), we get both interpretations:
```py
[PykkoAnalysis(wordform='viini', source='Lexicon', lemma='viini', pos='noun', homonym='1', info='', morphtags='+sg+nom', weight=0.0),
 PykkoAnalysis(wordform='viini', source='Lexicon', lemma='viini', pos='noun', homonym='2', info='', morphtags='+sg+nom', weight=0.0)]
```
In cases where the form is unambiguous (eg. viinen), we get only the homonym number that is relevant:
```py
[PykkoAnalysis(wordform='viinen', source='Lexicon', lemma='viini', pos='noun', homonym='2', info='', morphtags='+sg+gen', weight=0.0)]
```
In cases where the homonym is different in different interpretations, we get annotated interpretations:
```py
[PykkoAnalysis(wordform='viinin', source='Lexicon', lemma='viini', pos='noun', homonym='2', info='', morphtags='+pl+ins', weight=0.0),
 PykkoAnalysis(wordform='viinin', source='Lexicon', lemma='viini', pos='noun', homonym='1', info='', morphtags='+sg+gen', weight=0.0)]
```
* `info`: Either a register annotation or information on a derivation, eg:
```py
>>> analyze("höpsöillä")
[PykkoAnalysis(wordform='höpsöillä', source='Lexicon', lemma='höpsö', pos='noun', homonym='', info='⟨coll⟩', morphtags='+pl+ade', weight=0.0), PykkoAnalysis(wordform='höpsöillä', source='Lexicon', lemma='höpsö', pos='adjective', homonym='', info='⟨coll⟩', morphtags='+pl+ade', weight=0.0)]
>>> analyze("kulkenut")
[PykkoAnalysis(wordform='kulkenut', source='Lexicon', lemma='kulkea', pos='verb', homonym='', info='', morphtags='+past+conneg+sg', weight=0.0), PykkoAnalysis(wordform='kulkenut', source='Lexicon', lemma='kulkea', pos='verb', homonym='', info='', morphtags='+part_past+sg+nom', weight=0.0), PykkoAnalysis(wordform='kulkenut', source='Lexicon', lemma='kulkenut', pos='participle', homonym='', info=' ← verb:kulkea:+part_past', morphtags='+sg+nom', weight=0.0)]
```
* `morphtags`: Morphological tags that name the inflectional form.
* `weight`: The weight of this analysis per the FST. Generally, lower weights are more probable.

`extras.analyze\_with\_compound\_parts` is of use when it is useful to know the exact inflected forms of the compound parts of a word.
Eg. when looking at "isonvarpaan", one might want to not only know that it is the compound of "iso" and "varvas" but also that they are in the forms "ison" and "varpaan".
`extras.anlyze\_with\_compound\_parts` returns the character ranges matching compound parts.

```py
>>> analyze_with_compound_parts("isonvarpaan")
[RangedPykkoAnalysis(wordform='isonvarpaan', source='Lexicon', lemma='iso|varvas', pos='noun', homonym='', info='', morphtags='+sg+gen', weight=0.0, ranges=(range(0, 4), range(4, 11)))]
```

### generate.generate\_wordform

`generate\_wordform` is a simple-to-use api to inflect in-lexicon words.

```py
>>> from pypykko.generate import generate_wordform
>>> generate_wordform("höpönassu", "noun", '+pl+abe+ko')
{'höpönassuittako'}
```


## License

PyPykko is licensed under the MIT license like Pykko itself, as it is mostly constituted of Pykko's files with minor modifications. See the LICENSE file for details. Note that kfst (and kfst-rs) have less permissive licenses.

Files from Pykko itself are modified from the version in commit 95f3d51f0e94a1e88ab7c750f2bedcb6b3fd5edd. The compiled transducers are from the same commit.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fergusq/fst-python",
    "name": "pypykko",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "finnish nlp morphology",
    "author": "Th\u00e9o Salmenkivi-Friberg",
    "author_email": "theo.friberg@helsinki.f",
    "download_url": "https://files.pythonhosted.org/packages/f0/84/936aafc7b59575140c76b41bc61ba07f0919463b3d991f91292dd26d0f81/pypykko-0.3.0.tar.gz",
    "platform": null,
    "description": "# PyPykko\n\nPyPykko is a wrapper around [pykko](https://github.com/pkauppin/pykko). It provides the basic analysis and generation API in an easily installable package.\nPyPykko can be installed without compiling anything (as the transducers are pre-compiled) or pulling in any native dependencies (such as hfst).\n\nThis package contains (slightly modified for kfst compatibility versions of) all the files in the tools directory of pykko as well as constants.py and file\\_tools.py from the scripts directory and utils.py from the scripts directory as scriptutils.py. It also provides the novel reinflect.py and extras.py. The function `utils.analyze` returns a `NamedTuple` as opposed to the unnamed tuple returned by upstream Pykko as of writing.\n\n## Installation\n\nPyPykko is available on PyPI and can be installed with pip:\n\n```sh\npip install pypykko\n```\n\n## Usage\n\nThere are two main Python methods `utils.analyze` and `generate.generate_wordform` inherited from Pykko proper; besides these there is `reinflect.reinflect` that is perhaps a more suitable interface for general reinflection. There is also bolted-on alignment support in `extras.analyze_with_compound_parts`.\n\n### reinflect.reinflect\n\n`reinflect.reinflect` tries to reinflect a word to the best of its ability. It can be instructed either with a model word or with a specific form. Further, it can be given the form the original word was in if known ahead of time and the part-of-speech of the word.\n\n```py\n>>> from pypykko.reinflect import reinflect\n>>> reinflect(\"m\u00f6kkiammeemme\", model=\"talossa\")\n{'m\u00f6kkiammeessa'}\n>>> reinflect(\"esijuosta\", model=\"katselemme\")\n{'esijuoksemme'}\n>>> reinflect(\"m\u00f6kkiammeemme\", new_form=\"+sg+nom\")\n{'m\u00f6kkiamme'}\n>>> reinflect(\"m\u00f6hk\u00f6\", new_form=\"+pl+ine+ko\")\n{'m\u00f6hk\u00f6iss\u00e4k\u00f6'}\n>>> reinflect(\"viiniss\u00e4\", model=\"talot\")\n{'viinet'}\n>>> reinflect(\"viiniss\u00e4\", model=\"talot\", orig_form=\"+sg+ine\")\n{'viinit'}\n>>> reinflect(\"h\u00f6mpp\u00e4\u00e4mme\", model=\"juokset\", pos=\"verb\")\n{'h\u00f6mpp\u00e4\u00e4t'}\n>>> reinflect(\"h\u00f6mpp\u00e4\u00e4mme\", model=\"juokset\", pos=\"noun\")\n{'h\u00f6mp\u00e4t'}\n```\n\n\n### utils.analyze and extras.analyze\\_with\\_compound\\_parts\n\n`utils.analyze` should be used in most cases:\n\n```py\n>>> from pypykko.utils import analyze\n>>> analyze(\"h\u00e4tk\u00e4ht\u00e4neet\")\n[PykkoAnalysis(wordform='h\u00e4tk\u00e4ht\u00e4neet', source='Lexicon', lemma='h\u00e4tk\u00e4ht\u00e4\u00e4', pos='verb', homonym='', info='', morphtags='+past+conneg+pl', weight=0.0),\n PykkoAnalysis(wordform='h\u00e4tk\u00e4ht\u00e4neet', source='Lexicon', lemma='h\u00e4tk\u00e4ht\u00e4\u00e4', pos='verb', homonym='', info='', morphtags='+part_past+pl+nom', weight=0.0),\n PykkoAnalysis(wordform='h\u00e4tk\u00e4ht\u00e4neet', source='Lexicon', lemma='h\u00e4tk\u00e4ht\u00e4nyt', pos='participle', homonym='', info=' \u2190 verb:h\u00e4tk\u00e4ht\u00e4\u00e4:+part_past', morphtags='+pl+nom', weight=0.0)]\n ```\n\nThe fields of the outcoming tuple are:\n\n* `wordform`: Surface form (input as it is given)\n* `source`: The source of the word: eg. `Lexicon` if it is a word known ahead of time, `Guesser|Any` for unknown words and `Lexicon|Pfx` for words analyzed as the compounds of known words.\n* `lemma`: The lemma form of the word; notably this can contain pipe symbols to delimit compound parts: `ilma|luukku`. Sometimes Finnish has infix inflection, and the compound parts can be separately inflected (eg. `uudenvuoden` -> `uusi|vuosi`).\n* `pos`: The part of speech of the word.\n* `homonym`: The homonym number of the word (can be empty). Eg. the word viini has two senses that have slightly different inflection: wine (viini -> viinin) and quiver (viini -> viinen). In cases where such homonyms exist but it is impossible to tell which form is presented (the nominative form viini here), we get both interpretations:\n```py\n[PykkoAnalysis(wordform='viini', source='Lexicon', lemma='viini', pos='noun', homonym='1', info='', morphtags='+sg+nom', weight=0.0),\n PykkoAnalysis(wordform='viini', source='Lexicon', lemma='viini', pos='noun', homonym='2', info='', morphtags='+sg+nom', weight=0.0)]\n```\nIn cases where the form is unambiguous (eg. viinen), we get only the homonym number that is relevant:\n```py\n[PykkoAnalysis(wordform='viinen', source='Lexicon', lemma='viini', pos='noun', homonym='2', info='', morphtags='+sg+gen', weight=0.0)]\n```\nIn cases where the homonym is different in different interpretations, we get annotated interpretations:\n```py\n[PykkoAnalysis(wordform='viinin', source='Lexicon', lemma='viini', pos='noun', homonym='2', info='', morphtags='+pl+ins', weight=0.0),\n PykkoAnalysis(wordform='viinin', source='Lexicon', lemma='viini', pos='noun', homonym='1', info='', morphtags='+sg+gen', weight=0.0)]\n```\n* `info`: Either a register annotation or information on a derivation, eg:\n```py\n>>> analyze(\"h\u00f6ps\u00f6ill\u00e4\")\n[PykkoAnalysis(wordform='h\u00f6ps\u00f6ill\u00e4', source='Lexicon', lemma='h\u00f6ps\u00f6', pos='noun', homonym='', info='\u27e8coll\u27e9', morphtags='+pl+ade', weight=0.0), PykkoAnalysis(wordform='h\u00f6ps\u00f6ill\u00e4', source='Lexicon', lemma='h\u00f6ps\u00f6', pos='adjective', homonym='', info='\u27e8coll\u27e9', morphtags='+pl+ade', weight=0.0)]\n>>> analyze(\"kulkenut\")\n[PykkoAnalysis(wordform='kulkenut', source='Lexicon', lemma='kulkea', pos='verb', homonym='', info='', morphtags='+past+conneg+sg', weight=0.0), PykkoAnalysis(wordform='kulkenut', source='Lexicon', lemma='kulkea', pos='verb', homonym='', info='', morphtags='+part_past+sg+nom', weight=0.0), PykkoAnalysis(wordform='kulkenut', source='Lexicon', lemma='kulkenut', pos='participle', homonym='', info=' \u2190 verb:kulkea:+part_past', morphtags='+sg+nom', weight=0.0)]\n```\n* `morphtags`: Morphological tags that name the inflectional form.\n* `weight`: The weight of this analysis per the FST. Generally, lower weights are more probable.\n\n`extras.analyze\\_with\\_compound\\_parts` is of use when it is useful to know the exact inflected forms of the compound parts of a word.\nEg. when looking at \"isonvarpaan\", one might want to not only know that it is the compound of \"iso\" and \"varvas\" but also that they are in the forms \"ison\" and \"varpaan\".\n`extras.anlyze\\_with\\_compound\\_parts` returns the character ranges matching compound parts.\n\n```py\n>>> analyze_with_compound_parts(\"isonvarpaan\")\n[RangedPykkoAnalysis(wordform='isonvarpaan', source='Lexicon', lemma='iso|varvas', pos='noun', homonym='', info='', morphtags='+sg+gen', weight=0.0, ranges=(range(0, 4), range(4, 11)))]\n```\n\n### generate.generate\\_wordform\n\n`generate\\_wordform` is a simple-to-use api to inflect in-lexicon words.\n\n```py\n>>> from pypykko.generate import generate_wordform\n>>> generate_wordform(\"h\u00f6p\u00f6nassu\", \"noun\", '+pl+abe+ko')\n{'h\u00f6p\u00f6nassuittako'}\n```\n\n\n## License\n\nPyPykko is licensed under the MIT license like Pykko itself, as it is mostly constituted of Pykko's files with minor modifications. See the LICENSE file for details. Note that kfst (and kfst-rs) have less permissive licenses.\n\nFiles from Pykko itself are modified from the version in commit 95f3d51f0e94a1e88ab7c750f2bedcb6b3fd5edd. The compiled transducers are from the same commit.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A pure-python wrapper for the pykko Finnish morphological analyser and inflector",
    "version": "0.3.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/fergusq/fst-python/issues",
        "Homepage": "https://github.com/fergusq/fst-python"
    },
    "split_keywords": [
        "finnish",
        "nlp",
        "morphology"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b96534d4e0464a0866e5f8ef0321cb1f712e4efbc0b4bc72cf94c02c9b78764",
                "md5": "4d7932a564e17fe2f169d8b8518c4a12",
                "sha256": "d8e53445287f38d072ca857c43fac324c08892c32c32ad1d5a12ab5f54a33de4"
            },
            "downloads": -1,
            "filename": "pypykko-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4d7932a564e17fe2f169d8b8518c4a12",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8000219,
            "upload_time": "2025-10-07T10:20:07",
            "upload_time_iso_8601": "2025-10-07T10:20:07.477946Z",
            "url": "https://files.pythonhosted.org/packages/0b/96/534d4e0464a0866e5f8ef0321cb1f712e4efbc0b4bc72cf94c02c9b78764/pypykko-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f084936aafc7b59575140c76b41bc61ba07f0919463b3d991f91292dd26d0f81",
                "md5": "a43a464a4cf1ce2f27ec67a665dbd98c",
                "sha256": "b7216a1226a050f11988fe71b835f981961d5b66cec68e45e2609e2e81cebce5"
            },
            "downloads": -1,
            "filename": "pypykko-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a43a464a4cf1ce2f27ec67a665dbd98c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 7994081,
            "upload_time": "2025-10-07T10:20:13",
            "upload_time_iso_8601": "2025-10-07T10:20:13.897804Z",
            "url": "https://files.pythonhosted.org/packages/f0/84/936aafc7b59575140c76b41bc61ba07f0919463b3d991f91292dd26d0f81/pypykko-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-07 10:20:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fergusq",
    "github_project": "fst-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pypykko"
}
        
Elapsed time: 1.23682s