murre


Namemurre JSON
Version 1.4.1 PyPI version JSON
download
home_pagehttps://github.com/mikahama/murre
SummaryThe amazing Murre will normalize non-standard Finnish and Swedish, and dialectalize standard Finnish!
upload_time2024-08-10 15:39:11
maintainerNone
docs_urlNone
authorMika Hämäläinen
requires_pythonNone
licenseApache 2.0
keywords spoken finnish spelling normalization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🐶 Murre 🐕

[![Downloads](https://pepy.tech/badge/murre)](https://pepy.tech/project/murre)


The amazing Murre (*genitive Murren* 🐕) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). 
This repository is maintained by [Mika Hämäläinen](https://mikakalevi.com).

## Installation

This library is designed for Python 3 and it may not work on Python 2.

    pip3 install murre
    python3 -m murre.download
    
## Normalize

To normalize Finnish, all you need to do is to run:

    from murre import normalize_sentence
    
    normalize_sentence("mä syön paljo karkkii")
    >> minä syön paljon karkkia

You can normalize multiple sentences at the same time by running

    from murre import normalize_sentences
    
    sents = ["kissa syö karkkii", "jok laulaa tuol puole", "en tiiä oikee et kuka se o", "kyl on hölömöö"]
    normalize_sentences(sents)
    >> ['kissa syö karkkia', 'joka laulaa tuolla puolen', 'en tiedä oikein että kuka se on', 'kyllä on hölmöä']

### Historical Finnish

To normalize (and lemmatize) historical Finnish, run:

    from murre import normalize_sentence
    
    normalize_sentence("paluellen herra caiken", language="fin_hist")
    >> palvella herra kaikki
  
### Swedish

You can use the Swedish model by passing *language=swe*

    from murre import normalize_sentence
    
    normalize_sentence("int vet ja", language="swe")
    >> inte vet jag

## Generate

Murre can also generate different dialects. All you need to do, is to run:

    from murre import dialectalize_sentence
    dialectalize_sentence("kodin takana on koira", "Inkerinsuomalaismurteet")
    >> 'kojin takan on koira'

Or for multiple sentences:

    from murre import dialectalize_sentences
    sents = ["kissa syö karkkia", "kädellä on perhonen", "kettu juoksee sutta karkuun"]
    dialectalize_sentences(sents,'Kainuu')
    >> ['kissa syöpi karkkia', 'käellä om perhonej', 'kettu juoksee sutta karkuu']


The list of available dialects can be obtained by:

    from murre import supported_dialects
    supported_dialects()
    >> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'Etelä-Pohjanmaa', 'Etelä-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-Häme', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'Etelä-Karjala', 'Länsi-Uusimaa', 'Inkerinsuomalaismurteet', 'LäntinenKeski-Suomi', 'Länsi-Satakunta', 'Etelä-Savo', 'Länsipohja', 'Pohjois-Häme', 'EteläinenKeski-Suomi', 'Etelä-Häme', 'Peräpohjola']


## Cite

**Normalization (Finnish)**

Niko Partanen, Mika Hämäläinen, and Khalid Alnajjar. (2019). [Dialect Text Normalization to Normative Standard Finnish](https://www.aclweb.org/anthology/D19-5519/). In *the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT)*.


**Normalization (Swedish)**

Mika Hämäläinen, Niko Partanen and Khalid Alnajjar. (2020). [Normalization of Different Swedish Dialects Spoken in Finland](https://www.researchgate.net/publication/346933795_Normalization_of_Different_Swedish_Dialects_Spoken_in_Finland). In *the Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities*.

**Dialect generation**

Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter & Thierry Poibeau (2020). [Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity](https://www.researchgate.net/publication/344157810_Automatic_Dialect_Adaptation_in_Finnish_and_its_Effect_on_Perceived_Creativity). In *Proceedings of the 11th International Conference on Computational Creativity*. p. 204-211

**Historical Finnish**

Mika Hämäläinen, Niko Partanen and Khalid Alnajjar. (2021). [Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography](https://www.researchgate.net/publication/352837692_Lemmatization_of_Historical_Old_Literary_Finnish_Texts_in_Modern_Orthography). In *Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN)*.



## Data

The data used in the paper describing dialect generation has been published on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3885341.svg)](https://doi.org/10.5281/zenodo.3885341).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mikahama/murre",
    "name": "murre",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Spoken Finnish, spelling normalization",
    "author": "Mika H\u00e4m\u00e4l\u00e4inen",
    "author_email": "mika@flyforpoints.com",
    "download_url": "https://files.pythonhosted.org/packages/93/ac/07548ded65a591c175072bed592e955f3d66bed4df4322ee504a4d19ad80/murre-1.4.1.tar.gz",
    "platform": null,
    "description": "# \ud83d\udc36 Murre \ud83d\udc15\n\n[![Downloads](https://pepy.tech/badge/murre)](https://pepy.tech/project/murre)\n\n\nThe amazing Murre (*genitive Murren* \ud83d\udc15) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). \nThis repository is maintained by [Mika H\u00e4m\u00e4l\u00e4inen](https://mikakalevi.com).\n\n## Installation\n\nThis library is designed for Python 3 and it may not work on Python 2.\n\n    pip3 install murre\n    python3 -m murre.download\n    \n## Normalize\n\nTo normalize Finnish, all you need to do is to run:\n\n    from murre import normalize_sentence\n    \n    normalize_sentence(\"m\u00e4 sy\u00f6n paljo karkkii\")\n    >> min\u00e4 sy\u00f6n paljon karkkia\n\nYou can normalize multiple sentences at the same time by running\n\n    from murre import normalize_sentences\n    \n    sents = [\"kissa sy\u00f6 karkkii\", \"jok laulaa tuol puole\", \"en tii\u00e4 oikee et kuka se o\", \"kyl on h\u00f6l\u00f6m\u00f6\u00f6\"]\n    normalize_sentences(sents)\n    >> ['kissa sy\u00f6 karkkia', 'joka laulaa tuolla puolen', 'en tied\u00e4 oikein ett\u00e4 kuka se on', 'kyll\u00e4 on h\u00f6lm\u00f6\u00e4']\n\n### Historical Finnish\n\nTo normalize (and lemmatize) historical Finnish, run:\n\n    from murre import normalize_sentence\n    \n    normalize_sentence(\"paluellen herra caiken\", language=\"fin_hist\")\n    >> palvella herra kaikki\n  \n### Swedish\n\nYou can use the Swedish model by passing *language=swe*\n\n    from murre import normalize_sentence\n    \n    normalize_sentence(\"int vet ja\", language=\"swe\")\n    >> inte vet jag\n\n## Generate\n\nMurre can also generate different dialects. All you need to do, is to run:\n\n    from murre import dialectalize_sentence\n    dialectalize_sentence(\"kodin takana on koira\", \"Inkerinsuomalaismurteet\")\n    >> 'kojin takan on koira'\n\nOr for multiple sentences:\n\n    from murre import dialectalize_sentences\n    sents = [\"kissa sy\u00f6 karkkia\", \"k\u00e4dell\u00e4 on perhonen\", \"kettu juoksee sutta karkuun\"]\n    dialectalize_sentences(sents,'Kainuu')\n    >> ['kissa sy\u00f6pi karkkia', 'k\u00e4ell\u00e4 om perhonej', 'kettu juoksee sutta karkuu']\n\n\nThe list of available dialects can be obtained by:\n\n    from murre import supported_dialects\n    supported_dialects()\n    >> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'Etel\u00e4-Pohjanmaa', 'Etel\u00e4-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-H\u00e4me', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'Etel\u00e4-Karjala', 'L\u00e4nsi-Uusimaa', 'Inkerinsuomalaismurteet', 'L\u00e4ntinenKeski-Suomi', 'L\u00e4nsi-Satakunta', 'Etel\u00e4-Savo', 'L\u00e4nsipohja', 'Pohjois-H\u00e4me', 'Etel\u00e4inenKeski-Suomi', 'Etela\u0308-Ha\u0308me', 'Per\u00e4pohjola']\n\n\n## Cite\n\n**Normalization (Finnish)**\n\nNiko Partanen, Mika H\u00e4m\u00e4l\u00e4inen, and Khalid Alnajjar. (2019). [Dialect Text Normalization to Normative Standard Finnish](https://www.aclweb.org/anthology/D19-5519/). In *the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT)*.\n\n\n**Normalization (Swedish)**\n\nMika H\u00e4m\u00e4l\u00e4inen, Niko Partanen and Khalid Alnajjar. (2020). [Normalization of Different Swedish Dialects Spoken in Finland](https://www.researchgate.net/publication/346933795_Normalization_of_Different_Swedish_Dialects_Spoken_in_Finland). In *the Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities*.\n\n**Dialect generation**\n\nMika H\u00e4m\u00e4l\u00e4inen, Niko Partanen, Khalid Alnajjar, Jack Rueter & Thierry Poibeau (2020). [Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity](https://www.researchgate.net/publication/344157810_Automatic_Dialect_Adaptation_in_Finnish_and_its_Effect_on_Perceived_Creativity). In *Proceedings of the 11th International Conference on Computational Creativity*. p. 204-211\n\n**Historical Finnish**\n\nMika H\u00e4m\u00e4l\u00e4inen, Niko Partanen and Khalid Alnajjar. (2021). [Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography](https://www.researchgate.net/publication/352837692_Lemmatization_of_Historical_Old_Literary_Finnish_Texts_in_Modern_Orthography). In *Actes de la Conf\u00e9rence sur le Traitement Automatique des Langues Naturelles (TALN)*.\n\n\n\n## Data\n\nThe data used in the paper describing dialect generation has been published on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3885341.svg)](https://doi.org/10.5281/zenodo.3885341).\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "The amazing Murre will normalize non-standard Finnish and Swedish, and dialectalize standard Finnish!",
    "version": "1.4.1",
    "project_urls": {
        "Bug Reports": "https://github.com/mikahama/murre/issues",
        "Developer": "https://mikakalevi.com/",
        "Homepage": "https://github.com/mikahama/murre"
    },
    "split_keywords": [
        "spoken finnish",
        " spelling normalization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09b4e232c1915b000ee69f9fb17b02272deeeeade02d4b62a89733abb43eabec",
                "md5": "e40d60015b5c01bd18fd2aa027597c2b",
                "sha256": "15fc0a786f0f5f01b28da95a94bbab479b3e0379ae243f0000a9cac7c2a9b527"
            },
            "downloads": -1,
            "filename": "murre-1.4.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e40d60015b5c01bd18fd2aa027597c2b",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 10320,
            "upload_time": "2024-08-10T15:39:10",
            "upload_time_iso_8601": "2024-08-10T15:39:10.121724Z",
            "url": "https://files.pythonhosted.org/packages/09/b4/e232c1915b000ee69f9fb17b02272deeeeade02d4b62a89733abb43eabec/murre-1.4.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "93ac07548ded65a591c175072bed592e955f3d66bed4df4322ee504a4d19ad80",
                "md5": "66318798de617eaa119b0a9cb517d0cb",
                "sha256": "d0b4c3041522e3f7abcdae9513a28f130e7f30eaac2f48f984783759c17c05cb"
            },
            "downloads": -1,
            "filename": "murre-1.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "66318798de617eaa119b0a9cb517d0cb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10930,
            "upload_time": "2024-08-10T15:39:11",
            "upload_time_iso_8601": "2024-08-10T15:39:11.646125Z",
            "url": "https://files.pythonhosted.org/packages/93/ac/07548ded65a591c175072bed592e955f3d66bed4df4322ee504a4d19ad80/murre-1.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-10 15:39:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mikahama",
    "github_project": "murre",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "murre"
}
        
Elapsed time: 0.83021s