# 🐶 Murre 🐕
[![Downloads](https://pepy.tech/badge/murre)](https://pepy.tech/project/murre)
The amazing Murre (*genitive Murren* 🐕) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli).
This repository is maintained by [Mika Hämäläinen](https://mikakalevi.com).
## Installation
This library is designed for Python 3 and it may not work on Python 2.
pip3 install murre
python3 -m murre.download
## Normalize
To normalize Finnish, all you need to do is to run:
from murre import normalize_sentence
normalize_sentence("mä syön paljo karkkii")
>> minä syön paljon karkkia
You can normalize multiple sentences at the same time by running
from murre import normalize_sentences
sents = ["kissa syö karkkii", "jok laulaa tuol puole", "en tiiä oikee et kuka se o", "kyl on hölömöö"]
normalize_sentences(sents)
>> ['kissa syö karkkia', 'joka laulaa tuolla puolen', 'en tiedä oikein että kuka se on', 'kyllä on hölmöä']
### Historical Finnish
To normalize (and lemmatize) historical Finnish, run:
from murre import normalize_sentence
normalize_sentence("paluellen herra caiken", language="fin_hist")
>> palvella herra kaikki
### Swedish
You can use the Swedish model by passing *language=swe*
from murre import normalize_sentence
normalize_sentence("int vet ja", language="swe")
>> inte vet jag
## Generate
Murre can also generate different dialects. All you need to do, is to run:
from murre import dialectalize_sentence
dialectalize_sentence("kodin takana on koira", "Inkerinsuomalaismurteet")
>> 'kojin takan on koira'
Or for multiple sentences:
from murre import dialectalize_sentences
sents = ["kissa syö karkkia", "kädellä on perhonen", "kettu juoksee sutta karkuun"]
dialectalize_sentences(sents,'Kainuu')
>> ['kissa syöpi karkkia', 'käellä om perhonej', 'kettu juoksee sutta karkuu']
The list of available dialects can be obtained by:
from murre import supported_dialects
supported_dialects()
>> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'Etelä-Pohjanmaa', 'Etelä-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-Häme', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'Etelä-Karjala', 'Länsi-Uusimaa', 'Inkerinsuomalaismurteet', 'LäntinenKeski-Suomi', 'Länsi-Satakunta', 'Etelä-Savo', 'Länsipohja', 'Pohjois-Häme', 'EteläinenKeski-Suomi', 'Etelä-Häme', 'Peräpohjola']
## Cite
**Normalization (Finnish)**
Niko Partanen, Mika Hämäläinen, and Khalid Alnajjar. (2019). [Dialect Text Normalization to Normative Standard Finnish](https://www.aclweb.org/anthology/D19-5519/). In *the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT)*.
**Normalization (Swedish)**
Mika Hämäläinen, Niko Partanen and Khalid Alnajjar. (2020). [Normalization of Different Swedish Dialects Spoken in Finland](https://www.researchgate.net/publication/346933795_Normalization_of_Different_Swedish_Dialects_Spoken_in_Finland). In *the Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities*.
**Dialect generation**
Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter & Thierry Poibeau (2020). [Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity](https://www.researchgate.net/publication/344157810_Automatic_Dialect_Adaptation_in_Finnish_and_its_Effect_on_Perceived_Creativity). In *Proceedings of the 11th International Conference on Computational Creativity*. p. 204-211
**Historical Finnish**
Mika Hämäläinen, Niko Partanen and Khalid Alnajjar. (2021). [Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography](https://www.researchgate.net/publication/352837692_Lemmatization_of_Historical_Old_Literary_Finnish_Texts_in_Modern_Orthography). In *Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN)*.
## Data
The data used in the paper describing dialect generation has been published on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3885341.svg)](https://doi.org/10.5281/zenodo.3885341).
Raw data
{
"_id": null,
"home_page": "https://github.com/mikahama/murre",
"name": "murre",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Spoken Finnish, spelling normalization",
"author": "Mika H\u00e4m\u00e4l\u00e4inen",
"author_email": "mika@flyforpoints.com",
"download_url": "https://files.pythonhosted.org/packages/93/ac/07548ded65a591c175072bed592e955f3d66bed4df4322ee504a4d19ad80/murre-1.4.1.tar.gz",
"platform": null,
"description": "# \ud83d\udc36 Murre \ud83d\udc15\n\n[![Downloads](https://pepy.tech/badge/murre)](https://pepy.tech/project/murre)\n\n\nThe amazing Murre (*genitive Murren* \ud83d\udc15) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). \nThis repository is maintained by [Mika H\u00e4m\u00e4l\u00e4inen](https://mikakalevi.com).\n\n## Installation\n\nThis library is designed for Python 3 and it may not work on Python 2.\n\n pip3 install murre\n python3 -m murre.download\n \n## Normalize\n\nTo normalize Finnish, all you need to do is to run:\n\n from murre import normalize_sentence\n \n normalize_sentence(\"m\u00e4 sy\u00f6n paljo karkkii\")\n >> min\u00e4 sy\u00f6n paljon karkkia\n\nYou can normalize multiple sentences at the same time by running\n\n from murre import normalize_sentences\n \n sents = [\"kissa sy\u00f6 karkkii\", \"jok laulaa tuol puole\", \"en tii\u00e4 oikee et kuka se o\", \"kyl on h\u00f6l\u00f6m\u00f6\u00f6\"]\n normalize_sentences(sents)\n >> ['kissa sy\u00f6 karkkia', 'joka laulaa tuolla puolen', 'en tied\u00e4 oikein ett\u00e4 kuka se on', 'kyll\u00e4 on h\u00f6lm\u00f6\u00e4']\n\n### Historical Finnish\n\nTo normalize (and lemmatize) historical Finnish, run:\n\n from murre import normalize_sentence\n \n normalize_sentence(\"paluellen herra caiken\", language=\"fin_hist\")\n >> palvella herra kaikki\n \n### Swedish\n\nYou can use the Swedish model by passing *language=swe*\n\n from murre import normalize_sentence\n \n normalize_sentence(\"int vet ja\", language=\"swe\")\n >> inte vet jag\n\n## Generate\n\nMurre can also generate different dialects. All you need to do, is to run:\n\n from murre import dialectalize_sentence\n dialectalize_sentence(\"kodin takana on koira\", \"Inkerinsuomalaismurteet\")\n >> 'kojin takan on koira'\n\nOr for multiple sentences:\n\n from murre import dialectalize_sentences\n sents = [\"kissa sy\u00f6 karkkia\", \"k\u00e4dell\u00e4 on perhonen\", \"kettu juoksee sutta karkuun\"]\n dialectalize_sentences(sents,'Kainuu')\n >> ['kissa sy\u00f6pi karkkia', 'k\u00e4ell\u00e4 om perhonej', 'kettu juoksee sutta karkuu']\n\n\nThe list of available dialects can be obtained by:\n\n from murre import supported_dialects\n supported_dialects()\n >> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'Etel\u00e4-Pohjanmaa', 'Etel\u00e4-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-H\u00e4me', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'Etel\u00e4-Karjala', 'L\u00e4nsi-Uusimaa', 'Inkerinsuomalaismurteet', 'L\u00e4ntinenKeski-Suomi', 'L\u00e4nsi-Satakunta', 'Etel\u00e4-Savo', 'L\u00e4nsipohja', 'Pohjois-H\u00e4me', 'Etel\u00e4inenKeski-Suomi', 'Etela\u0308-Ha\u0308me', 'Per\u00e4pohjola']\n\n\n## Cite\n\n**Normalization (Finnish)**\n\nNiko Partanen, Mika H\u00e4m\u00e4l\u00e4inen, and Khalid Alnajjar. (2019). [Dialect Text Normalization to Normative Standard Finnish](https://www.aclweb.org/anthology/D19-5519/). In *the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT)*.\n\n\n**Normalization (Swedish)**\n\nMika H\u00e4m\u00e4l\u00e4inen, Niko Partanen and Khalid Alnajjar. (2020). [Normalization of Different Swedish Dialects Spoken in Finland](https://www.researchgate.net/publication/346933795_Normalization_of_Different_Swedish_Dialects_Spoken_in_Finland). In *the Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities*.\n\n**Dialect generation**\n\nMika H\u00e4m\u00e4l\u00e4inen, Niko Partanen, Khalid Alnajjar, Jack Rueter & Thierry Poibeau (2020). [Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity](https://www.researchgate.net/publication/344157810_Automatic_Dialect_Adaptation_in_Finnish_and_its_Effect_on_Perceived_Creativity). In *Proceedings of the 11th International Conference on Computational Creativity*. p. 204-211\n\n**Historical Finnish**\n\nMika H\u00e4m\u00e4l\u00e4inen, Niko Partanen and Khalid Alnajjar. (2021). [Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography](https://www.researchgate.net/publication/352837692_Lemmatization_of_Historical_Old_Literary_Finnish_Texts_in_Modern_Orthography). In *Actes de la Conf\u00e9rence sur le Traitement Automatique des Langues Naturelles (TALN)*.\n\n\n\n## Data\n\nThe data used in the paper describing dialect generation has been published on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3885341.svg)](https://doi.org/10.5281/zenodo.3885341).\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "The amazing Murre will normalize non-standard Finnish and Swedish, and dialectalize standard Finnish!",
"version": "1.4.1",
"project_urls": {
"Bug Reports": "https://github.com/mikahama/murre/issues",
"Developer": "https://mikakalevi.com/",
"Homepage": "https://github.com/mikahama/murre"
},
"split_keywords": [
"spoken finnish",
" spelling normalization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "09b4e232c1915b000ee69f9fb17b02272deeeeade02d4b62a89733abb43eabec",
"md5": "e40d60015b5c01bd18fd2aa027597c2b",
"sha256": "15fc0a786f0f5f01b28da95a94bbab479b3e0379ae243f0000a9cac7c2a9b527"
},
"downloads": -1,
"filename": "murre-1.4.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "e40d60015b5c01bd18fd2aa027597c2b",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 10320,
"upload_time": "2024-08-10T15:39:10",
"upload_time_iso_8601": "2024-08-10T15:39:10.121724Z",
"url": "https://files.pythonhosted.org/packages/09/b4/e232c1915b000ee69f9fb17b02272deeeeade02d4b62a89733abb43eabec/murre-1.4.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "93ac07548ded65a591c175072bed592e955f3d66bed4df4322ee504a4d19ad80",
"md5": "66318798de617eaa119b0a9cb517d0cb",
"sha256": "d0b4c3041522e3f7abcdae9513a28f130e7f30eaac2f48f984783759c17c05cb"
},
"downloads": -1,
"filename": "murre-1.4.1.tar.gz",
"has_sig": false,
"md5_digest": "66318798de617eaa119b0a9cb517d0cb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10930,
"upload_time": "2024-08-10T15:39:11",
"upload_time_iso_8601": "2024-08-10T15:39:11.646125Z",
"url": "https://files.pythonhosted.org/packages/93/ac/07548ded65a591c175072bed592e955f3d66bed4df4322ee504a4d19ad80/murre-1.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-10 15:39:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mikahama",
"github_project": "murre",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "murre"
}