# betterletter
In a given text, replaces alternative spellings of native characters with their proper spellings[^1]:
![demo](docs/images/demo.gif)
## Installation
```shell
pip install betterletter
```
## Usage
The package [will install a Python script of the same name](https://python-poetry.org/docs/pyproject/#scripts), so instead of the usual `python -m betterletter`, you can simply invoke that directly, if the Python script directory is on your `$PATH`:
```bash
$ betterletter -h
usage: betterletter [-h] [-c] [-f] [-r] [-g] [-d] [--debug] {de}
Tool to replace alternative spellings of native characters (e.g. German
umlauts [ä, ö, ü] etc. [ß]) with the proper native characters. For example,
this problem occurs when no proper keyboard layout was available. This program
is dictionary-based to check if replacements are valid words. By default,
reads from STDIN and writes to STDOUT.
positional arguments:
{de} Text language to work with, in ISO 639-1 format.
options:
-h, --help show this help message and exit
-c, --clipboard Read from and write back to clipboard instead of
STDIN/STDOUT.
-f, --force Force substitutions and return the text version with the
maximum number of substitutions, even if they are illegal
words (useful for names).
-r, --reverse Reverse mode, where all native characters are simply
replaced by their alternative spellings.
-g, --gui Stop and open a GUI prompt for confirmation before
finishing.
-d, --diff Print a diff view of the substitutions to stderr.
--debug Output detailed logging information.
```
### Usage Examples
Normal usage:
```bash
$ echo 'Hoeflich fragen waere angebracht!' | betterletter de
Höflich fragen wäre angebracht!
```
Reverse it:
```bash
$ echo 'Höflich fragen wäre angebracht!' | betterletter --reverse de
Hoeflich fragen waere angebracht!
```
A diff view, useful for longer text and to confirm correctness.
The [diff](https://docs.python.org/3/library/difflib.html) is written to STDERR so won't interfere with further redirection.
```bash
$ echo 'Hoeflich fragen waere angebracht!' | betterletter --diff de 2> diff.txt
Höflich fragen wäre angebracht!
$ cat diff.txt
- Hoeflich fragen waere angebracht!
? ^^ ^^
+ Höflich fragen wäre angebracht!
? ^ ^
```
The tool may be coerced into working with names:
```bash
$ # A name won't be in the dictionary:
$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter de
Sehr geehrte Frau Huebenstetter, ...
$ # But we can force it to work:
$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter --force de
Sehr geehrte Frau Hübenstetter, ...
```
[Clipboard-based](https://pypi.org/project/pyperclip/) workflows are also possible:
```bash
# Nothing happens: clipboard is read and written to silently.
# Paste the processed version from your clipboard.
$ betterletter --clipboard de
```
## Background
For example, German native characters and their corresponding alternative spellings (e.g. when no proper keyboard layout is at hand, or ASCII is used) are:
| Native Character | Alternative Spelling |
| :--------------: | :------------------: |
| Ä/ä | Ae/ae |
| Ö/ö | Oe/oe |
| Ü/ü | Ue/ue |
| ẞ/ß | SS/ss |
These pairings are recorded [here](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/languages.json).
Going from left to right is simple: replace all native characters with their alternative spellings, minding case.
That use case is also supported by this tool (`reverse` flag).
The other direction is much less straightforward: there exist countless words for which alternative spellings occur somewhere as a pattern, yet replacing them with the corresponding native character would be wrong:
| Character | Correct Spelling | Wrong Spelling |
| --------- | ----------------- | -------------- |
| *Ä* | **Ae**rodynamik | Ärodynamik |
| *Ä* | Isr**ae**l | Isräl |
| *Ä* | Schuf**ae**intrag | Schufäintrag |
| *Ö* | K**oe**ffizient | Köffizient |
| *Ö* | Domin**oe**ffekt | Dominöffekt |
| *Ö* | P**oet** | Pöt |
| *Ü* | Abente**ue**r | Abenteür |
| *Ü* | Ma**ue**r | Maür |
| *Ü* | Ste**ue**rung | Steürung |
| *ß* | Me**ss**gerät | Meßgerät |
| *ß* | Me**ss**e | Meße |
| *ß* | Abschlu**ss** | Abschluß |
just to name a few, pretty common examples.
As such, this tool is based on a dictionary lookup, see also the [containing directory](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/dicts/).
## Long-form samples
See also the [tests](https://github.com/alexpovel/betterletter/blob/master/tests/).
### de
The input:
> Ueberhaupt braeuchte es mal einen Teststring.
> Saetze ohne Bedeutung, aber mit vielen Umlauten.
> DRPFA-Angehoerige gehoeren haeufig nicht dazu.
> Bindestrich-Woerter spraechen Baende ueber Fehler.
> Doppelgaenger-Doppelgaenger sind doppelt droelfzig.
> Oder Uemlaeuten? Auslaeuten? Leute gaebe es, wuerde man meinen.
> Ueble Nachrede ist naechtens nicht erlaubt.
> Erlaube man dieses, waere es schoen uebertrieben.
> Busse muesste geloest werden, bevor Gruesse zum Gruss kommen.
> Busse sind Geraete, die womoeglich schnell fuehren.
> Voegel sind aehnlich zu Oel.
> Hierfuer ist fuer den droegen Poebel zu beachten, dass Anmassungen zu Gehoerverlust fuehren koennen.
> Stroemelschnoesseldaemel!
is turned into:
> Überhaupt bräuchte es mal einen Teststring.
> Sätze ohne Bedeutung, aber mit vielen Umlauten.
> DRPFA-Angehörige gehören häufig nicht dazu.
> Bindestrich-Wörter sprächen Bände über Fehler.
> Doppelgänger-Doppelgänger sind doppelt droelfzig.
> Oder Uemlaeuten? Auslaeuten? Leute gäbe es, würde man meinen.
> Üble Nachrede ist nächtens nicht erlaubt.
> Erlaube man dieses, wäre es schön übertrieben.
> Buße müsste gelöst werden, bevor Grüße zum Gruß kommen.
> Buße sind Geräte, die womöglich schnell führen.
> Vögel sind ähnlich zu Öl.
> Hierfür ist für den drögen Pöbel zu beachten, dass Anmaßungen zu Gehörverlust führen können.
> Stroemelschnoesseldaemel!
---
Note that some corrections are out of scope for this little script, e.g.:
> Busse
In German, *Busse* and *Buße* are two words of vastly different meaning (*busses* and *penance*, respectively).
Unfortunately, they map to the same alternative spelling of *Busse*.
The tool sees *Busse* (meaning *just that*, with no intent of changing it), notices *Buße* is a legal substitution, and therefore makes it.
The tool has no awareness of context.
Turning substitutions like these off would mean the tool would no longer emit *Buße*, ever.
This could be as undesirable as the current behaviour.
There seems to be no easy resolve.
## Development
This project uses [poetry](https://python-poetry.org/) for dependency management.
Refer to the [poetry config file](https://github.com/alexpovel/betterletter/blob/master/pyproject.toml) for more info (e.g. the required Python modules to install if you don't want to deal with `poetry`).
Using poetry, from the project root, run:
```bash
# Installs virtual environment according to lock file (if available in repo),
# otherwise pyproject.toml:
poetry install
# Run command within that environment:
poetry run python -m betterletter -h
```
Development tasks are all run through `poetry`, within the context of the virtual environment.
Run [`just`](https://github.com/casey/just) (without arguments) for more available commands related to development.
## AutoHotKey
This tool can be integrated with [AutoHotKey](https://www.autohotkey.com/), allowing you to use it at the touch of a button.
This can be used to setup a keyboard shortcut to run this tool in-place, quickly replacing what you need without leaving your text editing environment.
The AutoHotKey file is [here](https://github.com/alexpovel/betterletter/blob/master/betterletter.ahk) and **requires [AutoHotKey v2](https://www.autohotkey.com/v2/)** (check out commits 7dd68f9 and earlier for the AHK v1.1 script).
Follow [this guide](https://www.autohotkey.com/docs/FAQ.htm#Startup) to have the script launch on boot automatically.
[AHK try icon](icon.ico) generated using <https://favicon.io/favicon-generator/>.
[^1]: In this demo, `Ctrl + C` and `Ctrl + V` are inserted automatically using the [AutoHotKey script](#autohotkey).
The user only selects the desired text and presses the hotkey, amounting to two keystrokes.
The delay between the `Ctrl + C` and `Ctrl + V` keystrokes in the above demo is the script actually doing its work.
First, the script reads in a dictionary from disk, taking constant time (*O(1)*), aka it doesn't scale with input size, just dictionary size.
Sadly, this takes comparatively long for short texts.
However, the script scales acceptably with longer inputs (regular *O(n)*).
**Very long inputs are required for the actual processing to take longer than the initial dictionary I/O.**
Hence, this script could run very fast if it were (re-)designed as a daemon, with the dictionary preloaded in memory.
Raw data
{
"_id": null,
"home_page": "https://github.com/alexpovel/betterletter/",
"name": "betterletter",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "spelling,umlaut,substitute,letter,alternative",
"author": "Alex Povel",
"author_email": "python@alexpovel.de",
"download_url": "https://files.pythonhosted.org/packages/90/b0/79471b60b2b12f9f1fbd67681c2b7372fe06d49b414a660f7ad711800674/betterletter-1.2.1.tar.gz",
"platform": null,
"description": "# betterletter\n\nIn a given text, replaces alternative spellings of native characters with their proper spellings[^1]:\n\n![demo](docs/images/demo.gif)\n\n## Installation\n\n```shell\npip install betterletter\n```\n\n## Usage\n\nThe package [will install a Python script of the same name](https://python-poetry.org/docs/pyproject/#scripts), so instead of the usual `python -m betterletter`, you can simply invoke that directly, if the Python script directory is on your `$PATH`:\n\n```bash\n$ betterletter -h\nusage: betterletter [-h] [-c] [-f] [-r] [-g] [-d] [--debug] {de}\n\nTool to replace alternative spellings of native characters (e.g. German\numlauts [\u00e4, \u00f6, \u00fc] etc. [\u00df]) with the proper native characters. For example,\nthis problem occurs when no proper keyboard layout was available. This program\nis dictionary-based to check if replacements are valid words. By default,\nreads from STDIN and writes to STDOUT.\n\npositional arguments:\n {de} Text language to work with, in ISO 639-1 format.\n\noptions:\n -h, --help show this help message and exit\n -c, --clipboard Read from and write back to clipboard instead of\n STDIN/STDOUT.\n -f, --force Force substitutions and return the text version with the\n maximum number of substitutions, even if they are illegal\n words (useful for names).\n -r, --reverse Reverse mode, where all native characters are simply\n replaced by their alternative spellings.\n -g, --gui Stop and open a GUI prompt for confirmation before\n finishing.\n -d, --diff Print a diff view of the substitutions to stderr.\n --debug Output detailed logging information.\n```\n\n### Usage Examples\n\nNormal usage:\n\n```bash\n$ echo 'Hoeflich fragen waere angebracht!' | betterletter de\nH\u00f6flich fragen w\u00e4re angebracht!\n```\n\nReverse it:\n\n```bash\n$ echo 'H\u00f6flich fragen w\u00e4re angebracht!' | betterletter --reverse de\nHoeflich fragen waere angebracht!\n```\n\nA diff view, useful for longer text and to confirm correctness.\nThe [diff](https://docs.python.org/3/library/difflib.html) is written to STDERR so won't interfere with further redirection.\n\n```bash\n$ echo 'Hoeflich fragen waere angebracht!' | betterletter --diff de 2> diff.txt\nH\u00f6flich fragen w\u00e4re angebracht!\n$ cat diff.txt\n- Hoeflich fragen waere angebracht!\n? ^^ ^^\n+ H\u00f6flich fragen w\u00e4re angebracht!\n? ^ ^\n```\n\nThe tool may be coerced into working with names:\n\n```bash\n$ # A name won't be in the dictionary:\n$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter de\nSehr geehrte Frau Huebenstetter, ...\n$ # But we can force it to work:\n$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter --force de\nSehr geehrte Frau H\u00fcbenstetter, ...\n```\n\n[Clipboard-based](https://pypi.org/project/pyperclip/) workflows are also possible:\n\n```bash\n# Nothing happens: clipboard is read and written to silently.\n# Paste the processed version from your clipboard.\n$ betterletter --clipboard de\n```\n\n## Background\n\nFor example, German native characters and their corresponding alternative spellings (e.g. when no proper keyboard layout is at hand, or ASCII is used) are:\n\n| Native Character | Alternative Spelling |\n| :--------------: | :------------------: |\n| \u00c4/\u00e4 | Ae/ae |\n| \u00d6/\u00f6 | Oe/oe |\n| \u00dc/\u00fc | Ue/ue |\n| \u1e9e/\u00df | SS/ss |\n\nThese pairings are recorded [here](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/languages.json).\n\nGoing from left to right is simple: replace all native characters with their alternative spellings, minding case.\nThat use case is also supported by this tool (`reverse` flag).\n\nThe other direction is much less straightforward: there exist countless words for which alternative spellings occur somewhere as a pattern, yet replacing them with the corresponding native character would be wrong:\n\n| Character | Correct Spelling | Wrong Spelling |\n| --------- | ----------------- | -------------- |\n| *\u00c4* | **Ae**rodynamik | \u00c4rodynamik |\n| *\u00c4* | Isr**ae**l | Isr\u00e4l |\n| *\u00c4* | Schuf**ae**intrag | Schuf\u00e4intrag |\n| *\u00d6* | K**oe**ffizient | K\u00f6ffizient |\n| *\u00d6* | Domin**oe**ffekt | Domin\u00f6ffekt |\n| *\u00d6* | P**oet** | P\u00f6t |\n| *\u00dc* | Abente**ue**r | Abente\u00fcr |\n| *\u00dc* | Ma**ue**r | Ma\u00fcr |\n| *\u00dc* | Ste**ue**rung | Ste\u00fcrung |\n| *\u00df* | Me**ss**ger\u00e4t | Me\u00dfger\u00e4t |\n| *\u00df* | Me**ss**e | Me\u00dfe |\n| *\u00df* | Abschlu**ss** | Abschlu\u00df |\n\njust to name a few, pretty common examples.\n\nAs such, this tool is based on a dictionary lookup, see also the [containing directory](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/dicts/).\n\n## Long-form samples\n\nSee also the [tests](https://github.com/alexpovel/betterletter/blob/master/tests/).\n\n### de\n\nThe input:\n\n> Ueberhaupt braeuchte es mal einen Teststring.\n> Saetze ohne Bedeutung, aber mit vielen Umlauten.\n> DRPFA-Angehoerige gehoeren haeufig nicht dazu.\n> Bindestrich-Woerter spraechen Baende ueber Fehler.\n> Doppelgaenger-Doppelgaenger sind doppelt droelfzig.\n> Oder Uemlaeuten? Auslaeuten? Leute gaebe es, wuerde man meinen.\n> Ueble Nachrede ist naechtens nicht erlaubt.\n> Erlaube man dieses, waere es schoen uebertrieben.\n> Busse muesste geloest werden, bevor Gruesse zum Gruss kommen.\n> Busse sind Geraete, die womoeglich schnell fuehren.\n> Voegel sind aehnlich zu Oel.\n> Hierfuer ist fuer den droegen Poebel zu beachten, dass Anmassungen zu Gehoerverlust fuehren koennen.\n> Stroemelschnoesseldaemel!\n\nis turned into:\n\n> \u00dcberhaupt br\u00e4uchte es mal einen Teststring.\n> S\u00e4tze ohne Bedeutung, aber mit vielen Umlauten.\n> DRPFA-Angeh\u00f6rige geh\u00f6ren h\u00e4ufig nicht dazu.\n> Bindestrich-W\u00f6rter spr\u00e4chen B\u00e4nde \u00fcber Fehler.\n> Doppelg\u00e4nger-Doppelg\u00e4nger sind doppelt droelfzig.\n> Oder Uemlaeuten? Auslaeuten? Leute g\u00e4be es, w\u00fcrde man meinen.\n> \u00dcble Nachrede ist n\u00e4chtens nicht erlaubt.\n> Erlaube man dieses, w\u00e4re es sch\u00f6n \u00fcbertrieben.\n> Bu\u00dfe m\u00fcsste gel\u00f6st werden, bevor Gr\u00fc\u00dfe zum Gru\u00df kommen.\n> Bu\u00dfe sind Ger\u00e4te, die wom\u00f6glich schnell f\u00fchren.\n> V\u00f6gel sind \u00e4hnlich zu \u00d6l.\n> Hierf\u00fcr ist f\u00fcr den dr\u00f6gen P\u00f6bel zu beachten, dass Anma\u00dfungen zu Geh\u00f6rverlust f\u00fchren k\u00f6nnen.\n> Stroemelschnoesseldaemel!\n\n---\n\nNote that some corrections are out of scope for this little script, e.g.:\n\n> Busse\n\nIn German, *Busse* and *Bu\u00dfe* are two words of vastly different meaning (*busses* and *penance*, respectively).\nUnfortunately, they map to the same alternative spelling of *Busse*.\nThe tool sees *Busse* (meaning *just that*, with no intent of changing it), notices *Bu\u00dfe* is a legal substitution, and therefore makes it.\nThe tool has no awareness of context.\n\nTurning substitutions like these off would mean the tool would no longer emit *Bu\u00dfe*, ever.\nThis could be as undesirable as the current behaviour.\nThere seems to be no easy resolve.\n\n## Development\n\nThis project uses [poetry](https://python-poetry.org/) for dependency management.\nRefer to the [poetry config file](https://github.com/alexpovel/betterletter/blob/master/pyproject.toml) for more info (e.g. the required Python modules to install if you don't want to deal with `poetry`).\n\nUsing poetry, from the project root, run:\n\n```bash\n# Installs virtual environment according to lock file (if available in repo),\n# otherwise pyproject.toml:\npoetry install\n# Run command within that environment:\npoetry run python -m betterletter -h\n```\n\nDevelopment tasks are all run through `poetry`, within the context of the virtual environment.\n\nRun [`just`](https://github.com/casey/just) (without arguments) for more available commands related to development.\n\n## AutoHotKey\n\nThis tool can be integrated with [AutoHotKey](https://www.autohotkey.com/), allowing you to use it at the touch of a button.\nThis can be used to setup a keyboard shortcut to run this tool in-place, quickly replacing what you need without leaving your text editing environment.\n\nThe AutoHotKey file is [here](https://github.com/alexpovel/betterletter/blob/master/betterletter.ahk) and **requires [AutoHotKey v2](https://www.autohotkey.com/v2/)** (check out commits 7dd68f9 and earlier for the AHK v1.1 script).\n\nFollow [this guide](https://www.autohotkey.com/docs/FAQ.htm#Startup) to have the script launch on boot automatically.\n\n[AHK try icon](icon.ico) generated using <https://favicon.io/favicon-generator/>.\n\n[^1]: In this demo, `Ctrl + C` and `Ctrl + V` are inserted automatically using the [AutoHotKey script](#autohotkey).\n The user only selects the desired text and presses the hotkey, amounting to two keystrokes.\n The delay between the `Ctrl + C` and `Ctrl + V` keystrokes in the above demo is the script actually doing its work.\n First, the script reads in a dictionary from disk, taking constant time (*O(1)*), aka it doesn't scale with input size, just dictionary size.\n Sadly, this takes comparatively long for short texts.\n However, the script scales acceptably with longer inputs (regular *O(n)*).\n **Very long inputs are required for the actual processing to take longer than the initial dictionary I/O.**\n Hence, this script could run very fast if it were (re-)designed as a daemon, with the dictionary preloaded in memory.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Substitute alternative spellings of native characters (e.g. German umlauts [ae, oe, ue] etc. [ss]) with their correct versions (\u00e4, \u00f6, \u00fc, \u00df).",
"version": "1.2.1",
"split_keywords": [
"spelling",
"umlaut",
"substitute",
"letter",
"alternative"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c03779fa7b5fa1c6925212f92e5388d9782c27805e666a7eb83bab26fdb8dfe7",
"md5": "87564f906ff0672211eff63d74c4617d",
"sha256": "bcdb6f8dbee15a72318131d9329d6d503f5e211423f3011937c502e8de6d2234"
},
"downloads": -1,
"filename": "betterletter-1.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "87564f906ff0672211eff63d74c4617d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 6683859,
"upload_time": "2023-04-21T20:13:35",
"upload_time_iso_8601": "2023-04-21T20:13:35.760087Z",
"url": "https://files.pythonhosted.org/packages/c0/37/79fa7b5fa1c6925212f92e5388d9782c27805e666a7eb83bab26fdb8dfe7/betterletter-1.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "90b079471b60b2b12f9f1fbd67681c2b7372fe06d49b414a660f7ad711800674",
"md5": "7d17bb73af8d70462cd75dfab4336c58",
"sha256": "b0ce3262d60311e56aa235b0bba760d54bdd7d60ad9903558ca1833ce54509fb"
},
"downloads": -1,
"filename": "betterletter-1.2.1.tar.gz",
"has_sig": false,
"md5_digest": "7d17bb73af8d70462cd75dfab4336c58",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 6659847,
"upload_time": "2023-04-21T20:13:37",
"upload_time_iso_8601": "2023-04-21T20:13:37.333993Z",
"url": "https://files.pythonhosted.org/packages/90/b0/79471b60b2b12f9f1fbd67681c2b7372fe06d49b414a660f7ad711800674/betterletter-1.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-21 20:13:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "alexpovel",
"github_project": "betterletter",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "betterletter"
}