betterletter


Namebetterletter JSON
Version 1.2.1 PyPI version JSON
download
home_pagehttps://github.com/alexpovel/betterletter/
SummarySubstitute alternative spellings of native characters (e.g. German umlauts [ae, oe, ue] etc. [ss]) with their correct versions (ä, ö, ü, ß).
upload_time2023-04-21 20:13:37
maintainer
docs_urlNone
authorAlex Povel
requires_python>=3.9,<4.0
licenseMIT
keywords spelling umlaut substitute letter alternative
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # betterletter

In a given text, replaces alternative spellings of native characters with their proper spellings[^1]:

![demo](docs/images/demo.gif)

## Installation

```shell
pip install betterletter
```

## Usage

The package [will install a Python script of the same name](https://python-poetry.org/docs/pyproject/#scripts), so instead of the usual `python -m betterletter`, you can simply invoke that directly, if the Python script directory is on your `$PATH`:

```bash
$ betterletter -h
usage: betterletter [-h] [-c] [-f] [-r] [-g] [-d] [--debug] {de}

Tool to replace alternative spellings of native characters (e.g. German
umlauts [ä, ö, ü] etc. [ß]) with the proper native characters. For example,
this problem occurs when no proper keyboard layout was available. This program
is dictionary-based to check if replacements are valid words. By default,
reads from STDIN and writes to STDOUT.

positional arguments:
  {de}             Text language to work with, in ISO 639-1 format.

options:
  -h, --help       show this help message and exit
  -c, --clipboard  Read from and write back to clipboard instead of
                   STDIN/STDOUT.
  -f, --force      Force substitutions and return the text version with the
                   maximum number of substitutions, even if they are illegal
                   words (useful for names).
  -r, --reverse    Reverse mode, where all native characters are simply
                   replaced by their alternative spellings.
  -g, --gui        Stop and open a GUI prompt for confirmation before
                   finishing.
  -d, --diff       Print a diff view of the substitutions to stderr.
  --debug          Output detailed logging information.
```

### Usage Examples

Normal usage:

```bash
$ echo 'Hoeflich fragen waere angebracht!' | betterletter de
Höflich fragen wäre angebracht!
```

Reverse it:

```bash
$ echo 'Höflich fragen wäre angebracht!' | betterletter --reverse de
Hoeflich fragen waere angebracht!
```

A diff view, useful for longer text and to confirm correctness.
The [diff](https://docs.python.org/3/library/difflib.html) is written to STDERR so won't interfere with further redirection.

```bash
$ echo 'Hoeflich fragen waere angebracht!' | betterletter --diff de 2> diff.txt
Höflich fragen wäre angebracht!
$ cat diff.txt
- Hoeflich fragen waere angebracht!
?  ^^              ^^
+ Höflich fragen wäre angebracht!
?  ^              ^
```

The tool may be coerced into working with names:

```bash
$ # A name won't be in the dictionary:
$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter de
Sehr geehrte Frau Huebenstetter, ...
$ # But we can force it to work:
$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter --force de
Sehr geehrte Frau Hübenstetter, ...
```

[Clipboard-based](https://pypi.org/project/pyperclip/) workflows are also possible:

```bash
# Nothing happens: clipboard is read and written to silently.
# Paste the processed version from your clipboard.
$ betterletter --clipboard de
```

## Background

For example, German native characters and their corresponding alternative spellings (e.g. when no proper keyboard layout is at hand, or ASCII is used) are:

| Native Character | Alternative Spelling |
| :--------------: | :------------------: |
|       Ä/ä        |        Ae/ae         |
|       Ö/ö        |        Oe/oe         |
|       Ü/ü        |        Ue/ue         |
|       ẞ/ß        |        SS/ss         |

These pairings are recorded [here](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/languages.json).

Going from left to right is simple: replace all native characters with their alternative spellings, minding case.
That use case is also supported by this tool (`reverse` flag).

The other direction is much less straightforward: there exist countless words for which alternative spellings occur somewhere as a pattern, yet replacing them with the corresponding native character would be wrong:

| Character | Correct Spelling  | Wrong Spelling |
| --------- | ----------------- | -------------- |
| *Ä*       | **Ae**rodynamik   | Ärodynamik     |
| *Ä*       | Isr**ae**l        | Isräl          |
| *Ä*       | Schuf**ae**intrag | Schufäintrag   |
| *Ö*       | K**oe**ffizient   | Köffizient     |
| *Ö*       | Domin**oe**ffekt  | Dominöffekt    |
| *Ö*       | P**oet**          | Pöt            |
| *Ü*       | Abente**ue**r     | Abenteür       |
| *Ü*       | Ma**ue**r         | Maür           |
| *Ü*       | Ste**ue**rung     | Steürung       |
| *ß*       | Me**ss**gerät     | Meßgerät       |
| *ß*       | Me**ss**e         | Meße           |
| *ß*       | Abschlu**ss**     | Abschluß       |

just to name a few, pretty common examples.

As such, this tool is based on a dictionary lookup, see also the [containing directory](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/dicts/).

## Long-form samples

See also the [tests](https://github.com/alexpovel/betterletter/blob/master/tests/).

### de

The input:

> Ueberhaupt braeuchte es mal einen Teststring.
> Saetze ohne Bedeutung, aber mit vielen Umlauten.
> DRPFA-Angehoerige gehoeren haeufig nicht dazu.
> Bindestrich-Woerter spraechen Baende ueber Fehler.
> Doppelgaenger-Doppelgaenger sind doppelt droelfzig.
> Oder Uemlaeuten? Auslaeuten? Leute gaebe es, wuerde man meinen.
> Ueble Nachrede ist naechtens nicht erlaubt.
> Erlaube man dieses, waere es schoen uebertrieben.
> Busse muesste geloest werden, bevor Gruesse zum Gruss kommen.
> Busse sind Geraete, die womoeglich schnell fuehren.
> Voegel sind aehnlich zu Oel.
> Hierfuer ist fuer den droegen Poebel zu beachten, dass Anmassungen zu Gehoerverlust fuehren koennen.
> Stroemelschnoesseldaemel!

is turned into:

> Überhaupt bräuchte es mal einen Teststring.
> Sätze ohne Bedeutung, aber mit vielen Umlauten.
> DRPFA-Angehörige gehören häufig nicht dazu.
> Bindestrich-Wörter sprächen Bände über Fehler.
> Doppelgänger-Doppelgänger sind doppelt droelfzig.
> Oder Uemlaeuten? Auslaeuten? Leute gäbe es, würde man meinen.
> Üble Nachrede ist nächtens nicht erlaubt.
> Erlaube man dieses, wäre es schön übertrieben.
> Buße müsste gelöst werden, bevor Grüße zum Gruß kommen.
> Buße sind Geräte, die womöglich schnell führen.
> Vögel sind ähnlich zu Öl.
> Hierfür ist für den drögen Pöbel zu beachten, dass Anmaßungen zu Gehörverlust führen können.
> Stroemelschnoesseldaemel!

---

Note that some corrections are out of scope for this little script, e.g.:

> Busse

In German, *Busse* and *Buße* are two words of vastly different meaning (*busses* and *penance*, respectively).
Unfortunately, they map to the same alternative spelling of *Busse*.
The tool sees *Busse* (meaning *just that*, with no intent of changing it), notices *Buße* is a legal substitution, and therefore makes it.
The tool has no awareness of context.

Turning substitutions like these off would mean the tool would no longer emit *Buße*, ever.
This could be as undesirable as the current behaviour.
There seems to be no easy resolve.

## Development

This project uses [poetry](https://python-poetry.org/) for dependency management.
Refer to the [poetry config file](https://github.com/alexpovel/betterletter/blob/master/pyproject.toml) for more info (e.g. the required Python modules to install if you don't want to deal with `poetry`).

Using poetry, from the project root, run:

```bash
# Installs virtual environment according to lock file (if available in repo),
# otherwise pyproject.toml:
poetry install
# Run command within that environment:
poetry run python -m betterletter -h
```

Development tasks are all run through `poetry`, within the context of the virtual environment.

Run [`just`](https://github.com/casey/just) (without arguments) for more available commands related to development.

## AutoHotKey

This tool can be integrated with [AutoHotKey](https://www.autohotkey.com/), allowing you to use it at the touch of a button.
This can be used to setup a keyboard shortcut to run this tool in-place, quickly replacing what you need without leaving your text editing environment.

The AutoHotKey file is [here](https://github.com/alexpovel/betterletter/blob/master/betterletter.ahk) and **requires [AutoHotKey v2](https://www.autohotkey.com/v2/)** (check out commits 7dd68f9 and earlier for the AHK v1.1 script).

Follow [this guide](https://www.autohotkey.com/docs/FAQ.htm#Startup) to have the script launch on boot automatically.

[AHK try icon](icon.ico) generated using <https://favicon.io/favicon-generator/>.

[^1]: In this demo, `Ctrl + C` and `Ctrl + V` are inserted automatically using the [AutoHotKey script](#autohotkey).
  The user only selects the desired text and presses the hotkey, amounting to two keystrokes.
  The delay between the `Ctrl + C` and `Ctrl + V` keystrokes in the above demo is the script actually doing its work.
  First, the script reads in a dictionary from disk, taking constant time (*O(1)*), aka it doesn't scale with input size, just dictionary size.
  Sadly, this takes comparatively long for short texts.
  However, the script scales acceptably with longer inputs (regular *O(n)*).
  **Very long inputs are required for the actual processing to take longer than the initial dictionary I/O.**
  Hence, this script could run very fast if it were (re-)designed as a daemon, with the dictionary preloaded in memory.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/alexpovel/betterletter/",
    "name": "betterletter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "spelling,umlaut,substitute,letter,alternative",
    "author": "Alex Povel",
    "author_email": "python@alexpovel.de",
    "download_url": "https://files.pythonhosted.org/packages/90/b0/79471b60b2b12f9f1fbd67681c2b7372fe06d49b414a660f7ad711800674/betterletter-1.2.1.tar.gz",
    "platform": null,
    "description": "# betterletter\n\nIn a given text, replaces alternative spellings of native characters with their proper spellings[^1]:\n\n![demo](docs/images/demo.gif)\n\n## Installation\n\n```shell\npip install betterletter\n```\n\n## Usage\n\nThe package [will install a Python script of the same name](https://python-poetry.org/docs/pyproject/#scripts), so instead of the usual `python -m betterletter`, you can simply invoke that directly, if the Python script directory is on your `$PATH`:\n\n```bash\n$ betterletter -h\nusage: betterletter [-h] [-c] [-f] [-r] [-g] [-d] [--debug] {de}\n\nTool to replace alternative spellings of native characters (e.g. German\numlauts [\u00e4, \u00f6, \u00fc] etc. [\u00df]) with the proper native characters. For example,\nthis problem occurs when no proper keyboard layout was available. This program\nis dictionary-based to check if replacements are valid words. By default,\nreads from STDIN and writes to STDOUT.\n\npositional arguments:\n  {de}             Text language to work with, in ISO 639-1 format.\n\noptions:\n  -h, --help       show this help message and exit\n  -c, --clipboard  Read from and write back to clipboard instead of\n                   STDIN/STDOUT.\n  -f, --force      Force substitutions and return the text version with the\n                   maximum number of substitutions, even if they are illegal\n                   words (useful for names).\n  -r, --reverse    Reverse mode, where all native characters are simply\n                   replaced by their alternative spellings.\n  -g, --gui        Stop and open a GUI prompt for confirmation before\n                   finishing.\n  -d, --diff       Print a diff view of the substitutions to stderr.\n  --debug          Output detailed logging information.\n```\n\n### Usage Examples\n\nNormal usage:\n\n```bash\n$ echo 'Hoeflich fragen waere angebracht!' | betterletter de\nH\u00f6flich fragen w\u00e4re angebracht!\n```\n\nReverse it:\n\n```bash\n$ echo 'H\u00f6flich fragen w\u00e4re angebracht!' | betterletter --reverse de\nHoeflich fragen waere angebracht!\n```\n\nA diff view, useful for longer text and to confirm correctness.\nThe [diff](https://docs.python.org/3/library/difflib.html) is written to STDERR so won't interfere with further redirection.\n\n```bash\n$ echo 'Hoeflich fragen waere angebracht!' | betterletter --diff de 2> diff.txt\nH\u00f6flich fragen w\u00e4re angebracht!\n$ cat diff.txt\n- Hoeflich fragen waere angebracht!\n?  ^^              ^^\n+ H\u00f6flich fragen w\u00e4re angebracht!\n?  ^              ^\n```\n\nThe tool may be coerced into working with names:\n\n```bash\n$ # A name won't be in the dictionary:\n$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter de\nSehr geehrte Frau Huebenstetter, ...\n$ # But we can force it to work:\n$ echo 'Sehr geehrte Frau Huebenstetter, ...' | betterletter --force de\nSehr geehrte Frau H\u00fcbenstetter, ...\n```\n\n[Clipboard-based](https://pypi.org/project/pyperclip/) workflows are also possible:\n\n```bash\n# Nothing happens: clipboard is read and written to silently.\n# Paste the processed version from your clipboard.\n$ betterletter --clipboard de\n```\n\n## Background\n\nFor example, German native characters and their corresponding alternative spellings (e.g. when no proper keyboard layout is at hand, or ASCII is used) are:\n\n| Native Character | Alternative Spelling |\n| :--------------: | :------------------: |\n|       \u00c4/\u00e4        |        Ae/ae         |\n|       \u00d6/\u00f6        |        Oe/oe         |\n|       \u00dc/\u00fc        |        Ue/ue         |\n|       \u1e9e/\u00df        |        SS/ss         |\n\nThese pairings are recorded [here](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/languages.json).\n\nGoing from left to right is simple: replace all native characters with their alternative spellings, minding case.\nThat use case is also supported by this tool (`reverse` flag).\n\nThe other direction is much less straightforward: there exist countless words for which alternative spellings occur somewhere as a pattern, yet replacing them with the corresponding native character would be wrong:\n\n| Character | Correct Spelling  | Wrong Spelling |\n| --------- | ----------------- | -------------- |\n| *\u00c4*       | **Ae**rodynamik   | \u00c4rodynamik     |\n| *\u00c4*       | Isr**ae**l        | Isr\u00e4l          |\n| *\u00c4*       | Schuf**ae**intrag | Schuf\u00e4intrag   |\n| *\u00d6*       | K**oe**ffizient   | K\u00f6ffizient     |\n| *\u00d6*       | Domin**oe**ffekt  | Domin\u00f6ffekt    |\n| *\u00d6*       | P**oet**          | P\u00f6t            |\n| *\u00dc*       | Abente**ue**r     | Abente\u00fcr       |\n| *\u00dc*       | Ma**ue**r         | Ma\u00fcr           |\n| *\u00dc*       | Ste**ue**rung     | Ste\u00fcrung       |\n| *\u00df*       | Me**ss**ger\u00e4t     | Me\u00dfger\u00e4t       |\n| *\u00df*       | Me**ss**e         | Me\u00dfe           |\n| *\u00df*       | Abschlu**ss**     | Abschlu\u00df       |\n\njust to name a few, pretty common examples.\n\nAs such, this tool is based on a dictionary lookup, see also the [containing directory](https://github.com/alexpovel/betterletter/blob/master/betterletter/resources/dicts/).\n\n## Long-form samples\n\nSee also the [tests](https://github.com/alexpovel/betterletter/blob/master/tests/).\n\n### de\n\nThe input:\n\n> Ueberhaupt braeuchte es mal einen Teststring.\n> Saetze ohne Bedeutung, aber mit vielen Umlauten.\n> DRPFA-Angehoerige gehoeren haeufig nicht dazu.\n> Bindestrich-Woerter spraechen Baende ueber Fehler.\n> Doppelgaenger-Doppelgaenger sind doppelt droelfzig.\n> Oder Uemlaeuten? Auslaeuten? Leute gaebe es, wuerde man meinen.\n> Ueble Nachrede ist naechtens nicht erlaubt.\n> Erlaube man dieses, waere es schoen uebertrieben.\n> Busse muesste geloest werden, bevor Gruesse zum Gruss kommen.\n> Busse sind Geraete, die womoeglich schnell fuehren.\n> Voegel sind aehnlich zu Oel.\n> Hierfuer ist fuer den droegen Poebel zu beachten, dass Anmassungen zu Gehoerverlust fuehren koennen.\n> Stroemelschnoesseldaemel!\n\nis turned into:\n\n> \u00dcberhaupt br\u00e4uchte es mal einen Teststring.\n> S\u00e4tze ohne Bedeutung, aber mit vielen Umlauten.\n> DRPFA-Angeh\u00f6rige geh\u00f6ren h\u00e4ufig nicht dazu.\n> Bindestrich-W\u00f6rter spr\u00e4chen B\u00e4nde \u00fcber Fehler.\n> Doppelg\u00e4nger-Doppelg\u00e4nger sind doppelt droelfzig.\n> Oder Uemlaeuten? Auslaeuten? Leute g\u00e4be es, w\u00fcrde man meinen.\n> \u00dcble Nachrede ist n\u00e4chtens nicht erlaubt.\n> Erlaube man dieses, w\u00e4re es sch\u00f6n \u00fcbertrieben.\n> Bu\u00dfe m\u00fcsste gel\u00f6st werden, bevor Gr\u00fc\u00dfe zum Gru\u00df kommen.\n> Bu\u00dfe sind Ger\u00e4te, die wom\u00f6glich schnell f\u00fchren.\n> V\u00f6gel sind \u00e4hnlich zu \u00d6l.\n> Hierf\u00fcr ist f\u00fcr den dr\u00f6gen P\u00f6bel zu beachten, dass Anma\u00dfungen zu Geh\u00f6rverlust f\u00fchren k\u00f6nnen.\n> Stroemelschnoesseldaemel!\n\n---\n\nNote that some corrections are out of scope for this little script, e.g.:\n\n> Busse\n\nIn German, *Busse* and *Bu\u00dfe* are two words of vastly different meaning (*busses* and *penance*, respectively).\nUnfortunately, they map to the same alternative spelling of *Busse*.\nThe tool sees *Busse* (meaning *just that*, with no intent of changing it), notices *Bu\u00dfe* is a legal substitution, and therefore makes it.\nThe tool has no awareness of context.\n\nTurning substitutions like these off would mean the tool would no longer emit *Bu\u00dfe*, ever.\nThis could be as undesirable as the current behaviour.\nThere seems to be no easy resolve.\n\n## Development\n\nThis project uses [poetry](https://python-poetry.org/) for dependency management.\nRefer to the [poetry config file](https://github.com/alexpovel/betterletter/blob/master/pyproject.toml) for more info (e.g. the required Python modules to install if you don't want to deal with `poetry`).\n\nUsing poetry, from the project root, run:\n\n```bash\n# Installs virtual environment according to lock file (if available in repo),\n# otherwise pyproject.toml:\npoetry install\n# Run command within that environment:\npoetry run python -m betterletter -h\n```\n\nDevelopment tasks are all run through `poetry`, within the context of the virtual environment.\n\nRun [`just`](https://github.com/casey/just) (without arguments) for more available commands related to development.\n\n## AutoHotKey\n\nThis tool can be integrated with [AutoHotKey](https://www.autohotkey.com/), allowing you to use it at the touch of a button.\nThis can be used to setup a keyboard shortcut to run this tool in-place, quickly replacing what you need without leaving your text editing environment.\n\nThe AutoHotKey file is [here](https://github.com/alexpovel/betterletter/blob/master/betterletter.ahk) and **requires [AutoHotKey v2](https://www.autohotkey.com/v2/)** (check out commits 7dd68f9 and earlier for the AHK v1.1 script).\n\nFollow [this guide](https://www.autohotkey.com/docs/FAQ.htm#Startup) to have the script launch on boot automatically.\n\n[AHK try icon](icon.ico) generated using <https://favicon.io/favicon-generator/>.\n\n[^1]: In this demo, `Ctrl + C` and `Ctrl + V` are inserted automatically using the [AutoHotKey script](#autohotkey).\n  The user only selects the desired text and presses the hotkey, amounting to two keystrokes.\n  The delay between the `Ctrl + C` and `Ctrl + V` keystrokes in the above demo is the script actually doing its work.\n  First, the script reads in a dictionary from disk, taking constant time (*O(1)*), aka it doesn't scale with input size, just dictionary size.\n  Sadly, this takes comparatively long for short texts.\n  However, the script scales acceptably with longer inputs (regular *O(n)*).\n  **Very long inputs are required for the actual processing to take longer than the initial dictionary I/O.**\n  Hence, this script could run very fast if it were (re-)designed as a daemon, with the dictionary preloaded in memory.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Substitute alternative spellings of native characters (e.g. German umlauts [ae, oe, ue] etc. [ss]) with their correct versions (\u00e4, \u00f6, \u00fc, \u00df).",
    "version": "1.2.1",
    "split_keywords": [
        "spelling",
        "umlaut",
        "substitute",
        "letter",
        "alternative"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c03779fa7b5fa1c6925212f92e5388d9782c27805e666a7eb83bab26fdb8dfe7",
                "md5": "87564f906ff0672211eff63d74c4617d",
                "sha256": "bcdb6f8dbee15a72318131d9329d6d503f5e211423f3011937c502e8de6d2234"
            },
            "downloads": -1,
            "filename": "betterletter-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "87564f906ff0672211eff63d74c4617d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 6683859,
            "upload_time": "2023-04-21T20:13:35",
            "upload_time_iso_8601": "2023-04-21T20:13:35.760087Z",
            "url": "https://files.pythonhosted.org/packages/c0/37/79fa7b5fa1c6925212f92e5388d9782c27805e666a7eb83bab26fdb8dfe7/betterletter-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90b079471b60b2b12f9f1fbd67681c2b7372fe06d49b414a660f7ad711800674",
                "md5": "7d17bb73af8d70462cd75dfab4336c58",
                "sha256": "b0ce3262d60311e56aa235b0bba760d54bdd7d60ad9903558ca1833ce54509fb"
            },
            "downloads": -1,
            "filename": "betterletter-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7d17bb73af8d70462cd75dfab4336c58",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 6659847,
            "upload_time": "2023-04-21T20:13:37",
            "upload_time_iso_8601": "2023-04-21T20:13:37.333993Z",
            "url": "https://files.pythonhosted.org/packages/90/b0/79471b60b2b12f9f1fbd67681c2b7372fe06d49b414a660f7ad711800674/betterletter-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-21 20:13:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "alexpovel",
    "github_project": "betterletter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "betterletter"
}
        
Elapsed time: 0.07799s