# character normalization (especially for Latin letters - linguistic purposes)
## pip install normaltext
the lookup function simplifies character analysis, provides replacement
suggestions, and offers performance improvements through memorization.
It can be beneficial for tasks involving character normalization,
text processing, or any scenario where character properties and
substitutions are relevant.
#### Tested against Windows 10 / Python 3.10 / Anaconda
The lookup function can be used by developers or anyone working
with text processing or character manipulation tasks. It provides
information about a given character and suggests
a replacement based on certain criteria.
### Character Information:
The function retrieves the name of the character using unicodedata.name() and provides a sorted list of words representing the character name.
This can be useful for analyzing and understanding the properties of a character.
### Suggested Replacement:
The function suggests a replacement for the character based on the provided criteria.
By considering factors like case sensitivity, printability, and capitalization, the function offers a recommended substitution.
This can be beneficial when you need to transform
or normalize characters in a specific context.
### Memoization and Performance:
The function utilizes the functools.lru_cache decorator, which caches the results of previous function calls.
This means that if the function is called multiple times with the same character,
the result is retrieved from the cache instead of recomputing it.
This caching mechanism can significantly improve the performance of the function when there are repetitive
or redundant character lookups.
### Flexibility:
The lookup function provides optional parameters that allow customization of its behavior.
The case_sens parameter determines whether case sensitivity is considered for replacements.
The replace parameter allows setting a default replacement character.
The add_to_printable parameter enables the addition of extra uppercase characters to the set of printable characters.
These options provide flexibility to adapt the
function to different requirements and use cases.
```python
from normaltext import lookup
sen = "Montréal, über, 12.89, Mère, Françoise, noël, 889"
norm = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in sen])
print(norm)
#########################
sen2 = 'kožušček'
norm2 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in sen2])
print(norm2)
#########################
sen3 = "Falsches Üben von Xylophonmusik quält jeden größeren Zwerg."
norm3 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in
sen3]) # doesn't preserve ü - ue ...
print(norm3)
#########################
sen4 = "cætera"
norm4 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='ae')['suggested'] for k in
sen4]) # doesn't preserve ü - ue ...
print(norm4)
Montreal, uber, 12.89, Mere, Francoise, noel, 889
kozuscek
Falsches Uben von Xylophonmusik qualt jeden groseren Zwerg.
caetera
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/normaltext",
"name": "normaltext",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "character normalization",
"author": "Johannes Fischer",
"author_email": "aulasparticularesdealemaosp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/50/4c/7b4bea694434e75d9aacecee741004fa820376cd69493963f5d81b52a567/normaltext-0.10.tar.gz",
"platform": null,
"description": "\r\n# character normalization (especially for Latin letters - linguistic purposes)\r\n\r\n## pip install normaltext \r\n\r\nthe lookup function simplifies character analysis, provides replacement\r\n suggestions, and offers performance improvements through memorization. \r\n It can be beneficial for tasks involving character normalization, \r\n text processing, or any scenario where character properties and \r\n substitutions are relevant.\r\n\r\n#### Tested against Windows 10 / Python 3.10 / Anaconda \r\n\r\n\r\nThe lookup function can be used by developers or anyone working \r\nwith text processing or character manipulation tasks. It provides \r\ninformation about a given character and suggests \r\na replacement based on certain criteria.\r\n\r\n### Character Information: \r\n\r\nThe function retrieves the name of the character using unicodedata.name() and provides a sorted list of words representing the character name. \r\nThis can be useful for analyzing and understanding the properties of a character.\r\n\r\n### Suggested Replacement: \r\n\r\nThe function suggests a replacement for the character based on the provided criteria. \r\nBy considering factors like case sensitivity, printability, and capitalization, the function offers a recommended substitution. \r\nThis can be beneficial when you need to transform \r\nor normalize characters in a specific context.\r\n\r\n### Memoization and Performance: \r\n\r\nThe function utilizes the functools.lru_cache decorator, which caches the results of previous function calls. \r\nThis means that if the function is called multiple times with the same character, \r\nthe result is retrieved from the cache instead of recomputing it.\r\nThis caching mechanism can significantly improve the performance of the function when there are repetitive \r\nor redundant character lookups.\r\n\r\n### Flexibility: \r\n\r\nThe lookup function provides optional parameters that allow customization of its behavior. \r\nThe case_sens parameter determines whether case sensitivity is considered for replacements. \r\nThe replace parameter allows setting a default replacement character. \r\nThe add_to_printable parameter enables the addition of extra uppercase characters to the set of printable characters. \r\nThese options provide flexibility to adapt the \r\nfunction to different requirements and use cases.\r\n\r\n\r\n\r\n\r\n```python\r\nfrom normaltext import lookup\r\nsen = \"Montr\u00e9al, \u00fcber, 12.89, M\u00e8re, Fran\u00e7oise, no\u00ebl, 889\"\r\nnorm = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in sen])\r\nprint(norm)\r\n#########################\r\nsen2 = 'ko\u017eu\u0161\u010dek'\r\nnorm2 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in sen2])\r\nprint(norm2)\r\n#########################\r\nsen3 = \"Falsches \u00dcben von Xylophonmusik qu\u00e4lt jeden gr\u00f6\u00dferen Zwerg.\"\r\nnorm3 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='')['suggested'] for k in\r\n sen3]) # doesn't preserve \u00fc - ue ...\r\nprint(norm3)\r\n#########################\r\nsen4 = \"c\u00e6tera\"\r\nnorm4 = ''.join([lookup(k, case_sens=True, replace='x', add_to_printable='ae')['suggested'] for k in\r\n sen4]) # doesn't preserve \u00fc - ue ...\r\nprint(norm4)\r\nMontreal, uber, 12.89, Mere, Francoise, noel, 889\r\nkozuscek\r\nFalsches Uben von Xylophonmusik qualt jeden groseren Zwerg.\r\ncaetera\r\n\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "character normalization (especially for Latin letters - linguistic purposes)",
"version": "0.10",
"project_urls": {
"Homepage": "https://github.com/hansalemaos/normaltext"
},
"split_keywords": [
"character",
"normalization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8845a4f21d3f925cc126656c2646f131e70ba7aa7cd065105d8a013abd851a11",
"md5": "7f5c0a8745bd36e5d37140574d678e6e",
"sha256": "89cf4936fb8776619dac04cdc626bf51b3eabcf6ba7592fd05163e13c47ca0f1"
},
"downloads": -1,
"filename": "normaltext-0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7f5c0a8745bd36e5d37140574d678e6e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6755,
"upload_time": "2023-07-06T08:42:38",
"upload_time_iso_8601": "2023-07-06T08:42:38.940510Z",
"url": "https://files.pythonhosted.org/packages/88/45/a4f21d3f925cc126656c2646f131e70ba7aa7cd065105d8a013abd851a11/normaltext-0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "504c7b4bea694434e75d9aacecee741004fa820376cd69493963f5d81b52a567",
"md5": "6291869b766bcf96aca058dedbe5c6f5",
"sha256": "36c352d1ccba9cea3c35c30d97acfbdeae5385b6928163df514e1d152d53eda9"
},
"downloads": -1,
"filename": "normaltext-0.10.tar.gz",
"has_sig": false,
"md5_digest": "6291869b766bcf96aca058dedbe5c6f5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4783,
"upload_time": "2023-07-06T08:42:40",
"upload_time_iso_8601": "2023-07-06T08:42:40.608271Z",
"url": "https://files.pythonhosted.org/packages/50/4c/7b4bea694434e75d9aacecee741004fa820376cd69493963f5d81b52a567/normaltext-0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-06 08:42:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hansalemaos",
"github_project": "normaltext",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "normaltext"
}