precis-i18n


Nameprecis-i18n JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/byllyfish/precis_i18n
SummaryPRECIS-i18n: Internationalized Usernames and Passwords
upload_time2024-11-12 21:11:27
maintainerNone
docs_urlNone
authorWilliam W. Fisher
requires_pythonNone
licenseMIT
keywords precis codec username password
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PRECIS-i18n: Internationalized Usernames and Passwords

[![MIT licensed](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/byllyfish/precis_i18n/main/LICENSE.txt)
[![Build Status](https://github.com/byllyfish/precis_i18n/actions/workflows/ci.yml/badge.svg)](https://github.com/byllyfish/precis_i18n/actions/workflows/ci.yml)
[![codecov.io](https://codecov.io/gh/byllyfish/precis_i18n/coverage.svg?branch=main)](https://codecov.io/gh/byllyfish/precis_i18n?branch=main)

If you want your application to accept Unicode user names and passwords,
you must be careful in how you validate and compare them. The PRECIS
framework makes internationalized user names and passwords safer for use
by applications. PRECIS profiles transform Unicode strings into a
canonical form, suitable for comparison.

This module implements the PRECIS Framework as described in:

-   PRECIS Framework: Preparation, Enforcement, and Comparison of
    Internationalized Strings in Application Protocols ([RFC
    8264](https://tools.ietf.org/html/rfc8264))
-   Preparation, Enforcement, and Comparison of Internationalized
    Strings Representing Usernames and Passwords ([RFC
    8265](https://tools.ietf.org/html/rfc8265))
-   Preparation, Enforcement, and Comparison of Internationalized
    Strings Representing Nicknames ([RFC
    8266](https://tools.ietf.org/html/rfc8266))

Requires Python 3.5 or later.

## Usage

Use the `get_profile` function to obtain a profile object, then use its
`enforce` method. The `enforce` method returns a Unicode string.

```pycon
>>> from precis_i18n import get_profile
>>> username = get_profile('UsernameCaseMapped')
>>> username.enforce('Kevin')
'kevin'
>>> username.enforce('\u212Aevin')
'kevin'
>>> username.enforce('\uFF2Bevin')
'kevin'
>>> username.enforce('\U0001F17Aevin')
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'UsernameCaseMapped' codec can't encode character '\U0001f17a' in position 0: DISALLOWED/symbols

```

Alternatively, you can use the Python `str.encode` API. Import the
`precis_i18n.codec` module to register the PRECIS codec names. Now you
can use the `str.encode` method with any Unicode string. The result will
be a UTF-8 encoded byte string or a `UnicodeEncodeError` if the string
is disallowed.

```pycon
>>> import precis_i18n.codec
>>> 'Kevin'.encode('UsernameCasePreserved')
b'Kevin'
>>> '\u212Aevin'.encode('UsernameCasePreserved')
b'Kevin'
>>> '\uFF2Bevin'.encode('UsernameCasePreserved')
b'Kevin'
>>> '\u212Aevin'.encode('UsernameCaseMapped')
b'kevin'
>>> '\uFF2Bevin'.encode('OpaqueString')
b'\xef\xbc\xabevin'
>>> '\U0001F17Aevin'.encode('UsernameCasePreserved')
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'UsernameCasePreserved' codec can't encode character '\U0001f17a' in position 0: DISALLOWED/symbols

```

## Alternative Unicode Versions

The `get_profile` function uses whatever version of `unicodedata` is
provided by the Python runtime. The Unicode version is usually tied to
the major version of the Python runtime. Python 3.7.x uses Unicode 11.0.
Python 3.6.x uses Unicode 10.0.

To use an alternative `unicodedata` implementation, pass the
`unicodedata` keyword argument to `get_profile`.

For example, you could separately install version 12.0 of the
`unicodedata2` module from PyPI. Then, pass it to get_profile to
retrieve a profile that uses Unicode 12.0.

```pycon
>>> import unicodedata2
>>> from precis_i18n import get_profile
>>> username = get_profile('UsernameCaseMapped', unicodedata=unicodedata2)
>>> username.enforce('Kevin')
'kevin'

```

## Supported Profiles and Codecs

Each PRECIS profile has a corresponding codec name. The `CaseMapped`
variant converts the string to lower case for implementing
case-insensitive comparison.

-   UsernameCasePreserved
-   UsernameCaseMapped
-   OpaqueString
-   NicknameCasePreserved
-   NicknameCaseMapped

The `CaseMapped` profiles use Unicode `ToLower` per the latest RFC.
Previous versions of this package used Unicode Default Case Folding.
There are CaseMapped variants for different case transformations. These
profile names are deprecated:

-   UsernameCaseMapped:ToLower
-   UsernameCaseMapped:CaseFold
-   NicknameCaseMapped:ToLower
-   NicknameCaseMapped:CaseFold

The PRECIS base string classes are also available as codecs:

-   IdentifierClass
-   FreeFormClass

## Userparts and Space Delimited Usernames

The Username profiles in this implementation do not allow spaces. The
Username profiles correspond to the definition of \"userparts\" in RFC
8265. If you want to allow spaces in your application\'s user names, you
must split the string first.

```python
def enforce_app_username(name):
    profile = precis_i18n.get_profile('UsernameCasePreserved')
    userparts = [profile.enforce(userpart) for userpart in name.split(' ')]
    return ' '.join(userparts)
```

Be aware that a username constructed this way can contain bidirectional
text in the separate userparts.

## Error Messages

A PRECIS profile raises a `UnicodeEncodeError` exception if a string is
disallowed. The `reason` field specifies the kind of error.

Reason                                 | Explanation
-------------------------------------- | ------------------------------------------
DISALLOWED/arabic_indic                |  Arabic-Indic digits cannot be mixed with Extended Arabic-Indic Digits. (Context)
DISALLOWED/bidi_rule                   |  Right-to-left string cannot contain left-to-right characters due to the \"Bidi\" rule. (Context)
DISALLOWED/controls                    |  Control character is not allowed.
DISALLOWED/empty                       |  After applying the profile, the result cannot be empty.
DISALLOWED/exceptions                  |  Exception character is not allowed.
DISALLOWED/extended_arabic_indic       |    Extended Arabic-Indic digits cannot be mixed with Arabic-Indic Digits. (Context)
DISALLOWED/greek_keraia                |    Greek keraia must be followed by a Greek character. (Context)
DISALLOWED/has_compat                  |    Compatibility characters are not allowed.
DISALLOWED/hebrew_punctuation          |    Hebrew punctuation geresh or gershayim must be preceded by Hebrew character. (Context)
DISALLOWED/katakana_middle_dot         |    Katakana middle dot must be accompanied by a Hiragana, Katakana, or Han character. (Context)
DISALLOWED/middle_dot                  |    Middle dot must be surrounded by the letter \'l\'. (Context)
DISALLOWED/not_idempotent              |    After reapplying the profile, the result is not stable.
DISALLOWED/old_hangul_jamo             |    Conjoining Hangul Jamo is not allowed.
DISALLOWED/other                       |    Other character is not allowed.
DISALLOWED/other_letter_digits         |    Non-traditional letter or digit is not allowed.
DISALLOWED/precis_ignorable_properties | Default ignorable or non-character is not allowed.
DISALLOWED/punctuation                 |   Non-ASCII punctuation character is not allowed.
DISALLOWED/spaces                      |   Space character is not allowed.
DISALLOWED/symbols                     |   Non-ASCII symbol character is not allowed.
DISALLOWED/unassigned                  |   Unassigned Unicode character is not allowed.
DISALLOWED/zero_width_joiner           |   Zero width joiner must immediately follow a combining virama. (Context)
DISALLOWED/zero_width_nonjoiner        |   Zero width non-joiner must immediately follow a combining virama, or appear where it breaks a cursive connection in a formally cursive script. (Context)


## The Nickname Profile and White Space

When PRECIS processes a string using the `Nickname` profile, one of the
enforcement steps silently removes leading and trailing white space.
Starting with version 1.1, this library uses a more *restrictive*
definition of *white space* in the `Nickname` profile.

-   1.1 and later *only* include Unicode category `Zs`. If you try to
    enforce a Nickname that contains white space characters like `'\n'`,
    you will get a UnicodeEncodeError `DISALLOWED/controls`.
-   1.0.5 and earlier included control characters such as `'\n'`,
    `'\t'`, and `'\r'` when removing leading/trailing white space from
    Nicknames. The result treated these legacy white space characters
    the same as `Zs` and stripped them.
-   In all versions, *internal* white space (not leading or trailing)
    matches Unicode category `Zs` only.

The trimming of white space is specific to the Nickname profile only.
Here is an example of the current behavior:

```pycon
>>> from precis_i18n import get_profile
>>> nickname = get_profile('NicknameCaseMapped')
>>> nickname.enforce('Kevin\n')
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'NicknameCaseMapped' codec can't encode character '\x0a' in position 5: DISALLOWED/controls

```

In version 1.0.5 and earlier, the `NicknameCaseMapped` profile enforced `"Kevin\n"`
as `"kevin"`.

## Unicode Version Update Procedure

When Unicode releases a new version, take the following steps to update
internal tables and pass unit tests:

-   Under a version of Python that supports the new Unicode version, run
    the tests using `python -m unittest discover` and check that the
    `test_derived_props` test FAILS due to a missing file.
-   Generate a new `derived-props` file by running
    `PYTHONPATH=. python test/test_derived_props.py > derived-props-VERSION.txt`.
    Rename the file using the Unicode version, and re-run the tests. The
    unit tests will further check that no derived properties in the new
    file contradict the previous values.
-   Check for changes to internal tables used for context rules by
    running `PYTHONPATH=. python tools/check_codepoints.py`. Update the
    corresponding tables in precis_i18n/unicode.py if necessary.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/byllyfish/precis_i18n",
    "name": "precis-i18n",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "precis codec username password",
    "author": "William W. Fisher",
    "author_email": "william.w.fisher@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a5/7b/2955d45048ac7f00288df231722a63a3ac89716bf7f59f4a4e4c69d0534c/precis_i18n-1.1.1.tar.gz",
    "platform": null,
    "description": "# PRECIS-i18n: Internationalized Usernames and Passwords\n\n[![MIT licensed](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/byllyfish/precis_i18n/main/LICENSE.txt)\n[![Build Status](https://github.com/byllyfish/precis_i18n/actions/workflows/ci.yml/badge.svg)](https://github.com/byllyfish/precis_i18n/actions/workflows/ci.yml)\n[![codecov.io](https://codecov.io/gh/byllyfish/precis_i18n/coverage.svg?branch=main)](https://codecov.io/gh/byllyfish/precis_i18n?branch=main)\n\nIf you want your application to accept Unicode user names and passwords,\nyou must be careful in how you validate and compare them. The PRECIS\nframework makes internationalized user names and passwords safer for use\nby applications. PRECIS profiles transform Unicode strings into a\ncanonical form, suitable for comparison.\n\nThis module implements the PRECIS Framework as described in:\n\n-   PRECIS Framework: Preparation, Enforcement, and Comparison of\n    Internationalized Strings in Application Protocols ([RFC\n    8264](https://tools.ietf.org/html/rfc8264))\n-   Preparation, Enforcement, and Comparison of Internationalized\n    Strings Representing Usernames and Passwords ([RFC\n    8265](https://tools.ietf.org/html/rfc8265))\n-   Preparation, Enforcement, and Comparison of Internationalized\n    Strings Representing Nicknames ([RFC\n    8266](https://tools.ietf.org/html/rfc8266))\n\nRequires Python 3.5 or later.\n\n## Usage\n\nUse the `get_profile` function to obtain a profile object, then use its\n`enforce` method. The `enforce` method returns a Unicode string.\n\n```pycon\n>>> from precis_i18n import get_profile\n>>> username = get_profile('UsernameCaseMapped')\n>>> username.enforce('Kevin')\n'kevin'\n>>> username.enforce('\\u212Aevin')\n'kevin'\n>>> username.enforce('\\uFF2Bevin')\n'kevin'\n>>> username.enforce('\\U0001F17Aevin')\nTraceback (most recent call last):\n  ...\nUnicodeEncodeError: 'UsernameCaseMapped' codec can't encode character '\\U0001f17a' in position 0: DISALLOWED/symbols\n\n```\n\nAlternatively, you can use the Python `str.encode` API. Import the\n`precis_i18n.codec` module to register the PRECIS codec names. Now you\ncan use the `str.encode` method with any Unicode string. The result will\nbe a UTF-8 encoded byte string or a `UnicodeEncodeError` if the string\nis disallowed.\n\n```pycon\n>>> import precis_i18n.codec\n>>> 'Kevin'.encode('UsernameCasePreserved')\nb'Kevin'\n>>> '\\u212Aevin'.encode('UsernameCasePreserved')\nb'Kevin'\n>>> '\\uFF2Bevin'.encode('UsernameCasePreserved')\nb'Kevin'\n>>> '\\u212Aevin'.encode('UsernameCaseMapped')\nb'kevin'\n>>> '\\uFF2Bevin'.encode('OpaqueString')\nb'\\xef\\xbc\\xabevin'\n>>> '\\U0001F17Aevin'.encode('UsernameCasePreserved')\nTraceback (most recent call last):\n  ...\nUnicodeEncodeError: 'UsernameCasePreserved' codec can't encode character '\\U0001f17a' in position 0: DISALLOWED/symbols\n\n```\n\n## Alternative Unicode Versions\n\nThe `get_profile` function uses whatever version of `unicodedata` is\nprovided by the Python runtime. The Unicode version is usually tied to\nthe major version of the Python runtime. Python 3.7.x uses Unicode 11.0.\nPython 3.6.x uses Unicode 10.0.\n\nTo use an alternative `unicodedata` implementation, pass the\n`unicodedata` keyword argument to `get_profile`.\n\nFor example, you could separately install version 12.0 of the\n`unicodedata2` module from PyPI. Then, pass it to get_profile to\nretrieve a profile that uses Unicode 12.0.\n\n```pycon\n>>> import unicodedata2\n>>> from precis_i18n import get_profile\n>>> username = get_profile('UsernameCaseMapped', unicodedata=unicodedata2)\n>>> username.enforce('Kevin')\n'kevin'\n\n```\n\n## Supported Profiles and Codecs\n\nEach PRECIS profile has a corresponding codec name. The `CaseMapped`\nvariant converts the string to lower case for implementing\ncase-insensitive comparison.\n\n-   UsernameCasePreserved\n-   UsernameCaseMapped\n-   OpaqueString\n-   NicknameCasePreserved\n-   NicknameCaseMapped\n\nThe `CaseMapped` profiles use Unicode `ToLower` per the latest RFC.\nPrevious versions of this package used Unicode Default Case Folding.\nThere are CaseMapped variants for different case transformations. These\nprofile names are deprecated:\n\n-   UsernameCaseMapped:ToLower\n-   UsernameCaseMapped:CaseFold\n-   NicknameCaseMapped:ToLower\n-   NicknameCaseMapped:CaseFold\n\nThe PRECIS base string classes are also available as codecs:\n\n-   IdentifierClass\n-   FreeFormClass\n\n## Userparts and Space Delimited Usernames\n\nThe Username profiles in this implementation do not allow spaces. The\nUsername profiles correspond to the definition of \\\"userparts\\\" in RFC\n8265. If you want to allow spaces in your application\\'s user names, you\nmust split the string first.\n\n```python\ndef enforce_app_username(name):\n    profile = precis_i18n.get_profile('UsernameCasePreserved')\n    userparts = [profile.enforce(userpart) for userpart in name.split(' ')]\n    return ' '.join(userparts)\n```\n\nBe aware that a username constructed this way can contain bidirectional\ntext in the separate userparts.\n\n## Error Messages\n\nA PRECIS profile raises a `UnicodeEncodeError` exception if a string is\ndisallowed. The `reason` field specifies the kind of error.\n\nReason                                 | Explanation\n-------------------------------------- | ------------------------------------------\nDISALLOWED/arabic_indic                |  Arabic-Indic digits cannot be mixed with Extended Arabic-Indic Digits. (Context)\nDISALLOWED/bidi_rule                   |  Right-to-left string cannot contain left-to-right characters due to the \\\"Bidi\\\" rule. (Context)\nDISALLOWED/controls                    |  Control character is not allowed.\nDISALLOWED/empty                       |  After applying the profile, the result cannot be empty.\nDISALLOWED/exceptions                  |  Exception character is not allowed.\nDISALLOWED/extended_arabic_indic       |    Extended Arabic-Indic digits cannot be mixed with Arabic-Indic Digits. (Context)\nDISALLOWED/greek_keraia                |    Greek keraia must be followed by a Greek character. (Context)\nDISALLOWED/has_compat                  |    Compatibility characters are not allowed.\nDISALLOWED/hebrew_punctuation          |    Hebrew punctuation geresh or gershayim must be preceded by Hebrew character. (Context)\nDISALLOWED/katakana_middle_dot         |    Katakana middle dot must be accompanied by a Hiragana, Katakana, or Han character. (Context)\nDISALLOWED/middle_dot                  |    Middle dot must be surrounded by the letter \\'l\\'. (Context)\nDISALLOWED/not_idempotent              |    After reapplying the profile, the result is not stable.\nDISALLOWED/old_hangul_jamo             |    Conjoining Hangul Jamo is not allowed.\nDISALLOWED/other                       |    Other character is not allowed.\nDISALLOWED/other_letter_digits         |    Non-traditional letter or digit is not allowed.\nDISALLOWED/precis_ignorable_properties | Default ignorable or non-character is not allowed.\nDISALLOWED/punctuation                 |   Non-ASCII punctuation character is not allowed.\nDISALLOWED/spaces                      |   Space character is not allowed.\nDISALLOWED/symbols                     |   Non-ASCII symbol character is not allowed.\nDISALLOWED/unassigned                  |   Unassigned Unicode character is not allowed.\nDISALLOWED/zero_width_joiner           |   Zero width joiner must immediately follow a combining virama. (Context)\nDISALLOWED/zero_width_nonjoiner        |   Zero width non-joiner must immediately follow a combining virama, or appear where it breaks a cursive connection in a formally cursive script. (Context)\n\n\n## The Nickname Profile and White Space\n\nWhen PRECIS processes a string using the `Nickname` profile, one of the\nenforcement steps silently removes leading and trailing white space.\nStarting with version 1.1, this library uses a more *restrictive*\ndefinition of *white space* in the `Nickname` profile.\n\n-   1.1 and later *only* include Unicode category `Zs`. If you try to\n    enforce a Nickname that contains white space characters like `'\\n'`,\n    you will get a UnicodeEncodeError `DISALLOWED/controls`.\n-   1.0.5 and earlier included control characters such as `'\\n'`,\n    `'\\t'`, and `'\\r'` when removing leading/trailing white space from\n    Nicknames. The result treated these legacy white space characters\n    the same as `Zs` and stripped them.\n-   In all versions, *internal* white space (not leading or trailing)\n    matches Unicode category `Zs` only.\n\nThe trimming of white space is specific to the Nickname profile only.\nHere is an example of the current behavior:\n\n```pycon\n>>> from precis_i18n import get_profile\n>>> nickname = get_profile('NicknameCaseMapped')\n>>> nickname.enforce('Kevin\\n')\nTraceback (most recent call last):\n  ...\nUnicodeEncodeError: 'NicknameCaseMapped' codec can't encode character '\\x0a' in position 5: DISALLOWED/controls\n\n```\n\nIn version 1.0.5 and earlier, the `NicknameCaseMapped` profile enforced `\"Kevin\\n\"`\nas `\"kevin\"`.\n\n## Unicode Version Update Procedure\n\nWhen Unicode releases a new version, take the following steps to update\ninternal tables and pass unit tests:\n\n-   Under a version of Python that supports the new Unicode version, run\n    the tests using `python -m unittest discover` and check that the\n    `test_derived_props` test FAILS due to a missing file.\n-   Generate a new `derived-props` file by running\n    `PYTHONPATH=. python test/test_derived_props.py > derived-props-VERSION.txt`.\n    Rename the file using the Unicode version, and re-run the tests. The\n    unit tests will further check that no derived properties in the new\n    file contradict the previous values.\n-   Check for changes to internal tables used for context rules by\n    running `PYTHONPATH=. python tools/check_codepoints.py`. Update the\n    corresponding tables in precis_i18n/unicode.py if necessary.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "PRECIS-i18n: Internationalized Usernames and Passwords",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://github.com/byllyfish/precis_i18n"
    },
    "split_keywords": [
        "precis",
        "codec",
        "username",
        "password"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26e3fe4c56b96b3af0990ecfd03f7d6e5c05b75067574da71527d21778ac8a54",
                "md5": "d0cc97b5f7a9acac36cd5bf6b121610c",
                "sha256": "eabb3a3a8c01dededbc36cb0173f143c7bdfaa3f4fcd9b4ca215c9fd6aed865f"
            },
            "downloads": -1,
            "filename": "precis_i18n-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d0cc97b5f7a9acac36cd5bf6b121610c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 27677,
            "upload_time": "2024-11-12T21:11:25",
            "upload_time_iso_8601": "2024-11-12T21:11:25.974369Z",
            "url": "https://files.pythonhosted.org/packages/26/e3/fe4c56b96b3af0990ecfd03f7d6e5c05b75067574da71527d21778ac8a54/precis_i18n-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a57b2955d45048ac7f00288df231722a63a3ac89716bf7f59f4a4e4c69d0534c",
                "md5": "64de7da414c73a34dd9b38e6d7c6d67c",
                "sha256": "369fe3bcc29ea56ce0b5603e26165d0aabd885168512d92fc08e4f60d716bb31"
            },
            "downloads": -1,
            "filename": "precis_i18n-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "64de7da414c73a34dd9b38e6d7c6d67c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 69511,
            "upload_time": "2024-11-12T21:11:27",
            "upload_time_iso_8601": "2024-11-12T21:11:27.826540Z",
            "url": "https://files.pythonhosted.org/packages/a5/7b/2955d45048ac7f00288df231722a63a3ac89716bf7f59f4a4e4c69d0534c/precis_i18n-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-12 21:11:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "byllyfish",
    "github_project": "precis_i18n",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "precis-i18n"
}
        
Elapsed time: 0.50793s