unirange


Nameunirange JSON
Version 1.0 PyPI version JSON
download
home_page
SummaryUnirange is a notation for specifying multiple Unicode codepoints.
upload_time2023-05-28 23:53:41
maintainer
docs_urlNone
authorWhoAteMyButter
requires_python>=3.11
licenseMIT
keywords unicode characters range unirange
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Unirange

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Pylint](https://img.shields.io/badge/pylint-10.00-ffbf48)](https://pylint.pycqa.org/en/latest/)
[![License](https://img.shields.io/badge/license-MIT-a51931)](https://spdx.org/licenses/MIT.html)
[![PyPi](https://img.shields.io/pypi/v/unirange)](https://pypi.org/project/unirange/)
[![GitLab Release (latest by SemVer)](https://img.shields.io/gitlab/v/release/46367257?sort=semver)](https://gitlab.com/whoatemybutter/unirange/-/releases)

Unirange is a notation for specifying multiple Unicode codepoints.

A unirange comprises comma-delimited **components**.

A **part** is a notation for a single character, like ``A``, ``U+2600``, or ``0x7535``.
It is matched by the regular expression ``!?(?:0x|U\+|&#x)([0-9A-F]{1,7});?|(.)``

A **range** is two **parts** split by ``..`` (two dots) or ``-`` (a hyphen).
It is matched by the regular expression ``(?PART(?:-|\.\.)PART)``

A **component** comprises either a **range** or a **part**.
It is matched by the regular expression ``(RANGE|PART)``

The full unirange notation is matched by the regular expression ``(?:COMPONENT, ?)*``

Exclusion can be applied to any component by prefixing it with a ``!``.
This will instead perform the *difference* (subtraction) on the current set of characters.

---

## Table of contents

- [๐Ÿ“„ About](#-about)
- [๐Ÿ“ฆ Installation](#-installation)
- [๐Ÿ›  Usage](#-usage)
- [๐Ÿ“ฐ Changelog](#-changelog)
- [๐Ÿ“œ License](#-license)

---

## ๐Ÿ“„ About

### Component

A component is either a *range*, or a *part*.
These components define what characters are included or excluded by the unirange.

### Part

A part is a *single* character notation.
In a *range*, there exist two parts, split by ``..`` or ``-``.
In the range ``U+2600..U+26FF``, ``U+2600`` and ``U+26FF`` are parts.

Parts can match any of these regular expressions:

* ``U\+.{1,6}``
* ``&#x.{1,6}``
* ``0x.{1,6}``
* ``.``

If more than one character is in a part, and it is *not* prefixed, it is **invalid**.
For example, ``2600`` is not a valid part, but ``U+2600`` is.

> There is no way to specify a codepoint in a base system other than **hexadecimal**.
> ``&#1234`` is not valid.

### Range

A range is two *parts* separated by ``..`` or ``-``.

#### Implied infinite expansion

If either (but not both) part of the range is absent, it is called **implied infinite expansion** *(IIE)*.
With IIE, the range's boundaries are implied to become to lower or upper limits of the Unicode character set.

If the first part is absent, the first part becomes U+0000.
If the second part is absent, it becomes U+10FFFF.
If both parts are absent, *the range is invalid*.

This means that the range ``U+2600..`` will result in characters from U+2600 to U+10FFFF.
It is semantically equivalent to ``U+2600..U+10FFFF``.

This also applies to the reverse: the range ``..U+2600`` will result in characters from U+0000 to U+2600.
Likewise, it is equivalent to ``U+0000..U+2600``.

### Exclusion

To exclude a character from being included in a resulting range, prefix a component with a ``!``.
This will prevent it from being included in the range, regardless of what other parts indicate.

For example, ``U+2600..U+26FF, U+2704, !U+2605`` will include the codepoints from U+2600 **up to** U+2605,
and then from U+2606 to U+26FF, as well as U+2704.

You can exclude ranges as well. Either part of a range may be prefixed with a ``!`` to label that part as an
exclusion. ``!U+2600..U+267F``, ``!U+2600..!U+267F``, and ``!U+2600..!U+267F`` result in the same range:
no codepoints from U+2600 to U+267F.

**Exclusions must come after the inclusions, or else they will be overridden.**

> The order of your components matters when excluding. 
> Components after an exclusion that conflict with it *will* obsolete it, overriding it. 
> For example, ``!U+2600..U+2650,U+2600..U+26FF`` will result in the effective range of ``U+2600-26FF``.

---

## ๐Ÿ“ฆ Installation

`unirange` is available on PyPI.
It requires a Python version of **at least 3.11.0.**

To install unirange with pip, run:
```shell
python -m pip install unirange
```

### "externally-managed-environment"

This error occurs on some Linux distributions such as Fedora 38 and Ubuntu 23.04.
It can be solved by either:

1. Using a [virtual environment (venv)](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment)
2. Using [pipx](https://github.com/pypa/pipx)

---

## ๐Ÿ›  Usage
Using `unirange` is simple.

```python
>>> import unirange
>>> unirange.unirange_to_characters("A..Z")
{'G', 'D', 'I', 'K', 'X', 'J', 'V', 'O', 'H', 'C', 'A', 'B', 'Y', 'F', 'P', 'W', 'L', 'M', 'R', 'S', 'E', 'T', 'Z', 'N', 'U', 'Q'}

>>> unirange.unirange_to_characters("..0")
{'\x19', '0', '\x1c', '#', '\x14', '\x0c', '\x01', '\x0e', '\r', '\t', '+', '.', '%', '\x18', '\x15', '\x12', '\x16', '\x05', '!', '\x1b', '/', '\x17', '\x0b', '&', '\x1d', '\n', '\x1e', '\x10', '"', "'", '\x04', '\x1a', '(', ' ', '\x08', '\x07', '\x03', ')', '\x1f', '\x02', '\x13', '$', '-', '\x11', ',', '\x00', '*', '\x06', '\x0f'}

>>> unirange.unirange_to_characters("U+2600..U+26FF, !U+2610..")
{'โ˜Œ', 'โ˜', 'โ˜‚', 'โ˜‰', 'โ˜', 'โ˜‹', 'โ˜€', 'โ˜„', 'โ˜ƒ', 'โ˜ˆ', 'โ˜†', 'โ˜Š', 'โ˜‡', 'โ˜…', 'โ˜', 'โ˜Ž'}

>>> unirange.unirange_to_characters("U+2600....")
unirange.UnirangeError: Invalid unirange notation: U+2600....

>>> unirange.unirange_to_characters("U+2600..U+10000")
{'์ณ', 'ไ”ฟ', '้•”', '็ง', 'ๅ—ผ', 'ๆบณ', 'ใŸ', '๊ฑ•', '์คฟ', '์ฃ•', 'ไ‘€', '๊•€', '\ue548', '่ฑด', '์ดซ', 'ไชป', 'ไ‹ฑ', '่นพ', 'ํ‰™', '็ƒ…', '\uea1f', ...}
```

It can also be used in CLI:

```shell
$ python -m unirange U+2600..U+2610
โ˜€ โ˜ โ˜‚ โ˜ƒ โ˜„ โ˜… โ˜† โ˜‡ โ˜ˆ โ˜‰ โ˜Š โ˜‹ โ˜Œ โ˜ โ˜Ž โ˜ โ˜ 
$ python -m unirange U+2600
โ˜€ 
$ python -m unirange 'U+2600..,!U+2650..'
โ˜€ โ˜ โ˜‚ โ˜ƒ โ˜„ โ˜… โ˜† โ˜‡ โ˜ˆ โ˜‰ โ˜Š โ˜‹ โ˜Œ โ˜ โ˜Ž โ˜ โ˜ โ˜‘ โ˜’ โ˜“ โ˜” โ˜• โ˜– โ˜— โ˜˜ โ˜™ โ˜š โ˜› โ˜œ โ˜ โ˜ž โ˜Ÿ โ˜  โ˜ก โ˜ข โ˜ฃ โ˜ค โ˜ฅ โ˜ฆ โ˜ง โ˜จ โ˜ฉ โ˜ช โ˜ซ โ˜ฌ โ˜ญ โ˜ฎ โ˜ฏ โ˜ฐ โ˜ฑ โ˜ฒ โ˜ณ โ˜ด โ˜ต โ˜ถ โ˜ท โ˜ธ โ˜น โ˜บ โ˜ป โ˜ผ โ˜ฝ โ˜พ โ˜ฟ โ™€ โ™ โ™‚ โ™ƒ โ™„ โ™… โ™† โ™‡ โ™ˆ โ™‰ โ™Š โ™‹ โ™Œ โ™ โ™Ž โ™ 
```

> For some uniranges, you may need to wrap the argument in `'` or else the shell will interpret them oddly:
> ```shell
> $ python -m unirange U+2600..,!U+2650..
> bash: !U+2650..: event not found
> $ python -m unirange 'U+2600..,!U+2650..'
> # Works as expected.
> ```

---

## ๐Ÿ“ฐ Changelog

The changelog is at [CHANGELOG.md](CHANGELOG.md).

---

## ๐Ÿ“œ License

`unirange` is licensed under the [MIT license](https://spdx.org/licenses/MIT.html).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "unirange",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "",
    "keywords": "unicode,characters,range,unirange",
    "author": "WhoAteMyButter",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/2d/5b/ba2694d0a77af9ba215985a530d85a7d78523e3c35c1b3f17027627b7388/unirange-1.0.tar.gz",
    "platform": null,
    "description": "# Unirange\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Pylint](https://img.shields.io/badge/pylint-10.00-ffbf48)](https://pylint.pycqa.org/en/latest/)\n[![License](https://img.shields.io/badge/license-MIT-a51931)](https://spdx.org/licenses/MIT.html)\n[![PyPi](https://img.shields.io/pypi/v/unirange)](https://pypi.org/project/unirange/)\n[![GitLab Release (latest by SemVer)](https://img.shields.io/gitlab/v/release/46367257?sort=semver)](https://gitlab.com/whoatemybutter/unirange/-/releases)\n\nUnirange is a notation for specifying multiple Unicode codepoints.\n\nA unirange comprises comma-delimited **components**.\n\nA **part** is a notation for a single character, like ``A``, ``U+2600``, or ``0x7535``.\nIt is matched by the regular expression ``!?(?:0x|U\\+|&#x)([0-9A-F]{1,7});?|(.)``\n\nA **range** is two **parts** split by ``..`` (two dots) or ``-`` (a hyphen).\nIt is matched by the regular expression ``(?PART(?:-|\\.\\.)PART)``\n\nA **component** comprises either a **range** or a **part**.\nIt is matched by the regular expression ``(RANGE|PART)``\n\nThe full unirange notation is matched by the regular expression ``(?:COMPONENT, ?)*``\n\nExclusion can be applied to any component by prefixing it with a ``!``.\nThis will instead perform the *difference* (subtraction) on the current set of characters.\n\n---\n\n## Table of contents\n\n- [\ud83d\udcc4 About](#-about)\n- [\ud83d\udce6 Installation](#-installation)\n- [\ud83d\udee0 Usage](#-usage)\n- [\ud83d\udcf0 Changelog](#-changelog)\n- [\ud83d\udcdc License](#-license)\n\n---\n\n## \ud83d\udcc4 About\n\n### Component\n\nA component is either a *range*, or a *part*.\nThese components define what characters are included or excluded by the unirange.\n\n### Part\n\nA part is a *single* character notation.\nIn a *range*, there exist two parts, split by ``..`` or ``-``.\nIn the range ``U+2600..U+26FF``, ``U+2600`` and ``U+26FF`` are parts.\n\nParts can match any of these regular expressions:\n\n* ``U\\+.{1,6}``\n* ``&#x.{1,6}``\n* ``0x.{1,6}``\n* ``.``\n\nIf more than one character is in a part, and it is *not* prefixed, it is **invalid**.\nFor example, ``2600`` is not a valid part, but ``U+2600`` is.\n\n> There is no way to specify a codepoint in a base system other than **hexadecimal**.\n> ``&#1234`` is not valid.\n\n### Range\n\nA range is two *parts* separated by ``..`` or ``-``.\n\n#### Implied infinite expansion\n\nIf either (but not both) part of the range is absent, it is called **implied infinite expansion** *(IIE)*.\nWith IIE, the range's boundaries are implied to become to lower or upper limits of the Unicode character set.\n\nIf the first part is absent, the first part becomes U+0000.\nIf the second part is absent, it becomes U+10FFFF.\nIf both parts are absent, *the range is invalid*.\n\nThis means that the range ``U+2600..`` will result in characters from U+2600 to U+10FFFF.\nIt is semantically equivalent to ``U+2600..U+10FFFF``.\n\nThis also applies to the reverse: the range ``..U+2600`` will result in characters from U+0000 to U+2600.\nLikewise, it is equivalent to ``U+0000..U+2600``.\n\n### Exclusion\n\nTo exclude a character from being included in a resulting range, prefix a component with a ``!``.\nThis will prevent it from being included in the range, regardless of what other parts indicate.\n\nFor example, ``U+2600..U+26FF, U+2704, !U+2605`` will include the codepoints from U+2600 **up to** U+2605,\nand then from U+2606 to U+26FF, as well as U+2704.\n\nYou can exclude ranges as well. Either part of a range may be prefixed with a ``!`` to label that part as an\nexclusion. ``!U+2600..U+267F``, ``!U+2600..!U+267F``, and ``!U+2600..!U+267F`` result in the same range:\nno codepoints from U+2600 to U+267F.\n\n**Exclusions must come after the inclusions, or else they will be overridden.**\n\n> The order of your components matters when excluding. \n> Components after an exclusion that conflict with it *will* obsolete it, overriding it. \n> For example, ``!U+2600..U+2650,U+2600..U+26FF`` will result in the effective range of ``U+2600-26FF``.\n\n---\n\n## \ud83d\udce6 Installation\n\n`unirange` is available on PyPI.\nIt requires a Python version of **at least 3.11.0.**\n\nTo install unirange with pip, run:\n```shell\npython -m pip install unirange\n```\n\n### \"externally-managed-environment\"\n\nThis error occurs on some Linux distributions such as Fedora 38 and Ubuntu 23.04.\nIt can be solved by either:\n\n1. Using a [virtual environment (venv)](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment)\n2. Using [pipx](https://github.com/pypa/pipx)\n\n---\n\n## \ud83d\udee0 Usage\nUsing `unirange` is simple.\n\n```python\n>>> import unirange\n>>> unirange.unirange_to_characters(\"A..Z\")\n{'G', 'D', 'I', 'K', 'X', 'J', 'V', 'O', 'H', 'C', 'A', 'B', 'Y', 'F', 'P', 'W', 'L', 'M', 'R', 'S', 'E', 'T', 'Z', 'N', 'U', 'Q'}\n\n>>> unirange.unirange_to_characters(\"..0\")\n{'\\x19', '0', '\\x1c', '#', '\\x14', '\\x0c', '\\x01', '\\x0e', '\\r', '\\t', '+', '.', '%', '\\x18', '\\x15', '\\x12', '\\x16', '\\x05', '!', '\\x1b', '/', '\\x17', '\\x0b', '&', '\\x1d', '\\n', '\\x1e', '\\x10', '\"', \"'\", '\\x04', '\\x1a', '(', ' ', '\\x08', '\\x07', '\\x03', ')', '\\x1f', '\\x02', '\\x13', '$', '-', '\\x11', ',', '\\x00', '*', '\\x06', '\\x0f'}\n\n>>> unirange.unirange_to_characters(\"U+2600..U+26FF, !U+2610..\")\n{'\u260c', '\u260d', '\u2602', '\u2609', '\u260f', '\u260b', '\u2600', '\u2604', '\u2603', '\u2608', '\u2606', '\u260a', '\u2607', '\u2605', '\u2601', '\u260e'}\n\n>>> unirange.unirange_to_characters(\"U+2600....\")\nunirange.UnirangeError: Invalid unirange notation: U+2600....\n\n>>> unirange.unirange_to_characters(\"U+2600..U+10000\")\n{'\uc3f3', '\u453f', '\u9554', '\u79cd', '\u55fc', '\u6eb3', '\u37cf', '\uac55', '\uc93f', '\uc8d5', '\u4440', '\ua540', '\\ue548', '\u8c74', '\ucd2b', '\u4abb', '\u42f1', '\u8e7e', '\ud259', '\u70c5', '\\uea1f', ...}\n```\n\nIt can also be used in CLI:\n\n```shell\n$ python -m unirange U+2600..U+2610\n\u2600 \u2601 \u2602 \u2603 \u2604 \u2605 \u2606 \u2607 \u2608 \u2609 \u260a \u260b \u260c \u260d \u260e \u260f \u2610 \n$ python -m unirange U+2600\n\u2600 \n$ python -m unirange 'U+2600..,!U+2650..'\n\u2600 \u2601 \u2602 \u2603 \u2604 \u2605 \u2606 \u2607 \u2608 \u2609 \u260a \u260b \u260c \u260d \u260e \u260f \u2610 \u2611 \u2612 \u2613 \u2614 \u2615 \u2616 \u2617 \u2618 \u2619 \u261a \u261b \u261c \u261d \u261e \u261f \u2620 \u2621 \u2622 \u2623 \u2624 \u2625 \u2626 \u2627 \u2628 \u2629 \u262a \u262b \u262c \u262d \u262e \u262f \u2630 \u2631 \u2632 \u2633 \u2634 \u2635 \u2636 \u2637 \u2638 \u2639 \u263a \u263b \u263c \u263d \u263e \u263f \u2640 \u2641 \u2642 \u2643 \u2644 \u2645 \u2646 \u2647 \u2648 \u2649 \u264a \u264b \u264c \u264d \u264e \u264f \n```\n\n> For some uniranges, you may need to wrap the argument in `'` or else the shell will interpret them oddly:\n> ```shell\n> $ python -m unirange U+2600..,!U+2650..\n> bash: !U+2650..: event not found\n> $ python -m unirange 'U+2600..,!U+2650..'\n> # Works as expected.\n> ```\n\n---\n\n## \ud83d\udcf0 Changelog\n\nThe changelog is at [CHANGELOG.md](CHANGELOG.md).\n\n---\n\n## \ud83d\udcdc License\n\n`unirange` is licensed under the [MIT license](https://spdx.org/licenses/MIT.html).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Unirange is a notation for specifying multiple Unicode codepoints.",
    "version": "1.0",
    "project_urls": {
        "Changelog": "https://gitlab.com/whoatemybutter/unirange/-/blob/master/CHANGELOG.md",
        "Issues": "https://gitlab.com/whoatemybutter/unirange/-/issues",
        "Source": "https://gitlab.com/whoatemybutter/unirange"
    },
    "split_keywords": [
        "unicode",
        "characters",
        "range",
        "unirange"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c5229a481eb96ac2d653a37b4b6c5130f2864f8fc3c25ec9b97d353517ed1cf",
                "md5": "9e3a019fa9f7fcfebde5be352fd371c1",
                "sha256": "4bfb7187cf6764a3c0de8722d00616f3c02a49f27de03f880414e1bacf2845ae"
            },
            "downloads": -1,
            "filename": "unirange-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9e3a019fa9f7fcfebde5be352fd371c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 8555,
            "upload_time": "2023-05-28T23:53:39",
            "upload_time_iso_8601": "2023-05-28T23:53:39.301698Z",
            "url": "https://files.pythonhosted.org/packages/8c/52/29a481eb96ac2d653a37b4b6c5130f2864f8fc3c25ec9b97d353517ed1cf/unirange-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2d5bba2694d0a77af9ba215985a530d85a7d78523e3c35c1b3f17027627b7388",
                "md5": "a932e35b3d017369d2cb726c14afb501",
                "sha256": "9ac369421a7d17726d991e09e7b4f9c3ee40bc2c58c24211e91197d0ebd904a9"
            },
            "downloads": -1,
            "filename": "unirange-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a932e35b3d017369d2cb726c14afb501",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 9690,
            "upload_time": "2023-05-28T23:53:41",
            "upload_time_iso_8601": "2023-05-28T23:53:41.300425Z",
            "url": "https://files.pythonhosted.org/packages/2d/5b/ba2694d0a77af9ba215985a530d85a7d78523e3c35c1b3f17027627b7388/unirange-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-28 23:53:41",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "whoatemybutter",
    "gitlab_project": "unirange",
    "lcname": "unirange"
}
        
Elapsed time: 0.07746s