holms


Nameholms JSON
Version 1.6.0 PyPI version JSON
download
home_pageNone
SummaryText to Unicode code points breakdown
upload_time2024-08-06 23:22:24
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords analyzer breakdown console terminal text unicode
VCS
bugtrack_url
requirements click es7s.commons pytermor
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">
   <!-- es7s/holms -->
   <a href="##"><img align="left" src="https://s3.eu-north-1.amazonaws.com/dp2.dl/readme/es7s/holms/logo.png?v=2" width="160" height="64"></a>
   <a href="##"><img align="center" src="https://s3.eu-north-1.amazonaws.com/dp2.dl/readme/es7s/holms/label.png" width="200" height="64"></a>
   <a href="##"><img align="right" src="https://s3.eu-north-1.amazonaws.com/dp2.dl/readme/empty.png" width="160" height="64"></a>
</h1>
<div align="right">
 <a href="##"><img src="https://img.shields.io/badge/python-3.10-3776AB?logo=python&logoColor=white&labelColor=333333"></a>
 <a href="https://pepy.tech/project/holms/"><img alt="Downloads" src="https://pepy.tech/badge/holms"></a>
 <a href="https://pypi.org/project/holms/"><img alt="PyPI" src="https://img.shields.io/pypi/v/holms"></a>
 <a href='https://coveralls.io/github/es7s/holms?branch=master'><img src='https://coveralls.io/repos/github/es7s/holms/badge.svg?branch=master' alt='Coverage Status' /></a>
 <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
 <a href="##"><img src="https://wakatime.com/badge/user/8eb9e217-791b-436f-b729-81eb63e84b08/project/018b5923-4968-4029-ae8d-3776792f88d5.svg"></a>
</div>
<br>

CLI UTF-8 decomposer for text analysis capable of displaying Unicode code point
names and categories, along with ASCII control characters, UTF-16 surrogate pair
pieces, invalid UTF-8 sequences parts as separate bytes, etc.


Motivation
---------------------------

A necessity for a tool that can quickly identify otherwise indistinguishable
Unicode code points.


Installation
---------------------------
### With `pipx` (recommended)
    pipx install holms

### From git repository
    curl -sS https://github.com/es7s/holms/blob/master/install.sh | sh


Basic usage
---------------------------

    Usage: holms run [OPTIONS] [INPUT]
    
      Read data from INPUT file, find all valid UTF-8 byte sequences, decode them and display as
      separate Unicode code points. Use '-' as INPUT to read from stdin instead.

<div align="center">
  <img alt="example001" width="49%" src="https://github.com/es7s/holms/assets/50381946/a9c9bcdd-42d5-4038-a23a-22b91bb7cc7d">
  <img alt="example004" width="49%" src="https://github.com/es7s/holms/assets/50381946/fd1b4bc3-aacc-42af-8442-2db3c3984a13">
  <img alt="example002" width="49%" src="https://github.com/es7s/holms/assets/50381946/0a126747-3b29-44da-9d94-ab5f01a63d68">
  <img alt="example003" width="49%" src="https://github.com/es7s/holms/assets/50381946/8e217ae3-325c-4629-8cda-389882667aa4">
</div>

<details>
   <summary>Plain text output</summary>
   <!-- @sub:example001.png.txt -->

      > holms run  -u - <<<'1₂³⅘↉⏨'
    
      0  U+  31 ▕ 1 ▏ Nd DIGIT ONE
      1  U+2082 ▕ ₂ ▏ No SUBSCRIPT TWO
      4  U+  B3 ▕ ³ ▏ No SUPERSCRIPT THREE
      6  U+2158 ▕ ⅘ ▏ No VULGAR FRACTION FOUR FIFTHS
      9  U+2189 ▕ ↉ ▏ No VULGAR FRACTION ZERO THIRDS
      c  U+23E8 ▕ ⏨ ▏ So DECIMAL EXPONENT SYMBOL

   <!-- @sub -->
   <!-- @sub:example004.png.txt -->

      > holms run  -u - <<<'🌯👄🤡🎈🐳🐍'
    
      00  U1F32F ▕🌯 ▏ So BURRITO
      04  U1F444 ▕👄 ▏ So MOUTH
      08  U1F921 ▕🤡 ▏ So CLOWN FACE
      0c  U1F388 ▕🎈 ▏ So BALLOON
      10  U1F433 ▕🐳 ▏ So SPOUTING WHALE
      14  U1F40D ▕🐍 ▏ So SNAKE

   <!-- @sub -->
   <!-- @sub:example002.png.txt -->

      > holms run  -u - <<<'aаͣāãâȧäåₐᵃa'
    
      00  U+  61 ▕ a ▏ Ll LATIN SMALL LETTER A
      01  U+ 430 ▕ а ▏ Ll CYRILLIC SMALL LETTER A
      03  U+ 363 ▕  ͣ ▏ Mn COMBINING LATIN SMALL LETTER A
      05  U+ 101 ▕ ā ▏ Ll LATIN SMALL LETTER A WITH MACRON
      07  U+  E3 ▕ ã ▏ Ll LATIN SMALL LETTER A WITH TILDE
      09  U+  E2 ▕ â ▏ Ll LATIN SMALL LETTER A WITH CIRCUMFLEX
      0b  U+ 227 ▕ ȧ ▏ Ll LATIN SMALL LETTER A WITH DOT ABOVE
      0d  U+  E4 ▕ ä ▏ Ll LATIN SMALL LETTER A WITH DIAERESIS
      0f  U+  E5 ▕ å ▏ Ll LATIN SMALL LETTER A WITH RING ABOVE
      11  U+2090 ▕ ₐ ▏ Lm LATIN SUBSCRIPT SMALL LETTER A
      14  U+1D43 ▕ ᵃ ▏ Lm MODIFIER LETTER SMALL A
      17  U+FF41 ▕a ▏ Ll FULLWIDTH LATIN SMALL LETTER A

   <!-- @sub -->
   <!-- @sub:example003.png.txt -->

      > holms run  -u - <<<'%‰∞8᪲?¿‽⚠⚠️'
    
      00  U+  25 ▕ % ▏ Po PERCENT SIGN
      01  U+2030 ▕ ‰ ▏ Po PER MILLE SIGN
      04  U+221E ▕ ∞ ▏ Sm INFINITY
      07  U+  38 ▕ 8 ▏ Nd DIGIT EIGHT
      08  U+1AB2 ▕  ᪲ ▏ Mn COMBINING INFINITY
      0b  U+  3F ▕ ? ▏ Po QUESTION MARK
      0c  U+  BF ▕ ¿ ▏ Po INVERTED QUESTION MARK
      0e  U+203D ▕ ‽ ▏ Po INTERROBANG
      11  U+26A0 ▕ ⚠ ▏ So WARNING SIGN
      14  U+26A0 ▕ ⚠ ▏ So WARNING SIGN
      17  U+FE0F ▕  ️ ▏ Mn VARIATION SELECTOR-16

   <!-- @sub -->
</details> 


Buffering
---------------------------------

The application works in two modes: **buffered** (the default if INPUT is a
file) and **unbuffered** (default when reading from stdin). Options `-b`/`-u`
explicitly override output mode regardless of the default setting.

In **buffered** mode the result begins to appear only after EOF is encountered
(i.e., the WHOLE file has been read to the buffer). This is suitable for short
and predictable inputs and produces the most compact output with fixed column
sizes.

The **unbuffered** mode comes in handy when input is an endless piped stream:
the results will be displayed in real time, as soon as the type of each byte
sequence is determined, but the output column widths are not fixed and can vary
as the process goes further.

> Despite the name, the app actually uses tiny (4 bytes) input buffer, but it's
> the only way to handle UTF-8 stream and distinguish valid sequences from broken
> ones; in truly unbuffered mode the output would consist of ASCII-7 characters
> (`0x00`-`0x7F`) and unrecognized binary data (`0x80`-`0xFF`) only, which is not
> something the application was made for.


Configuration / Advanced usage
----------------------------------
[//]: # (@sub:help.txt)

    Options:
      -b, --buffered / -u, --unbuffered
                            Explicitly set to wait for EOF before processing the
                            output (buffered), or to stream the results in
                            parallel with reading, as soon as possible
                            (unbuffered). See BUFFERING section above for the
                            details.
      -m, --merge           Replace all sequences of repeating characters with one
                            of each, together with initial length of the sequence.
      -g, --group           Group the input by code points (=count unique), sort
                            descending and display counts instead of normal
                            output. Implies '--merge' and forces buffered ('-b')
                            mode. Specifying the option twice ('-gg') results in
                            grouping by code point category instead, while doing
                            it thrice ('-ggg') makes the app group the input by
                            super categories.
      -f, --format          Comma-separated list of columns to show (order is
                            preserved). Run 'holms format' to see the details.
      -n, --names           Display names instead of abbreviations. Affects `cat`
                            and `block` columns, but only if column in question is
                            already present on the screen. Note that these columns
                            can still display only the beginning of the attribute,
                            unless '-r' is provided.
      -a, --all             Display ALL columns.
      -r, --rigid           By default some columns can be compressed beyond the
                            nominal width, if all current values fit and there is
                            still space left. This option disables column
                            shrinking (but they still will be expanded when
                            needed).
      --decimal             Use decimal byte offsets instead of hexadecimal.
      --alt                 Use alternative notation for control characters: caret
                            notation for ASCII C0, octal notation for ASCII C1.
      --oneline             Discard all newline characters (0x0a LINE FEED) from
                            the input.
      --no-table            Do not format results as a table, just apply the
                            colors to characters (equivalent to '-f char', implies
                            '-b'). Compatible with '-merge', '--format' and even '
                            --group'.
      --no-override         Do not replace control/whitespace code point markers
                            with distinguishable characters ('▯' to '↵', '␣' etc).
                            Run 'holms legend' to see the details.
      -?, --help            Show this message and exit.

[//]: # (@sub)

Examples
--------------------------

### Output column selection

Option `-f`/`--filter` can be used to specify what columns to display. As an
alternative, there is an `-a`/`--all` option that enables displaying of all
currently available columns.

<details>
  <summary><b>Column availability depending on operating mode</b></summary>

  <div align="center">
    <img alt="example010" src="https://github.com/es7s/holms/assets/50381946/62a6f354-1f30-4ee8-a8fc-533b1a980e03">
  </div>
</details>

Also `-m`/`--merge` option is demonstrated, which tells the app to collapse
repetitive characters into one line of the output while counting them:

<div align="center">
  <img alt="example005" src="https://github.com/es7s/holms/assets/50381946/6da31546-0e50-4fa0-af69-0b7a8ed5d4c3">
</div>

<details>
   <summary>Plain text output</summary>
   <!-- @sub:example005.png.txt -->

      > holms run -m  phpstan.txt
    
      000  U+2B ▕ + ▏ Sm     PLUS SIGN
      001+ U+2D ▕ - ▏ Pd 27× HYPHEN-MINUS
      01c  U+2B ▕ + ▏ Sm     PLUS SIGN
      01d  U+20 ▕ ␣ ▏ Zs     SPACE
      01e  U+2B ▕ + ▏ Sm     PLUS SIGN
      01f+ U+2D ▕ - ▏ Pd 27× HYPHEN-MINUS
      03a  U+2B ▕ + ▏ Sm     PLUS SIGN
      03b  U+ A ▕ ↵ ▏ Cc     ASCII C0 [LF] LINE FEED
      03c  U+7C ▕ | ▏ Sm     VERTICAL LINE
      03d+ U+20 ▕ ␣ ▏ Zs 27× SPACE
     ...

   <!-- @sub -->
</details>

### Reading from pipeline

There is an official Unicode Consortium data file included in the repository for
test purposes, named [confusables.txt](tests/data/confusables.txt). In the next
example we extract line **#3620** using `sed`, delete all TAB (`0x08`) characters
and feed the result to the application. The result demonstrates various Unicode
dot/bullet code points:

<div align="center">
    <img alt="example006" src="https://github.com/es7s/holms/assets/50381946/78a90c45-d331-46d9-998e-20c6c9a97f12">
</div>

<details>
   <summary>Plain text output</summary>
   <!-- @sub:example006.png.txt -->

      > sed confusables.txt -Ee 'sg' -e '3620!d' |
        holms run  -
    
      00  U+  B7 ▕ · ▏ Po MIDDLE DOT
      02  U+1427 ▕ ᐧ ▏ Lo CANADIAN SYLLABICS FINAL MIDDLE DOT
      05  U+ 387 ▕ · ▏ Po GREEK ANO TELEIA
      07  U+2022 ▕ • ▏ Po BULLET
      0a  U+2027 ▕ ‧ ▏ Po HYPHENATION POINT
      0d  U+2219 ▕ ∙ ▏ Sm BULLET OPERATOR
      10  U+22C5 ▕ ⋅ ▏ Sm DOT OPERATOR
      13  U+30FB ▕・ ▏ Po KATAKANA MIDDLE DOT
      16  U10101 ▕ 𐄁 ▏ Po AEGEAN WORD SEPARATOR DOT
      1a  U+FF65 ▕ ・ ▏ Po HALFWIDTH KATAKANA MIDDLE DOT
      1d  U+   A ▕ ↵ ▏ Cc ASCII C0 [LF] LINE FEED

   <!-- @sub -->
</details>

### Code points / categories statistics

`-g`/`--group` option can be used to count unique code points, and to compute
the occurrence rate of each one:

<div align="center">
  <img alt="example008" src="https://github.com/es7s/holms/assets/50381946/f89be555-cf7e-4766-90b2-61a02140c54e">
</div>

<details>
   <summary>Plain text output</summary>
   <!-- @sub:example008.png.txt -->

      > holms run -g  ./tests/data/confusables.txt
    
     U+  20 ▕ ␣ ▏ Zs  12.5% ███ 62732× SPACE
     U+   9 ▕ ⇥ ▏ Cc   7.3% █▊  36745× ASCII C0 [HT] HORIZONTAL TABULATION
     U+  41 ▕ A ▏ Lu   6.1% █▍  30555× LATIN CAPITAL LETTER A
     U+  49 ▕ I ▏ Lu   5.2% █▏  26063× LATIN CAPITAL LETTER I
     U+  45 ▕ E ▏ Lu   5.0% █▏  24992× LATIN CAPITAL LETTER E
     U+  54 ▕ T ▏ Lu   3.7% ▉   18776× LATIN CAPITAL LETTER T
     U+  4C ▕ L ▏ Lu   3.7% ▉   18763× LATIN CAPITAL LETTER L
     U+200E ▕ ▯ ▏ Cf   3.7% ▉   18494× LEFT-TO-RIGHT MARK
     U+   A ▕ ↵ ▏ Cc   2.9% ▋   14609× ASCII C0 [LF] LINE FEED
     U+  43 ▕ C ▏ Lu   2.9% ▋   14450× LATIN CAPITAL LETTER C
     ...

   <!-- @sub -->
</details>

When used twice (`-gg`) or thrice (`-ggg`), the application groups the input by
code point category or code point super category, respectively, which can be used
e.g. for frequency domain analysis:

<div align="center">
  <img alt="example011" src="https://github.com/es7s/holms/assets/50381946/18018b0c-7978-48aa-b3be-4923167bb425">
  <img alt="example012" src="https://github.com/es7s/holms/assets/50381946/1128d864-aad9-4203-ae9c-af2ea0f3ad9f">
</div>

<details>
   <summary>Plain text output</summary>
   <!-- @sub:example011.png.txt -->

      > holms run -gg  ./tests/data/confusables.txt
    
      53.1% ██████████ 266233×  Uppercase_Letter
      12.5% ██▎         62748×  Space_Separator
      10.2% █▉          51356×  Control
       8.5% █▌          42511×  Decimal_Number
       3.7% ▋           18497×  Format
       3.0% ▌           14832×  Other_Letter
       2.0% ▎            9778×  Math_Symbol
       1.8% ▎            9261×  Close_Punctuation
       1.8% ▎            9259×  Open_Punctuation
       1.5% ▎            7525×  Other_Punctuation
     ...

   <!-- @sub -->
   <!-- @sub:example012.png.txt -->

      > holms run -ggg  ./tests/data/confusables.txt
    
      56.7% ██████████ 284074×  Letter
      13.9% ██▍         69853×  Other(C)
      12.5% ██▏         62750×  Separator(Z)
       8.5% █▌          42796×  Number
       5.9% █           29571×  Punctuation
       2.2% ▍           11072×  Symbol
       0.2% ▏             965×  Mark

   <!-- @sub -->
</details>

### In-place type highlighting

When `--format` is specified exactly as a single `char` column: `--format=char`,
the application omits all the columns and prints the original file contents,
while highligting each character with a color that indicates its' Unicode
category. 

> Note that ASCII control codes, as well as Unicode ones, are kept
untouched and invisible.

<div align="center">
  <img alt="example007" src="https://github.com/es7s/holms/assets/50381946/78ca318c-e295-41ff-b37d-d45d95842295">
</div>

<details>
   <summary>Plain text output</summary>
   <!-- @sub:example007.png.txt -->

      > sed chars.txt -nEe 1,12p |
        holms run --format=char  -
    
       ! " # $ % & ' ( ) * + , - . /
     0 1 2 3 4 5 6 7 8 9 : ; < = > ?
     @ A B C D E F G H I J K L M N O
     P Q R S T U V W X Y Z [ \ ] ^ _
     ` a b c d e f g h i j k l m n o
     p q r s t u v w x y z { | } ~
       ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
     ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
     À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
     Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
     à á â ã ä å æ ç è é ê ë ì í î ï
     ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

   <!-- @sub -->
</details>


ASCII latin letters (`A-Za-z`) are colored in 50% gray color instead of regular
white on purpose — this can be extremely helpful when the task is to find
non-ASCII character(s) in an massive text of plain ASCII ones, or vice versa.

Below is a real example of broken characters which are the result of two
operations being applied in the wrong order: *UTF-8 decoding* and *URL %-based
unescaping*. This error is different from incorrect codepage selection errors,
which mess up the whole text or a part of it; all byte sequences are valid UTF-8
encoded code points, but the result differs from the origin and is completely 
unreadable nevertheless.

<div align="center">
  <img alt="example015" src="https://github.com/es7s/holms/assets/50381946/738b5bbe-291f-4ade-bf97-66c1e8368281">
</div>


### ASCII C0 / C1 details

While developing the application I encountered strange (as it seemed to be at
the beginning) behaviour of Python interpreter, which encoded C1 control bytes
as two bytes of UTF-8, while C0 control bytes were displayed as sole bytes, like
it would have been encoded in a plain ASCII. Then there was a bit of researching
done.

According to [ISO/IEC 6429 (ECMA-48)](https://www.iso.org/standard/12782.html),
there are two types of ASCII control codes (to be precise, much more, but for
our purposes it's mostly irrelevant) — C0 and C1. The first one includes ASCII
code points `0x00`-`0x1F` and `0x7F` (some authors also include a regular space
character `0x20` in this list), and the characteristic property of this type is
that all C0 code points are encoded in UTF-8 **exactly the same** as they do in
7-bit US-ASCII ([ISO/IEC 646](https://www.iso.org/standard/4777.html)). This
helps to disambiguate exactly what type of encoding is used even for broken byte
sequences, considering the task is to tell if a byte represents sole code point
or is actually a part of multibyte UTF-8 sequence.

However, C1 control codes are represented by `0x80`-`0x9F` bytes, which also are
valid bytes for multibyte UTF-8 sequences. In order to distinguish the first
type from the second UTF-8 encodes them as two-byte sequences instead (`0x80` →
`0xC280`, etc.); also this applies not only to control codes, but to all other
[ISO/IEC 8859](https://www.iso.org/standard/28245.html) code points starting
from `0x80`.

With this in mind, let's see how the application reflects these differences.
First command produces several 8-bit ASCII C1 control codes, which are
classified as raw binary/non-UTF-8 data, while the second command's output
consists of the very same code points but being encoded in UTF-8 (thanks to
Python's full transparent Unicode support, we don't even need to bother much
about the encodings and such):

<div align="center">
  <img alt="example013" src="https://github.com/es7s/holms/assets/50381946/884d3269-6323-41f1-9eab-6dccd83c5d6d">
</div>

<details>
   <summary>Plain text output</summary>
   <!-- @sub:example013.png.txt -->

      > printf "\x80\x90\x9f" && python3 -c 'print("\x80\x90\x9f", end="")' |
        holms run --names --decimal --all  -
    
     ⏨0  #0   0x    80  --  ▕ ▯ ▏ NON UTF-8 BYTE 0x80                                      -- Binary
     ⏨1  #1   0x    90  --  ▕ ▯ ▏ NON UTF-8 BYTE 0x90                                      -- Binary
     ⏨2  #2   0x    9f  --  ▕ ▯ ▏ NON UTF-8 BYTE 0x9F                                      -- Binary
    
     ⏨3  #3   0x c2 80 U+80 ▕ ▯ ▏ ASCII C1 [PC] PADDING CHARACTER            Latin-1 Supplem‥ Control
     ⏨5  #4   0x c2 90 U+90 ▕ ▯ ▏ ASCII C1 [DCS] DEVICE CONTROL STRING       Latin-1 Supplem‥ Control
     ⏨7  #5   0x c2 9f U+9F ▕ ▯ ▏ ASCII C1 [APC] APPLICATION PROGRAM COMMAND Latin-1 Supplem‥ Control

   <!-- @sub -->
</details>

Legend
------------------

The image below illustrates the color scheme developed for the app specifically,
to simplify distinguishing code points of one category from others.

<div align="center">
  <img alt="example009" src="https://github.com/es7s/holms/assets/50381946/f9cac3b0-adab-45a3-a324-174ad7f06d44">
</div>

Most frequently encountering control codes also have a unique character
replacements, which allows to recognize them without reading the label or
memorizing code point identifiers:

<div align="center">
  <img alt="example014" src="https://github.com/es7s/holms/assets/50381946/2b77d06a-5e3d-4837-973c-78454e687113">
</div>

<details>
<summary><b>Unicode Blocks</b></summary>
    <div align="center">
            <img alt="blocks" src="https://github.com/es7s/holms/assets/50381946/8244553b-fc2d-419e-8b11-388ed0738bad"/>
    </div>
</details>

Changelog
------------------

[CHANGES.rst](CHANGES.rst)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "holms",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "analyzer, breakdown, console, terminal, text, unicode",
    "author": null,
    "author_email": "Aleksandr Shavykin <0.delameter@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/8f/c0/dde00bb1644b0989641551f572dd6fd08be1e5f81016009070397fa11c60/holms-1.6.0.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n   <!-- es7s/holms -->\n   <a href=\"##\"><img align=\"left\" src=\"https://s3.eu-north-1.amazonaws.com/dp2.dl/readme/es7s/holms/logo.png?v=2\" width=\"160\" height=\"64\"></a>\n   <a href=\"##\"><img align=\"center\" src=\"https://s3.eu-north-1.amazonaws.com/dp2.dl/readme/es7s/holms/label.png\" width=\"200\" height=\"64\"></a>\n   <a href=\"##\"><img align=\"right\" src=\"https://s3.eu-north-1.amazonaws.com/dp2.dl/readme/empty.png\" width=\"160\" height=\"64\"></a>\n</h1>\n<div align=\"right\">\n <a href=\"##\"><img src=\"https://img.shields.io/badge/python-3.10-3776AB?logo=python&logoColor=white&labelColor=333333\"></a>\n <a href=\"https://pepy.tech/project/holms/\"><img alt=\"Downloads\" src=\"https://pepy.tech/badge/holms\"></a>\n <a href=\"https://pypi.org/project/holms/\"><img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/holms\"></a>\n <a href='https://coveralls.io/github/es7s/holms?branch=master'><img src='https://coveralls.io/repos/github/es7s/holms/badge.svg?branch=master' alt='Coverage Status' /></a>\n <a href=\"https://github.com/psf/black\"><img alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"></a>\n <a href=\"##\"><img src=\"https://wakatime.com/badge/user/8eb9e217-791b-436f-b729-81eb63e84b08/project/018b5923-4968-4029-ae8d-3776792f88d5.svg\"></a>\n</div>\n<br>\n\nCLI UTF-8 decomposer for text analysis capable of displaying Unicode code point\nnames and categories, along with ASCII control characters, UTF-16 surrogate pair\npieces, invalid UTF-8 sequences parts as separate bytes, etc.\n\n\nMotivation\n---------------------------\n\nA necessity for a tool that can quickly identify otherwise indistinguishable\nUnicode code points.\n\n\nInstallation\n---------------------------\n### With `pipx` (recommended)\n    pipx install holms\n\n### From git repository\n    curl -sS https://github.com/es7s/holms/blob/master/install.sh | sh\n\n\nBasic usage\n---------------------------\n\n    Usage: holms run [OPTIONS] [INPUT]\n    \n      Read data from INPUT file, find all valid UTF-8 byte sequences, decode them and display as\n      separate Unicode code points. Use '-' as INPUT to read from stdin instead.\n\n<div align=\"center\">\n  <img alt=\"example001\" width=\"49%\" src=\"https://github.com/es7s/holms/assets/50381946/a9c9bcdd-42d5-4038-a23a-22b91bb7cc7d\">\n  <img alt=\"example004\" width=\"49%\" src=\"https://github.com/es7s/holms/assets/50381946/fd1b4bc3-aacc-42af-8442-2db3c3984a13\">\n  <img alt=\"example002\" width=\"49%\" src=\"https://github.com/es7s/holms/assets/50381946/0a126747-3b29-44da-9d94-ab5f01a63d68\">\n  <img alt=\"example003\" width=\"49%\" src=\"https://github.com/es7s/holms/assets/50381946/8e217ae3-325c-4629-8cda-389882667aa4\">\n</div>\n\n<details>\n   <summary>Plain text output</summary>\n   <!-- @sub:example001.png.txt -->\n\n      > holms run  -u - <<<'1\u2082\u00b3\u2158\u2189\u23e8'\n    \n      0  U+  31 \u2595 1 \u258f Nd DIGIT ONE\n      1  U+2082 \u2595 \u2082 \u258f No SUBSCRIPT TWO\n      4  U+  B3 \u2595 \u00b3 \u258f No SUPERSCRIPT THREE\n      6  U+2158 \u2595 \u2158 \u258f No VULGAR FRACTION FOUR FIFTHS\n      9  U+2189 \u2595 \u2189 \u258f No VULGAR FRACTION ZERO THIRDS\n      c  U+23E8 \u2595 \u23e8 \u258f So DECIMAL EXPONENT SYMBOL\n\n   <!-- @sub -->\n   <!-- @sub:example004.png.txt -->\n\n      > holms run  -u - <<<'\ud83c\udf2f\ud83d\udc44\ud83e\udd21\ud83c\udf88\ud83d\udc33\ud83d\udc0d'\n    \n      00  U1F32F \u2595\ud83c\udf2f \u258f So BURRITO\n      04  U1F444 \u2595\ud83d\udc44 \u258f So MOUTH\n      08  U1F921 \u2595\ud83e\udd21 \u258f So CLOWN FACE\n      0c  U1F388 \u2595\ud83c\udf88 \u258f So BALLOON\n      10  U1F433 \u2595\ud83d\udc33 \u258f So SPOUTING WHALE\n      14  U1F40D \u2595\ud83d\udc0d \u258f So SNAKE\n\n   <!-- @sub -->\n   <!-- @sub:example002.png.txt -->\n\n      > holms run  -u - <<<'a\u0430\u0363\u0101\u00e3\u00e2\u0227\u00e4\u00e5\u2090\u1d43\uff41'\n    \n      00  U+  61 \u2595 a \u258f Ll LATIN SMALL LETTER A\n      01  U+ 430 \u2595 \u0430 \u258f Ll CYRILLIC SMALL LETTER A\n      03  U+ 363 \u2595  \u0363 \u258f Mn COMBINING LATIN SMALL LETTER A\n      05  U+ 101 \u2595 \u0101 \u258f Ll LATIN SMALL LETTER A WITH MACRON\n      07  U+  E3 \u2595 \u00e3 \u258f Ll LATIN SMALL LETTER A WITH TILDE\n      09  U+  E2 \u2595 \u00e2 \u258f Ll LATIN SMALL LETTER A WITH CIRCUMFLEX\n      0b  U+ 227 \u2595 \u0227 \u258f Ll LATIN SMALL LETTER A WITH DOT ABOVE\n      0d  U+  E4 \u2595 \u00e4 \u258f Ll LATIN SMALL LETTER A WITH DIAERESIS\n      0f  U+  E5 \u2595 \u00e5 \u258f Ll LATIN SMALL LETTER A WITH RING ABOVE\n      11  U+2090 \u2595 \u2090 \u258f Lm LATIN SUBSCRIPT SMALL LETTER A\n      14  U+1D43 \u2595 \u1d43 \u258f Lm MODIFIER LETTER SMALL A\n      17  U+FF41 \u2595\uff41 \u258f Ll FULLWIDTH LATIN SMALL LETTER A\n\n   <!-- @sub -->\n   <!-- @sub:example003.png.txt -->\n\n      > holms run  -u - <<<'%\u2030\u221e8\u1ab2?\u00bf\u203d\u26a0\u26a0\ufe0f'\n    \n      00  U+  25 \u2595 % \u258f Po PERCENT SIGN\n      01  U+2030 \u2595 \u2030 \u258f Po PER MILLE SIGN\n      04  U+221E \u2595 \u221e \u258f Sm INFINITY\n      07  U+  38 \u2595 8 \u258f Nd DIGIT EIGHT\n      08  U+1AB2 \u2595  \u1ab2 \u258f Mn COMBINING INFINITY\n      0b  U+  3F \u2595 ? \u258f Po QUESTION MARK\n      0c  U+  BF \u2595 \u00bf \u258f Po INVERTED QUESTION MARK\n      0e  U+203D \u2595 \u203d \u258f Po INTERROBANG\n      11  U+26A0 \u2595 \u26a0 \u258f So WARNING SIGN\n      14  U+26A0 \u2595 \u26a0 \u258f So WARNING SIGN\n      17  U+FE0F \u2595  \ufe0f \u258f Mn VARIATION SELECTOR-16\n\n   <!-- @sub -->\n</details> \n\n\nBuffering\n---------------------------------\n\nThe application works in two modes: **buffered** (the default if INPUT is a\nfile) and **unbuffered** (default when reading from stdin). Options `-b`/`-u`\nexplicitly override output mode regardless of the default setting.\n\nIn **buffered** mode the result begins to appear only after EOF is encountered\n(i.e., the WHOLE file has been read to the buffer). This is suitable for short\nand predictable inputs and produces the most compact output with fixed column\nsizes.\n\nThe **unbuffered** mode comes in handy when input is an endless piped stream:\nthe results will be displayed in real time, as soon as the type of each byte\nsequence is determined, but the output column widths are not fixed and can vary\nas the process goes further.\n\n> Despite the name, the app actually uses tiny (4 bytes) input buffer, but it's\n> the only way to handle UTF-8 stream and distinguish valid sequences from broken\n> ones; in truly unbuffered mode the output would consist of ASCII-7 characters\n> (`0x00`-`0x7F`) and unrecognized binary data (`0x80`-`0xFF`) only, which is not\n> something the application was made for.\n\n\nConfiguration / Advanced usage\n----------------------------------\n[//]: # (@sub:help.txt)\n\n    Options:\n      -b, --buffered / -u, --unbuffered\n                            Explicitly set to wait for EOF before processing the\n                            output (buffered), or to stream the results in\n                            parallel with reading, as soon as possible\n                            (unbuffered). See BUFFERING section above for the\n                            details.\n      -m, --merge           Replace all sequences of repeating characters with one\n                            of each, together with initial length of the sequence.\n      -g, --group           Group the input by code points (=count unique), sort\n                            descending and display counts instead of normal\n                            output. Implies '--merge' and forces buffered ('-b')\n                            mode. Specifying the option twice ('-gg') results in\n                            grouping by code point category instead, while doing\n                            it thrice ('-ggg') makes the app group the input by\n                            super categories.\n      -f, --format          Comma-separated list of columns to show (order is\n                            preserved). Run 'holms format' to see the details.\n      -n, --names           Display names instead of abbreviations. Affects `cat`\n                            and `block` columns, but only if column in question is\n                            already present on the screen. Note that these columns\n                            can still display only the beginning of the attribute,\n                            unless '-r' is provided.\n      -a, --all             Display ALL columns.\n      -r, --rigid           By default some columns can be compressed beyond the\n                            nominal width, if all current values fit and there is\n                            still space left. This option disables column\n                            shrinking (but they still will be expanded when\n                            needed).\n      --decimal             Use decimal byte offsets instead of hexadecimal.\n      --alt                 Use alternative notation for control characters: caret\n                            notation for ASCII C0, octal notation for ASCII C1.\n      --oneline             Discard all newline characters (0x0a LINE FEED) from\n                            the input.\n      --no-table            Do not format results as a table, just apply the\n                            colors to characters (equivalent to '-f char', implies\n                            '-b'). Compatible with '-merge', '--format' and even '\n                            --group'.\n      --no-override         Do not replace control/whitespace code point markers\n                            with distinguishable characters ('\u25af' to '\u21b5', '\u2423' etc).\n                            Run 'holms legend' to see the details.\n      -?, --help            Show this message and exit.\n\n[//]: # (@sub)\n\nExamples\n--------------------------\n\n### Output column selection\n\nOption `-f`/`--filter` can be used to specify what columns to display. As an\nalternative, there is an `-a`/`--all` option that enables displaying of all\ncurrently available columns.\n\n<details>\n  <summary><b>Column availability depending on operating mode</b></summary>\n\n  <div align=\"center\">\n    <img alt=\"example010\" src=\"https://github.com/es7s/holms/assets/50381946/62a6f354-1f30-4ee8-a8fc-533b1a980e03\">\n  </div>\n</details>\n\nAlso `-m`/`--merge` option is demonstrated, which tells the app to collapse\nrepetitive characters into one line of the output while counting them:\n\n<div align=\"center\">\n  <img alt=\"example005\" src=\"https://github.com/es7s/holms/assets/50381946/6da31546-0e50-4fa0-af69-0b7a8ed5d4c3\">\n</div>\n\n<details>\n   <summary>Plain text output</summary>\n   <!-- @sub:example005.png.txt -->\n\n      > holms run -m  phpstan.txt\n    \n      000  U+2B \u2595 + \u258f Sm     PLUS SIGN\n      001+ U+2D \u2595 - \u258f Pd 27\u00d7 HYPHEN-MINUS\n      01c  U+2B \u2595 + \u258f Sm     PLUS SIGN\n      01d  U+20 \u2595 \u2423 \u258f Zs     SPACE\n      01e  U+2B \u2595 + \u258f Sm     PLUS SIGN\n      01f+ U+2D \u2595 - \u258f Pd 27\u00d7 HYPHEN-MINUS\n      03a  U+2B \u2595 + \u258f Sm     PLUS SIGN\n      03b  U+ A \u2595 \u21b5 \u258f Cc     ASCII C0 [LF] LINE FEED\n      03c  U+7C \u2595 | \u258f Sm     VERTICAL LINE\n      03d+ U+20 \u2595 \u2423 \u258f Zs 27\u00d7 SPACE\n     ...\n\n   <!-- @sub -->\n</details>\n\n### Reading from pipeline\n\nThere is an official Unicode Consortium data file included in the repository for\ntest purposes, named [confusables.txt](tests/data/confusables.txt). In the next\nexample we extract line **#3620** using `sed`, delete all TAB (`0x08`) characters\nand feed the result to the application. The result demonstrates various Unicode\ndot/bullet code points:\n\n<div align=\"center\">\n    <img alt=\"example006\" src=\"https://github.com/es7s/holms/assets/50381946/78a90c45-d331-46d9-998e-20c6c9a97f12\">\n</div>\n\n<details>\n   <summary>Plain text output</summary>\n   <!-- @sub:example006.png.txt -->\n\n      > sed confusables.txt -Ee 'sg' -e '3620!d' |\n      \u00a0\u00a0holms run  -\n    \n      00  U+  B7 \u2595 \u00b7 \u258f Po MIDDLE DOT\n      02  U+1427 \u2595 \u1427 \u258f Lo CANADIAN SYLLABICS FINAL MIDDLE DOT\n      05  U+ 387 \u2595 \u0387 \u258f Po GREEK ANO TELEIA\n      07  U+2022 \u2595 \u2022 \u258f Po BULLET\n      0a  U+2027 \u2595 \u2027 \u258f Po HYPHENATION POINT\n      0d  U+2219 \u2595 \u2219 \u258f Sm BULLET OPERATOR\n      10  U+22C5 \u2595 \u22c5 \u258f Sm DOT OPERATOR\n      13  U+30FB \u2595\u30fb \u258f Po KATAKANA MIDDLE DOT\n      16  U10101 \u2595 \ud800\udd01 \u258f Po AEGEAN WORD SEPARATOR DOT\n      1a  U+FF65 \u2595 \uff65 \u258f Po HALFWIDTH KATAKANA MIDDLE DOT\n      1d  U+   A \u2595 \u21b5 \u258f Cc ASCII C0 [LF] LINE FEED\n\n   <!-- @sub -->\n</details>\n\n### Code points / categories statistics\n\n`-g`/`--group` option can be used to count unique code points, and to compute\nthe occurrence rate of each one:\n\n<div align=\"center\">\n  <img alt=\"example008\" src=\"https://github.com/es7s/holms/assets/50381946/f89be555-cf7e-4766-90b2-61a02140c54e\">\n</div>\n\n<details>\n   <summary>Plain text output</summary>\n   <!-- @sub:example008.png.txt -->\n\n      > holms run -g  ./tests/data/confusables.txt\n    \n     U+  20 \u2595 \u2423 \u258f Zs  12.5% \u2588\u2588\u2588 62732\u00d7 SPACE\n     U+   9 \u2595 \u21e5 \u258f Cc   7.3% \u2588\u258a  36745\u00d7 ASCII C0 [HT] HORIZONTAL TABULATION\n     U+  41 \u2595 A \u258f Lu   6.1% \u2588\u258d  30555\u00d7 LATIN CAPITAL LETTER A\n     U+  49 \u2595 I \u258f Lu   5.2% \u2588\u258f  26063\u00d7 LATIN CAPITAL LETTER I\n     U+  45 \u2595 E \u258f Lu   5.0% \u2588\u258f  24992\u00d7 LATIN CAPITAL LETTER E\n     U+  54 \u2595 T \u258f Lu   3.7% \u2589   18776\u00d7 LATIN CAPITAL LETTER T\n     U+  4C \u2595 L \u258f Lu   3.7% \u2589   18763\u00d7 LATIN CAPITAL LETTER L\n     U+200E \u2595 \u25af \u258f Cf   3.7% \u2589   18494\u00d7 LEFT-TO-RIGHT MARK\n     U+   A \u2595 \u21b5 \u258f Cc   2.9% \u258b   14609\u00d7 ASCII C0 [LF] LINE FEED\n     U+  43 \u2595 C \u258f Lu   2.9% \u258b   14450\u00d7 LATIN CAPITAL LETTER C\n     ...\n\n   <!-- @sub -->\n</details>\n\nWhen used twice (`-gg`) or thrice (`-ggg`), the application groups the input by\ncode point category or code point super category, respectively, which can be used\ne.g. for frequency domain analysis:\n\n<div align=\"center\">\n  <img alt=\"example011\" src=\"https://github.com/es7s/holms/assets/50381946/18018b0c-7978-48aa-b3be-4923167bb425\">\n  <img alt=\"example012\" src=\"https://github.com/es7s/holms/assets/50381946/1128d864-aad9-4203-ae9c-af2ea0f3ad9f\">\n</div>\n\n<details>\n   <summary>Plain text output</summary>\n   <!-- @sub:example011.png.txt -->\n\n      > holms run -gg  ./tests/data/confusables.txt\n    \n      53.1% \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 266233\u00d7  Uppercase_Letter\n      12.5% \u2588\u2588\u258e         62748\u00d7  Space_Separator\n      10.2% \u2588\u2589          51356\u00d7  Control\n       8.5% \u2588\u258c          42511\u00d7  Decimal_Number\n       3.7% \u258b           18497\u00d7  Format\n       3.0% \u258c           14832\u00d7  Other_Letter\n       2.0% \u258e            9778\u00d7  Math_Symbol\n       1.8% \u258e            9261\u00d7  Close_Punctuation\n       1.8% \u258e            9259\u00d7  Open_Punctuation\n       1.5% \u258e            7525\u00d7  Other_Punctuation\n     ...\n\n   <!-- @sub -->\n   <!-- @sub:example012.png.txt -->\n\n      > holms run -ggg  ./tests/data/confusables.txt\n    \n      56.7% \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 284074\u00d7  Letter\n      13.9% \u2588\u2588\u258d         69853\u00d7  Other(C)\n      12.5% \u2588\u2588\u258f         62750\u00d7  Separator(Z)\n       8.5% \u2588\u258c          42796\u00d7  Number\n       5.9% \u2588           29571\u00d7  Punctuation\n       2.2% \u258d           11072\u00d7  Symbol\n       0.2% \u258f             965\u00d7  Mark\n\n   <!-- @sub -->\n</details>\n\n### In-place type highlighting\n\nWhen `--format` is specified exactly as a single `char` column: `--format=char`,\nthe application omits all the columns and prints the original file contents,\nwhile highligting each character with a color that indicates its' Unicode\ncategory. \n\n> Note that ASCII control codes, as well as Unicode ones, are kept\nuntouched and invisible.\n\n<div align=\"center\">\n  <img alt=\"example007\" src=\"https://github.com/es7s/holms/assets/50381946/78ca318c-e295-41ff-b37d-d45d95842295\">\n</div>\n\n<details>\n   <summary>Plain text output</summary>\n   <!-- @sub:example007.png.txt -->\n\n      > sed chars.txt -nEe 1,12p |\n      \u00a0\u00a0holms run --format=char  -\n    \n       ! \" # $ % & ' ( ) * + , - . /\n     0 1 2 3 4 5 6 7 8 9 : ; < = > ?\n     @ A B C D E F G H I J K L M N O\n     P Q R S T U V W X Y Z [ \\ ] ^ _\n     ` a b c d e f g h i j k l m n o\n     p q r s t u v w x y z { | } ~\n       \u00a1 \u00a2 \u00a3 \u00a4 \u00a5 \u00a6 \u00a7 \u00a8 \u00a9 \u00aa \u00ab \u00ac \u00ad \u00ae \u00af\n     \u00b0 \u00b1 \u00b2 \u00b3 \u00b4 \u00b5 \u00b6 \u00b7 \u00b8 \u00b9 \u00ba \u00bb \u00bc \u00bd \u00be \u00bf\n     \u00c0 \u00c1 \u00c2 \u00c3 \u00c4 \u00c5 \u00c6 \u00c7 \u00c8 \u00c9 \u00ca \u00cb \u00cc \u00cd \u00ce \u00cf\n     \u00d0 \u00d1 \u00d2 \u00d3 \u00d4 \u00d5 \u00d6 \u00d7 \u00d8 \u00d9 \u00da \u00db \u00dc \u00dd \u00de \u00df\n     \u00e0 \u00e1 \u00e2 \u00e3 \u00e4 \u00e5 \u00e6 \u00e7 \u00e8 \u00e9 \u00ea \u00eb \u00ec \u00ed \u00ee \u00ef\n     \u00f0 \u00f1 \u00f2 \u00f3 \u00f4 \u00f5 \u00f6 \u00f7 \u00f8 \u00f9 \u00fa \u00fb \u00fc \u00fd \u00fe \u00ff\n\n   <!-- @sub -->\n</details>\n\n\nASCII latin letters (`A-Za-z`) are colored in 50% gray color instead of regular\nwhite on purpose \u2014 this can be extremely helpful when the task is to find\nnon-ASCII character(s) in an massive text of plain ASCII ones, or vice versa.\n\nBelow is a real example of broken characters which are the result of two\noperations being applied in the wrong order: *UTF-8 decoding* and *URL %-based\nunescaping*. This error is different from incorrect codepage selection errors,\nwhich mess up the whole text or a part of it; all byte sequences are valid UTF-8\nencoded code points, but the result differs from the origin and is completely \nunreadable nevertheless.\n\n<div align=\"center\">\n  <img alt=\"example015\" src=\"https://github.com/es7s/holms/assets/50381946/738b5bbe-291f-4ade-bf97-66c1e8368281\">\n</div>\n\n\n### ASCII C0 / C1 details\n\nWhile developing the application I encountered strange (as it seemed to be at\nthe beginning) behaviour of Python interpreter, which encoded C1 control bytes\nas two bytes of UTF-8, while C0 control bytes were displayed as sole bytes, like\nit would have been encoded in a plain ASCII. Then there was a bit of researching\ndone.\n\nAccording to [ISO/IEC 6429 (ECMA-48)](https://www.iso.org/standard/12782.html),\nthere are two types of ASCII control codes (to be precise, much more, but for\nour purposes it's mostly irrelevant) \u2014 C0 and C1. The first one includes ASCII\ncode points `0x00`-`0x1F` and `0x7F` (some authors also include a regular space\ncharacter `0x20` in this list), and the characteristic property of this type is\nthat all C0 code points are encoded in UTF-8 **exactly the same** as they do in\n7-bit US-ASCII ([ISO/IEC 646](https://www.iso.org/standard/4777.html)). This\nhelps to disambiguate exactly what type of encoding is used even for broken byte\nsequences, considering the task is to tell if a byte represents sole code point\nor is actually a part of multibyte UTF-8 sequence.\n\nHowever, C1 control codes are represented by `0x80`-`0x9F` bytes, which also are\nvalid bytes for multibyte UTF-8 sequences. In order to distinguish the first\ntype from the second UTF-8 encodes them as two-byte sequences instead (`0x80` \u2192\n`0xC280`, etc.); also this applies not only to control codes, but to all other\n[ISO/IEC 8859](https://www.iso.org/standard/28245.html) code points starting\nfrom `0x80`.\n\nWith this in mind, let's see how the application reflects these differences.\nFirst command produces several 8-bit ASCII C1 control codes, which are\nclassified as raw binary/non-UTF-8 data, while the second command's output\nconsists of the very same code points but being encoded in UTF-8 (thanks to\nPython's full transparent Unicode support, we don't even need to bother much\nabout the encodings and such):\n\n<div align=\"center\">\n  <img alt=\"example013\" src=\"https://github.com/es7s/holms/assets/50381946/884d3269-6323-41f1-9eab-6dccd83c5d6d\">\n</div>\n\n<details>\n   <summary>Plain text output</summary>\n   <!-- @sub:example013.png.txt -->\n\n      > printf \"\\x80\\x90\\x9f\" && python3 -c 'print(\"\\x80\\x90\\x9f\", end=\"\")' |\n      \u00a0\u00a0holms run --names --decimal --all  -\n    \n     \u23e80  #0   0x    80  --  \u2595 \u25af \u258f NON UTF-8 BYTE 0x80                                      -- Binary\n     \u23e81  #1   0x    90  --  \u2595 \u25af \u258f NON UTF-8 BYTE 0x90                                      -- Binary\n     \u23e82  #2   0x    9f  --  \u2595 \u25af \u258f NON UTF-8 BYTE 0x9F                                      -- Binary\n    \n     \u23e83  #3   0x c2 80 U+80 \u2595 \u25af \u258f ASCII C1 [PC] PADDING CHARACTER            Latin-1 Supplem\u2025 Control\n     \u23e85  #4   0x c2 90 U+90 \u2595 \u25af \u258f ASCII C1 [DCS] DEVICE CONTROL STRING       Latin-1 Supplem\u2025 Control\n     \u23e87  #5   0x c2 9f U+9F \u2595 \u25af \u258f ASCII C1 [APC] APPLICATION PROGRAM COMMAND Latin-1 Supplem\u2025 Control\n\n   <!-- @sub -->\n</details>\n\nLegend\n------------------\n\nThe image below illustrates the color scheme developed for the app specifically,\nto simplify distinguishing code points of one category from others.\n\n<div align=\"center\">\n  <img alt=\"example009\" src=\"https://github.com/es7s/holms/assets/50381946/f9cac3b0-adab-45a3-a324-174ad7f06d44\">\n</div>\n\nMost frequently encountering control codes also have a unique character\nreplacements, which allows to recognize them without reading the label or\nmemorizing code point identifiers:\n\n<div align=\"center\">\n  <img alt=\"example014\" src=\"https://github.com/es7s/holms/assets/50381946/2b77d06a-5e3d-4837-973c-78454e687113\">\n</div>\n\n<details>\n<summary><b>Unicode Blocks</b></summary>\n    <div align=\"center\">\n            <img alt=\"blocks\" src=\"https://github.com/es7s/holms/assets/50381946/8244553b-fc2d-419e-8b11-388ed0738bad\"/>\n    </div>\n</details>\n\nChangelog\n------------------\n\n[CHANGES.rst](CHANGES.rst)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Text to Unicode code points breakdown",
    "version": "1.6.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/es7s/holms/issues",
        "Changelog": "https://github.com/es7s/holms/blob/master/CHANGES.rst",
        "Homepage": "https://github.com/es7s/holms"
    },
    "split_keywords": [
        "analyzer",
        " breakdown",
        " console",
        " terminal",
        " text",
        " unicode"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0dd09d8f7a7899af19c7b3978baad02ee5edcf91ffc0972034eccec547477ee2",
                "md5": "04fa4123402a2401a31810f0563bf7aa",
                "sha256": "ff8ffa1e71741bfe30eca8a3960d69876018fe16a277e2b9d200888050c110d6"
            },
            "downloads": -1,
            "filename": "holms-1.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "04fa4123402a2401a31810f0563bf7aa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 45632,
            "upload_time": "2024-08-06T23:22:22",
            "upload_time_iso_8601": "2024-08-06T23:22:22.283824Z",
            "url": "https://files.pythonhosted.org/packages/0d/d0/9d8f7a7899af19c7b3978baad02ee5edcf91ffc0972034eccec547477ee2/holms-1.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8fc0dde00bb1644b0989641551f572dd6fd08be1e5f81016009070397fa11c60",
                "md5": "097340c77f7667352cceb63e903ad7f1",
                "sha256": "6b659c27de3f7640feb2bed993c6e46978ff5bf9da54714c3961346d58081282"
            },
            "downloads": -1,
            "filename": "holms-1.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "097340c77f7667352cceb63e903ad7f1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 195614,
            "upload_time": "2024-08-06T23:22:24",
            "upload_time_iso_8601": "2024-08-06T23:22:24.014181Z",
            "url": "https://files.pythonhosted.org/packages/8f/c0/dde00bb1644b0989641551f572dd6fd08be1e5f81016009070397fa11c60/holms-1.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-06 23:22:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "es7s",
    "github_project": "holms",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "es7s.commons",
            "specs": [
                [
                    "==",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "pytermor",
            "specs": [
                [
                    "==",
                    "2.118.0.dev0"
                ]
            ]
        }
    ],
    "lcname": "holms"
}
        
Elapsed time: 0.30801s