dahuffman


Namedahuffman JSON
Version 0.4.2 PyPI version JSON
download
home_pageNone
SummaryPure Python Huffman encoder and decoder module
upload_time2024-09-09 07:52:42
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT License
keywords compression decoding encoding huffman
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            dahuffman - Python Module for Huffman Encoding and Decoding
===========================================================


.. image:: https://img.shields.io/github/actions/workflow/status/soxofaan/dahuffman/lint-and-test.yml
    :target: https://github.com/soxofaan/dahuffman/actions/workflows/lint-and-test.yml

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :target: https://github.com/soxofaan/dahuffman/blob/master/LICENSE.txt

.. image::  https://img.shields.io/pypi/v/dahuffman
    :target: https://pypi.org/project/dahuffman

.. image:: https://img.shields.io/pypi/pyversions/dahuffman
    :target: https://pypi.org/project/dahuffman

.. image:: https://img.shields.io/pypi/wheel/dahuffman
    :target: https://pypi.org/project/dahuffman
    :alt: PyPI - Wheel

-------------------------

dahuffman is a pure Python module for Huffman encoding and decoding,
commonly used for lossless data compression.

The name of the module refers to the full name of the inventor
of the Huffman code tree algorithm: David Albert Huffman (August 9, 1925 – October 7, 1999).

Features and design
-------------------

- Pure Python implementation, only using standard library.
- Leverages iterators and generators internally, allows to be used in streaming fashion.
- Not limited to byte/unicode string input, can handle other "symbols" or tokens,
  for example chess moves or sequences of categorical data, as long as these symbols
  can be used as keys in dictionaries (meaning they should be hashable).
- Properly handle end of encoded bit stream if it does not align with byte boundaries
- For Python 3.5 and up

Installation
------------

.. code-block:: bash

    pip install dahuffman

Usage
-----

Basic usage example, where the code table is built based on given symbol frequencies::

    >>> from dahuffman import HuffmanCodec
    >>> codec = HuffmanCodec.from_frequencies(
    ...     {"e": 100, "n": 20, "x": 1, "i": 40, "q": 3}
    ... )
    >>> codec.print_code_table()
    Bits Code  Value Symbol
       5 00000     0 _EOF
       5 00001     1 'x'
       4 0001      1 'q'
       3 001       1 'n'
       2 01        1 'i'
       1 1         1 'e'

Encode a string, get the encoded data as ``bytes`` and decode again::

    >>> encoded = codec.encode("exeneeeexniqneieini")
    >>> encoded
    b'\x86|%\x13i@'
    >>> len(encoded)
    6
    >>> codec.decode(encoded)
    'exeneeeexniqneieini'

If desired: work with byte values directly:

    >>> list(encoded)
    [134, 124, 37, 19, 105, 64]
    >>> codec.decode([134, 124, 37, 19, 105, 64])
    'exeneeeexniqneieini'


You can also "train" the codec by providing it data directly::

    >>> codec = HuffmanCodec.from_data(
    ...    "hello world how are you doing today foo bar lorem ipsum"
    ... )
    >>> codec.encode("do lo er ad od")
    b'^O\x1a\xc4S\xab\x80'
    >>> len(_)
    7


Using it with sequences of symbols (country codes in this example)::

    >>> countries = ["FR", "UK", "BE", "IT", "FR", "IT", "GR", "FR", "NL", "BE", "DE"]
    >>> codec = HuffmanCodec.from_data(countries)
    >>> encoded = codec.encode(["FR", "IT", "BE", "FR", "UK"])
    >>> encoded
    b'L\xca'
    >>> len(encoded)
    2
    >>> codec.decode(encoded)
    ['FR', 'IT', 'BE', 'FR', 'UK']



Doing it in a streaming fashion (generators)::

    >>> import random
    >>> def sample(n, symbols):
    ...     for i in range(n):
    ...             if (n-i) % 5 == 1:
    ...                     print(i)
    ...             yield random.choice(symbols)
    ...
    >>> codec = HuffmanCodec.from_data(countries)
    >>> encoded = codec.encode_streaming(sample(16, countries))
    >>> encoded
    <generator object encode_streaming at 0x108bd82d0>
    >>> decoded = codec.decode_streaming(encoded)
    >>> decoded
    <generator object decode_streaming at 0x108bd8370>
    >>> list(decoded)
    0
    5
    10
    15
    ['DE', 'BE', 'FR', 'GR', 'UK', 'BE', 'UK', 'IT', 'UK', 'FR', 'DE', 'IT', 'NL', 'IT', 'FR', 'UK']




Pre-trained codecs
~~~~~~~~~~~~~~~~~~

The ``dahuffman.codecs`` package contains a bunch of pre-trained code tables.
The codecs can be loaded as follows::

    >>> from dahuffman import load_shakespeare
    >>> codec = load_shakespeare()
    >>> codec.print_code_table()
    Bits Code                     Value Symbol
       4 0000                         0 'n'
       4 0001                         1 's'
       4 0010                         2 'h'
       5 00110                        6 'u'
       7 0011100                     28 'k'
       9 001110100                  116 'Y'
      14 00111010100000            3744 '0'
    ...
    >>> len(codec.encode('To be, or not to be; that is the question;'))
    24

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dahuffman",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "compression, decoding, encoding, huffman",
    "author": null,
    "author_email": "Stefaan Lippens <soxofaan@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/8b/71/a0733bd3a40213f42005bdc60a4fe066bb790100b89c56f47aadf74dc88c/dahuffman-0.4.2.tar.gz",
    "platform": null,
    "description": "dahuffman - Python Module for Huffman Encoding and Decoding\n===========================================================\n\n\n.. image:: https://img.shields.io/github/actions/workflow/status/soxofaan/dahuffman/lint-and-test.yml\n    :target: https://github.com/soxofaan/dahuffman/actions/workflows/lint-and-test.yml\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n    :target: https://github.com/soxofaan/dahuffman/blob/master/LICENSE.txt\n\n.. image::  https://img.shields.io/pypi/v/dahuffman\n    :target: https://pypi.org/project/dahuffman\n\n.. image:: https://img.shields.io/pypi/pyversions/dahuffman\n    :target: https://pypi.org/project/dahuffman\n\n.. image:: https://img.shields.io/pypi/wheel/dahuffman\n    :target: https://pypi.org/project/dahuffman\n    :alt: PyPI - Wheel\n\n-------------------------\n\ndahuffman is a pure Python module for Huffman encoding and decoding,\ncommonly used for lossless data compression.\n\nThe name of the module refers to the full name of the inventor\nof the Huffman code tree algorithm: David Albert Huffman (August 9, 1925 \u2013 October 7, 1999).\n\nFeatures and design\n-------------------\n\n- Pure Python implementation, only using standard library.\n- Leverages iterators and generators internally, allows to be used in streaming fashion.\n- Not limited to byte/unicode string input, can handle other \"symbols\" or tokens,\n  for example chess moves or sequences of categorical data, as long as these symbols\n  can be used as keys in dictionaries (meaning they should be hashable).\n- Properly handle end of encoded bit stream if it does not align with byte boundaries\n- For Python 3.5 and up\n\nInstallation\n------------\n\n.. code-block:: bash\n\n    pip install dahuffman\n\nUsage\n-----\n\nBasic usage example, where the code table is built based on given symbol frequencies::\n\n    >>> from dahuffman import HuffmanCodec\n    >>> codec = HuffmanCodec.from_frequencies(\n    ...     {\"e\": 100, \"n\": 20, \"x\": 1, \"i\": 40, \"q\": 3}\n    ... )\n    >>> codec.print_code_table()\n    Bits Code  Value Symbol\n       5 00000     0 _EOF\n       5 00001     1 'x'\n       4 0001      1 'q'\n       3 001       1 'n'\n       2 01        1 'i'\n       1 1         1 'e'\n\nEncode a string, get the encoded data as ``bytes`` and decode again::\n\n    >>> encoded = codec.encode(\"exeneeeexniqneieini\")\n    >>> encoded\n    b'\\x86|%\\x13i@'\n    >>> len(encoded)\n    6\n    >>> codec.decode(encoded)\n    'exeneeeexniqneieini'\n\nIf desired: work with byte values directly:\n\n    >>> list(encoded)\n    [134, 124, 37, 19, 105, 64]\n    >>> codec.decode([134, 124, 37, 19, 105, 64])\n    'exeneeeexniqneieini'\n\n\nYou can also \"train\" the codec by providing it data directly::\n\n    >>> codec = HuffmanCodec.from_data(\n    ...    \"hello world how are you doing today foo bar lorem ipsum\"\n    ... )\n    >>> codec.encode(\"do lo er ad od\")\n    b'^O\\x1a\\xc4S\\xab\\x80'\n    >>> len(_)\n    7\n\n\nUsing it with sequences of symbols (country codes in this example)::\n\n    >>> countries = [\"FR\", \"UK\", \"BE\", \"IT\", \"FR\", \"IT\", \"GR\", \"FR\", \"NL\", \"BE\", \"DE\"]\n    >>> codec = HuffmanCodec.from_data(countries)\n    >>> encoded = codec.encode([\"FR\", \"IT\", \"BE\", \"FR\", \"UK\"])\n    >>> encoded\n    b'L\\xca'\n    >>> len(encoded)\n    2\n    >>> codec.decode(encoded)\n    ['FR', 'IT', 'BE', 'FR', 'UK']\n\n\n\nDoing it in a streaming fashion (generators)::\n\n    >>> import random\n    >>> def sample(n, symbols):\n    ...     for i in range(n):\n    ...             if (n-i) % 5 == 1:\n    ...                     print(i)\n    ...             yield random.choice(symbols)\n    ...\n    >>> codec = HuffmanCodec.from_data(countries)\n    >>> encoded = codec.encode_streaming(sample(16, countries))\n    >>> encoded\n    <generator object encode_streaming at 0x108bd82d0>\n    >>> decoded = codec.decode_streaming(encoded)\n    >>> decoded\n    <generator object decode_streaming at 0x108bd8370>\n    >>> list(decoded)\n    0\n    5\n    10\n    15\n    ['DE', 'BE', 'FR', 'GR', 'UK', 'BE', 'UK', 'IT', 'UK', 'FR', 'DE', 'IT', 'NL', 'IT', 'FR', 'UK']\n\n\n\n\nPre-trained codecs\n~~~~~~~~~~~~~~~~~~\n\nThe ``dahuffman.codecs`` package contains a bunch of pre-trained code tables.\nThe codecs can be loaded as follows::\n\n    >>> from dahuffman import load_shakespeare\n    >>> codec = load_shakespeare()\n    >>> codec.print_code_table()\n    Bits Code                     Value Symbol\n       4 0000                         0 'n'\n       4 0001                         1 's'\n       4 0010                         2 'h'\n       5 00110                        6 'u'\n       7 0011100                     28 'k'\n       9 001110100                  116 'Y'\n      14 00111010100000            3744 '0'\n    ...\n    >>> len(codec.encode('To be, or not to be; that is the question;'))\n    24\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Pure Python Huffman encoder and decoder module",
    "version": "0.4.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/soxofaan/dahuffman/issues",
        "Homepage": "https://github.com/soxofaan/dahuffman"
    },
    "split_keywords": [
        "compression",
        " decoding",
        " encoding",
        " huffman"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "76e7b0beac9a3655219574d2b59cf71346f44d2d5064d9a2ab228ecb256d3069",
                "md5": "fe287a1771b1d64fdcafe392178986e5",
                "sha256": "37968b2102402206367298f62f5b11aa588d3d827ccdf52f475bfccaea2a1732"
            },
            "downloads": -1,
            "filename": "dahuffman-0.4.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fe287a1771b1d64fdcafe392178986e5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 18184,
            "upload_time": "2024-09-09T07:52:40",
            "upload_time_iso_8601": "2024-09-09T07:52:40.098691Z",
            "url": "https://files.pythonhosted.org/packages/76/e7/b0beac9a3655219574d2b59cf71346f44d2d5064d9a2ab228ecb256d3069/dahuffman-0.4.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b71a0733bd3a40213f42005bdc60a4fe066bb790100b89c56f47aadf74dc88c",
                "md5": "b5908688cfed15168b2f7d38b9239347",
                "sha256": "e260e5279e4e4989bab325cc073db1810e914453e61d9210906fee57373e0130"
            },
            "downloads": -1,
            "filename": "dahuffman-0.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b5908688cfed15168b2f7d38b9239347",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 17183,
            "upload_time": "2024-09-09T07:52:42",
            "upload_time_iso_8601": "2024-09-09T07:52:42.868369Z",
            "url": "https://files.pythonhosted.org/packages/8b/71/a0733bd3a40213f42005bdc60a4fe066bb790100b89c56f47aadf74dc88c/dahuffman-0.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-09 07:52:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "soxofaan",
    "github_project": "dahuffman",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dahuffman"
}
        
Elapsed time: 0.49387s