# Canonical Huffman Compression
This implementation of the Huffman algorithm uses canonical code to save space. It is a full featured implementation that allows for the saving and loading of compressed files. It also allows for the use of a fixed dictionary, which can be useful in some situations.
Using canonical Huffamn codes allows us to save the Huffman code dictionary with a combination of sorted symbols and the number occurrences for each code length. This is ideal for situations where the number of symbols used is significantly smaller than the number of possible symbols.
# Example of how to use the HuffmanCoding class
This implementation expects inputs as a list of integers. In this example, we convert some text to integers using the python ord function.
```python
from canonical_huffman import HuffmanCoding
text = "Lorem ipsum dolor sit amet."
data = [ord(char) for char in text]
huff = HuffmanCoding()
huff.compress(data)
huff.save_compressed('example.bin')
# delete huff object and create a new one to show that there is no data leakage.
del huff
huff2 = HuffmanCoding()
huff2.open_compressed('example.bin')
decompressed = huff2.decompress_file()
if data == decompressed:
print('huffman successful!')
else:
print('huffman failed!')
```
# Example of how to use a fixed dictionary
A fixed dictionary is sometimes used when the data is fairly predictable across files e.g. the distribution of common letters in the English language. Fixed dictionaries can sometimes save space when the size of the dictionary is large compared to the size of the data. It is possible to implement a fixed dictionary with the current release, but it requires a little more work and management by the user.
```python
from canonical_huffman import HuffmanCoding
# First we need to create the fixed dictionary
huff_fixed = HuffmanCoding()
text = "Lorem ipsum dolor sit amet."
data = [ord(char) for char in text]
fixed_text = "the quick brown fox jumped over the lazy dogs, THE QUICK BROWN FOX JUMPED OVER THE LAZY DOGS."
fixed_data = [ord(char) for char in fixed_text]
huff_fixed.huff_dict.make_dictionary(fixed_data)
# Then we can compress the data with the fixed dictionary
huff_new = HuffmanCoding()
huff_new.huff_dict.canonical_codes = huff_fixed.huff_dict.canonical_codes
huff_new.compress(data, fixed_dict=True)
compressed_data = huff_new.encoded_text
# For now, it is up to the user to save the encoded binary text to a file as they see fit.
# The data should be much larger than the first example, since the fixed dictionary is not optimized for the data
print('Fixed dict compressed data size: ', len(compressed_data)//8)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/craigwatkins/Huffman-Compression",
"name": "canonical-huffman-compression",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "huffman compression canonical",
"author": "Craig Watkins",
"author_email": "craigwatkinsdev@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b8/00/87d592eacd4200cfe5730a114e27258fae285ddda7967b6821c3806614d5/canonical_huffman_compression-0.1.1.8.tar.gz",
"platform": null,
"description": "# Canonical Huffman Compression\r\n\r\nThis implementation of the Huffman algorithm uses canonical code to save space. It is a full featured implementation that allows for the saving and loading of compressed files. It also allows for the use of a fixed dictionary, which can be useful in some situations.\r\n\r\nUsing canonical Huffamn codes allows us to save the Huffman code dictionary with a combination of sorted symbols and the number occurrences for each code length. This is ideal for situations where the number of symbols used is significantly smaller than the number of possible symbols.\r\n\r\n\r\n\r\n# Example of how to use the HuffmanCoding class\r\n\r\nThis implementation expects inputs as a list of integers. In this example, we convert some text to integers using the python ord function.\r\n```python\r\nfrom canonical_huffman import HuffmanCoding\r\ntext = \"Lorem ipsum dolor sit amet.\"\r\ndata = [ord(char) for char in text]\r\nhuff = HuffmanCoding()\r\nhuff.compress(data)\r\nhuff.save_compressed('example.bin')\r\n# delete huff object and create a new one to show that there is no data leakage.\r\ndel huff\r\nhuff2 = HuffmanCoding()\r\nhuff2.open_compressed('example.bin')\r\ndecompressed = huff2.decompress_file()\r\nif data == decompressed:\r\n print('huffman successful!')\r\nelse:\r\n print('huffman failed!')\r\n```\r\n# Example of how to use a fixed dictionary\r\nA fixed dictionary is sometimes used when the data is fairly predictable across files e.g. the distribution of common letters in the English language. Fixed dictionaries can sometimes save space when the size of the dictionary is large compared to the size of the data. It is possible to implement a fixed dictionary with the current release, but it requires a little more work and management by the user.\r\n```python\r\nfrom canonical_huffman import HuffmanCoding\r\n# First we need to create the fixed dictionary\r\nhuff_fixed = HuffmanCoding()\r\ntext = \"Lorem ipsum dolor sit amet.\"\r\ndata = [ord(char) for char in text]\r\nfixed_text = \"the quick brown fox jumped over the lazy dogs, THE QUICK BROWN FOX JUMPED OVER THE LAZY DOGS.\"\r\nfixed_data = [ord(char) for char in fixed_text]\r\nhuff_fixed.huff_dict.make_dictionary(fixed_data)\r\n# Then we can compress the data with the fixed dictionary\r\nhuff_new = HuffmanCoding()\r\nhuff_new.huff_dict.canonical_codes = huff_fixed.huff_dict.canonical_codes\r\nhuff_new.compress(data, fixed_dict=True)\r\ncompressed_data = huff_new.encoded_text\r\n# For now, it is up to the user to save the encoded binary text to a file as they see fit.\r\n# The data should be much larger than the first example, since the fixed dictionary is not optimized for the data\r\nprint('Fixed dict compressed data size: ', len(compressed_data)//8)\r\n\r\n```\r\n\r\n\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A package for canonical Huffman compression",
"version": "0.1.1.8",
"project_urls": {
"Homepage": "https://github.com/craigwatkins/Huffman-Compression"
},
"split_keywords": [
"huffman",
"compression",
"canonical"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "00632db6a2b2f040b4850e28e6039fb54506152d3fbcd49962aab77c2b80178d",
"md5": "24799578ece8bec8e8e536ee7e2f73a9",
"sha256": "90b300fbbcb8e5b90e26ac994c80384922abd349a4429d558ecfc88939ebdf29"
},
"downloads": -1,
"filename": "canonical_huffman_compression-0.1.1.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "24799578ece8bec8e8e536ee7e2f73a9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 11283,
"upload_time": "2024-02-01T04:16:17",
"upload_time_iso_8601": "2024-02-01T04:16:17.501627Z",
"url": "https://files.pythonhosted.org/packages/00/63/2db6a2b2f040b4850e28e6039fb54506152d3fbcd49962aab77c2b80178d/canonical_huffman_compression-0.1.1.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b80087d592eacd4200cfe5730a114e27258fae285ddda7967b6821c3806614d5",
"md5": "4bab5129aa3a05163eea3c48d59028cf",
"sha256": "88503e209e0da2ed13d153925f638d7d2ce700ddd697250961f733c454f0552e"
},
"downloads": -1,
"filename": "canonical_huffman_compression-0.1.1.8.tar.gz",
"has_sig": false,
"md5_digest": "4bab5129aa3a05163eea3c48d59028cf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 9852,
"upload_time": "2024-02-01T04:16:19",
"upload_time_iso_8601": "2024-02-01T04:16:19.374738Z",
"url": "https://files.pythonhosted.org/packages/b8/00/87d592eacd4200cfe5730a114e27258fae285ddda7967b6821c3806614d5/canonical_huffman_compression-0.1.1.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-01 04:16:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "craigwatkins",
"github_project": "Huffman-Compression",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "canonical-huffman-compression"
}