![Tests](https://github.com/TkTech/mutf8/workflows/Tests/badge.svg?branch=master)
# mutf-8
This package contains simple pure-python as well as C encoders and decoders for
the MUTF-8 character encoding. In most cases, you can also parse the even-rarer
CESU-8.
These days, you'll most likely encounter MUTF-8 when working on files or
protocols related to the JVM. Strings in a Java `.class` file are encoded using
MUTF-8, strings passed by the JNI, as well as strings exported by the object
serializer.
This library was extracted from [Lawu][], a Python library for working with JVM
class files.
## 🎉 Installation
Install the package from PyPi:
```
pip install mutf8
```
Binary wheels are available for the following:
| | py3.6 | py3.7 | py3.8 | py3.9 |
| ---------------- | ----- | ----- | ----- | ----- |
| OS X (x86_64) | y | y | y | y |
| Windows (x86_64) | y | y | y | y |
| Linux (x86_64) | y | y | y | y |
If binary wheels are not available, it will attempt to build the C extension
from source with any C99 compiler. If it could not build, it will fall back
to a pure-python version.
## Usage
Encoding and decoding is simple:
```python
from mutf8 import encode_modified_utf8, decode_modified_utf8
unicode = decode_modified_utf8(byte_like_object)
bytes = encode_modified_utf8(unicode)
```
This module *does not* register itself globally as a codec, since importing
should be side-effect-free.
## 📈 Benchmarks
The C extension is significantly faster - often 20x to 40x faster.
<!-- BENCHMARK START -->
### MUTF-8 Decoding
| Name | Min (μs) | Max (μs) | StdDev | Ops |
|------------------------------|------------|------------|----------|---------------|
| cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |
| pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |
### MUTF-8 Encoding
| Name | Min (μs) | Max (μs) | StdDev | Ops |
|------------------------------|------------|------------|----------|----------------|
| cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |
| pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |
<!-- BENCHMARK END -->
## C Extension
The C extension is optional. If a binary package is not available, or a C
compiler is not present, the pure-python version will be used instead. If you
want to ensure you're using the C version, import it directly:
```python
from mutf8.cmutf8 import decode_modified_utf8
decode_modified_utf(b'\xED\xA1\x80\xED\xB0\x80')
```
[Lawu]: https://github.com/tktech/lawu
Raw data
{
"_id": null,
"home_page": "http://github.com/TkTech/mutf8",
"name": "mutf8",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "mutf-8,cesu-8,jvm",
"author": "Tyler Kennedy",
"author_email": "tk@tkte.ch",
"download_url": "https://files.pythonhosted.org/packages/ca/31/3c57313757b3a47dcf32d2a9bad55d913b797efc8814db31bed8a7142396/mutf8-1.0.6.tar.gz",
"platform": "",
"description": "![Tests](https://github.com/TkTech/mutf8/workflows/Tests/badge.svg?branch=master)\n\n# mutf-8\n\nThis package contains simple pure-python as well as C encoders and decoders for\nthe MUTF-8 character encoding. In most cases, you can also parse the even-rarer\nCESU-8.\n\nThese days, you'll most likely encounter MUTF-8 when working on files or\nprotocols related to the JVM. Strings in a Java `.class` file are encoded using\nMUTF-8, strings passed by the JNI, as well as strings exported by the object\nserializer.\n\nThis library was extracted from [Lawu][], a Python library for working with JVM\nclass files.\n\n## \ud83c\udf89 Installation\n\nInstall the package from PyPi:\n\n```\npip install mutf8\n```\n\nBinary wheels are available for the following:\n\n| | py3.6 | py3.7 | py3.8 | py3.9 |\n| ---------------- | ----- | ----- | ----- | ----- |\n| OS X (x86_64) | y | y | y | y |\n| Windows (x86_64) | y | y | y | y |\n| Linux (x86_64) | y | y | y | y |\n\nIf binary wheels are not available, it will attempt to build the C extension\nfrom source with any C99 compiler. If it could not build, it will fall back\nto a pure-python version.\n\n## Usage\n\nEncoding and decoding is simple:\n\n```python\nfrom mutf8 import encode_modified_utf8, decode_modified_utf8\n\nunicode = decode_modified_utf8(byte_like_object)\nbytes = encode_modified_utf8(unicode)\n```\n\nThis module *does not* register itself globally as a codec, since importing\nshould be side-effect-free.\n\n## \ud83d\udcc8 Benchmarks\n\nThe C extension is significantly faster - often 20x to 40x faster.\n\n<!-- BENCHMARK START -->\n\n### MUTF-8 Decoding\n| Name | Min (\u03bcs) | Max (\u03bcs) | StdDev | Ops |\n|------------------------------|------------|------------|----------|---------------|\n| cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |\n| pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |\n\n### MUTF-8 Encoding\n| Name | Min (\u03bcs) | Max (\u03bcs) | StdDev | Ops |\n|------------------------------|------------|------------|----------|----------------|\n| cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |\n| pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |\n<!-- BENCHMARK END -->\n\n## C Extension\n\nThe C extension is optional. If a binary package is not available, or a C\ncompiler is not present, the pure-python version will be used instead. If you\nwant to ensure you're using the C version, import it directly:\n\n```python\nfrom mutf8.cmutf8 import decode_modified_utf8\n\ndecode_modified_utf(b'\\xED\\xA1\\x80\\xED\\xB0\\x80')\n```\n\n[Lawu]: https://github.com/tktech/lawu\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Fast MUTF-8 encoder & decoder",
"version": "1.0.6",
"split_keywords": [
"mutf-8",
"cesu-8",
"jvm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1d35a974f7150411b1597e49bbfa2361afa0a69b776b02e4514c2b8fb663178c",
"md5": "341b28ca1b5c041e5be438bf300fbc5c",
"sha256": "74ae69cd9790fa4f0f6a7b0db503c459c955b8235551baf683cb4f3f31677063"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp36-cp36m-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "341b28ca1b5c041e5be438bf300fbc5c",
"packagetype": "bdist_wheel",
"python_version": "cp36",
"requires_python": null,
"size": 8677,
"upload_time": "2021-12-29T03:02:53",
"upload_time_iso_8601": "2021-12-29T03:02:53.070687Z",
"url": "https://files.pythonhosted.org/packages/1d/35/a974f7150411b1597e49bbfa2361afa0a69b776b02e4514c2b8fb663178c/mutf8-1.0.6-cp36-cp36m-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1f4fa0fecea0020c194378c2ab4e8d26acfbad9c177c1947e62adb63f1b02de4",
"md5": "acfc25dac566d7324254ad2a71944ee7",
"sha256": "fcf20045263ce8ebd6c47e94c9477ab0d388ed169a69ad2d8f19bcbf0b87f401"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "acfc25dac566d7324254ad2a71944ee7",
"packagetype": "bdist_wheel",
"python_version": "cp36",
"requires_python": null,
"size": 18910,
"upload_time": "2021-12-29T03:03:11",
"upload_time_iso_8601": "2021-12-29T03:03:11.305405Z",
"url": "https://files.pythonhosted.org/packages/1f/4f/a0fecea0020c194378c2ab4e8d26acfbad9c177c1947e62adb63f1b02de4/mutf8-1.0.6-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2347615e86d4d318839c8b75e7ded85a5cd440425156f7426b7435ff5288f15d",
"md5": "fa5ddbfdf58334918df6eb5ceb6160c1",
"sha256": "83c38555db263e369e95533d80848d8e4296e302303b72082b98c3124cba504d"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp36-cp36m-win_amd64.whl",
"has_sig": false,
"md5_digest": "fa5ddbfdf58334918df6eb5ceb6160c1",
"packagetype": "bdist_wheel",
"python_version": "cp36",
"requires_python": null,
"size": 11472,
"upload_time": "2021-12-29T03:03:50",
"upload_time_iso_8601": "2021-12-29T03:03:50.632766Z",
"url": "https://files.pythonhosted.org/packages/23/47/615e86d4d318839c8b75e7ded85a5cd440425156f7426b7435ff5288f15d/mutf8-1.0.6-cp36-cp36m-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "197aca090f94dc1848aeeafb02e739edb78092ea027afe30119eb97df2c8e95d",
"md5": "f310510d0212664f2359cb797b958d86",
"sha256": "e09f4a19e5500699bb42074890b463b785ab9a8d95c7d793e590405f3b4b29d7"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp37-cp37m-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "f310510d0212664f2359cb797b958d86",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 8672,
"upload_time": "2021-12-29T03:02:45",
"upload_time_iso_8601": "2021-12-29T03:02:45.242872Z",
"url": "https://files.pythonhosted.org/packages/19/7a/ca090f94dc1848aeeafb02e739edb78092ea027afe30119eb97df2c8e95d/mutf8-1.0.6-cp37-cp37m-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "37b951ac052f1d9ce1eca596a64a4b71ac32d05483d636c03e335be555ad6725",
"md5": "1682a9d2101df4b3def640c0d5f5d2cf",
"sha256": "1f4f497f20e3ea7968496c1eb1e1cb259c53ad040879e1e83ffb755a12112a04"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "1682a9d2101df4b3def640c0d5f5d2cf",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 18908,
"upload_time": "2021-12-29T03:03:12",
"upload_time_iso_8601": "2021-12-29T03:03:12.631542Z",
"url": "https://files.pythonhosted.org/packages/37/b9/51ac052f1d9ce1eca596a64a4b71ac32d05483d636c03e335be555ad6725/mutf8-1.0.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b69c577a93c09a3f16e718e6783d7b72c0fe08cd944637ba14ac72c4812eb26f",
"md5": "bc66f790756a5a00604682e15cada4a8",
"sha256": "1925f5490fabca5c34138ed6644a1a093b0d935252207a5e89664097ff14114c"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp37-cp37m-win_amd64.whl",
"has_sig": false,
"md5_digest": "bc66f790756a5a00604682e15cada4a8",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 11436,
"upload_time": "2021-12-29T03:03:44",
"upload_time_iso_8601": "2021-12-29T03:03:44.149558Z",
"url": "https://files.pythonhosted.org/packages/b6/9c/577a93c09a3f16e718e6783d7b72c0fe08cd944637ba14ac72c4812eb26f/mutf8-1.0.6-cp37-cp37m-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ca084610bad7f9af6f82f62b162d24ea4139d2ef8a173e760a87d776aa57b938",
"md5": "88417a3a9a2030f273994178d51370c7",
"sha256": "018ceda7cdb66a1d3e9c07a71a1a35b92570fbb1230887a34ad784ff4d349981"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp38-cp38-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "88417a3a9a2030f273994178d51370c7",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 8714,
"upload_time": "2021-12-29T03:05:24",
"upload_time_iso_8601": "2021-12-29T03:05:24.462633Z",
"url": "https://files.pythonhosted.org/packages/ca/08/4610bad7f9af6f82f62b162d24ea4139d2ef8a173e760a87d776aa57b938/mutf8-1.0.6-cp38-cp38-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4817c2b687871abff8e15ceb689e2c01ec3fe73a9461d428561ffd17278c2802",
"md5": "e2642fd10c76114c6dbe287bb51dce94",
"sha256": "7a67e88534a7641c513dad13f2f7913239808df4a5d0b822eda0ff9024431e0b"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "e2642fd10c76114c6dbe287bb51dce94",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 19100,
"upload_time": "2021-12-29T03:03:13",
"upload_time_iso_8601": "2021-12-29T03:03:13.721564Z",
"url": "https://files.pythonhosted.org/packages/48/17/c2b687871abff8e15ceb689e2c01ec3fe73a9461d428561ffd17278c2802/mutf8-1.0.6-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "28c3f3f7b0f9000ebdbad8440941a7926b02c28231e434fb0fd7c80aad2b940c",
"md5": "24299594fa9def9f16652a4036895a4f",
"sha256": "0d1325d42806b31901a0ddd4ef199144e508fd9f6f3c75a8305d5979365b66c3"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp38-cp38-win_amd64.whl",
"has_sig": false,
"md5_digest": "24299594fa9def9f16652a4036895a4f",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 11422,
"upload_time": "2021-12-29T03:03:42",
"upload_time_iso_8601": "2021-12-29T03:03:42.783043Z",
"url": "https://files.pythonhosted.org/packages/28/c3/f3f7b0f9000ebdbad8440941a7926b02c28231e434fb0fd7c80aad2b940c/mutf8-1.0.6-cp38-cp38-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dabc9e05f5b1d3156822bcdd8b07319f41d05f8ee7237643fd470255af95d6e8",
"md5": "f61c30756ca7e4fd3f19e6215fe16161",
"sha256": "3207a071ead14d928213019f12b5554b179f61a16a8094ed660b755990db3652"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp39-cp39-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "f61c30756ca7e4fd3f19e6215fe16161",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 8716,
"upload_time": "2021-12-29T03:02:45",
"upload_time_iso_8601": "2021-12-29T03:02:45.028961Z",
"url": "https://files.pythonhosted.org/packages/da/bc/9e05f5b1d3156822bcdd8b07319f41d05f8ee7237643fd470255af95d6e8/mutf8-1.0.6-cp39-cp39-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "574a1ad8954084a75e308d978bb0ef95b61d29c84f8b4a4fbc0a687b62922789",
"md5": "1fd0c185a86833f27c0beba8c2e5f416",
"sha256": "6172b5babc0c819636830fc79ca9c3a82662ef1ee764c82c1b59fbf6ea54d82f"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "1fd0c185a86833f27c0beba8c2e5f416",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 18437,
"upload_time": "2021-12-29T03:03:15",
"upload_time_iso_8601": "2021-12-29T03:03:15.366006Z",
"url": "https://files.pythonhosted.org/packages/57/4a/1ad8954084a75e308d978bb0ef95b61d29c84f8b4a4fbc0a687b62922789/mutf8-1.0.6-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d68ca5186e0116f2107856ea71babb5e9997cd5d717b952cf02a5cf1647aff2a",
"md5": "aef180ef35a7a3b9a4321028fc322dc5",
"sha256": "4f7a24b55c53d508a7ecb2e8c6fe14e4fcefaa4c48100b446e73217ade7875a0"
},
"downloads": -1,
"filename": "mutf8-1.0.6-cp39-cp39-win_amd64.whl",
"has_sig": false,
"md5_digest": "aef180ef35a7a3b9a4321028fc322dc5",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 11423,
"upload_time": "2021-12-29T03:03:55",
"upload_time_iso_8601": "2021-12-29T03:03:55.083469Z",
"url": "https://files.pythonhosted.org/packages/d6/8c/a5186e0116f2107856ea71babb5e9997cd5d717b952cf02a5cf1647aff2a/mutf8-1.0.6-cp39-cp39-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ca313c57313757b3a47dcf32d2a9bad55d913b797efc8814db31bed8a7142396",
"md5": "0a49ae9ae414a188a67fa7ac6597363a",
"sha256": "1bbbefb67c2e5a57104750bb04b0912200b57b2fa9841be245279e83859cb346"
},
"downloads": -1,
"filename": "mutf8-1.0.6.tar.gz",
"has_sig": false,
"md5_digest": "0a49ae9ae414a188a67fa7ac6597363a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6424,
"upload_time": "2021-12-29T03:02:17",
"upload_time_iso_8601": "2021-12-29T03:02:17.271828Z",
"url": "https://files.pythonhosted.org/packages/ca/31/3c57313757b3a47dcf32d2a9bad55d913b797efc8814db31bed8a7142396/mutf8-1.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-12-29 03:02:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "TkTech",
"github_project": "mutf8",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mutf8"
}