Have you ever seen this?
***UnicodeEncodeError: 'XXXXX' codec can't encode character 'XXXXX' in position 15: ordinal ...***
Probably more than once, right? :) After having spent too much time on finding the right codecs for files, I wrote [BruteCodecChecker](https://github.com/hansalemaos/BruteCodecChecker). BruteCodecChecker (MIT) opens a file in all codecs available in your environment and prints the results. It also works for byte objects.
If you work, like me, with a lot of text files, it will save you a lot of time.
### Install it:
```python
pip install BruteCodecChecker
```
### Try it:
```python
from BruteCodecChecker import CodecChecker
teststuff = b"""This is a test!
Hi there!
A little test! """
testfilename = "test_utf8.tmp"
with open("test_utf8.tmp", mode="w", encoding="utf-8-sig") as f:
f.write(teststuff.decode("utf-8-sig"))
codechecker = CodecChecker()
codechecker.try_open_file(testfilename, readlines=2).print_results(
pause_after_interval=1, items_per_interval=10
)
codechecker.try_open_file(testfilename).print_results()
codechecker.try_convert_bytes(teststuff.decode("cp850").encode()).print_results(
pause_after_interval=1, items_per_interval=10
)
```
### **Output**
```
Codec : palmos
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : ptcp154
Mode : strict
Length : 32
Converted :
Line: 0 п»ҝThis is a test!
Line: 1 Hi there!
Codec : punycode
Mode : strict
Codec : quopri_codec
Mode : strict
Codec : raw_unicode_escape
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : rot_13
Mode : strict
Codec : shift_jis
Mode : strict
Codec : shift_jisx0213
Mode : strict
Length : 31
Converted :
Line: 0 鬠ソThis is a test!
Line: 1 Hi there!
Codec : shift_jis_2004
Mode : strict
Length : 31
Converted :
Line: 0 鬠ソThis is a test!
Line: 1 Hi there!
Codec : tis_620
Mode : strict
Length : 32
Converted :
Line: 0 ๏ปฟThis is a test!
Line: 1 Hi there!
Codec : undefined
Mode : strict
Codec : unicode_escape
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : utf_16
Mode : strict
Codec : utf_16_be
Mode : strict
Codec : utf_16_le
Mode : strict
Codec : utf_32
Mode : strict
Codec : utf_32_be
Mode : strict
Codec : utf_32_le
Mode : strict
Codec : utf_7
Mode : strict
Codec : utf_8
Mode : strict
Length : 30
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : utf_8_sig
Mode : strict
Length : 29
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/BruteCodecChecker",
"name": "BruteCodecChecker",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "codecs,codec,utf,8,sig,16,le,ascii",
"author": "Johannes Fischer",
"author_email": "<aulasparticularesdealemaosp@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/14/e1/98fe9305f40b2982b8905d6c0cb9165ffe12d55e7dfbeedfa1ccb906d6e9/BruteCodecChecker-0.21.tar.gz",
"platform": null,
"description": "\nHave you ever seen this? \n\n\n\n***UnicodeEncodeError: 'XXXXX' codec can't encode character 'XXXXX' in position 15: ordinal ...***\n\n\n\nProbably more than once, right? :) After having spent too much time on finding the right codecs for files, I wrote [BruteCodecChecker](https://github.com/hansalemaos/BruteCodecChecker). BruteCodecChecker (MIT) opens a file in all codecs available in your environment and prints the results. It also works for byte objects. \n\n\n\nIf you work, like me, with a lot of text files, it will save you a lot of time.\n\n\n\n### Install it:\n\n\n\n```python\n\npip install BruteCodecChecker\n\n```\n\n\n\n### Try it:\n\n\n\n```python\n\nfrom BruteCodecChecker import CodecChecker\n\nteststuff = b\"\"\"This is a test! \n\nHi there!\n\nA little test! \"\"\"\n\ntestfilename = \"test_utf8.tmp\"\n\nwith open(\"test_utf8.tmp\", mode=\"w\", encoding=\"utf-8-sig\") as f:\n\n f.write(teststuff.decode(\"utf-8-sig\"))\n\ncodechecker = CodecChecker()\n\ncodechecker.try_open_file(testfilename, readlines=2).print_results(\n\n pause_after_interval=1, items_per_interval=10\n\n)\n\ncodechecker.try_open_file(testfilename).print_results()\n\ncodechecker.try_convert_bytes(teststuff.decode(\"cp850\").encode()).print_results(\n\n pause_after_interval=1, items_per_interval=10\n\n)\n\n\n\n\n\n```\n\n\n\n### **Output**\n\n\n\n```\n\nCodec : palmos \n\nMode : strict\n\nLength : 32\n\nConverted : \n\nLine: 0 \u00ef\u00bb\u00bfThis is a test! \n\nLine: 1 Hi there!\n\nCodec : ptcp154 \n\nMode : strict\n\nLength : 32\n\nConverted : \n\nLine: 0 \u043f\u00bb\u049dThis is a test! \n\nLine: 1 Hi there!\n\nCodec : punycode \n\nMode : strict\n\nCodec : quopri_codec \n\nMode : strict\n\nCodec : raw_unicode_escape \n\nMode : strict\n\nLength : 32\n\nConverted : \n\nLine: 0 \u00ef\u00bb\u00bfThis is a test! \n\nLine: 1 Hi there!\n\nCodec : rot_13 \n\nMode : strict\n\nCodec : shift_jis \n\nMode : strict\n\nCodec : shift_jisx0213 \n\nMode : strict\n\nLength : 31\n\nConverted : \n\nLine: 0 \u9b20\uff7fThis is a test! \n\nLine: 1 Hi there!\n\nCodec : shift_jis_2004 \n\nMode : strict\n\nLength : 31\n\nConverted : \n\nLine: 0 \u9b20\uff7fThis is a test! \n\nLine: 1 Hi there!\n\nCodec : tis_620 \n\nMode : strict\n\nLength : 32\n\nConverted : \n\nLine: 0 \u0e4f\u0e1b\u0e1fThis is a test! \n\nLine: 1 Hi there!\n\nCodec : undefined \n\nMode : strict\n\nCodec : unicode_escape \n\nMode : strict\n\nLength : 32\n\nConverted : \n\nLine: 0 \u00ef\u00bb\u00bfThis is a test! \n\nLine: 1 Hi there!\n\nCodec : utf_16 \n\nMode : strict\n\nCodec : utf_16_be \n\nMode : strict\n\nCodec : utf_16_le \n\nMode : strict\n\nCodec : utf_32 \n\nMode : strict\n\nCodec : utf_32_be \n\nMode : strict\n\nCodec : utf_32_le \n\nMode : strict\n\nCodec : utf_7 \n\nMode : strict\n\nCodec : utf_8 \n\nMode : strict\n\nLength : 30\n\nConverted : \n\nLine: 0 \ufeffThis is a test! \n\nLine: 1 Hi there!\n\nCodec : utf_8_sig \n\nMode : strict\n\nLength : 29\n\nConverted : \n\nLine: 0 This is a test! \n\nLine: 1 Hi there!\n\n```\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Read files in all available codes in your env, so that you can pick the one that fits best!",
"version": "0.21",
"split_keywords": [
"codecs",
"codec",
"utf",
"8",
"sig",
"16",
"le",
"ascii"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "cce05643bcc9ab68a7bd18d9a242ee7a",
"sha256": "7e25711794f0780c664d53d6eb619d7124d5a4ab26e4c18d999f4bbf13a447e4"
},
"downloads": -1,
"filename": "BruteCodecChecker-0.21-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cce05643bcc9ab68a7bd18d9a242ee7a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7616,
"upload_time": "2022-10-02T05:43:34",
"upload_time_iso_8601": "2022-10-02T05:43:34.037679Z",
"url": "https://files.pythonhosted.org/packages/dd/da/67bc2c7b822ec783af442dde623c6b0daae1c0099b13bba4c8167b313e82/BruteCodecChecker-0.21-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "0a004f5cf84a3d65ecf7605e4999ac39",
"sha256": "3023af65fbb433d525bcd4e5c93a87f34dafd2ffb035c7ebf9346e766baed2bc"
},
"downloads": -1,
"filename": "BruteCodecChecker-0.21.tar.gz",
"has_sig": false,
"md5_digest": "0a004f5cf84a3d65ecf7605e4999ac39",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6556,
"upload_time": "2022-10-02T05:43:35",
"upload_time_iso_8601": "2022-10-02T05:43:35.329748Z",
"url": "https://files.pythonhosted.org/packages/14/e1/98fe9305f40b2982b8905d6c0cb9165ffe12d55e7dfbeedfa1ccb906d6e9/BruteCodecChecker-0.21.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-10-02 05:43:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "hansalemaos",
"github_project": "BruteCodecChecker",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "brutecodecchecker"
}