# Tools for Legacy Taiwan News Data
## User guide
### 1. Obtain some legacy news data in Taiwan
- The packed data contains some unreadable characters
- The packed data should be encoded in `big5` or `cp950`
- The packed data extension does not matter
- The packed data should look like this
```plaintext
?坼#ADte 102004/11/21\#AuID 590005\#AuNm 6全玉明\#GrpN 2B1\#ArID 17\#VerN 10\#Hdr1 4test\#Hdr2 0\#Hdr3 0\#ALno 13\#Word 16\#Spec 1N\#ALon 1Y\#Atth 0\#TyID 590005\#TyGR 2B1\#SDte 102004/11/21\#PbID 0?怕K?B
test
◆
?
```
### 2. Install our tools
- Using `pip`
```shell
pip install legacy-taiwan-news-data-tools
```
- Using `poetry`
```shell
poetry add legacy-taiwan-news-data-tools
```
### 3. Use our tools to process the packed data
- Decode and encode the packed data
```python
from legacy_taiwan_news_data_tools.decode import decode_dict
from legacy_taiwan_news_data_tools.encode import encode_json
from io import BytesIO
packed_data = b'\xc8\xa9\\#ADte 102004/11/21\\#AuID 590005\\#AuNm 6\xa5\xfe\xa5\xc9\xa9\xfa\\#GrpN 2B1\\#ArID 17\\#VerN 10\\#Hdr1 4test\\#Hdr2 0\\#Hdr3 0\\#ALno 13\\#Word 16\\#Spec 1N\\#ALon 1Y\\#Atth 0\\#TyID 590005\\#TyGR 2B1\\#SDte 102004/11/21\\#PbID 0\xc8\xa9\xc8K\xc8B\r\ntest\r\n\xa1@\r\n\xa1@\xa1\xbb\r\n\x00'
decoded_packed_data = decode_dict(BytesIO(packed_data))
# Check the decoded data
assert decoded_packed_data['ADte'] == '2004/11/21'
assert decoded_packed_data['PbID'] == ''
assert decoded_packed_data['Data'] == '\u3000\r\n\u3000◆\r\n'
# Encode the decoded data
encoded_packed_data = encode_json(decoded_packed_data)
# Check the encoded data
assert encoded_packed_data == b'{"ADte":"2004/11/21","AuID":"90005","AuNm":"\xe5\x85\xa8\xe7\x8e\x89\xe6\x98\x8e","GrpN":"B1","ArID":"7","VerN":"0","Hdr1":"test","Hdr2":"","Hdr3":"","ALno":"3","Word":"6","Spec":"N","ALon":"Y","Atth":"","TyID":"90005","TyGR":"B1","SDte":"2004/11/21","PbID":"","Data":"test\\r\\n\xe3\x80\x80\\r\\n\xe3\x80\x80\xe2\x97\x86\\r\\n"}'
```
Raw data
{
"_id": null,
"home_page": "https://github.com/AsherJingkongChen/legacy-taiwan-news-data-tools/blob/main/README.md",
"name": "legacy-taiwan-news-data-tools",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "data,decode,encode,language,news,parse",
"author": "AsherJingkongChen",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/5e/93/24f48ef530fdcffb6c0f95ece1e0591729815e4e44fc96635c6fadbc5373/legacy_taiwan_news_data_tools-0.1.0.tar.gz",
"platform": null,
"description": "# Tools for Legacy Taiwan News Data\n\n## User guide\n\n### 1. Obtain some legacy news data in Taiwan\n\n- The packed data contains some unreadable characters\n- The packed data should be encoded in `big5` or `cp950`\n- The packed data extension does not matter\n- The packed data should look like this\n\n```plaintext\n?\u577c#ADte 102004/11/21\\#AuID 590005\\#AuNm 6\u5168\u7389\u660e\\#GrpN 2B1\\#ArID 17\\#VerN 10\\#Hdr1 4test\\#Hdr2 0\\#Hdr3 0\\#ALno 13\\#Word 16\\#Spec 1N\\#ALon 1Y\\#Atth 0\\#TyID 590005\\#TyGR 2B1\\#SDte 102004/11/21\\#PbID 0?\u6015K?B\ntest\n\u3000\n\u3000\u25c6\n?\n```\n\n### 2. Install our tools\n\n- Using `pip`\n\n ```shell\n pip install legacy-taiwan-news-data-tools\n ```\n\n- Using `poetry`\n\n ```shell\n poetry add legacy-taiwan-news-data-tools\n ```\n\n### 3. Use our tools to process the packed data\n\n- Decode and encode the packed data\n\n ```python\n from legacy_taiwan_news_data_tools.decode import decode_dict\n from legacy_taiwan_news_data_tools.encode import encode_json\n from io import BytesIO\n\n packed_data = b'\\xc8\\xa9\\\\#ADte 102004/11/21\\\\#AuID 590005\\\\#AuNm 6\\xa5\\xfe\\xa5\\xc9\\xa9\\xfa\\\\#GrpN 2B1\\\\#ArID 17\\\\#VerN 10\\\\#Hdr1 4test\\\\#Hdr2 0\\\\#Hdr3 0\\\\#ALno 13\\\\#Word 16\\\\#Spec 1N\\\\#ALon 1Y\\\\#Atth 0\\\\#TyID 590005\\\\#TyGR 2B1\\\\#SDte 102004/11/21\\\\#PbID 0\\xc8\\xa9\\xc8K\\xc8B\\r\\ntest\\r\\n\\xa1@\\r\\n\\xa1@\\xa1\\xbb\\r\\n\\x00'\n\n decoded_packed_data = decode_dict(BytesIO(packed_data))\n\n # Check the decoded data\n assert decoded_packed_data['ADte'] == '2004/11/21'\n assert decoded_packed_data['PbID'] == ''\n assert decoded_packed_data['Data'] == '\\u3000\\r\\n\\u3000\u25c6\\r\\n'\n\n # Encode the decoded data\n encoded_packed_data = encode_json(decoded_packed_data)\n\n # Check the encoded data\n assert encoded_packed_data == b'{\"ADte\":\"2004/11/21\",\"AuID\":\"90005\",\"AuNm\":\"\\xe5\\x85\\xa8\\xe7\\x8e\\x89\\xe6\\x98\\x8e\",\"GrpN\":\"B1\",\"ArID\":\"7\",\"VerN\":\"0\",\"Hdr1\":\"test\",\"Hdr2\":\"\",\"Hdr3\":\"\",\"ALno\":\"3\",\"Word\":\"6\",\"Spec\":\"N\",\"ALon\":\"Y\",\"Atth\":\"\",\"TyID\":\"90005\",\"TyGR\":\"B1\",\"SDte\":\"2004/11/21\",\"PbID\":\"\",\"Data\":\"test\\\\r\\\\n\\xe3\\x80\\x80\\\\r\\\\n\\xe3\\x80\\x80\\xe2\\x97\\x86\\\\r\\\\n\"}'\n ```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Tools for legacy Taiwan news data",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/AsherJingkongChen/legacy-taiwan-news-data-tools/blob/main/README.md",
"Repository": "https://github.com/AsherJingkongChen/legacy-taiwan-news-data-tools.git"
},
"split_keywords": [
"data",
"decode",
"encode",
"language",
"news",
"parse"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7dfa154e85ae52b159dbaa0ed8572d7182d9027698b37c465169eb15e2a518aa",
"md5": "cd26346abfe9117be93188c9c8968168",
"sha256": "2887acbe947c6b83d7082fb285380ece1d1fb7a903eef2ab2cfe32c464f50812"
},
"downloads": -1,
"filename": "legacy_taiwan_news_data_tools-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cd26346abfe9117be93188c9c8968168",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 7377,
"upload_time": "2024-03-03T12:38:01",
"upload_time_iso_8601": "2024-03-03T12:38:01.557673Z",
"url": "https://files.pythonhosted.org/packages/7d/fa/154e85ae52b159dbaa0ed8572d7182d9027698b37c465169eb15e2a518aa/legacy_taiwan_news_data_tools-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5e9324f48ef530fdcffb6c0f95ece1e0591729815e4e44fc96635c6fadbc5373",
"md5": "ff318935b5e592cb9ba290f7131d62ad",
"sha256": "08b71e585a8c9f69088be98d868281fd52f09874d47734a9eb494aeae42ac8f0"
},
"downloads": -1,
"filename": "legacy_taiwan_news_data_tools-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "ff318935b5e592cb9ba290f7131d62ad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 5267,
"upload_time": "2024-03-03T12:38:03",
"upload_time_iso_8601": "2024-03-03T12:38:03.812022Z",
"url": "https://files.pythonhosted.org/packages/5e/93/24f48ef530fdcffb6c0f95ece1e0591729815e4e44fc96635c6fadbc5373/legacy_taiwan_news_data_tools-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-03 12:38:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AsherJingkongChen",
"github_project": "legacy-taiwan-news-data-tools",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "legacy-taiwan-news-data-tools"
}