legacy-taiwan-news-data-tools


Namelegacy-taiwan-news-data-tools JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/AsherJingkongChen/legacy-taiwan-news-data-tools/blob/main/README.md
SummaryTools for legacy Taiwan news data
upload_time2024-03-03 12:38:03
maintainer
docs_urlNone
authorAsherJingkongChen
requires_python>=3.8,<4.0
licenseMIT
keywords data decode encode language news parse
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tools for Legacy Taiwan News Data

## User guide

### 1. Obtain some legacy news data in Taiwan

- The packed data contains some unreadable characters
- The packed data should be encoded in `big5` or `cp950`
- The packed data extension does not matter
- The packed data should look like this

```plaintext
?坼#ADte   102004/11/21\#AuID    590005\#AuNm    6全玉明\#GrpN    2B1\#ArID    17\#VerN    10\#Hdr1    4test\#Hdr2    0\#Hdr3    0\#ALno    13\#Word    16\#Spec    1N\#ALon    1Y\#Atth    0\#TyID    590005\#TyGR    2B1\#SDte   102004/11/21\#PbID    0?怕K?B
test
 
 ◆
?
```

### 2. Install our tools

- Using `pip`

  ```shell
  pip install legacy-taiwan-news-data-tools
  ```

- Using `poetry`

  ```shell
  poetry add legacy-taiwan-news-data-tools
  ```

### 3. Use our tools to process the packed data

- Decode and encode the packed data

  ```python
  from legacy_taiwan_news_data_tools.decode import decode_dict
  from legacy_taiwan_news_data_tools.encode import encode_json
  from io import BytesIO

  packed_data = b'\xc8\xa9\\#ADte   102004/11/21\\#AuID    590005\\#AuNm    6\xa5\xfe\xa5\xc9\xa9\xfa\\#GrpN    2B1\\#ArID    17\\#VerN    10\\#Hdr1    4test\\#Hdr2    0\\#Hdr3    0\\#ALno    13\\#Word    16\\#Spec    1N\\#ALon    1Y\\#Atth    0\\#TyID    590005\\#TyGR    2B1\\#SDte   102004/11/21\\#PbID    0\xc8\xa9\xc8K\xc8B\r\ntest\r\n\xa1@\r\n\xa1@\xa1\xbb\r\n\x00'

  decoded_packed_data = decode_dict(BytesIO(packed_data))

  # Check the decoded data
  assert decoded_packed_data['ADte'] == '2004/11/21'
  assert decoded_packed_data['PbID'] == ''
  assert decoded_packed_data['Data'] == '\u3000\r\n\u3000◆\r\n'

  # Encode the decoded data
  encoded_packed_data = encode_json(decoded_packed_data)

  # Check the encoded data
  assert encoded_packed_data == b'{"ADte":"2004/11/21","AuID":"90005","AuNm":"\xe5\x85\xa8\xe7\x8e\x89\xe6\x98\x8e","GrpN":"B1","ArID":"7","VerN":"0","Hdr1":"test","Hdr2":"","Hdr3":"","ALno":"3","Word":"6","Spec":"N","ALon":"Y","Atth":"","TyID":"90005","TyGR":"B1","SDte":"2004/11/21","PbID":"","Data":"test\\r\\n\xe3\x80\x80\\r\\n\xe3\x80\x80\xe2\x97\x86\\r\\n"}'
  ```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AsherJingkongChen/legacy-taiwan-news-data-tools/blob/main/README.md",
    "name": "legacy-taiwan-news-data-tools",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "data,decode,encode,language,news,parse",
    "author": "AsherJingkongChen",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/5e/93/24f48ef530fdcffb6c0f95ece1e0591729815e4e44fc96635c6fadbc5373/legacy_taiwan_news_data_tools-0.1.0.tar.gz",
    "platform": null,
    "description": "# Tools for Legacy Taiwan News Data\n\n## User guide\n\n### 1. Obtain some legacy news data in Taiwan\n\n- The packed data contains some unreadable characters\n- The packed data should be encoded in `big5` or `cp950`\n- The packed data extension does not matter\n- The packed data should look like this\n\n```plaintext\n?\u577c#ADte   102004/11/21\\#AuID    590005\\#AuNm    6\u5168\u7389\u660e\\#GrpN    2B1\\#ArID    17\\#VerN    10\\#Hdr1    4test\\#Hdr2    0\\#Hdr3    0\\#ALno    13\\#Word    16\\#Spec    1N\\#ALon    1Y\\#Atth    0\\#TyID    590005\\#TyGR    2B1\\#SDte   102004/11/21\\#PbID    0?\u6015K?B\ntest\n\u3000\n\u3000\u25c6\n?\n```\n\n### 2. Install our tools\n\n- Using `pip`\n\n  ```shell\n  pip install legacy-taiwan-news-data-tools\n  ```\n\n- Using `poetry`\n\n  ```shell\n  poetry add legacy-taiwan-news-data-tools\n  ```\n\n### 3. Use our tools to process the packed data\n\n- Decode and encode the packed data\n\n  ```python\n  from legacy_taiwan_news_data_tools.decode import decode_dict\n  from legacy_taiwan_news_data_tools.encode import encode_json\n  from io import BytesIO\n\n  packed_data = b'\\xc8\\xa9\\\\#ADte   102004/11/21\\\\#AuID    590005\\\\#AuNm    6\\xa5\\xfe\\xa5\\xc9\\xa9\\xfa\\\\#GrpN    2B1\\\\#ArID    17\\\\#VerN    10\\\\#Hdr1    4test\\\\#Hdr2    0\\\\#Hdr3    0\\\\#ALno    13\\\\#Word    16\\\\#Spec    1N\\\\#ALon    1Y\\\\#Atth    0\\\\#TyID    590005\\\\#TyGR    2B1\\\\#SDte   102004/11/21\\\\#PbID    0\\xc8\\xa9\\xc8K\\xc8B\\r\\ntest\\r\\n\\xa1@\\r\\n\\xa1@\\xa1\\xbb\\r\\n\\x00'\n\n  decoded_packed_data = decode_dict(BytesIO(packed_data))\n\n  # Check the decoded data\n  assert decoded_packed_data['ADte'] == '2004/11/21'\n  assert decoded_packed_data['PbID'] == ''\n  assert decoded_packed_data['Data'] == '\\u3000\\r\\n\\u3000\u25c6\\r\\n'\n\n  # Encode the decoded data\n  encoded_packed_data = encode_json(decoded_packed_data)\n\n  # Check the encoded data\n  assert encoded_packed_data == b'{\"ADte\":\"2004/11/21\",\"AuID\":\"90005\",\"AuNm\":\"\\xe5\\x85\\xa8\\xe7\\x8e\\x89\\xe6\\x98\\x8e\",\"GrpN\":\"B1\",\"ArID\":\"7\",\"VerN\":\"0\",\"Hdr1\":\"test\",\"Hdr2\":\"\",\"Hdr3\":\"\",\"ALno\":\"3\",\"Word\":\"6\",\"Spec\":\"N\",\"ALon\":\"Y\",\"Atth\":\"\",\"TyID\":\"90005\",\"TyGR\":\"B1\",\"SDte\":\"2004/11/21\",\"PbID\":\"\",\"Data\":\"test\\\\r\\\\n\\xe3\\x80\\x80\\\\r\\\\n\\xe3\\x80\\x80\\xe2\\x97\\x86\\\\r\\\\n\"}'\n  ```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Tools for legacy Taiwan news data",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/AsherJingkongChen/legacy-taiwan-news-data-tools/blob/main/README.md",
        "Repository": "https://github.com/AsherJingkongChen/legacy-taiwan-news-data-tools.git"
    },
    "split_keywords": [
        "data",
        "decode",
        "encode",
        "language",
        "news",
        "parse"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7dfa154e85ae52b159dbaa0ed8572d7182d9027698b37c465169eb15e2a518aa",
                "md5": "cd26346abfe9117be93188c9c8968168",
                "sha256": "2887acbe947c6b83d7082fb285380ece1d1fb7a903eef2ab2cfe32c464f50812"
            },
            "downloads": -1,
            "filename": "legacy_taiwan_news_data_tools-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cd26346abfe9117be93188c9c8968168",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 7377,
            "upload_time": "2024-03-03T12:38:01",
            "upload_time_iso_8601": "2024-03-03T12:38:01.557673Z",
            "url": "https://files.pythonhosted.org/packages/7d/fa/154e85ae52b159dbaa0ed8572d7182d9027698b37c465169eb15e2a518aa/legacy_taiwan_news_data_tools-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5e9324f48ef530fdcffb6c0f95ece1e0591729815e4e44fc96635c6fadbc5373",
                "md5": "ff318935b5e592cb9ba290f7131d62ad",
                "sha256": "08b71e585a8c9f69088be98d868281fd52f09874d47734a9eb494aeae42ac8f0"
            },
            "downloads": -1,
            "filename": "legacy_taiwan_news_data_tools-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ff318935b5e592cb9ba290f7131d62ad",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 5267,
            "upload_time": "2024-03-03T12:38:03",
            "upload_time_iso_8601": "2024-03-03T12:38:03.812022Z",
            "url": "https://files.pythonhosted.org/packages/5e/93/24f48ef530fdcffb6c0f95ece1e0591729815e4e44fc96635c6fadbc5373/legacy_taiwan_news_data_tools-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-03 12:38:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AsherJingkongChen",
    "github_project": "legacy-taiwan-news-data-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "legacy-taiwan-news-data-tools"
}
        
Elapsed time: 0.19448s