# unihan-etl · [![Python Package](https://img.shields.io/pypi/v/unihan-etl.svg)](https://pypi.org/project/unihan-etl/) [![License](https://img.shields.io/github/license/cihai/unihan-etl.svg)](https://github.com/cihai/unihan-etl/blob/master/LICENSE) [![Code Coverage](https://codecov.io/gh/cihai/unihan-etl/branch/master/graph/badge.svg)](https://codecov.io/gh/cihai/unihan-etl)
An [ETL](http://www.unicode.org/charts/unihan.html) tool for the Unicode Han Unification ([UNIHAN](http://www.unicode.org/charts/unihan.html)) database releases. unihan-etl is designed to fetch (download), unpack (unzip), and convert the database from the Unicode website into either a flattened, tabular format or a structured, hierarchical format.
unihan-etl serves dual purposes: as a Python library offering an [API](https://unihan-etl.git-pull.com/en/latest/) for accessing data as Python objects, and as a command-line interface ([CLI](https://unihan-etl.git-pull.com/en/latest/cli.html)) for exporting data into CSV, JSON, or YAML formats.
This tool is a component of the [cihai](https://cihai.git-pull.com) suite of CJK related projects. For a similar tool, see [libUnihan](http://libunihan.sourceforge.net/).
As of v0.31.0, unihan-etl is compatible with UNIHAN Version 15.1.0 ([released on 2023-09-01, revision 35](https://www.unicode.org/reports/tr38/tr38-35.html#History)).
## The UNIHAN database
The [UNIHAN](http://www.unicode.org/charts/unihan.html) database organizes data across multiple files, exemplified below:
```tsv
U+3400 kCantonese jau1
U+3400 kDefinition (same as U+4E18 丘) hillock or mound
U+3400 kMandarin qiū
U+3401 kCantonese tim2
U+3401 kDefinition to lick; to taste, a mat, bamboo bark
U+3401 kHanyuPinyin 10019.020:tiàn
U+3401 kMandarin tiàn
```
Values vary in shape and structure depending on their field type.
[kHanyuPinyin](http://www.unicode.org/reports/tr38/#kHanyuPinyin) maps Unicode codepoints to
[Hànyǔ Dà Zìdiǎn](https://en.wikipedia.org/wiki/Hanyu_Da_Zidian), where `10019.020:tiàn` represents
an entry. Complicating it further, more variations:
```tsv
U+5EFE kHanyuPinyin 10513.110,10514.010,10514.020:gǒng
U+5364 kHanyuPinyin 10093.130:xī,lǔ 74609.020:lǔ,xī
```
_kHanyuPinyin_ supports multiple entries delimited by spaces. ":" (colon) separate locations in the
work from pinyin readings. "," (comma) separate multiple entries/readings. This is just one of 90
fields contained in the database.
[etl]: https://en.wikipedia.org/wiki/Extract,_transform,_load
## Tabular, "Flat" output
### CSV (default)
```console
$ unihan-etl
```
```csv
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn
```
With `$ unihan-etl -F yaml --no-expand`:
```yaml
- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401
```
To preview in the CLI, try [tabview](https://github.com/TabViewer/tabview) or
[csvlens](https://github.com/YS-L/csvlens).
### JSON
```console
$ unihan-etl -F json --no-expand
```
```json
[
{
"char": "㐀",
"ucn": "U+3400",
"kDefinition": "(same as U+4E18 丘) hillock or mound",
"kCantonese": "jau1",
"kHanyuPinyin": null,
"kMandarin": "qiū"
},
{
"char": "㐁",
"ucn": "U+3401",
"kDefinition": "to lick; to taste, a mat, bamboo bark",
"kCantonese": "tim2",
"kHanyuPinyin": "10019.020:tiàn",
"kMandarin": "tiàn"
}
]
```
Tools:
- View in CLI: [python-fx](https://github.com/cielong/pyfx),
[jless](https://github.com/PaulJuliusMartinez/jless) or
[fx](https://github.com/antonmedv/fx).
- Filter via CLI: [jq](https://github.com/stedolan/jq),
[jql](https://github.com/yamafaktory/jql),
[gojq](https://github.com/itchyny/gojq).
### YAML
```console
$ unihan-etl -F yaml --no-expand
```
```yaml
- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401
```
Filter via the CLI with [yq](https://github.com/mikefarah/yq).
## "Structured" output
Codepoints can pack a lot more detail, unihan-etl carefully extracts these values in a uniform
manner. Empty values are pruned.
To make this possible, unihan-etl exports to JSON, YAML, and python list/dicts.
<div class="admonition">
Why not CSV?
Unfortunately, CSV is only suitable for storing table-like information. File formats such as JSON
and YAML accept key-values and hierarchical entries.
</div>
### JSON
```console
$ unihan-etl -F json
```
```json
[
{
"char": "㐀",
"ucn": "U+3400",
"kDefinition": ["(same as U+4E18 丘) hillock or mound"],
"kCantonese": ["jau1"],
"kMandarin": {
"zh-Hans": "qiū",
"zh-Hant": "qiū"
}
},
{
"char": "㐁",
"ucn": "U+3401",
"kDefinition": ["to lick", "to taste, a mat, bamboo bark"],
"kCantonese": ["tim2"],
"kHanyuPinyin": [
{
"locations": [
{
"volume": 1,
"page": 19,
"character": 2,
"virtual": 0
}
],
"readings": ["tiàn"]
}
],
"kMandarin": {
"zh-Hans": "tiàn",
"zh-Hant": "tiàn"
}
}
]
```
### YAML
```console
$ unihan-etl -F yaml
```
```yaml
- char: 㐀
kCantonese:
- jau1
kDefinition:
- (same as U+4E18 丘) hillock or mound
kMandarin:
zh-Hans: qiū
zh-Hant: qiū
ucn: U+3400
- char: 㐁
kCantonese:
- tim2
kDefinition:
- to lick
- to taste, a mat, bamboo bark
kHanyuPinyin:
- locations:
- character: 2
page: 19
virtual: 0
volume: 1
readings:
- tiàn
kMandarin:
zh-Hans: tiàn
zh-Hant: tiàn
ucn: U+3401
```
## Features
- automatically downloads UNIHAN from the internet
- strives for accuracy with the specifications described in
[UNIHAN's database design](http://www.unicode.org/reports/tr38/)
- export to JSON, CSV and YAML (requires [pyyaml](http://pyyaml.org/)) via `-F`
- configurable to export specific fields via `-f`
- accounts for encoding conflicts due to the Unicode-heavy content
- designed as a technical proof for future CJK (Chinese, Japanese, Korean) datasets
- core component and dependency of [cihai](https://cihai.git-pull.com), a CJK library
- [data package](http://frictionlessdata.io/data-packages/) support
- expansion of multi-value delimited fields in YAML, JSON and python dictionaries
- supports >= 3.7 and pypy
If you encounter a problem or have a question, please
[create an issue](https://github.com/cihai/unihan-etl/issues/new).
## Installation
To download and build your own UNIHAN export:
```console
$ pip install --user unihan-etl
```
or by [pipx](https://pypa.github.io/pipx/docs/):
```console
$ pipx install unihan-etl
```
### Developmental releases
[pip](https://pip.pypa.io/en/stable/):
```console
$ pip install --user --upgrade --pre unihan-etl
```
[pipx](https://pypa.github.io/pipx/docs/):
```console
$ pipx install --suffix=@next 'unihan-etl' --pip-args '\--pre' --force
// Usage: unihan-etl@next load yoursession
```
## Usage
`unihan-etl` offers customizable builds via its command line arguments.
See [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for information
on how you can specify columns, files, download URL's, and output destination.
To output CSV, the default format:
```console
$ unihan-etl
```
To output JSON:
```console
$ unihan-etl -F json
```
To output YAML:
```console
$ pip install --user pyyaml
$ unihan-etl -F yaml
```
To only output the kDefinition field in a csv:
```console
$ unihan-etl -f kDefinition
```
To output multiple fields, separate with spaces:
```console
$ unihan-etl -f kCantonese kDefinition
```
To output to a custom file:
```console
$ unihan-etl --destination ./exported.csv
```
To output to a custom file (templated file extension):
```console
$ unihan-etl --destination ./exported.{ext}
```
See [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for advanced
usage examples.
## Code layout
```console
# cache dir (Unihan.zip is downloaded, contents extracted)
{XDG cache dir}/unihan_etl/
# output dir
{XDG data dir}/unihan_etl/
unihan.json
unihan.csv
unihan.yaml # (requires pyyaml)
# package dir
unihan_etl/
core.py # argparse, download, extract, transform UNIHAN's data
options.py # configuration object
constants.py # immutable data vars (field to filename mappings, etc)
expansion.py # extracting details baked inside of fields
types.py # type annotations
util.py # utility / helper functions
# test suite
tests/*
```
## API
The package is python underneath the hood, you can utilize its full [API].
Example:
```python
>>> from unihan_etl.core import Packager
>>> pkgr = Packager()
>>> hasattr(pkgr.options, 'destination')
True
```
[API]: https://unihan-etl.git-pull.com/en/latest/api.html
## Developing
```console
$ git clone https://github.com/cihai/unihan-etl.git
```
```console
$ cd unihan-etl
```
[Bootstrap your environment and learn more about contributing](https://cihai.git-pull.com/contributing/). We use the same conventions / tools across all cihai projects: `pytest`, `sphinx`, `mypy`, `ruff`, `tmuxp`, and file watcher helpers (e.g. `entr(1)`).
## More information
[![Docs](https://github.com/cihai/unihan-etl/workflows/docs/badge.svg)](https://unihan-etl.git-pull.com/)
[![Build Status](https://github.com/cihai/unihan-etl/workflows/tests/badge.svg)](https://github.com/cihai/unihan-etl/actions?query=workflow%3A%22tests%22)
Raw data
{
"_id": null,
"home_page": "https://unihan-etl.git-pull.com",
"name": "unihan-etl",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "unihan, unicode, cjk, yaml, json, chinese, japanese, korean, hanzi, dictionary, dataset",
"author": "Tony Narlock",
"author_email": "tony@git-pull.com",
"download_url": "https://files.pythonhosted.org/packages/6b/0a/840afb05bdbb341bc672eba9fb5da78a0a55f7f5995eff3662493927bc53/unihan_etl-0.34.0.tar.gz",
"platform": null,
"description": "# unihan-etl · [![Python Package](https://img.shields.io/pypi/v/unihan-etl.svg)](https://pypi.org/project/unihan-etl/) [![License](https://img.shields.io/github/license/cihai/unihan-etl.svg)](https://github.com/cihai/unihan-etl/blob/master/LICENSE) [![Code Coverage](https://codecov.io/gh/cihai/unihan-etl/branch/master/graph/badge.svg)](https://codecov.io/gh/cihai/unihan-etl)\n\nAn [ETL](http://www.unicode.org/charts/unihan.html) tool for the Unicode Han Unification ([UNIHAN](http://www.unicode.org/charts/unihan.html)) database releases. unihan-etl is designed to fetch (download), unpack (unzip), and convert the database from the Unicode website into either a flattened, tabular format or a structured, hierarchical format.\n\nunihan-etl serves dual purposes: as a Python library offering an [API](https://unihan-etl.git-pull.com/en/latest/) for accessing data as Python objects, and as a command-line interface ([CLI](https://unihan-etl.git-pull.com/en/latest/cli.html)) for exporting data into CSV, JSON, or YAML formats.\n\nThis tool is a component of the [cihai](https://cihai.git-pull.com) suite of CJK related projects. For a similar tool, see [libUnihan](http://libunihan.sourceforge.net/).\n\nAs of v0.31.0, unihan-etl is compatible with UNIHAN Version 15.1.0 ([released on 2023-09-01, revision 35](https://www.unicode.org/reports/tr38/tr38-35.html#History)).\n\n## The UNIHAN database\n\nThe [UNIHAN](http://www.unicode.org/charts/unihan.html) database organizes data across multiple files, exemplified below:\n\n```tsv\nU+3400\tkCantonese\t\tjau1\nU+3400\tkDefinition\t\t(same as U+4E18 \u4e18) hillock or mound\nU+3400\tkMandarin\t\tqi\u016b\nU+3401\tkCantonese\t\ttim2\nU+3401\tkDefinition\t\tto lick; to taste, a mat, bamboo bark\nU+3401\tkHanyuPinyin\t\t10019.020:ti\u00e0n\nU+3401\tkMandarin\t\tti\u00e0n\n```\n\nValues vary in shape and structure depending on their field type.\n[kHanyuPinyin](http://www.unicode.org/reports/tr38/#kHanyuPinyin) maps Unicode codepoints to\n[H\u00e0ny\u01d4 D\u00e0 Z\u00ecdi\u01cen](https://en.wikipedia.org/wiki/Hanyu_Da_Zidian), where `10019.020:ti\u00e0n` represents\nan entry. Complicating it further, more variations:\n\n```tsv\nU+5EFE\tkHanyuPinyin\t\t10513.110,10514.010,10514.020:g\u01d2ng\nU+5364\tkHanyuPinyin\t\t10093.130:x\u012b,l\u01d4 74609.020:l\u01d4,x\u012b\n```\n\n_kHanyuPinyin_ supports multiple entries delimited by spaces. \":\" (colon) separate locations in the\nwork from pinyin readings. \",\" (comma) separate multiple entries/readings. This is just one of 90\nfields contained in the database.\n\n[etl]: https://en.wikipedia.org/wiki/Extract,_transform,_load\n\n## Tabular, \"Flat\" output\n\n### CSV (default)\n\n```console\n$ unihan-etl\n```\n\n```csv\nchar,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin\n\u3400,U+3400,jau1,(same as U+4E18 \u4e18) hillock or mound,,qi\u016b\n\u3401,U+3401,tim2,\"to lick; to taste, a mat, bamboo bark\",10019.020:ti\u00e0n,ti\u00e0n\n```\n\nWith `$ unihan-etl -F yaml --no-expand`:\n\n```yaml\n- char: \u3400\n kCantonese: jau1\n kDefinition: (same as U+4E18 \u4e18) hillock or mound\n kHanyuPinyin: null\n kMandarin: qi\u016b\n ucn: U+3400\n- char: \u3401\n kCantonese: tim2\n kDefinition: to lick; to taste, a mat, bamboo bark\n kHanyuPinyin: 10019.020:ti\u00e0n\n kMandarin: ti\u00e0n\n ucn: U+3401\n```\n\nTo preview in the CLI, try [tabview](https://github.com/TabViewer/tabview) or\n[csvlens](https://github.com/YS-L/csvlens).\n\n### JSON\n\n```console\n$ unihan-etl -F json --no-expand\n```\n\n```json\n[\n {\n \"char\": \"\u3400\",\n \"ucn\": \"U+3400\",\n \"kDefinition\": \"(same as U+4E18 \u4e18) hillock or mound\",\n \"kCantonese\": \"jau1\",\n \"kHanyuPinyin\": null,\n \"kMandarin\": \"qi\u016b\"\n },\n {\n \"char\": \"\u3401\",\n \"ucn\": \"U+3401\",\n \"kDefinition\": \"to lick; to taste, a mat, bamboo bark\",\n \"kCantonese\": \"tim2\",\n \"kHanyuPinyin\": \"10019.020:ti\u00e0n\",\n \"kMandarin\": \"ti\u00e0n\"\n }\n]\n```\n\nTools:\n\n- View in CLI: [python-fx](https://github.com/cielong/pyfx),\n [jless](https://github.com/PaulJuliusMartinez/jless) or\n [fx](https://github.com/antonmedv/fx).\n- Filter via CLI: [jq](https://github.com/stedolan/jq),\n [jql](https://github.com/yamafaktory/jql),\n [gojq](https://github.com/itchyny/gojq).\n\n### YAML\n\n```console\n$ unihan-etl -F yaml --no-expand\n```\n\n```yaml\n- char: \u3400\n kCantonese: jau1\n kDefinition: (same as U+4E18 \u4e18) hillock or mound\n kHanyuPinyin: null\n kMandarin: qi\u016b\n ucn: U+3400\n- char: \u3401\n kCantonese: tim2\n kDefinition: to lick; to taste, a mat, bamboo bark\n kHanyuPinyin: 10019.020:ti\u00e0n\n kMandarin: ti\u00e0n\n ucn: U+3401\n```\n\nFilter via the CLI with [yq](https://github.com/mikefarah/yq).\n\n## \"Structured\" output\n\nCodepoints can pack a lot more detail, unihan-etl carefully extracts these values in a uniform\nmanner. Empty values are pruned.\n\nTo make this possible, unihan-etl exports to JSON, YAML, and python list/dicts.\n\n<div class=\"admonition\">\n\nWhy not CSV?\n\nUnfortunately, CSV is only suitable for storing table-like information. File formats such as JSON\nand YAML accept key-values and hierarchical entries.\n\n</div>\n\n### JSON\n\n```console\n$ unihan-etl -F json\n```\n\n```json\n[\n {\n \"char\": \"\u3400\",\n \"ucn\": \"U+3400\",\n \"kDefinition\": [\"(same as U+4E18 \u4e18) hillock or mound\"],\n \"kCantonese\": [\"jau1\"],\n \"kMandarin\": {\n \"zh-Hans\": \"qi\u016b\",\n \"zh-Hant\": \"qi\u016b\"\n }\n },\n {\n \"char\": \"\u3401\",\n \"ucn\": \"U+3401\",\n \"kDefinition\": [\"to lick\", \"to taste, a mat, bamboo bark\"],\n \"kCantonese\": [\"tim2\"],\n \"kHanyuPinyin\": [\n {\n \"locations\": [\n {\n \"volume\": 1,\n \"page\": 19,\n \"character\": 2,\n \"virtual\": 0\n }\n ],\n \"readings\": [\"ti\u00e0n\"]\n }\n ],\n \"kMandarin\": {\n \"zh-Hans\": \"ti\u00e0n\",\n \"zh-Hant\": \"ti\u00e0n\"\n }\n }\n]\n```\n\n### YAML\n\n```console\n$ unihan-etl -F yaml\n```\n\n```yaml\n- char: \u3400\n kCantonese:\n - jau1\n kDefinition:\n - (same as U+4E18 \u4e18) hillock or mound\n kMandarin:\n zh-Hans: qi\u016b\n zh-Hant: qi\u016b\n ucn: U+3400\n- char: \u3401\n kCantonese:\n - tim2\n kDefinition:\n - to lick\n - to taste, a mat, bamboo bark\n kHanyuPinyin:\n - locations:\n - character: 2\n page: 19\n virtual: 0\n volume: 1\n readings:\n - ti\u00e0n\n kMandarin:\n zh-Hans: ti\u00e0n\n zh-Hant: ti\u00e0n\n ucn: U+3401\n```\n\n## Features\n\n- automatically downloads UNIHAN from the internet\n- strives for accuracy with the specifications described in\n [UNIHAN's database design](http://www.unicode.org/reports/tr38/)\n- export to JSON, CSV and YAML (requires [pyyaml](http://pyyaml.org/)) via `-F`\n- configurable to export specific fields via `-f`\n- accounts for encoding conflicts due to the Unicode-heavy content\n- designed as a technical proof for future CJK (Chinese, Japanese, Korean) datasets\n- core component and dependency of [cihai](https://cihai.git-pull.com), a CJK library\n- [data package](http://frictionlessdata.io/data-packages/) support\n- expansion of multi-value delimited fields in YAML, JSON and python dictionaries\n- supports >= 3.7 and pypy\n\nIf you encounter a problem or have a question, please\n[create an issue](https://github.com/cihai/unihan-etl/issues/new).\n\n## Installation\n\nTo download and build your own UNIHAN export:\n\n```console\n$ pip install --user unihan-etl\n```\n\nor by [pipx](https://pypa.github.io/pipx/docs/):\n\n```console\n$ pipx install unihan-etl\n```\n\n### Developmental releases\n\n[pip](https://pip.pypa.io/en/stable/):\n\n```console\n$ pip install --user --upgrade --pre unihan-etl\n```\n\n[pipx](https://pypa.github.io/pipx/docs/):\n\n```console\n$ pipx install --suffix=@next 'unihan-etl' --pip-args '\\--pre' --force\n// Usage: unihan-etl@next load yoursession\n```\n\n## Usage\n\n`unihan-etl` offers customizable builds via its command line arguments.\n\nSee [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for information\non how you can specify columns, files, download URL's, and output destination.\n\nTo output CSV, the default format:\n\n```console\n$ unihan-etl\n```\n\nTo output JSON:\n\n```console\n$ unihan-etl -F json\n```\n\nTo output YAML:\n\n```console\n$ pip install --user pyyaml\n$ unihan-etl -F yaml\n```\n\nTo only output the kDefinition field in a csv:\n\n```console\n$ unihan-etl -f kDefinition\n```\n\nTo output multiple fields, separate with spaces:\n\n```console\n$ unihan-etl -f kCantonese kDefinition\n```\n\nTo output to a custom file:\n\n```console\n$ unihan-etl --destination ./exported.csv\n```\n\nTo output to a custom file (templated file extension):\n\n```console\n$ unihan-etl --destination ./exported.{ext}\n```\n\nSee [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for advanced\nusage examples.\n\n## Code layout\n\n```console\n# cache dir (Unihan.zip is downloaded, contents extracted)\n{XDG cache dir}/unihan_etl/\n\n# output dir\n{XDG data dir}/unihan_etl/\n unihan.json\n unihan.csv\n unihan.yaml # (requires pyyaml)\n\n# package dir\nunihan_etl/\n core.py # argparse, download, extract, transform UNIHAN's data\n options.py # configuration object\n constants.py # immutable data vars (field to filename mappings, etc)\n expansion.py # extracting details baked inside of fields\n types.py # type annotations\n util.py # utility / helper functions\n\n# test suite\ntests/*\n```\n\n## API\n\nThe package is python underneath the hood, you can utilize its full [API].\nExample:\n\n```python\n>>> from unihan_etl.core import Packager\n>>> pkgr = Packager()\n>>> hasattr(pkgr.options, 'destination')\nTrue\n```\n\n[API]: https://unihan-etl.git-pull.com/en/latest/api.html\n\n## Developing\n\n```console\n$ git clone https://github.com/cihai/unihan-etl.git\n```\n\n```console\n$ cd unihan-etl\n```\n\n[Bootstrap your environment and learn more about contributing](https://cihai.git-pull.com/contributing/). We use the same conventions / tools across all cihai projects: `pytest`, `sphinx`, `mypy`, `ruff`, `tmuxp`, and file watcher helpers (e.g. `entr(1)`).\n\n## More information\n\n[![Docs](https://github.com/cihai/unihan-etl/workflows/docs/badge.svg)](https://unihan-etl.git-pull.com/)\n[![Build Status](https://github.com/cihai/unihan-etl/workflows/tests/badge.svg)](https://github.com/cihai/unihan-etl/actions?query=workflow%3A%22tests%22)\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Export UNIHAN data of Chinese, Japanese, Korean to CSV, JSON or YAML",
"version": "0.34.0",
"project_urls": {
"Bug Tracker": "https://github.com/cihai/unihan-etl/issues",
"Changes": "https://github.com/cihai/unihan-etl/blob/master/CHANGES",
"Documentation": "https://unihan-etl.git-pull.com",
"Homepage": "https://unihan-etl.git-pull.com",
"Repository": "https://github.com/cihai/unihan-etl"
},
"split_keywords": [
"unihan",
" unicode",
" cjk",
" yaml",
" json",
" chinese",
" japanese",
" korean",
" hanzi",
" dictionary",
" dataset"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4d9cae602992a46a9773a49deaf765b367d1655996fd565b8fa176fbdc040984",
"md5": "09fae33a3d83e1b90644532d2be7641f",
"sha256": "12c9d45f9697be86497e70189c4b833f406c1936d6e9e511ecbffd68d80648cd"
},
"downloads": -1,
"filename": "unihan_etl-0.34.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "09fae33a3d83e1b90644532d2be7641f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 58838,
"upload_time": "2024-03-24T17:22:36",
"upload_time_iso_8601": "2024-03-24T17:22:36.742951Z",
"url": "https://files.pythonhosted.org/packages/4d/9c/ae602992a46a9773a49deaf765b367d1655996fd565b8fa176fbdc040984/unihan_etl-0.34.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6b0a840afb05bdbb341bc672eba9fb5da78a0a55f7f5995eff3662493927bc53",
"md5": "8aa660825a656662a7c557c26987754e",
"sha256": "1a596f28982fc9ee172d50ed44b025c4bf4f7403bba2e14c8933571ea08fba21"
},
"downloads": -1,
"filename": "unihan_etl-0.34.0.tar.gz",
"has_sig": false,
"md5_digest": "8aa660825a656662a7c557c26987754e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 69407,
"upload_time": "2024-03-24T17:22:39",
"upload_time_iso_8601": "2024-03-24T17:22:39.100111Z",
"url": "https://files.pythonhosted.org/packages/6b/0a/840afb05bdbb341bc672eba9fb5da78a0a55f7f5995eff3662493927bc53/unihan_etl-0.34.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-24 17:22:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cihai",
"github_project": "unihan-etl",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "unihan-etl"
}