unihan-etl


Nameunihan-etl JSON
Version 0.34.0 PyPI version JSON
download
home_pagehttps://unihan-etl.git-pull.com
SummaryExport UNIHAN data of Chinese, Japanese, Korean to CSV, JSON or YAML
upload_time2024-03-24 17:22:39
maintainerNone
docs_urlNone
authorTony Narlock
requires_python<4.0,>=3.8
licenseMIT
keywords unihan unicode cjk yaml json chinese japanese korean hanzi dictionary dataset
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # unihan-etl &middot; [![Python Package](https://img.shields.io/pypi/v/unihan-etl.svg)](https://pypi.org/project/unihan-etl/) [![License](https://img.shields.io/github/license/cihai/unihan-etl.svg)](https://github.com/cihai/unihan-etl/blob/master/LICENSE) [![Code Coverage](https://codecov.io/gh/cihai/unihan-etl/branch/master/graph/badge.svg)](https://codecov.io/gh/cihai/unihan-etl)

An [ETL](http://www.unicode.org/charts/unihan.html) tool for the Unicode Han Unification ([UNIHAN](http://www.unicode.org/charts/unihan.html)) database releases. unihan-etl is designed to fetch (download), unpack (unzip), and convert the database from the Unicode website into either a flattened, tabular format or a structured, hierarchical format.

unihan-etl serves dual purposes: as a Python library offering an [API](https://unihan-etl.git-pull.com/en/latest/) for accessing data as Python objects, and as a command-line interface ([CLI](https://unihan-etl.git-pull.com/en/latest/cli.html)) for exporting data into CSV, JSON, or YAML formats.

This tool is a component of the [cihai](https://cihai.git-pull.com) suite of CJK related projects. For a similar tool, see [libUnihan](http://libunihan.sourceforge.net/).

As of v0.31.0, unihan-etl is compatible with UNIHAN Version 15.1.0 ([released on 2023-09-01, revision 35](https://www.unicode.org/reports/tr38/tr38-35.html#History)).

## The UNIHAN database

The [UNIHAN](http://www.unicode.org/charts/unihan.html) database organizes data across multiple files, exemplified below:

```tsv
U+3400	kCantonese		jau1
U+3400	kDefinition		(same as U+4E18 丘) hillock or mound
U+3400	kMandarin		qiū
U+3401	kCantonese		tim2
U+3401	kDefinition		to lick; to taste, a mat, bamboo bark
U+3401	kHanyuPinyin		10019.020:tiàn
U+3401	kMandarin		tiàn
```

Values vary in shape and structure depending on their field type.
[kHanyuPinyin](http://www.unicode.org/reports/tr38/#kHanyuPinyin) maps Unicode codepoints to
[Hànyǔ Dà Zìdiǎn](https://en.wikipedia.org/wiki/Hanyu_Da_Zidian), where `10019.020:tiàn` represents
an entry. Complicating it further, more variations:

```tsv
U+5EFE	kHanyuPinyin		10513.110,10514.010,10514.020:gǒng
U+5364	kHanyuPinyin		10093.130:xī,lǔ 74609.020:lǔ,xī
```

_kHanyuPinyin_ supports multiple entries delimited by spaces. ":" (colon) separate locations in the
work from pinyin readings. "," (comma) separate multiple entries/readings. This is just one of 90
fields contained in the database.

[etl]: https://en.wikipedia.org/wiki/Extract,_transform,_load

## Tabular, "Flat" output

### CSV (default)

```console
$ unihan-etl
```

```csv
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn
```

With `$ unihan-etl -F yaml --no-expand`:

```yaml
- char: 㐀
  kCantonese: jau1
  kDefinition: (same as U+4E18 丘) hillock or mound
  kHanyuPinyin: null
  kMandarin: qiū
  ucn: U+3400
- char: 㐁
  kCantonese: tim2
  kDefinition: to lick; to taste, a mat, bamboo bark
  kHanyuPinyin: 10019.020:tiàn
  kMandarin: tiàn
  ucn: U+3401
```

To preview in the CLI, try [tabview](https://github.com/TabViewer/tabview) or
[csvlens](https://github.com/YS-L/csvlens).

### JSON

```console
$ unihan-etl -F json --no-expand
```

```json
[
  {
    "char": "㐀",
    "ucn": "U+3400",
    "kDefinition": "(same as U+4E18 丘) hillock or mound",
    "kCantonese": "jau1",
    "kHanyuPinyin": null,
    "kMandarin": "qiū"
  },
  {
    "char": "㐁",
    "ucn": "U+3401",
    "kDefinition": "to lick; to taste, a mat, bamboo bark",
    "kCantonese": "tim2",
    "kHanyuPinyin": "10019.020:tiàn",
    "kMandarin": "tiàn"
  }
]
```

Tools:

- View in CLI: [python-fx](https://github.com/cielong/pyfx),
  [jless](https://github.com/PaulJuliusMartinez/jless) or
  [fx](https://github.com/antonmedv/fx).
- Filter via CLI: [jq](https://github.com/stedolan/jq),
  [jql](https://github.com/yamafaktory/jql),
  [gojq](https://github.com/itchyny/gojq).

### YAML

```console
$ unihan-etl -F yaml --no-expand
```

```yaml
- char: 㐀
  kCantonese: jau1
  kDefinition: (same as U+4E18 丘) hillock or mound
  kHanyuPinyin: null
  kMandarin: qiū
  ucn: U+3400
- char: 㐁
  kCantonese: tim2
  kDefinition: to lick; to taste, a mat, bamboo bark
  kHanyuPinyin: 10019.020:tiàn
  kMandarin: tiàn
  ucn: U+3401
```

Filter via the CLI with [yq](https://github.com/mikefarah/yq).

## "Structured" output

Codepoints can pack a lot more detail, unihan-etl carefully extracts these values in a uniform
manner. Empty values are pruned.

To make this possible, unihan-etl exports to JSON, YAML, and python list/dicts.

<div class="admonition">

Why not CSV?

Unfortunately, CSV is only suitable for storing table-like information. File formats such as JSON
and YAML accept key-values and hierarchical entries.

</div>

### JSON

```console
$ unihan-etl -F json
```

```json
[
  {
    "char": "㐀",
    "ucn": "U+3400",
    "kDefinition": ["(same as U+4E18 丘) hillock or mound"],
    "kCantonese": ["jau1"],
    "kMandarin": {
      "zh-Hans": "qiū",
      "zh-Hant": "qiū"
    }
  },
  {
    "char": "㐁",
    "ucn": "U+3401",
    "kDefinition": ["to lick", "to taste, a mat, bamboo bark"],
    "kCantonese": ["tim2"],
    "kHanyuPinyin": [
      {
        "locations": [
          {
            "volume": 1,
            "page": 19,
            "character": 2,
            "virtual": 0
          }
        ],
        "readings": ["tiàn"]
      }
    ],
    "kMandarin": {
      "zh-Hans": "tiàn",
      "zh-Hant": "tiàn"
    }
  }
]
```

### YAML

```console
$ unihan-etl -F yaml
```

```yaml
- char: 㐀
  kCantonese:
    - jau1
  kDefinition:
    - (same as U+4E18 丘) hillock or mound
  kMandarin:
    zh-Hans: qiū
    zh-Hant: qiū
  ucn: U+3400
- char: 㐁
  kCantonese:
    - tim2
  kDefinition:
    - to lick
    - to taste, a mat, bamboo bark
  kHanyuPinyin:
    - locations:
        - character: 2
          page: 19
          virtual: 0
          volume: 1
      readings:
        - tiàn
  kMandarin:
    zh-Hans: tiàn
    zh-Hant: tiàn
  ucn: U+3401
```

## Features

- automatically downloads UNIHAN from the internet
- strives for accuracy with the specifications described in
  [UNIHAN's database design](http://www.unicode.org/reports/tr38/)
- export to JSON, CSV and YAML (requires [pyyaml](http://pyyaml.org/)) via `-F`
- configurable to export specific fields via `-f`
- accounts for encoding conflicts due to the Unicode-heavy content
- designed as a technical proof for future CJK (Chinese, Japanese, Korean) datasets
- core component and dependency of [cihai](https://cihai.git-pull.com), a CJK library
- [data package](http://frictionlessdata.io/data-packages/) support
- expansion of multi-value delimited fields in YAML, JSON and python dictionaries
- supports >= 3.7 and pypy

If you encounter a problem or have a question, please
[create an issue](https://github.com/cihai/unihan-etl/issues/new).

## Installation

To download and build your own UNIHAN export:

```console
$ pip install --user unihan-etl
```

or by [pipx](https://pypa.github.io/pipx/docs/):

```console
$ pipx install unihan-etl
```

### Developmental releases

[pip](https://pip.pypa.io/en/stable/):

```console
$ pip install --user --upgrade --pre unihan-etl
```

[pipx](https://pypa.github.io/pipx/docs/):

```console
$ pipx install --suffix=@next 'unihan-etl' --pip-args '\--pre' --force
// Usage: unihan-etl@next load yoursession
```

## Usage

`unihan-etl` offers customizable builds via its command line arguments.

See [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for information
on how you can specify columns, files, download URL's, and output destination.

To output CSV, the default format:

```console
$ unihan-etl
```

To output JSON:

```console
$ unihan-etl -F json
```

To output YAML:

```console
$ pip install --user pyyaml
$ unihan-etl -F yaml
```

To only output the kDefinition field in a csv:

```console
$ unihan-etl -f kDefinition
```

To output multiple fields, separate with spaces:

```console
$ unihan-etl -f kCantonese kDefinition
```

To output to a custom file:

```console
$ unihan-etl --destination ./exported.csv
```

To output to a custom file (templated file extension):

```console
$ unihan-etl --destination ./exported.{ext}
```

See [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for advanced
usage examples.

## Code layout

```console
# cache dir (Unihan.zip is downloaded, contents extracted)
{XDG cache dir}/unihan_etl/

# output dir
{XDG data dir}/unihan_etl/
  unihan.json
  unihan.csv
  unihan.yaml   # (requires pyyaml)

# package dir
unihan_etl/
  core.py    # argparse, download, extract, transform UNIHAN's data
  options.py    # configuration object
  constants.py  # immutable data vars (field to filename mappings, etc)
  expansion.py  # extracting details baked inside of fields
  types.py      # type annotations
  util.py       # utility / helper functions

# test suite
tests/*
```

## API

The package is python underneath the hood, you can utilize its full [API].
Example:

```python
>>> from unihan_etl.core import Packager
>>> pkgr = Packager()
>>> hasattr(pkgr.options, 'destination')
True
```

[API]: https://unihan-etl.git-pull.com/en/latest/api.html

## Developing

```console
$ git clone https://github.com/cihai/unihan-etl.git
```

```console
$ cd unihan-etl
```

[Bootstrap your environment and learn more about contributing](https://cihai.git-pull.com/contributing/). We use the same conventions / tools across all cihai projects: `pytest`, `sphinx`, `mypy`, `ruff`, `tmuxp`, and file watcher helpers (e.g. `entr(1)`).

## More information

[![Docs](https://github.com/cihai/unihan-etl/workflows/docs/badge.svg)](https://unihan-etl.git-pull.com/)
[![Build Status](https://github.com/cihai/unihan-etl/workflows/tests/badge.svg)](https://github.com/cihai/unihan-etl/actions?query=workflow%3A%22tests%22)


            

Raw data

            {
    "_id": null,
    "home_page": "https://unihan-etl.git-pull.com",
    "name": "unihan-etl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "unihan, unicode, cjk, yaml, json, chinese, japanese, korean, hanzi, dictionary, dataset",
    "author": "Tony Narlock",
    "author_email": "tony@git-pull.com",
    "download_url": "https://files.pythonhosted.org/packages/6b/0a/840afb05bdbb341bc672eba9fb5da78a0a55f7f5995eff3662493927bc53/unihan_etl-0.34.0.tar.gz",
    "platform": null,
    "description": "# unihan-etl &middot; [![Python Package](https://img.shields.io/pypi/v/unihan-etl.svg)](https://pypi.org/project/unihan-etl/) [![License](https://img.shields.io/github/license/cihai/unihan-etl.svg)](https://github.com/cihai/unihan-etl/blob/master/LICENSE) [![Code Coverage](https://codecov.io/gh/cihai/unihan-etl/branch/master/graph/badge.svg)](https://codecov.io/gh/cihai/unihan-etl)\n\nAn [ETL](http://www.unicode.org/charts/unihan.html) tool for the Unicode Han Unification ([UNIHAN](http://www.unicode.org/charts/unihan.html)) database releases. unihan-etl is designed to fetch (download), unpack (unzip), and convert the database from the Unicode website into either a flattened, tabular format or a structured, hierarchical format.\n\nunihan-etl serves dual purposes: as a Python library offering an [API](https://unihan-etl.git-pull.com/en/latest/) for accessing data as Python objects, and as a command-line interface ([CLI](https://unihan-etl.git-pull.com/en/latest/cli.html)) for exporting data into CSV, JSON, or YAML formats.\n\nThis tool is a component of the [cihai](https://cihai.git-pull.com) suite of CJK related projects. For a similar tool, see [libUnihan](http://libunihan.sourceforge.net/).\n\nAs of v0.31.0, unihan-etl is compatible with UNIHAN Version 15.1.0 ([released on 2023-09-01, revision 35](https://www.unicode.org/reports/tr38/tr38-35.html#History)).\n\n## The UNIHAN database\n\nThe [UNIHAN](http://www.unicode.org/charts/unihan.html) database organizes data across multiple files, exemplified below:\n\n```tsv\nU+3400\tkCantonese\t\tjau1\nU+3400\tkDefinition\t\t(same as U+4E18 \u4e18) hillock or mound\nU+3400\tkMandarin\t\tqi\u016b\nU+3401\tkCantonese\t\ttim2\nU+3401\tkDefinition\t\tto lick; to taste, a mat, bamboo bark\nU+3401\tkHanyuPinyin\t\t10019.020:ti\u00e0n\nU+3401\tkMandarin\t\tti\u00e0n\n```\n\nValues vary in shape and structure depending on their field type.\n[kHanyuPinyin](http://www.unicode.org/reports/tr38/#kHanyuPinyin) maps Unicode codepoints to\n[H\u00e0ny\u01d4 D\u00e0 Z\u00ecdi\u01cen](https://en.wikipedia.org/wiki/Hanyu_Da_Zidian), where `10019.020:ti\u00e0n` represents\nan entry. Complicating it further, more variations:\n\n```tsv\nU+5EFE\tkHanyuPinyin\t\t10513.110,10514.010,10514.020:g\u01d2ng\nU+5364\tkHanyuPinyin\t\t10093.130:x\u012b,l\u01d4 74609.020:l\u01d4,x\u012b\n```\n\n_kHanyuPinyin_ supports multiple entries delimited by spaces. \":\" (colon) separate locations in the\nwork from pinyin readings. \",\" (comma) separate multiple entries/readings. This is just one of 90\nfields contained in the database.\n\n[etl]: https://en.wikipedia.org/wiki/Extract,_transform,_load\n\n## Tabular, \"Flat\" output\n\n### CSV (default)\n\n```console\n$ unihan-etl\n```\n\n```csv\nchar,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin\n\u3400,U+3400,jau1,(same as U+4E18 \u4e18) hillock or mound,,qi\u016b\n\u3401,U+3401,tim2,\"to lick; to taste, a mat, bamboo bark\",10019.020:ti\u00e0n,ti\u00e0n\n```\n\nWith `$ unihan-etl -F yaml --no-expand`:\n\n```yaml\n- char: \u3400\n  kCantonese: jau1\n  kDefinition: (same as U+4E18 \u4e18) hillock or mound\n  kHanyuPinyin: null\n  kMandarin: qi\u016b\n  ucn: U+3400\n- char: \u3401\n  kCantonese: tim2\n  kDefinition: to lick; to taste, a mat, bamboo bark\n  kHanyuPinyin: 10019.020:ti\u00e0n\n  kMandarin: ti\u00e0n\n  ucn: U+3401\n```\n\nTo preview in the CLI, try [tabview](https://github.com/TabViewer/tabview) or\n[csvlens](https://github.com/YS-L/csvlens).\n\n### JSON\n\n```console\n$ unihan-etl -F json --no-expand\n```\n\n```json\n[\n  {\n    \"char\": \"\u3400\",\n    \"ucn\": \"U+3400\",\n    \"kDefinition\": \"(same as U+4E18 \u4e18) hillock or mound\",\n    \"kCantonese\": \"jau1\",\n    \"kHanyuPinyin\": null,\n    \"kMandarin\": \"qi\u016b\"\n  },\n  {\n    \"char\": \"\u3401\",\n    \"ucn\": \"U+3401\",\n    \"kDefinition\": \"to lick; to taste, a mat, bamboo bark\",\n    \"kCantonese\": \"tim2\",\n    \"kHanyuPinyin\": \"10019.020:ti\u00e0n\",\n    \"kMandarin\": \"ti\u00e0n\"\n  }\n]\n```\n\nTools:\n\n- View in CLI: [python-fx](https://github.com/cielong/pyfx),\n  [jless](https://github.com/PaulJuliusMartinez/jless) or\n  [fx](https://github.com/antonmedv/fx).\n- Filter via CLI: [jq](https://github.com/stedolan/jq),\n  [jql](https://github.com/yamafaktory/jql),\n  [gojq](https://github.com/itchyny/gojq).\n\n### YAML\n\n```console\n$ unihan-etl -F yaml --no-expand\n```\n\n```yaml\n- char: \u3400\n  kCantonese: jau1\n  kDefinition: (same as U+4E18 \u4e18) hillock or mound\n  kHanyuPinyin: null\n  kMandarin: qi\u016b\n  ucn: U+3400\n- char: \u3401\n  kCantonese: tim2\n  kDefinition: to lick; to taste, a mat, bamboo bark\n  kHanyuPinyin: 10019.020:ti\u00e0n\n  kMandarin: ti\u00e0n\n  ucn: U+3401\n```\n\nFilter via the CLI with [yq](https://github.com/mikefarah/yq).\n\n## \"Structured\" output\n\nCodepoints can pack a lot more detail, unihan-etl carefully extracts these values in a uniform\nmanner. Empty values are pruned.\n\nTo make this possible, unihan-etl exports to JSON, YAML, and python list/dicts.\n\n<div class=\"admonition\">\n\nWhy not CSV?\n\nUnfortunately, CSV is only suitable for storing table-like information. File formats such as JSON\nand YAML accept key-values and hierarchical entries.\n\n</div>\n\n### JSON\n\n```console\n$ unihan-etl -F json\n```\n\n```json\n[\n  {\n    \"char\": \"\u3400\",\n    \"ucn\": \"U+3400\",\n    \"kDefinition\": [\"(same as U+4E18 \u4e18) hillock or mound\"],\n    \"kCantonese\": [\"jau1\"],\n    \"kMandarin\": {\n      \"zh-Hans\": \"qi\u016b\",\n      \"zh-Hant\": \"qi\u016b\"\n    }\n  },\n  {\n    \"char\": \"\u3401\",\n    \"ucn\": \"U+3401\",\n    \"kDefinition\": [\"to lick\", \"to taste, a mat, bamboo bark\"],\n    \"kCantonese\": [\"tim2\"],\n    \"kHanyuPinyin\": [\n      {\n        \"locations\": [\n          {\n            \"volume\": 1,\n            \"page\": 19,\n            \"character\": 2,\n            \"virtual\": 0\n          }\n        ],\n        \"readings\": [\"ti\u00e0n\"]\n      }\n    ],\n    \"kMandarin\": {\n      \"zh-Hans\": \"ti\u00e0n\",\n      \"zh-Hant\": \"ti\u00e0n\"\n    }\n  }\n]\n```\n\n### YAML\n\n```console\n$ unihan-etl -F yaml\n```\n\n```yaml\n- char: \u3400\n  kCantonese:\n    - jau1\n  kDefinition:\n    - (same as U+4E18 \u4e18) hillock or mound\n  kMandarin:\n    zh-Hans: qi\u016b\n    zh-Hant: qi\u016b\n  ucn: U+3400\n- char: \u3401\n  kCantonese:\n    - tim2\n  kDefinition:\n    - to lick\n    - to taste, a mat, bamboo bark\n  kHanyuPinyin:\n    - locations:\n        - character: 2\n          page: 19\n          virtual: 0\n          volume: 1\n      readings:\n        - ti\u00e0n\n  kMandarin:\n    zh-Hans: ti\u00e0n\n    zh-Hant: ti\u00e0n\n  ucn: U+3401\n```\n\n## Features\n\n- automatically downloads UNIHAN from the internet\n- strives for accuracy with the specifications described in\n  [UNIHAN's database design](http://www.unicode.org/reports/tr38/)\n- export to JSON, CSV and YAML (requires [pyyaml](http://pyyaml.org/)) via `-F`\n- configurable to export specific fields via `-f`\n- accounts for encoding conflicts due to the Unicode-heavy content\n- designed as a technical proof for future CJK (Chinese, Japanese, Korean) datasets\n- core component and dependency of [cihai](https://cihai.git-pull.com), a CJK library\n- [data package](http://frictionlessdata.io/data-packages/) support\n- expansion of multi-value delimited fields in YAML, JSON and python dictionaries\n- supports >= 3.7 and pypy\n\nIf you encounter a problem or have a question, please\n[create an issue](https://github.com/cihai/unihan-etl/issues/new).\n\n## Installation\n\nTo download and build your own UNIHAN export:\n\n```console\n$ pip install --user unihan-etl\n```\n\nor by [pipx](https://pypa.github.io/pipx/docs/):\n\n```console\n$ pipx install unihan-etl\n```\n\n### Developmental releases\n\n[pip](https://pip.pypa.io/en/stable/):\n\n```console\n$ pip install --user --upgrade --pre unihan-etl\n```\n\n[pipx](https://pypa.github.io/pipx/docs/):\n\n```console\n$ pipx install --suffix=@next 'unihan-etl' --pip-args '\\--pre' --force\n// Usage: unihan-etl@next load yoursession\n```\n\n## Usage\n\n`unihan-etl` offers customizable builds via its command line arguments.\n\nSee [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for information\non how you can specify columns, files, download URL's, and output destination.\n\nTo output CSV, the default format:\n\n```console\n$ unihan-etl\n```\n\nTo output JSON:\n\n```console\n$ unihan-etl -F json\n```\n\nTo output YAML:\n\n```console\n$ pip install --user pyyaml\n$ unihan-etl -F yaml\n```\n\nTo only output the kDefinition field in a csv:\n\n```console\n$ unihan-etl -f kDefinition\n```\n\nTo output multiple fields, separate with spaces:\n\n```console\n$ unihan-etl -f kCantonese kDefinition\n```\n\nTo output to a custom file:\n\n```console\n$ unihan-etl --destination ./exported.csv\n```\n\nTo output to a custom file (templated file extension):\n\n```console\n$ unihan-etl --destination ./exported.{ext}\n```\n\nSee [unihan-etl CLI arguments](https://unihan-etl.git-pull.com/en/latest/cli.html) for advanced\nusage examples.\n\n## Code layout\n\n```console\n# cache dir (Unihan.zip is downloaded, contents extracted)\n{XDG cache dir}/unihan_etl/\n\n# output dir\n{XDG data dir}/unihan_etl/\n  unihan.json\n  unihan.csv\n  unihan.yaml   # (requires pyyaml)\n\n# package dir\nunihan_etl/\n  core.py    # argparse, download, extract, transform UNIHAN's data\n  options.py    # configuration object\n  constants.py  # immutable data vars (field to filename mappings, etc)\n  expansion.py  # extracting details baked inside of fields\n  types.py      # type annotations\n  util.py       # utility / helper functions\n\n# test suite\ntests/*\n```\n\n## API\n\nThe package is python underneath the hood, you can utilize its full [API].\nExample:\n\n```python\n>>> from unihan_etl.core import Packager\n>>> pkgr = Packager()\n>>> hasattr(pkgr.options, 'destination')\nTrue\n```\n\n[API]: https://unihan-etl.git-pull.com/en/latest/api.html\n\n## Developing\n\n```console\n$ git clone https://github.com/cihai/unihan-etl.git\n```\n\n```console\n$ cd unihan-etl\n```\n\n[Bootstrap your environment and learn more about contributing](https://cihai.git-pull.com/contributing/). We use the same conventions / tools across all cihai projects: `pytest`, `sphinx`, `mypy`, `ruff`, `tmuxp`, and file watcher helpers (e.g. `entr(1)`).\n\n## More information\n\n[![Docs](https://github.com/cihai/unihan-etl/workflows/docs/badge.svg)](https://unihan-etl.git-pull.com/)\n[![Build Status](https://github.com/cihai/unihan-etl/workflows/tests/badge.svg)](https://github.com/cihai/unihan-etl/actions?query=workflow%3A%22tests%22)\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Export UNIHAN data of Chinese, Japanese, Korean to CSV, JSON or YAML",
    "version": "0.34.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/cihai/unihan-etl/issues",
        "Changes": "https://github.com/cihai/unihan-etl/blob/master/CHANGES",
        "Documentation": "https://unihan-etl.git-pull.com",
        "Homepage": "https://unihan-etl.git-pull.com",
        "Repository": "https://github.com/cihai/unihan-etl"
    },
    "split_keywords": [
        "unihan",
        " unicode",
        " cjk",
        " yaml",
        " json",
        " chinese",
        " japanese",
        " korean",
        " hanzi",
        " dictionary",
        " dataset"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4d9cae602992a46a9773a49deaf765b367d1655996fd565b8fa176fbdc040984",
                "md5": "09fae33a3d83e1b90644532d2be7641f",
                "sha256": "12c9d45f9697be86497e70189c4b833f406c1936d6e9e511ecbffd68d80648cd"
            },
            "downloads": -1,
            "filename": "unihan_etl-0.34.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "09fae33a3d83e1b90644532d2be7641f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 58838,
            "upload_time": "2024-03-24T17:22:36",
            "upload_time_iso_8601": "2024-03-24T17:22:36.742951Z",
            "url": "https://files.pythonhosted.org/packages/4d/9c/ae602992a46a9773a49deaf765b367d1655996fd565b8fa176fbdc040984/unihan_etl-0.34.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b0a840afb05bdbb341bc672eba9fb5da78a0a55f7f5995eff3662493927bc53",
                "md5": "8aa660825a656662a7c557c26987754e",
                "sha256": "1a596f28982fc9ee172d50ed44b025c4bf4f7403bba2e14c8933571ea08fba21"
            },
            "downloads": -1,
            "filename": "unihan_etl-0.34.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8aa660825a656662a7c557c26987754e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 69407,
            "upload_time": "2024-03-24T17:22:39",
            "upload_time_iso_8601": "2024-03-24T17:22:39.100111Z",
            "url": "https://files.pythonhosted.org/packages/6b/0a/840afb05bdbb341bc672eba9fb5da78a0a55f7f5995eff3662493927bc53/unihan_etl-0.34.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-24 17:22:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cihai",
    "github_project": "unihan-etl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "unihan-etl"
}
        
Elapsed time: 0.22935s