# cnparser
[](https://github.com/new-village/cnparser/actions/workflows/test.yaml)

**cnparser** is a python library for loading and enrichment [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/en/) data that is provided from National Tax Agency Japan. cnparser only support to parse latest data now.
## Installation
----------------------
cnparser is available on pip installation.
```shell:
$ python -m pip install cnparser
```
### GitHub Install
Installing the latest version from GitHub:
```shell:
$ git clone https://github.com/new-village/cnparser
$ cd cnparser
$ python setup.py install
```
## Usage
This section demonstrates how to use this library to load and process data from the National Tax Agency's [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/).
### Direct Data Loading
To download data for a specific prefecture, use the `load` function. By passing the prefecture name as an argument, you can obtain a DataFrame containing data for that prefecture.If you wish to download data for a specific prefecture, you must specify the prefecture name in Roman characters ([list of the supported prefectures](https://github.com/new-village/cnparser/blob/main/cnparser/config/file_id.json)).
To execute the `load` function without specifying any arguments, data for all prefectures across Japan will be downloaded.
```python:
>>> import cnparser
>>> df = cnparser.load("Shimane")
```
### CSV Data Loading
If you already have a downloaded CSV file, use the `read_csv` function. By passing the file path as an argument, you can obtain a DataFrame with headers from the CSV data.
```python:
>>> import cnparser
>>> df = cnparser.read_csv("path/to/data.csv")
```
### Data Enrichment Functionality
The `enrich` function standardises and transforms the values of specific fields in the loaded DataFrame.
```python:
>>> import cnparser
>>> df = cnparser.enrich(df)
```
The functions perform all processing, but it is possible to apply only specific processing by defining specific processing as an argument.
```python:
>>> import cnparser
>>> df = cnparser.enrich(df, "enrich_kana" ...)
```
The processes supported by the `enrich` function are as follows:
- `enrich_kana`: Function that adds a standardized furigana column `furigana` to the DataFrame. It handles data entry by converting `name` to kana, if `furigana` is NaN. Note that currently only kanji and katakana conversions are supported. Alphabet conversions are not supported.
- `enrich_kind`: Function that adds the `kind` label to the `legal_entity`.
- `enrich_post_code`: Function that adds the formatted postcode as XXX-XXX to `post_code`.
Raw data
{
"_id": null,
"home_page": "https://github.com/new-village/cnparser",
"name": "cnparser",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "new-village",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/ea/d8/6f2ecb469371afa6950a5b83c6d09929742a202cbc803d8efdf39250d919/cnparser-1.7.0.tar.gz",
"platform": null,
"description": "# cnparser \n[](https://github.com/new-village/cnparser/actions/workflows/test.yaml)\n\n \n**cnparser** is a python library for loading and enrichment [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/en/) data that is provided from National Tax Agency Japan. cnparser only support to parse latest data now. \n \n## Installation \n----------------------\ncnparser is available on pip installation.\n```shell:\n$ python -m pip install cnparser\n```\n \n### GitHub Install\nInstalling the latest version from GitHub: \n```shell:\n$ git clone https://github.com/new-village/cnparser\n$ cd cnparser\n$ python setup.py install\n```\n \n## Usage\nThis section demonstrates how to use this library to load and process data from the National Tax Agency's [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/).\n\n### Direct Data Loading\nTo download data for a specific prefecture, use the `load` function. By passing the prefecture name as an argument, you can obtain a DataFrame containing data for that prefecture.If you wish to download data for a specific prefecture, you must specify the prefecture name in Roman characters ([list of the supported prefectures](https://github.com/new-village/cnparser/blob/main/cnparser/config/file_id.json)). \nTo execute the `load` function without specifying any arguments, data for all prefectures across Japan will be downloaded. \n```python:\n>>> import cnparser\n>>> df = cnparser.load(\"Shimane\")\n```\n\n### CSV Data Loading\nIf you already have a downloaded CSV file, use the `read_csv` function. By passing the file path as an argument, you can obtain a DataFrame with headers from the CSV data.\n```python:\n>>> import cnparser\n>>> df = cnparser.read_csv(\"path/to/data.csv\")\n```\n\n### Data Enrichment Functionality\nThe `enrich` function standardises and transforms the values of specific fields in the loaded DataFrame. \n```python:\n>>> import cnparser\n>>> df = cnparser.enrich(df)\n```\n\nThe functions perform all processing, but it is possible to apply only specific processing by defining specific processing as an argument.\n```python:\n>>> import cnparser\n>>> df = cnparser.enrich(df, \"enrich_kana\" ...)\n```\n\nThe processes supported by the `enrich` function are as follows:\n- `enrich_kana`: Function that adds a standardized furigana column `furigana` to the DataFrame. It handles data entry by converting `name` to kana, if `furigana` is NaN. Note that currently only kanji and katakana conversions are supported. Alphabet conversions are not supported. \n- `enrich_kind`: Function that adds the `kind` label to the `legal_entity`. \n- `enrich_post_code`: Function that adds the formatted postcode as XXX-XXX to `post_code`. \n",
"bugtrack_url": null,
"license": "Apache-2.0 license",
"summary": "cnparser is a parser library of Corporate Number Publication Site data.",
"version": "1.7.0",
"project_urls": {
"Homepage": "https://github.com/new-village/cnparser"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "702eeafe7fb6f745e3117b3071223cac3ff38c26340f804f813bd53288419fea",
"md5": "caf4d48f20b9d8c8fff4483c76e7974c",
"sha256": "87b3cef2a295e066dde1294777ae5c610a56d3c6e81916724c0311d279523809"
},
"downloads": -1,
"filename": "cnparser-1.7.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "caf4d48f20b9d8c8fff4483c76e7974c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14947,
"upload_time": "2025-08-11T13:22:26",
"upload_time_iso_8601": "2025-08-11T13:22:26.165152Z",
"url": "https://files.pythonhosted.org/packages/70/2e/eafe7fb6f745e3117b3071223cac3ff38c26340f804f813bd53288419fea/cnparser-1.7.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ead86f2ecb469371afa6950a5b83c6d09929742a202cbc803d8efdf39250d919",
"md5": "806a3861c81a54d87165d9321a1e6e9e",
"sha256": "aa7608b2968d228f87515f3b96555f6b5fefe19c7cab852d9f5e4955eb2e8988"
},
"downloads": -1,
"filename": "cnparser-1.7.0.tar.gz",
"has_sig": false,
"md5_digest": "806a3861c81a54d87165d9321a1e6e9e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13972,
"upload_time": "2025-08-11T13:22:27",
"upload_time_iso_8601": "2025-08-11T13:22:27.109173Z",
"url": "https://files.pythonhosted.org/packages/ea/d8/6f2ecb469371afa6950a5b83c6d09929742a202cbc803d8efdf39250d919/cnparser-1.7.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-11 13:22:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "new-village",
"github_project": "cnparser",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cnparser"
}