cnparser

Name	cnparser JSON
Version	1.7.0 JSON
	download
home_page	https://github.com/new-village/cnparser
Summary	cnparser is a parser library of Corporate Number Publication Site data.
upload_time	2025-08-11 13:22:27
maintainer	None
docs_url	None
author	new-village
requires_python	None
license	Apache-2.0 license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # cnparser  
[![Test](https://github.com/new-village/cnparser/actions/workflows/test.yaml/badge.svg)](https://github.com/new-village/cnparser/actions/workflows/test.yaml)
![PyPI - Version](https://img.shields.io/pypi/v/cnparser)
  
**cnparser** is a python library for loading and enrichment [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/en/) data that is provided from National Tax Agency Japan. cnparser only support to parse latest data now.   
  
## Installation  
----------------------
cnparser is available on pip installation.
```shell:
$ python -m pip install cnparser
```
  
### GitHub Install
Installing the latest version from GitHub:  
```shell:
$ git clone https://github.com/new-village/cnparser
$ cd cnparser
$ python setup.py install
```
    
## Usage
This section demonstrates how to use this library to load and process data from the National Tax Agency's [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/).

### Direct Data Loading
To download data for a specific prefecture, use the `load` function. By passing the prefecture name as an argument, you can obtain a DataFrame containing data for that prefecture.If you wish to download data for a specific prefecture, you must specify the prefecture name in Roman characters ([list of the supported prefectures](https://github.com/new-village/cnparser/blob/main/cnparser/config/file_id.json)).  
To execute the `load` function without specifying any arguments, data for all prefectures across Japan will be downloaded. 
```python:
>>> import cnparser
>>> df = cnparser.load("Shimane")
```

### CSV Data Loading
If you already have a downloaded CSV file, use the `read_csv` function. By passing the file path as an argument, you can obtain a DataFrame with headers from the CSV data.
```python:
>>> import cnparser
>>> df = cnparser.read_csv("path/to/data.csv")
```

### Data Enrichment Functionality
The `enrich` function standardises and transforms the values of specific fields in the loaded DataFrame. 
```python:
>>> import cnparser
>>> df = cnparser.enrich(df)
```

The functions perform all processing, but it is possible to apply only specific processing by defining specific processing as an argument.
```python:
>>> import cnparser
>>> df = cnparser.enrich(df, "enrich_kana" ...)
```

The processes supported by the `enrich` function are as follows:
- `enrich_kana`: Function that adds a standardized furigana column `furigana` to the DataFrame. It handles data entry by converting `name` to kana, if `furigana` is NaN. Note that currently only kanji and katakana conversions are supported. Alphabet conversions are not supported.  
- `enrich_kind`: Function that adds the `kind` label to the `legal_entity`.  
- `enrich_post_code`: Function that adds the formatted postcode as XXX-XXX to `post_code`.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/new-village/cnparser",
    "name": "cnparser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "new-village",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ea/d8/6f2ecb469371afa6950a5b83c6d09929742a202cbc803d8efdf39250d919/cnparser-1.7.0.tar.gz",
    "platform": null,
    "description": "# cnparser  \n[![Test](https://github.com/new-village/cnparser/actions/workflows/test.yaml/badge.svg)](https://github.com/new-village/cnparser/actions/workflows/test.yaml)\n![PyPI - Version](https://img.shields.io/pypi/v/cnparser)\n  \n**cnparser** is a python library for loading and enrichment [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/en/) data that is provided from National Tax Agency Japan. cnparser only support to parse latest data now.   \n  \n## Installation  \n----------------------\ncnparser is available on pip installation.\n```shell:\n$ python -m pip install cnparser\n```\n  \n### GitHub Install\nInstalling the latest version from GitHub:  \n```shell:\n$ git clone https://github.com/new-village/cnparser\n$ cd cnparser\n$ python setup.py install\n```\n    \n## Usage\nThis section demonstrates how to use this library to load and process data from the National Tax Agency's [Corporate Number Publication Site](https://www.houjin-bangou.nta.go.jp/).\n\n### Direct Data Loading\nTo download data for a specific prefecture, use the `load` function. By passing the prefecture name as an argument, you can obtain a DataFrame containing data for that prefecture.If you wish to download data for a specific prefecture, you must specify the prefecture name in Roman characters ([list of the supported prefectures](https://github.com/new-village/cnparser/blob/main/cnparser/config/file_id.json)).  \nTo execute the `load` function without specifying any arguments, data for all prefectures across Japan will be downloaded. \n```python:\n>>> import cnparser\n>>> df = cnparser.load(\"Shimane\")\n```\n\n### CSV Data Loading\nIf you already have a downloaded CSV file, use the `read_csv` function. By passing the file path as an argument, you can obtain a DataFrame with headers from the CSV data.\n```python:\n>>> import cnparser\n>>> df = cnparser.read_csv(\"path/to/data.csv\")\n```\n\n### Data Enrichment Functionality\nThe `enrich` function standardises and transforms the values of specific fields in the loaded DataFrame. \n```python:\n>>> import cnparser\n>>> df = cnparser.enrich(df)\n```\n\nThe functions perform all processing, but it is possible to apply only specific processing by defining specific processing as an argument.\n```python:\n>>> import cnparser\n>>> df = cnparser.enrich(df, \"enrich_kana\" ...)\n```\n\nThe processes supported by the `enrich` function are as follows:\n- `enrich_kana`: Function that adds a standardized furigana column `furigana` to the DataFrame. It handles data entry by converting `name` to kana, if `furigana` is NaN. Note that currently only kanji and katakana conversions are supported. Alphabet conversions are not supported.  \n- `enrich_kind`: Function that adds the `kind` label to the `legal_entity`.  \n- `enrich_post_code`: Function that adds the formatted postcode as XXX-XXX to `post_code`.  \n",
    "bugtrack_url": null,
    "license": "Apache-2.0 license",
    "summary": "cnparser is a parser library of Corporate Number Publication Site data.",
    "version": "1.7.0",
    "project_urls": {
        "Homepage": "https://github.com/new-village/cnparser"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "702eeafe7fb6f745e3117b3071223cac3ff38c26340f804f813bd53288419fea",
                "md5": "caf4d48f20b9d8c8fff4483c76e7974c",
                "sha256": "87b3cef2a295e066dde1294777ae5c610a56d3c6e81916724c0311d279523809"
            },
            "downloads": -1,
            "filename": "cnparser-1.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "caf4d48f20b9d8c8fff4483c76e7974c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14947,
            "upload_time": "2025-08-11T13:22:26",
            "upload_time_iso_8601": "2025-08-11T13:22:26.165152Z",
            "url": "https://files.pythonhosted.org/packages/70/2e/eafe7fb6f745e3117b3071223cac3ff38c26340f804f813bd53288419fea/cnparser-1.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ead86f2ecb469371afa6950a5b83c6d09929742a202cbc803d8efdf39250d919",
                "md5": "806a3861c81a54d87165d9321a1e6e9e",
                "sha256": "aa7608b2968d228f87515f3b96555f6b5fefe19c7cab852d9f5e4955eb2e8988"
            },
            "downloads": -1,
            "filename": "cnparser-1.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "806a3861c81a54d87165d9321a1e6e9e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 13972,
            "upload_time": "2025-08-11T13:22:27",
            "upload_time_iso_8601": "2025-08-11T13:22:27.109173Z",
            "url": "https://files.pythonhosted.org/packages/ea/d8/6f2ecb469371afa6950a5b83c6d09929742a202cbc803d8efdf39250d919/cnparser-1.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-11 13:22:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "new-village",
    "github_project": "cnparser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cnparser"
}

new-village