estat-api-dlt-helper


Nameestat-api-dlt-helper JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
Summarye-Stat APIを使ってデータを取得し、dltを使ってデータをロードするためのヘルパーライブラリ
upload_time2025-07-22 07:27:10
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords api dlt e-stat elt estat etl helper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # estat_api_dlt_helper

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)

[e-Stat API](https://www.e-stat.go.jp/api/)からデータを取得しロードするhelper

## 概要

e-Stat APIを利用してデータを取得し、DWHなどのデータ基盤にロードするシーンでの活用を想定しています。

Pythonのライブラリとして動作し、以下の2つの機能を提供します。

- `parse_response`
  - APIのレスポンスをパースし、データとメタデータを結合させたArrow Tableを作成します。
- `load_estat_data`
  - [dlt(data load tool)](https://dlthub.com/docs/intro)のラッパーとして動作し、
    統計表IDとテーブル名などを設定するだけで、簡単にDWHなどにロード可能です。
  - paginationや複数の統計表IDを同じテーブルにロードしたいケースなどを内部でいい感じに処理します。

## インストール

```bash
pip install estat_api_dlt_helper

# BigQuery
pip install "estat_api_dlt_helper[bigquery]"

# Snowflake
pip install "estat_api_dlt_helper[snowflake]"

# duckdb
pip install "estat_api_dlt_helper[duckdb]"
```

## 使用方法

e-Stat APIに関して、ユーザー登録やアプリケーションIDの取得が完了している前提です。
取得したアプリケーションIDは環境変数に入れておいてください。

```bash
export ESTAT_API_KEY=YOUR_APP_ID
```

Win: 

```
$env:ESTAT_API_KEY = "YOUR_APP_ID"
```

### parse_responseの使い方

e-Stat APIの`/rest/3.0/app/json/getStatsData`のレスポンスを`parse_response()`に渡すことで、
responseの`TABLE_INF.VALUE`の中身をテーブルとして、`CLASS_INF.CLASS_OBJ`の中身をメタデータとして名寄せさせたArrow Tableを生成することができます。

処理イメージ:

| response                                | 加工後                                 |
| --------------------------------------- | -------------------------------------- |
| ![response](images/2024-11-18-json.jpg) | ![加工後](images/2024-11-18-table.jpg) |

see: [examples](examples/basic_parser_usage.py)

```python
import os
import pandas as pd
import requests

from estat_api_dlt_helper import parse_response

# API endpoint
url = "https://api.e-stat.go.jp/rest/3.0/app/json/getStatsData"

# Parameters for the API request
params = {
    "appId": os.getenv("ESTAT_API_KEY"),
    "statsDataId": "0000020201",  # 社会人口統計 市区町村データ 基礎データ
    "cdCat01": "A2101",           # 住民基本台帳人口(日本人)
    "cdArea": "01100,01101",      # 札幌市, 札幌市中央区
    "limit": 10
}
try:
    # Make API request
    response = requests.get(url, params=params)
    response.raise_for_status()
    data = response.json()
    # Parse the response into Arrow table
    table = parse_response(data)
    # Print data
    print(table.to_pandas())

except requests.RequestException as e:
    print(f"Error fetching data from API: {e}")
except Exception as e:
    print(f"Error processing data: {e}")
```

### load_estat_dataの使い方

[dlt(data load tool)](https://dlthub.com/docs/intro)のwrapperとして簡便なconfigで取得データを
DWH等にロードできます。

ロード可能なDWHについては[dltのドキュメント](https://dlthub.com/docs/dlt-ecosystem/destinations/)を参考にしてください。

see: [examples](examples/basic_load_example.py)

```python
# duckdbの場合
import os
import dlt
import duckdb
from estat_api_dlt_helper import EstatDltConfig, load_estat_data

db = duckdb.connect("estat_demo.duckdb")

# Simple configuration
config = {
    "source": {
        "app_id": os.getenv("ESTAT_API_KEY"), #(必須項目)
        "statsDataId": "0000020201",  # (必須項目) 人口推計
        "limit": 100,  # (Optional) Small limit for demo
    },
    "destination": {
        "pipeline_name": "estat_demo",
        "destination": dlt.destinations.duckdb(db),
        "dataset_name": "estat_api_data",
        "table_name": "population_estimates",
        "write_disposition": "replace",  # Replace existing data
    },
}
estat_config = EstatDltConfig(**config)

# Load data with one line
info = load_estat_data(estat_config)
print(info)
```

### parse_responseの使い方 (Advanced)

`load_estat_data()`は簡単な設定でロードを可能にしますが、dltの細かい設定や機能を使いこなしたい場合(`dlt.transform`や`bigquery_adapter`など)は、
dltのresourceとpipelineをそれぞれ単体で生成し、既存のdltのコードと同じように扱うこともできます。

see: [examples (resource)](examples/resource_example.py)

see: [examples (pipeline)](examples/pipeline_example.py)

## Development

```bash
# Install development dependencies
uv sync

# Run tests
uv run pytest

# Format code
uv run ruff format src/
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "estat-api-dlt-helper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "api, dlt, e-stat, elt, estat, etl, helper",
    "author": null,
    "author_email": "K-Oxon <ko1011qfp@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5d/eb/291c6c58da74fbd86ceccff3cf517831f34003fea3241d52bab73bc30bc0/estat_api_dlt_helper-0.1.0.tar.gz",
    "platform": null,
    "description": "# estat_api_dlt_helper\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n\n[e-Stat API](https://www.e-stat.go.jp/api/)\u304b\u3089\u30c7\u30fc\u30bf\u3092\u53d6\u5f97\u3057\u30ed\u30fc\u30c9\u3059\u308bhelper\n\n## \u6982\u8981\n\ne-Stat API\u3092\u5229\u7528\u3057\u3066\u30c7\u30fc\u30bf\u3092\u53d6\u5f97\u3057\u3001DWH\u306a\u3069\u306e\u30c7\u30fc\u30bf\u57fa\u76e4\u306b\u30ed\u30fc\u30c9\u3059\u308b\u30b7\u30fc\u30f3\u3067\u306e\u6d3b\u7528\u3092\u60f3\u5b9a\u3057\u3066\u3044\u307e\u3059\u3002\n\nPython\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3068\u3057\u3066\u52d5\u4f5c\u3057\u3001\u4ee5\u4e0b\u306e\uff12\u3064\u306e\u6a5f\u80fd\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002\n\n- `parse_response`\n  - API\u306e\u30ec\u30b9\u30dd\u30f3\u30b9\u3092\u30d1\u30fc\u30b9\u3057\u3001\u30c7\u30fc\u30bf\u3068\u30e1\u30bf\u30c7\u30fc\u30bf\u3092\u7d50\u5408\u3055\u305b\u305fArrow Table\u3092\u4f5c\u6210\u3057\u307e\u3059\u3002\n- `load_estat_data`\n  - [dlt(data load tool)](https://dlthub.com/docs/intro)\u306e\u30e9\u30c3\u30d1\u30fc\u3068\u3057\u3066\u52d5\u4f5c\u3057\u3001\n    \u7d71\u8a08\u8868ID\u3068\u30c6\u30fc\u30d6\u30eb\u540d\u306a\u3069\u3092\u8a2d\u5b9a\u3059\u308b\u3060\u3051\u3067\u3001\u7c21\u5358\u306bDWH\u306a\u3069\u306b\u30ed\u30fc\u30c9\u53ef\u80fd\u3067\u3059\u3002\n  - pagination\u3084\u8907\u6570\u306e\u7d71\u8a08\u8868ID\u3092\u540c\u3058\u30c6\u30fc\u30d6\u30eb\u306b\u30ed\u30fc\u30c9\u3057\u305f\u3044\u30b1\u30fc\u30b9\u306a\u3069\u3092\u5185\u90e8\u3067\u3044\u3044\u611f\u3058\u306b\u51e6\u7406\u3057\u307e\u3059\u3002\n\n## \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n\n```bash\npip install estat_api_dlt_helper\n\n# BigQuery\npip install \"estat_api_dlt_helper[bigquery]\"\n\n# Snowflake\npip install \"estat_api_dlt_helper[snowflake]\"\n\n# duckdb\npip install \"estat_api_dlt_helper[duckdb]\"\n```\n\n## \u4f7f\u7528\u65b9\u6cd5\n\ne-Stat API\u306b\u95a2\u3057\u3066\u3001\u30e6\u30fc\u30b6\u30fc\u767b\u9332\u3084\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3ID\u306e\u53d6\u5f97\u304c\u5b8c\u4e86\u3057\u3066\u3044\u308b\u524d\u63d0\u3067\u3059\u3002\n\u53d6\u5f97\u3057\u305f\u30a2\u30d7\u30ea\u30b1\u30fc\u30b7\u30e7\u30f3ID\u306f\u74b0\u5883\u5909\u6570\u306b\u5165\u308c\u3066\u304a\u3044\u3066\u304f\u3060\u3055\u3044\u3002\n\n```bash\nexport ESTAT_API_KEY=YOUR_APP_ID\n```\n\nWin: \n\n```\n$env:ESTAT_API_KEY = \"YOUR_APP_ID\"\n```\n\n### parse_response\u306e\u4f7f\u3044\u65b9\n\ne-Stat API\u306e`/rest/3.0/app/json/getStatsData`\u306e\u30ec\u30b9\u30dd\u30f3\u30b9\u3092`parse_response()`\u306b\u6e21\u3059\u3053\u3068\u3067\u3001\nresponse\u306e`TABLE_INF.VALUE`\u306e\u4e2d\u8eab\u3092\u30c6\u30fc\u30d6\u30eb\u3068\u3057\u3066\u3001`CLASS_INF.CLASS_OBJ`\u306e\u4e2d\u8eab\u3092\u30e1\u30bf\u30c7\u30fc\u30bf\u3068\u3057\u3066\u540d\u5bc4\u305b\u3055\u305b\u305fArrow Table\u3092\u751f\u6210\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002\n\n\u51e6\u7406\u30a4\u30e1\u30fc\u30b8:\n\n| response                                | \u52a0\u5de5\u5f8c                                 |\n| --------------------------------------- | -------------------------------------- |\n| ![response](images/2024-11-18-json.jpg) | ![\u52a0\u5de5\u5f8c](images/2024-11-18-table.jpg) |\n\nsee: [examples](examples/basic_parser_usage.py)\n\n```python\nimport os\nimport pandas as pd\nimport requests\n\nfrom estat_api_dlt_helper import parse_response\n\n# API endpoint\nurl = \"https://api.e-stat.go.jp/rest/3.0/app/json/getStatsData\"\n\n# Parameters for the API request\nparams = {\n    \"appId\": os.getenv(\"ESTAT_API_KEY\"),\n    \"statsDataId\": \"0000020201\",  # \u793e\u4f1a\u4eba\u53e3\u7d71\u8a08 \u5e02\u533a\u753a\u6751\u30c7\u30fc\u30bf \u57fa\u790e\u30c7\u30fc\u30bf\n    \"cdCat01\": \"A2101\",           # \u4f4f\u6c11\u57fa\u672c\u53f0\u5e33\u4eba\u53e3\uff08\u65e5\u672c\u4eba\uff09\n    \"cdArea\": \"01100,01101\",      # \u672d\u5e4c\u5e02, \u672d\u5e4c\u5e02\u4e2d\u592e\u533a\n    \"limit\": 10\n}\ntry:\n    # Make API request\n    response = requests.get(url, params=params)\n    response.raise_for_status()\n    data = response.json()\n    # Parse the response into Arrow table\n    table = parse_response(data)\n    # Print data\n    print(table.to_pandas())\n\nexcept requests.RequestException as e:\n    print(f\"Error fetching data from API: {e}\")\nexcept Exception as e:\n    print(f\"Error processing data: {e}\")\n```\n\n### load_estat_data\u306e\u4f7f\u3044\u65b9\n\n[dlt(data load tool)](https://dlthub.com/docs/intro)\u306ewrapper\u3068\u3057\u3066\u7c21\u4fbf\u306aconfig\u3067\u53d6\u5f97\u30c7\u30fc\u30bf\u3092\nDWH\u7b49\u306b\u30ed\u30fc\u30c9\u3067\u304d\u307e\u3059\u3002\n\n\u30ed\u30fc\u30c9\u53ef\u80fd\u306aDWH\u306b\u3064\u3044\u3066\u306f[dlt\u306e\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8](https://dlthub.com/docs/dlt-ecosystem/destinations/)\u3092\u53c2\u8003\u306b\u3057\u3066\u304f\u3060\u3055\u3044\u3002\n\nsee: [examples](examples/basic_load_example.py)\n\n```python\n# duckdb\u306e\u5834\u5408\nimport os\nimport dlt\nimport duckdb\nfrom estat_api_dlt_helper import EstatDltConfig, load_estat_data\n\ndb = duckdb.connect(\"estat_demo.duckdb\")\n\n# Simple configuration\nconfig = {\n    \"source\": {\n        \"app_id\": os.getenv(\"ESTAT_API_KEY\"), #(\u5fc5\u9808\u9805\u76ee)\n        \"statsDataId\": \"0000020201\",  # (\u5fc5\u9808\u9805\u76ee) \u4eba\u53e3\u63a8\u8a08\n        \"limit\": 100,  # (Optional) Small limit for demo\n    },\n    \"destination\": {\n        \"pipeline_name\": \"estat_demo\",\n        \"destination\": dlt.destinations.duckdb(db),\n        \"dataset_name\": \"estat_api_data\",\n        \"table_name\": \"population_estimates\",\n        \"write_disposition\": \"replace\",  # Replace existing data\n    },\n}\nestat_config = EstatDltConfig(**config)\n\n# Load data with one line\ninfo = load_estat_data(estat_config)\nprint(info)\n```\n\n### parse_response\u306e\u4f7f\u3044\u65b9 (Advanced)\n\n`load_estat_data()`\u306f\u7c21\u5358\u306a\u8a2d\u5b9a\u3067\u30ed\u30fc\u30c9\u3092\u53ef\u80fd\u306b\u3057\u307e\u3059\u304c\u3001dlt\u306e\u7d30\u304b\u3044\u8a2d\u5b9a\u3084\u6a5f\u80fd\u3092\u4f7f\u3044\u3053\u306a\u3057\u305f\u3044\u5834\u5408(`dlt.transform`\u3084`bigquery_adapter`\u306a\u3069)\u306f\u3001\ndlt\u306eresource\u3068pipeline\u3092\u305d\u308c\u305e\u308c\u5358\u4f53\u3067\u751f\u6210\u3057\u3001\u65e2\u5b58\u306edlt\u306e\u30b3\u30fc\u30c9\u3068\u540c\u3058\u3088\u3046\u306b\u6271\u3046\u3053\u3068\u3082\u3067\u304d\u307e\u3059\u3002\n\nsee: [examples (resource)](examples/resource_example.py)\n\nsee: [examples (pipeline)](examples/pipeline_example.py)\n\n## Development\n\n```bash\n# Install development dependencies\nuv sync\n\n# Run tests\nuv run pytest\n\n# Format code\nuv run ruff format src/\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "e-Stat API\u3092\u4f7f\u3063\u3066\u30c7\u30fc\u30bf\u3092\u53d6\u5f97\u3057\u3001dlt\u3092\u4f7f\u3063\u3066\u30c7\u30fc\u30bf\u3092\u30ed\u30fc\u30c9\u3059\u308b\u305f\u3081\u306e\u30d8\u30eb\u30d1\u30fc\u30e9\u30a4\u30d6\u30e9\u30ea",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/K-Oxon/estat_api_dlt_helper",
        "Issues": "https://github.com/K-Oxon/estat_api_dlt_helper/issues",
        "Repository": "https://github.com/K-Oxon/estat_api_dlt_helper"
    },
    "split_keywords": [
        "api",
        " dlt",
        " e-stat",
        " elt",
        " estat",
        " etl",
        " helper"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6f3908104bd06eb2270fc4950a7a115fc6947fc551c86436c238a87f61edf537",
                "md5": "d4cef97faa100fc34dffc53b5cc764c4",
                "sha256": "ed13177acddb658b0f1682876f784835cf7f8d8c479b00cf2c16a6502d56b135"
            },
            "downloads": -1,
            "filename": "estat_api_dlt_helper-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d4cef97faa100fc34dffc53b5cc764c4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 23597,
            "upload_time": "2025-07-22T07:27:09",
            "upload_time_iso_8601": "2025-07-22T07:27:09.554841Z",
            "url": "https://files.pythonhosted.org/packages/6f/39/08104bd06eb2270fc4950a7a115fc6947fc551c86436c238a87f61edf537/estat_api_dlt_helper-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5deb291c6c58da74fbd86ceccff3cf517831f34003fea3241d52bab73bc30bc0",
                "md5": "267e00a571d062375c85735128c44695",
                "sha256": "b276f6f0b4b80e5d2862be5f6f34945af63e1d0d736df3cf8f8e06ae6f6d627e"
            },
            "downloads": -1,
            "filename": "estat_api_dlt_helper-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "267e00a571d062375c85735128c44695",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 463341,
            "upload_time": "2025-07-22T07:27:10",
            "upload_time_iso_8601": "2025-07-22T07:27:10.999940Z",
            "url": "https://files.pythonhosted.org/packages/5d/eb/291c6c58da74fbd86ceccff3cf517831f34003fea3241d52bab73bc30bc0/estat_api_dlt_helper-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 07:27:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "K-Oxon",
    "github_project": "estat_api_dlt_helper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "estat-api-dlt-helper"
}
        
Elapsed time: 0.83675s