vietnam-provinces


Namevietnam-provinces JSON
Version 0.5.0 PyPI version JSON
download
home_pagehttps://github.com/sunshine-tech/VietnamProvinces.git
SummaryLibrary to provide list of Vietnam administrative divisions (tỉnh thành, quận huyện, phường xã).
upload_time2023-05-07 11:47:45
maintainer
docs_urlNone
authorNguyễn Hồng Quân
requires_python>=3.7,<4.0
licenseGPL-3.0-or-later
keywords vietnam administrative division locality
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ================
VietnamProvinces
================

|image love| |image pypi|

[`Tiếng Việt <vietnamese_>`_]

Library to provide list of Vietnam administrative divisions (tỉnh thành, quận huyện, phường xã) with the name and code as defined by `General Statistics Office of Viet Nam <gso_vn_>`_ (Tổng cục Thống kê).

Example:

.. code-block:: json

    {
        "name": "Tỉnh Cà Mau",
        "code": 96,
        "codename": "tinh_ca_mau",
        "division_type": "tỉnh",
        "phone_code": 290,
        "districts": [
            {
                "name": "Huyện Đầm Dơi",
                "code": 970,
                "codename": "huyen_dam_doi",
                "division_type": "huyện",
                "wards": [
                    {
                        "name": "Thị trấn Đầm Dơi",
                        "code": 32152,
                        "codename": "thi_tran_dam_doi",
                        "division_type": "thị trấn"
                    },
                    {
                        "name": "Xã Tạ An Khương",
                        "code": 32155,
                        "codename": "xa_ta_an_khuong",
                        "division_type": "xã"
                    },
                ]
            }
        ]
    }

This library provides data in these forms:

1. JSON

This data is suitable for applications which don't need to access the data often. They are fine with loading JSON and extract information from it. The JSON files are saved in *data* folder. You can get the file path via ``vietnam_provinces.NESTED_DIVISIONS_JSON_PATH`` variable.

Note that this variable only returns the path of the file, not the content. It is up to application developer to use any method to parse the JSON. For example:

.. code-block:: python

    import orjson
    import rapidjson
    from vietnam_provinces import NESTED_DIVISIONS_JSON_PATH

    # With rapidjson
    with NESTED_DIVISIONS_JSON_PATH.open() as f:
        rapidjson.load(f)

    # With orjson
    orjson.loads(NESTED_DIVISIONS_JSON_PATH.read_bytes())

Due to the big amount of data (10609 wards all over Viet Nam), this loading will be slow.


2. Python data type

This data is useful for some applications which need to access the data more often. They are built as ``Enum``, where you can import in Python code:

.. code-block:: python

    >>> from vietnam_provinces.enums import ProvinceEnum, ProvinceDEnum, DistrictEnum, DistrictDEnum

    >>> ProvinceEnum.P_77
    <ProvinceEnum.P_77: Province(name='Tỉnh Bà Rịa - Vũng Tàu', code=77, division_type=<VietNamDivisionType.TINH: 'tỉnh'>, codename='tinh_ba_ria_vung_tau', phone_code=254)>

    >>> ProvinceDEnum.BA_RIA_VUNG_TAU
    <ProvinceDEnum.BA_RIA_VUNG_TAU: Province(name='Tỉnh Bà Rịa - Vũng Tàu', code=77, division_type=<VietNamDivisionType.TINH: 'tỉnh'>, codename='tinh_ba_ria_vung_tau', phone_code=254)>

    >>> DistrictEnum.D_624
    >>> <DistrictEnum.D_624: District(name='Thị xã Ayun Pa', code=624, division_type=<VietNamDivisionType.THI_XA: 'thị xã'>, codename='thi_xa_ayun_pa', province_code=64)>

    >>> DistrictDEnum.AYUN_PA_GL
    <DistrictDEnum.AYUN_PA_GL: District(name='Thị xã Ayun Pa', code=624, division_type=<VietNamDivisionType.THI_XA: 'thị xã'>, codename='thi_xa_ayun_pa', province_code=64)>

    >>> from vietnam_provinces.enums.wards import WardEnum, WardDEnum

    >>> WardEnum.W_7450
    <WardEnum.W_7450: Ward(name='Xã Đông Hưng', code=7450, division_type=<VietNamDivisionType.XA: 'xã'>, codename='xa_dong_hung', district_code=218)>

    >>> WardDEnum.BG_DONG_HUNG_7450
    <WardDEnum.BG_DONG_HUNG_7450: Ward(name='Xã Đông Hưng', code=7450, division_type=<VietNamDivisionType.XA: 'xã'>, codename='xa_dong_hung', district_code=218)>


Loading wards this way is far more faster than the JSON option.

They are made as ``Enum``, so that library user can take advantage of auto-complete feature of IDE/code editors in development. It prevents typo mistake.

The Ward Enum has two variants:

- ``WardEnum``: Has member name in form of numeric ward code (``W_28912``). It helps look up a ward by its code (which is a most-seen use case).

- ``WardDEnum``: Has more readable member name (``D`` means "descriptive"), to help the application code easier to reason about. For example, looking at ``WardDEnum.BT_PHAN_RI_CUA_22972``, the programmer can guess that this ward is "Phan Rí Cửa", of "Bình Thuận" province.

Similarly, other levels (District, Province) also have two variants of Enum.

Example of looking up ``Ward``, ``District``, ``Province`` with theirs numeric code:

.. code-block:: python

    # Assume that you are loading user info from your database
    user_info = load_user_info()

    province_code = user_info['province_code']
    province = ProvinceEnum[f'P_{province_code}'].value

Unlike ``ProvinceDEnum``, ``DistrictDEnum``, the ``WardDEnum`` has ward code in member name. It is because there are too many Vietnamese wards with the same name. There is no way to build unique ID for wards, with pure Latin letters (Vietnamese punctuations stripped), even if we add district and province info to the ID. Let's take "Xã Đông Thành" and "Xã Đông Thạnh" as example. Both belong to "Huyện Bình Minh" of "Vĩnh Long", both produces ID name "DONG_THANH". Although Python allows Unicode as ID name, like "ĐÔNG_THẠNH", but it is not practical yet because the code formatter tool (`Black`_) will still normalizes it to Latin form.

Because the ``WardEnum`` has many records (10609 in February 2021) and may not be needed in some applications, I move it to separate module, to avoid loading automatically to application.


Member of these enums, the ``Province``, ``District`` and ``Ward`` data types, can be imported from top-level of ``vietnam_provinces``.

.. code-block:: python

    >>> from vietnam_provinces import Province, District, Ward


Install
-------

.. code-block:: sh

    pip3 install vietnam-provinces


This library is compatible with Python 3.7+.


Development
-----------

In development, this project has a tool to convert data from government sources.

The tool doesn't directly crawl data from government websites because the data rarely change (it doesn't worth developing the feature which you only need to use each ten years), and because those websites provide data in unfriendly Microsoft Office formats.

The tool is tested on Linux only (may not run on Windows).

Update data
~~~~~~~~~~~

In the future, when the authority reorganize administrative divisions, we need to collect this data again from GSOVN website. Do:

- Go to: https://danhmuchanhchinh.gso.gov.vn/ (this URL may change when `GSOVN <gso_vn_>`_ replaces their software).
- Find the button "Xuất Excel".
- Tick the "Quận Huyện Phường Xã" checkbox.
- Click the button to export and download list of units in Excel (XLS) file.
- Use LibreOffice to convert Excel file to CSV file. For example, we name it *Xa_2023-05-07.csv*.
- Run this tool to compute data to JSON format:

.. code-block:: sh

    python3 -m dev -i dev/seed-data/Xa_2023-05-07.csv -o vietnam_provinces/data/nested-divisions.json

You can run

.. code-block:: sh

    python3 -m dev --help

to see more options of that tool.

Note that this tool is only available in the source folder (cloned from Git). It is not included in the distributable Python package.


Generate Python code
~~~~~~~~~~~~~~~~~~~~

.. code-block:: sh

    python3 -m dev -i dev/seed-data/Xa_2023-05-07.csv -f python


Data source
~~~~~~~~~~~

- Name and code of provinces, districts and wards:  `General Statistics Office of Viet Nam <gso_vn_>`_.
- Phone area code: `Thái Bình province's department of Information and Communication <tb_ic_>`_.


Credit
------

Given to you by `Nguyễn Hồng Quân <quan_>`_, after nights and weekends.


.. |image love| image:: https://madewithlove.now.sh/vn?heart=true&colorA=%23ffcd00&colorB=%23da251d
.. |image pypi| image:: https://badgen.net/pypi/v/vietnam-provinces
   :target: https://pypi.org/project/vietnam-provinces/
.. _vietnamese: README.vi_VN.rst
.. _gso_vn: https://www.gso.gov.vn/
.. _tb_ic: https://sotttt.thaibinh.gov.vn/tin-tuc/buu-chinh-vien-thong/tra-cuu-ma-vung-dien-thoai-co-dinh-mat-dat-ma-mang-dien-thoa2.html
.. _dataclass: https://docs.python.org/3/library/dataclasses.html
.. _fast-enum: https://pypi.org/project/fast-enum/
.. _pydantic: https://pypi.org/project/pydantic/
.. _Black: https://github.com/psf/black
.. _quan: https://quan.hoabinh.vn

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sunshine-tech/VietnamProvinces.git",
    "name": "vietnam-provinces",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "Vietnam,administrative,division,locality",
    "author": "Nguy\u1ec5n H\u1ed3ng Qu\u00e2n",
    "author_email": "ng.hong.quan@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/95/52/b2175aa5a93ca5f1cbdc325c9c2f8450f6d924c43b9f3567b74fd60d1b25/vietnam_provinces-0.5.0.tar.gz",
    "platform": null,
    "description": "================\nVietnamProvinces\n================\n\n|image love| |image pypi|\n\n[`Ti\u1ebfng Vi\u1ec7t <vietnamese_>`_]\n\nLibrary to provide list of Vietnam administrative divisions (t\u1ec9nh th\u00e0nh, qu\u1eadn huy\u1ec7n, ph\u01b0\u1eddng x\u00e3) with the name and code as defined by `General Statistics Office of Viet Nam <gso_vn_>`_ (T\u1ed5ng c\u1ee5c Th\u1ed1ng k\u00ea).\n\nExample:\n\n.. code-block:: json\n\n    {\n        \"name\": \"T\u1ec9nh C\u00e0 Mau\",\n        \"code\": 96,\n        \"codename\": \"tinh_ca_mau\",\n        \"division_type\": \"t\u1ec9nh\",\n        \"phone_code\": 290,\n        \"districts\": [\n            {\n                \"name\": \"Huy\u1ec7n \u0110\u1ea7m D\u01a1i\",\n                \"code\": 970,\n                \"codename\": \"huyen_dam_doi\",\n                \"division_type\": \"huy\u1ec7n\",\n                \"wards\": [\n                    {\n                        \"name\": \"Th\u1ecb tr\u1ea5n \u0110\u1ea7m D\u01a1i\",\n                        \"code\": 32152,\n                        \"codename\": \"thi_tran_dam_doi\",\n                        \"division_type\": \"th\u1ecb tr\u1ea5n\"\n                    },\n                    {\n                        \"name\": \"X\u00e3 T\u1ea1 An Kh\u01b0\u01a1ng\",\n                        \"code\": 32155,\n                        \"codename\": \"xa_ta_an_khuong\",\n                        \"division_type\": \"x\u00e3\"\n                    },\n                ]\n            }\n        ]\n    }\n\nThis library provides data in these forms:\n\n1. JSON\n\nThis data is suitable for applications which don't need to access the data often. They are fine with loading JSON and extract information from it. The JSON files are saved in *data* folder. You can get the file path via ``vietnam_provinces.NESTED_DIVISIONS_JSON_PATH`` variable.\n\nNote that this variable only returns the path of the file, not the content. It is up to application developer to use any method to parse the JSON. For example:\n\n.. code-block:: python\n\n    import orjson\n    import rapidjson\n    from vietnam_provinces import NESTED_DIVISIONS_JSON_PATH\n\n    # With rapidjson\n    with NESTED_DIVISIONS_JSON_PATH.open() as f:\n        rapidjson.load(f)\n\n    # With orjson\n    orjson.loads(NESTED_DIVISIONS_JSON_PATH.read_bytes())\n\nDue to the big amount of data (10609 wards all over Viet Nam), this loading will be slow.\n\n\n2. Python data type\n\nThis data is useful for some applications which need to access the data more often. They are built as ``Enum``, where you can import in Python code:\n\n.. code-block:: python\n\n    >>> from vietnam_provinces.enums import ProvinceEnum, ProvinceDEnum, DistrictEnum, DistrictDEnum\n\n    >>> ProvinceEnum.P_77\n    <ProvinceEnum.P_77: Province(name='T\u1ec9nh B\u00e0 R\u1ecba - V\u0169ng T\u00e0u', code=77, division_type=<VietNamDivisionType.TINH: 't\u1ec9nh'>, codename='tinh_ba_ria_vung_tau', phone_code=254)>\n\n    >>> ProvinceDEnum.BA_RIA_VUNG_TAU\n    <ProvinceDEnum.BA_RIA_VUNG_TAU: Province(name='T\u1ec9nh B\u00e0 R\u1ecba - V\u0169ng T\u00e0u', code=77, division_type=<VietNamDivisionType.TINH: 't\u1ec9nh'>, codename='tinh_ba_ria_vung_tau', phone_code=254)>\n\n    >>> DistrictEnum.D_624\n    >>> <DistrictEnum.D_624: District(name='Th\u1ecb x\u00e3 Ayun Pa', code=624, division_type=<VietNamDivisionType.THI_XA: 'th\u1ecb x\u00e3'>, codename='thi_xa_ayun_pa', province_code=64)>\n\n    >>> DistrictDEnum.AYUN_PA_GL\n    <DistrictDEnum.AYUN_PA_GL: District(name='Th\u1ecb x\u00e3 Ayun Pa', code=624, division_type=<VietNamDivisionType.THI_XA: 'th\u1ecb x\u00e3'>, codename='thi_xa_ayun_pa', province_code=64)>\n\n    >>> from vietnam_provinces.enums.wards import WardEnum, WardDEnum\n\n    >>> WardEnum.W_7450\n    <WardEnum.W_7450: Ward(name='X\u00e3 \u0110\u00f4ng H\u01b0ng', code=7450, division_type=<VietNamDivisionType.XA: 'x\u00e3'>, codename='xa_dong_hung', district_code=218)>\n\n    >>> WardDEnum.BG_DONG_HUNG_7450\n    <WardDEnum.BG_DONG_HUNG_7450: Ward(name='X\u00e3 \u0110\u00f4ng H\u01b0ng', code=7450, division_type=<VietNamDivisionType.XA: 'x\u00e3'>, codename='xa_dong_hung', district_code=218)>\n\n\nLoading wards this way is far more faster than the JSON option.\n\nThey are made as ``Enum``, so that library user can take advantage of auto-complete feature of IDE/code editors in development. It prevents typo mistake.\n\nThe Ward Enum has two variants:\n\n- ``WardEnum``: Has member name in form of numeric ward code (``W_28912``). It helps look up a ward by its code (which is a most-seen use case).\n\n- ``WardDEnum``: Has more readable member name (``D`` means \"descriptive\"), to help the application code easier to reason about. For example, looking at ``WardDEnum.BT_PHAN_RI_CUA_22972``, the programmer can guess that this ward is \"Phan R\u00ed C\u1eeda\", of \"B\u00ecnh Thu\u1eadn\" province.\n\nSimilarly, other levels (District, Province) also have two variants of Enum.\n\nExample of looking up ``Ward``, ``District``, ``Province`` with theirs numeric code:\n\n.. code-block:: python\n\n    # Assume that you are loading user info from your database\n    user_info = load_user_info()\n\n    province_code = user_info['province_code']\n    province = ProvinceEnum[f'P_{province_code}'].value\n\nUnlike ``ProvinceDEnum``, ``DistrictDEnum``, the ``WardDEnum`` has ward code in member name. It is because there are too many Vietnamese wards with the same name. There is no way to build unique ID for wards, with pure Latin letters (Vietnamese punctuations stripped), even if we add district and province info to the ID. Let's take \"X\u00e3 \u0110\u00f4ng Th\u00e0nh\" and \"X\u00e3 \u0110\u00f4ng Th\u1ea1nh\" as example. Both belong to \"Huy\u1ec7n B\u00ecnh Minh\" of \"V\u0129nh Long\", both produces ID name \"DONG_THANH\". Although Python allows Unicode as ID name, like \"\u0110\u00d4NG_TH\u1ea0NH\", but it is not practical yet because the code formatter tool (`Black`_) will still normalizes it to Latin form.\n\nBecause the ``WardEnum`` has many records (10609 in February 2021) and may not be needed in some applications, I move it to separate module, to avoid loading automatically to application.\n\n\nMember of these enums, the ``Province``, ``District`` and ``Ward`` data types, can be imported from top-level of ``vietnam_provinces``.\n\n.. code-block:: python\n\n    >>> from vietnam_provinces import Province, District, Ward\n\n\nInstall\n-------\n\n.. code-block:: sh\n\n    pip3 install vietnam-provinces\n\n\nThis library is compatible with Python 3.7+.\n\n\nDevelopment\n-----------\n\nIn development, this project has a tool to convert data from government sources.\n\nThe tool doesn't directly crawl data from government websites because the data rarely change (it doesn't worth developing the feature which you only need to use each ten years), and because those websites provide data in unfriendly Microsoft Office formats.\n\nThe tool is tested on Linux only (may not run on Windows).\n\nUpdate data\n~~~~~~~~~~~\n\nIn the future, when the authority reorganize administrative divisions, we need to collect this data again from GSOVN website. Do:\n\n- Go to: https://danhmuchanhchinh.gso.gov.vn/ (this URL may change when `GSOVN <gso_vn_>`_ replaces their software).\n- Find the button \"Xu\u1ea5t Excel\".\n- Tick the \"Qu\u1eadn Huy\u1ec7n Ph\u01b0\u1eddng X\u00e3\" checkbox.\n- Click the button to export and download list of units in Excel (XLS) file.\n- Use LibreOffice to convert Excel file to CSV file. For example, we name it *Xa_2023-05-07.csv*.\n- Run this tool to compute data to JSON format:\n\n.. code-block:: sh\n\n    python3 -m dev -i dev/seed-data/Xa_2023-05-07.csv -o vietnam_provinces/data/nested-divisions.json\n\nYou can run\n\n.. code-block:: sh\n\n    python3 -m dev --help\n\nto see more options of that tool.\n\nNote that this tool is only available in the source folder (cloned from Git). It is not included in the distributable Python package.\n\n\nGenerate Python code\n~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: sh\n\n    python3 -m dev -i dev/seed-data/Xa_2023-05-07.csv -f python\n\n\nData source\n~~~~~~~~~~~\n\n- Name and code of provinces, districts and wards:  `General Statistics Office of Viet Nam <gso_vn_>`_.\n- Phone area code: `Th\u00e1i B\u00ecnh province's department of Information and Communication <tb_ic_>`_.\n\n\nCredit\n------\n\nGiven to you by `Nguy\u1ec5n H\u1ed3ng Qu\u00e2n <quan_>`_, after nights and weekends.\n\n\n.. |image love| image:: https://madewithlove.now.sh/vn?heart=true&colorA=%23ffcd00&colorB=%23da251d\n.. |image pypi| image:: https://badgen.net/pypi/v/vietnam-provinces\n   :target: https://pypi.org/project/vietnam-provinces/\n.. _vietnamese: README.vi_VN.rst\n.. _gso_vn: https://www.gso.gov.vn/\n.. _tb_ic: https://sotttt.thaibinh.gov.vn/tin-tuc/buu-chinh-vien-thong/tra-cuu-ma-vung-dien-thoai-co-dinh-mat-dat-ma-mang-dien-thoa2.html\n.. _dataclass: https://docs.python.org/3/library/dataclasses.html\n.. _fast-enum: https://pypi.org/project/fast-enum/\n.. _pydantic: https://pypi.org/project/pydantic/\n.. _Black: https://github.com/psf/black\n.. _quan: https://quan.hoabinh.vn\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "Library to provide list of Vietnam administrative divisions (t\u1ec9nh th\u00e0nh, qu\u1eadn huy\u1ec7n, ph\u01b0\u1eddng x\u00e3).",
    "version": "0.5.0",
    "project_urls": {
        "Homepage": "https://github.com/sunshine-tech/VietnamProvinces.git",
        "Repository": "https://github.com/sunshine-tech/VietnamProvinces.git"
    },
    "split_keywords": [
        "vietnam",
        "administrative",
        "division",
        "locality"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09b753968f09701237d250fce7f46e768134c99901f85972dc4c37b871c6c100",
                "md5": "4e2298da06f7a9378e2371c88af71ba3",
                "sha256": "c56acf6c4ad3b4275b77ebf00903d862f04a389e385813c7df9a89b6490dbdfc"
            },
            "downloads": -1,
            "filename": "vietnam_provinces-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4e2298da06f7a9378e2371c88af71ba3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 719058,
            "upload_time": "2023-05-07T11:47:41",
            "upload_time_iso_8601": "2023-05-07T11:47:41.857681Z",
            "url": "https://files.pythonhosted.org/packages/09/b7/53968f09701237d250fce7f46e768134c99901f85972dc4c37b871c6c100/vietnam_provinces-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9552b2175aa5a93ca5f1cbdc325c9c2f8450f6d924c43b9f3567b74fd60d1b25",
                "md5": "964b4f77e1fca7d4666bf54332a4e1a9",
                "sha256": "72a6b84766d115a8307441ceae9d2726ad1918eda2b3248b7dbc6f00017a7d1b"
            },
            "downloads": -1,
            "filename": "vietnam_provinces-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "964b4f77e1fca7d4666bf54332a4e1a9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 681710,
            "upload_time": "2023-05-07T11:47:45",
            "upload_time_iso_8601": "2023-05-07T11:47:45.193836Z",
            "url": "https://files.pythonhosted.org/packages/95/52/b2175aa5a93ca5f1cbdc325c9c2f8450f6d924c43b9f3567b74fd60d1b25/vietnam_provinces-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-07 11:47:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sunshine-tech",
    "github_project": "VietnamProvinces",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "vietnam-provinces"
}
        
Elapsed time: 0.06586s