# french-cities
This repo contains the documentation of the python french-cities package, a
package aimed at improving the referencing of municipalities in French 🇫🇷
datasets.
# Documentation
A full documentation with usecases is available at
[https://tgrandje.github.io/french-cities/](https://tgrandje.github.io/french-cities/).
Obviously, it is only available in french as yet.
Any help is welcome to build a multi-lingual documentation website.
Until then, a basic english documentation will stay available in the present README.
# Why french-cities?
Do you have any data:
* which municipal locations are provided through approximate addresses, or via geographical 🗺️ coordinates?
* which municipalities are referenced by their postal codes and their labels 😮?
* which departments are written in full text 🔡?
* which spelling are dubious (for instance, torturing the _<del>Loire</del> Loir-et-Cher_) or obsolete
(for instance, referencing _Templeuve_, a city renamed as _Templeuve-en-Pévèle_ since 2015)?
* or compiled over the years and where cities' codes are a patchwork of multiple 🤯 vintages?
**Then 'french-cities' is for you 🫵!**
# Installation
`pip install french-cities`
# Configuration
## Setting INSEE's API keys
`french-cities` uses `pynsee` under the hood. For it to work, you need to set
the credentials up. You can set up to four environment variables:
* insee_key
* insee_secret,
* http_proxy (if accessing web behind a corporate proxy)
* https_proxy (if accessing web behind a corporate proxy)
Please refer to [`pynsee`'s documentation](https://pynsee.readthedocs.io/en/latest/api_subscription.html)
to help configure the API's access.
Note that setting environment variable for proxy will set it for both `pynsee`
and `geopy`.
## Session management
Note that `pynsee` and `geopy` use their own web session. Every Session object
you will pass to `french-cities` will **NOT** be shared with `pynsee` or `geopy`.
This explains the possibility to pass a session as an argument to `french-cities`
functions, even if you had to configure the corporate proxy through environment
variables for `pynsee` and `geopy`.
## Basic usage
### Retrieve departements' codes
`french-cities` can retrieve departement's codes from postal codes, official
(COG/INSEE) codes or labels.
Working from postal codes will make use of the BAN (Base Adresse Nationale)
and should return correct results. The case of "Cedex" codes is only partially
covered by the BAN, so [OpenDataSoft's API](https://public.opendatasoft.com/explore/dataset/correspondance-code-cedex-code-insee/api/?flg=fr&q=code%3D68013&lang=fr),
constructed upon [Christian Quest works](https://public.opendatasoft.com/explore/dataset/correspondance-code-cedex-code-insee/information/?flg=fr&q=code%3D68013&lang=fr).
This consumes the freemium API and no authentication is included:
the user of the present package should check the current API's legal terms
directly on OpenDataSoft's website.
Working from official codes may sometime give empty results (when working on an old
dataset and with cities which have changed of departments, which is rarely seen).
This is deliberate: it will mostly use the first characters of the cities' codes
(which is a fast process and 99% accurate) instead of using an API (which is
lengthy though foolproof).
```
from french_cities import find_departements
import pandas as pd
df = pd.DataFrame(
{
"code_postal": ["59800", "97133", "20000"],
"code_commune": ["59350", "97701", "2A004"],
"communes": ["Lille", "Saint-Barthélémy", "Ajaccio"],
"deps": ["59", "977", "2A"],
}
)
df = find_departements(df, source="code_postal", alias="dep_A", type_field="postcode")
df = find_departements(df, source="code_commune", alias="dep_B", type_field="insee")
df = find_departements(df, source="communes", alias="dep_C", type_field="label")
print(df)
```
For a complete documentation on `find_departements`, please type `help(find_departements)`.
### Retrieve cities' codes
`french-cities` can retrieve cities' codes from multiple fields. It will work
out basic mistakes (up to a certain limit).
The columns used by the algorithm can be (in the order of precedence used by
the algorithm):
* 'x' and 'y' (in that case, epsg must be explicitly given);
* 'postcode' and 'city'
* 'address', 'postcode' and 'city'
* 'department' and 'city'
Note that the algorithm can (and will) make errors using xy coordinates on a
older vintage (ie different from the current one) in the case of historic
splitting of cities (the geographic files are not vintaged yet).
The lexical (postcode, city, address, departement) recognition is based on a
python fuzzy matching, the BAN API(base adresse nationale) or the Nominatim
API of OSM (if activated). The algorithm won't collect underscored
results, but failures may still occure.
```
from french_cities import find_city
import pandas as pd
df = pd.DataFrame(
[
{
"x": 2.294694,
"y": 48.858093,
"location": "Tour Eiffel",
"dep": "75",
"city": "Paris",
"address": "5 Avenue Anatole France",
"postcode": "75007",
"target": "75056",
},
{
"x": 8.738962,
"y": 41.919216,
"location": "mairie",
"dep": "2A",
"city": "Ajaccio",
"address": "Antoine Sérafini",
"postcode": "20000",
"target": "2A004",
},
{
"x": -52.334990,
"y": 4.938194,
"location": "mairie",
"dep": "973",
"city": "Cayenne",
"address": "1 rue de Rémire",
"postcode": "97300",
"target": "97302",
},
{
"x": np.nan,
"y": np.nan,
"location": "Erreur code postal Lille/Lyon",
"dep": "59",
"city": "Lille",
"address": "1 rue Faidherbe",
"postcode": "69000",
"target": "59350",
},
]
)
df = find_city(df, epsg=4326)
print(df)
```
For a complete documentation on `find_city`, please type
`help(find_city)`.
**Note** : to activate `geopy` (Nominatim API from OpenStreeMap) usage in last
resort, you will need to use the argument `use_nominatim_backend=True`.
### Set vintage to cities' codes
`french-cities` can try to project a given dataframe into a set vintage,
starting from an unknown vintage (or even a non-vintaged dataset, which is
often the case).
Error may occur for splitted cities as the starting vintage is unknown
(or inexistant).
In case of a known starting vintage, you can make use of
INSEE's projection API (with `pynsee`). Note that this might prove slower as
each row will have to induce a request to the API (which allows up to
30 requests/minute).
Basically, the algorithm of `french-cities` will try to see if a given city
code exists in the desired vintage:
* if yes, it will be kept (we the aforementionned approximation regarding
restored cities);
* if not, it will look in older vintages and make use of INSEE's projection API.
This algorithm will also:
* convert communal districts' into cities' codes;
* convert delegated or associated cities' codes into it's parent's.
```
from french_cities import set_vintage
import pandas as pd
df = pd.DataFrame(
[
["07180", "Fusion"],
["02077", "Commune déléguée"],
["02564", "Commune nouvelle"],
["75101", "Arrondissement municipal"],
["59298", "Commune associée"],
["99999", "Code erroné"],
["14472", "Oudon"],
],
columns=["A", "Test"],
index=["A", "B", "C", "D", 1, 2, 3],
)
df = set_vintage(df, 2023, field="A")
print(df)
```
For a complete documentation on `set_vintage`, please type
`help(set_vintage)`.
## External documentation
`french-cities` makes use of multiple APIs. Please read :
* [documentation](https://adresse.data.gouv.fr/api-doc/adresse) (in french) on API Adresse
* [documentation](https://public.opendatasoft.com/explore/dataset/correspondance-code-cedex-code-insee/api/?flg=fr&q=code%3D68013&lang=fr) (in french) on OpenDataSoft API
* [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/)
## Support
In case of bugs, please open an issue [on the repo](https://github.com/tgrandje/french-cities/issues).
## Contribution
Any help is welcome.
## Author
Thomas GRANDJEAN (DREAL Hauts-de-France, service Information, Développement Durable et Évaluation Environnementale, pôle Promotion de la Connaissance).
## Licence
GPL-3.0-or-later
## Project Status
Stable.
Raw data
{
"_id": null,
"home_page": "https://github.com/tgrandje/french-cities/",
"name": "french-cities",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "france, cities",
"author": "thomas.grandjean",
"author_email": "thomas.grandjean@developpement-durable.gouv.fr",
"download_url": "https://files.pythonhosted.org/packages/44/80/24b471ad21d6f160092dda6c606a38a222ba4d2074338637a68b9ba503d0/french_cities-1.0.2.tar.gz",
"platform": null,
"description": "# french-cities\nThis repo contains the documentation of the python french-cities package, a \npackage aimed at improving the referencing of municipalities in French \ud83c\uddeb\ud83c\uddf7 \ndatasets.\n\n# Documentation\n\nA full documentation with usecases is available at\n[https://tgrandje.github.io/french-cities/](https://tgrandje.github.io/french-cities/).\nObviously, it is only available in french as yet.\nAny help is welcome to build a multi-lingual documentation website.\n\nUntil then, a basic english documentation will stay available in the present README. \n\n# Why french-cities?\n\nDo you have any data:\n* which municipal locations are provided through approximate addresses, or via geographical \ud83d\uddfa\ufe0f coordinates?\n* which municipalities are referenced by their postal codes and their labels \ud83d\ude2e?\n* which departments are written in full text \ud83d\udd21?\n* which spelling are dubious (for instance, torturing the _<del>Loire</del> Loir-et-Cher_) or obsolete \n(for instance, referencing _Templeuve_, a city renamed as _Templeuve-en-P\u00e9v\u00e8le_ since 2015)? \n* or compiled over the years and where cities' codes are a patchwork of multiple \ud83e\udd2f vintages?\n\n**Then 'french-cities' is for you \ud83e\udef5!**\n\n# Installation\n\n`pip install french-cities`\n\n# Configuration\n\n## Setting INSEE's API keys\n`french-cities` uses `pynsee` under the hood. For it to work, you need to set\nthe credentials up. You can set up to four environment variables:\n* insee_key\n* insee_secret, \n* http_proxy (if accessing web behind a corporate proxy)\n* https_proxy (if accessing web behind a corporate proxy)\n\nPlease refer to [`pynsee`'s documentation](https://pynsee.readthedocs.io/en/latest/api_subscription.html)\nto help configure the API's access.\n\nNote that setting environment variable for proxy will set it for both `pynsee`\nand `geopy`.\n\n## Session management\nNote that `pynsee` and `geopy` use their own web session. Every Session object \nyou will pass to `french-cities` will **NOT** be shared with `pynsee` or `geopy`. \nThis explains the possibility to pass a session as an argument to `french-cities` \nfunctions, even if you had to configure the corporate proxy through environment \nvariables for `pynsee` and `geopy`.\n\n## Basic usage\n\n### Retrieve departements' codes\n`french-cities` can retrieve departement's codes from postal codes, official\n(COG/INSEE) codes or labels. \n\nWorking from postal codes will make use of the BAN (Base Adresse Nationale)\nand should return correct results. The case of \"Cedex\" codes is only partially\ncovered by the BAN, so [OpenDataSoft's API](https://public.opendatasoft.com/explore/dataset/correspondance-code-cedex-code-insee/api/?flg=fr&q=code%3D68013&lang=fr),\nconstructed upon [Christian Quest works](https://public.opendatasoft.com/explore/dataset/correspondance-code-cedex-code-insee/information/?flg=fr&q=code%3D68013&lang=fr).\nThis consumes the freemium API and no authentication is included:\nthe user of the present package should check the current API's legal terms\ndirectly on OpenDataSoft's website.\n\nWorking from official codes may sometime give empty results (when working on an old\ndataset and with cities which have changed of departments, which is rarely seen). \nThis is deliberate: it will mostly use the first characters of the cities' codes \n(which is a fast process and 99% accurate) instead of using an API (which is\nlengthy though foolproof).\n\n```\nfrom french_cities import find_departements\nimport pandas as pd\n\ndf = pd.DataFrame(\n {\n \"code_postal\": [\"59800\", \"97133\", \"20000\"],\n \"code_commune\": [\"59350\", \"97701\", \"2A004\"],\n \"communes\": [\"Lille\", \"Saint-Barth\u00e9l\u00e9my\", \"Ajaccio\"],\n \"deps\": [\"59\", \"977\", \"2A\"],\n }\n)\ndf = find_departements(df, source=\"code_postal\", alias=\"dep_A\", type_field=\"postcode\")\ndf = find_departements(df, source=\"code_commune\", alias=\"dep_B\", type_field=\"insee\")\ndf = find_departements(df, source=\"communes\", alias=\"dep_C\", type_field=\"label\")\n\nprint(df)\n```\n\nFor a complete documentation on `find_departements`, please type `help(find_departements)`.\n\n### Retrieve cities' codes\n`french-cities` can retrieve cities' codes from multiple fields. It will work\nout basic mistakes (up to a certain limit).\n\nThe columns used by the algorithm can be (in the order of precedence used by\nthe algorithm):\n* 'x' and 'y' (in that case, epsg must be explicitly given);\n* 'postcode' and 'city'\n* 'address', 'postcode' and 'city'\n* 'department' and 'city'\n\nNote that the algorithm can (and will) make errors using xy coordinates on a \nolder vintage (ie different from the current one) in the case of historic \nsplitting of cities (the geographic files are not vintaged yet).\n\nThe lexical (postcode, city, address, departement) recognition is based on a\npython fuzzy matching, the BAN API(base adresse nationale) or the Nominatim\nAPI of OSM (if activated). The algorithm won't collect underscored\nresults, but failures may still occure.\n\n```\nfrom french_cities import find_city\nimport pandas as pd\n\ndf = pd.DataFrame(\n [\n {\n \"x\": 2.294694,\n \"y\": 48.858093,\n \"location\": \"Tour Eiffel\",\n \"dep\": \"75\",\n \"city\": \"Paris\",\n \"address\": \"5 Avenue Anatole France\",\n \"postcode\": \"75007\",\n \"target\": \"75056\",\n },\n {\n \"x\": 8.738962,\n \"y\": 41.919216,\n \"location\": \"mairie\",\n \"dep\": \"2A\",\n \"city\": \"Ajaccio\",\n \"address\": \"Antoine S\u00e9rafini\",\n \"postcode\": \"20000\",\n \"target\": \"2A004\",\n },\n {\n \"x\": -52.334990,\n \"y\": 4.938194,\n \"location\": \"mairie\",\n \"dep\": \"973\",\n \"city\": \"Cayenne\",\n \"address\": \"1 rue de R\u00e9mire\",\n \"postcode\": \"97300\",\n \"target\": \"97302\",\n },\n {\n \"x\": np.nan,\n \"y\": np.nan,\n \"location\": \"Erreur code postal Lille/Lyon\",\n \"dep\": \"59\",\n \"city\": \"Lille\",\n \"address\": \"1 rue Faidherbe\",\n \"postcode\": \"69000\",\n \"target\": \"59350\",\n },\n ]\n)\ndf = find_city(df, epsg=4326)\n\nprint(df)\n```\n\nFor a complete documentation on `find_city`, please type \n`help(find_city)`.\n\n**Note** : to activate `geopy` (Nominatim API from OpenStreeMap) usage in last \nresort, you will need to use the argument `use_nominatim_backend=True`.\n\n### Set vintage to cities' codes\n`french-cities` can try to project a given dataframe into a set vintage,\nstarting from an unknown vintage (or even a non-vintaged dataset, which is \noften the case).\n\nError may occur for splitted cities as the starting vintage is unknown\n(or inexistant).\n\nIn case of a known starting vintage, you can make use of\nINSEE's projection API (with `pynsee`). Note that this might prove slower as\neach row will have to induce a request to the API (which allows up to \n30 requests/minute).\n\nBasically, the algorithm of `french-cities` will try to see if a given city\ncode exists in the desired vintage:\n* if yes, it will be kept (we the aforementionned approximation regarding\nrestored cities);\n* if not, it will look in older vintages and make use of INSEE's projection API.\n\nThis algorithm will also:\n* convert communal districts' into cities' codes;\n* convert delegated or associated cities' codes into it's parent's.\n\n```\nfrom french_cities import set_vintage\nimport pandas as pd\n\ndf = pd.DataFrame(\n [\n [\"07180\", \"Fusion\"],\n [\"02077\", \"Commune d\u00e9l\u00e9gu\u00e9e\"],\n [\"02564\", \"Commune nouvelle\"],\n [\"75101\", \"Arrondissement municipal\"],\n [\"59298\", \"Commune associ\u00e9e\"],\n [\"99999\", \"Code erron\u00e9\"],\n [\"14472\", \"Oudon\"],\n ],\n columns=[\"A\", \"Test\"],\n index=[\"A\", \"B\", \"C\", \"D\", 1, 2, 3],\n)\ndf = set_vintage(df, 2023, field=\"A\")\nprint(df)\n```\n\nFor a complete documentation on `set_vintage`, please type \n`help(set_vintage)`.\n\n## External documentation\n\n`french-cities` makes use of multiple APIs. Please read :\n* [documentation](https://adresse.data.gouv.fr/api-doc/adresse) (in french) on API Adresse\n* [documentation](https://public.opendatasoft.com/explore/dataset/correspondance-code-cedex-code-insee/api/?flg=fr&q=code%3D68013&lang=fr) (in french) on OpenDataSoft API\n* [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/)\n\n## Support\n\nIn case of bugs, please open an issue [on the repo](https://github.com/tgrandje/french-cities/issues).\n\n## Contribution\nAny help is welcome.\n\n## Author\nThomas GRANDJEAN (DREAL Hauts-de-France, service Information, D\u00e9veloppement Durable et \u00c9valuation Environnementale, p\u00f4le Promotion de la Connaissance).\n\n## Licence\nGPL-3.0-or-later\n\n## Project Status\nStable.",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "Toolbox on french cities: set vintage, find departments, find cities...",
"version": "1.0.2",
"project_urls": {
"Bug Tracker": "https://github.com/tgrandje/french-cities/issues",
"Documentation": "https://tgrandje.github.io/french-cities/",
"Homepage": "https://github.com/tgrandje/french-cities/",
"Repository": "https://github.com/tgrandje/french-cities/"
},
"split_keywords": [
"france",
" cities"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5d1dc579a01adffd7a65d01767248d6f404fcb1bb63c47b44ff1ee1f9a06c12c",
"md5": "d17704a3c47418cc26e08843d4b504fa",
"sha256": "4f17494c965f6fb721591a1403286fe11d186c8cd655f5e76758adba79b271e7"
},
"downloads": -1,
"filename": "french_cities-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d17704a3c47418cc26e08843d4b504fa",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 27950,
"upload_time": "2024-08-09T07:29:20",
"upload_time_iso_8601": "2024-08-09T07:29:20.864368Z",
"url": "https://files.pythonhosted.org/packages/5d/1d/c579a01adffd7a65d01767248d6f404fcb1bb63c47b44ff1ee1f9a06c12c/french_cities-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "448024b471ad21d6f160092dda6c606a38a222ba4d2074338637a68b9ba503d0",
"md5": "14cb64486687f425dca8f38e1d70755f",
"sha256": "46c9b50ed17856e79ce5f94db12fff690aaa187bab37a8855a95d20c8a404006"
},
"downloads": -1,
"filename": "french_cities-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "14cb64486687f425dca8f38e1d70755f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 27569,
"upload_time": "2024-08-09T07:29:22",
"upload_time_iso_8601": "2024-08-09T07:29:22.035319Z",
"url": "https://files.pythonhosted.org/packages/44/80/24b471ad21d6f160092dda6c606a38a222ba4d2074338637a68b9ba503d0/french_cities-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-09 07:29:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tgrandje",
"github_project": "french-cities",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "french-cities"
}