cl-hubeau


Namecl-hubeau JSON
Version 0.5.0 PyPI version JSON
download
home_pagehttps://tgrandje.github.io/cl-hubeau
SummaryHubeau client to collect data from the different APIs
upload_time2024-10-17 16:12:02
maintainerThomas Grandjean
docs_urlNone
authorThomas Grandjean
requires_python<4.0,>=3.9
licenseGPL-3.0-or-later
keywords france water hydrology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cl-hubeau

Simple hub'eau client for python

This package is currently under active development.
Every API on [Hub'eau](hubeau.eaufrance.fr/) will be covered by this package in
due time.

At this stage, the following APIs are covered by cl-hubeau:
* [piezometry/piézométrie](https://hubeau.eaufrance.fr/page/api-piezometrie)
* [hydrometry/hydrométrie](https://hubeau.eaufrance.fr/page/api-hydrometrie)
* [drinking water quality/qualité de l'eau potable](https://hubeau.eaufrance.fr/page/api-qualite-eau-potable)
* [superficial waterbodies quality/qualité physico-chimique des cours d'eau'](https://hubeau.eaufrance.fr/page/api-qualite-cours-deau)

For any help on available kwargs for each endpoint, please refer 
directly to the documentation on hubeau (this will not be covered
by the current documentation).

Assume that each function from cl-hubeau will be consistent with
it's hub'eau counterpart, with the exception of the `size` and 
`page` or `cursor` arguments (those will be set automatically by
cl-hubeau to crawl allong the results).

## Parallelization

`cl-hubeau` already uses simple multithreading pools to perform requests.
In order not to endanger the webservers and share ressources amont users, a 
rate limiter is set to 10 queries per second. This limiter should work fine on 
any given machine, whatever the context (even with a new parallelization 
overlay).

However `cl-hubeau` should **NOT** be used in containers or pods with
parallelization. There is currently no way of tracking the rate of querying
amont multiple machines and greedy queries may end  up blacklisted by the
team managing Hub'eau.


## Configuration

First of all, you will need API keys from INSEE to use some high level operations, 
which may loop over cities'official codes. Please refer to pynsee's
[API subscription Tutorial ](https://pynsee.readthedocs.io/en/latest/api_subscription.html)
for help.

## Basic examples

### Clean cache

```python
from cl_hubeau.utils import clean_all_cache
clean_all_cache
```

### Piezometry

3 high level functions are available (and one class for low level operations).

Get all piezometers (uses a 30 days caching):

```python
from cl_hubeau import piezometry
gdf = piezometry.get_all_stations()
```

Get chronicles for the first 100 piezometers (uses a 30 days caching):

```python
df = piezometry.get_chronicles(gdf["code_bss"].head(100).tolist())
```

Get realtime data for the first 100 piezometers:

A small cache is stored to allow for realtime consumption (cache expires after
only 15 minutes). Please, adopt a responsible usage with this functionnality ! 

```python
df = get_realtime_chronicles(gdf["code_bss"].head(100).tolist())
```

Low level class to perform the same tasks:

Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility, noticely for realtime data

```python
with piezometry.PiezometrySession() as session:
    df = session.get_chronicles(code_bss="07548X0009/F")
    df = session.get_stations(code_departement=['02', '59', '60', '62', '80'], format="geojson")
    df = session.get_chronicles_real_time(code_bss="07548X0009/F")
```

### Hydrometry

4 high level functions are available (and one class for low level operations).


Get all stations (uses a 30 days caching):

```python
from cl_hubeau import hydrometry 
gdf = hydrometry.get_all_stations()
```

Get all sites (uses a 30 days caching):

```python
gdf = hydrometry.get_all_sites()
```

Get observations for the first 5 sites (uses a 30 days caching):
_Note that this will also work with stations (instead of sites)._

```python
df = hydrometry.get_observations(gdf["code_site"].head(5).tolist())
```

Get realtime data for the first 5 sites (no cache stored):

A small cache is stored to allow for realtime consumption (cache expires after
only 15 minutes). Please, adopt a responsible usage with this functionnality ! 


```python
df = hydrometry.get_realtime_observations(gdf["code_site"].head(5).tolist())
```

Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility, noticely for realtime data

```python
with hydrometry.HydrometrySession() as session:
    df = session.get_stations(code_station="K437311001")
    df = session.get_sites(code_departement=['02', '59', '60', '62', '80'], format="geojson")
    df = session.get_realtime_observations(code_entite="K437311001")
    df = session.get_observations(code_entite="K437311001")

```

### Drinking water quality

2 high level functions are available (and one class for low level operations).


Get all water networks (UDI) (uses a 30 days caching):

```python
from cl_hubeau import drinking_water_quality 
df = drinking_water_quality.get_all_water_networks()
```

Get the sanitary controls's results for nitrates on all networks of Paris, Lyon & Marseille 
(uses a 30 days caching) for nitrates

```python
networks = drinking_water_quality.get_all_water_networks()
networks = networks[
    networks.nom_commune.isin(["PARIS", "MARSEILLE", "LYON"])
    ]["code_reseau"].unique().tolist()

df = drinking_water_quality.get_control_results(
    codes_reseaux=networks,
    code_parametre="1340"
)
```

Note that this query is heavy, even if this was already restricted to nitrates.
In theory, you could also query the API without specifying the substance you're tracking,
but you may hit the 20k threshold and trigger an exception.

You can also call the same function, using official city codes directly:
```python
df = drinking_water_quality.get_control_results(
    codes_communes=['59350'],
    code_parametre="1340"
)
```

Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with drinking_water_quality.DrinkingWaterQualitySession() as session:
    df = session.get_cities_networks(nom_commune="LILLE")
    df = session.get_control_results(code_departement='02', code_parametre="1340")

```

### Superficial waterbodies quality

4 high level functions are available (and one class for low level operations).


Get all stations (uses a 30 days caching):

```python
from cl_hubeau import superficial_waterbodies_quality 
df = superficial_waterbodies_quality.get_all_stations()
```

Get all operations (uses a 30 days caching):

```python
from cl_hubeau import superficial_waterbodies_quality
df = superficial_waterbodies_quality.get_all_operations()
```

Note that this query is heavy, users should restrict it to a given territory.
For instance, you could use :
```python
df = superficial_waterbodies_quality.get_all_operations(code_region="11")
```

Get all environmental conditions:

```python
from cl_hubeau import superficial_waterbodies_quality
df = superficial_waterbodies_quality.get_all_environmental_conditions()
```

Note that this query is heavy, users should restrict it to a given territory.
For instance, you could use :
```python
df = superficial_waterbodies_quality.get_all_environmental_conditions(code_region="11")
```

Get all physicochemical analysis:
```python
from cl_hubeau import superficial_waterbodies_quality
df = superficial_waterbodies_quality.get_all_analysis()
```

Note that this query is heavy, users should restrict it to a given territory
and given parameters. For instance, you could use :
```python
df = superficial_waterbodies_quality.get_all_analysis(
    code_departement="59", 
    code_parametre="1313"
    )
```


Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with superficial_waterbodies_quality.SuperficialWaterbodiesQualitySession() as session:
    df = session.get_stations(code_commune="59183")
    df = session.get_operations(code_commune="59183")
    df = session.get_environmental_conditions(code_commune="59183")
    df = session.get_analysis(code_commune='59183', code_parametre="1340")

```

            

Raw data

            {
    "_id": null,
    "home_page": "https://tgrandje.github.io/cl-hubeau",
    "name": "cl-hubeau",
    "maintainer": "Thomas Grandjean",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": "thomas.grandjean@developpement-durable.gouv.fr",
    "keywords": "france, water, hydrology",
    "author": "Thomas Grandjean",
    "author_email": "thomas.grandjean@developpement-durable.gouv.fr",
    "download_url": "https://files.pythonhosted.org/packages/3c/f4/dc5423226134c16ce20a369ba84b7db2bc2661cc608e60f87af73805deed/cl_hubeau-0.5.0.tar.gz",
    "platform": null,
    "description": "# cl-hubeau\n\nSimple hub'eau client for python\n\nThis package is currently under active development.\nEvery API on [Hub'eau](hubeau.eaufrance.fr/) will be covered by this package in\ndue time.\n\nAt this stage, the following APIs are covered by cl-hubeau:\n* [piezometry/pi\u00e9zom\u00e9trie](https://hubeau.eaufrance.fr/page/api-piezometrie)\n* [hydrometry/hydrom\u00e9trie](https://hubeau.eaufrance.fr/page/api-hydrometrie)\n* [drinking water quality/qualit\u00e9 de l'eau potable](https://hubeau.eaufrance.fr/page/api-qualite-eau-potable)\n* [superficial waterbodies quality/qualit\u00e9 physico-chimique des cours d'eau'](https://hubeau.eaufrance.fr/page/api-qualite-cours-deau)\n\nFor any help on available kwargs for each endpoint, please refer \ndirectly to the documentation on hubeau (this will not be covered\nby the current documentation).\n\nAssume that each function from cl-hubeau will be consistent with\nit's hub'eau counterpart, with the exception of the `size` and \n`page` or `cursor` arguments (those will be set automatically by\ncl-hubeau to crawl allong the results).\n\n## Parallelization\n\n`cl-hubeau` already uses simple multithreading pools to perform requests.\nIn order not to endanger the webservers and share ressources amont users, a \nrate limiter is set to 10 queries per second. This limiter should work fine on \nany given machine, whatever the context (even with a new parallelization \noverlay).\n\nHowever `cl-hubeau` should **NOT** be used in containers or pods with\nparallelization. There is currently no way of tracking the rate of querying\namont multiple machines and greedy queries may end  up blacklisted by the\nteam managing Hub'eau.\n\n\n## Configuration\n\nFirst of all, you will need API keys from INSEE to use some high level operations, \nwhich may loop over cities'official codes. Please refer to pynsee's\n[API subscription Tutorial ](https://pynsee.readthedocs.io/en/latest/api_subscription.html)\nfor help.\n\n## Basic examples\n\n### Clean cache\n\n```python\nfrom cl_hubeau.utils import clean_all_cache\nclean_all_cache\n```\n\n### Piezometry\n\n3 high level functions are available (and one class for low level operations).\n\nGet all piezometers (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import piezometry\ngdf = piezometry.get_all_stations()\n```\n\nGet chronicles for the first 100 piezometers (uses a 30 days caching):\n\n```python\ndf = piezometry.get_chronicles(gdf[\"code_bss\"].head(100).tolist())\n```\n\nGet realtime data for the first 100 piezometers:\n\nA small cache is stored to allow for realtime consumption (cache expires after\nonly 15 minutes). Please, adopt a responsible usage with this functionnality ! \n\n```python\ndf = get_realtime_chronicles(gdf[\"code_bss\"].head(100).tolist())\n```\n\nLow level class to perform the same tasks:\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility, noticely for realtime data\n\n```python\nwith piezometry.PiezometrySession() as session:\n    df = session.get_chronicles(code_bss=\"07548X0009/F\")\n    df = session.get_stations(code_departement=['02', '59', '60', '62', '80'], format=\"geojson\")\n    df = session.get_chronicles_real_time(code_bss=\"07548X0009/F\")\n```\n\n### Hydrometry\n\n4 high level functions are available (and one class for low level operations).\n\n\nGet all stations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import hydrometry \ngdf = hydrometry.get_all_stations()\n```\n\nGet all sites (uses a 30 days caching):\n\n```python\ngdf = hydrometry.get_all_sites()\n```\n\nGet observations for the first 5 sites (uses a 30 days caching):\n_Note that this will also work with stations (instead of sites)._\n\n```python\ndf = hydrometry.get_observations(gdf[\"code_site\"].head(5).tolist())\n```\n\nGet realtime data for the first 5 sites (no cache stored):\n\nA small cache is stored to allow for realtime consumption (cache expires after\nonly 15 minutes). Please, adopt a responsible usage with this functionnality ! \n\n\n```python\ndf = hydrometry.get_realtime_observations(gdf[\"code_site\"].head(5).tolist())\n```\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility, noticely for realtime data\n\n```python\nwith hydrometry.HydrometrySession() as session:\n    df = session.get_stations(code_station=\"K437311001\")\n    df = session.get_sites(code_departement=['02', '59', '60', '62', '80'], format=\"geojson\")\n    df = session.get_realtime_observations(code_entite=\"K437311001\")\n    df = session.get_observations(code_entite=\"K437311001\")\n\n```\n\n### Drinking water quality\n\n2 high level functions are available (and one class for low level operations).\n\n\nGet all water networks (UDI) (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import drinking_water_quality \ndf = drinking_water_quality.get_all_water_networks()\n```\n\nGet the sanitary controls's results for nitrates on all networks of Paris, Lyon & Marseille \n(uses a 30 days caching) for nitrates\n\n```python\nnetworks = drinking_water_quality.get_all_water_networks()\nnetworks = networks[\n    networks.nom_commune.isin([\"PARIS\", \"MARSEILLE\", \"LYON\"])\n    ][\"code_reseau\"].unique().tolist()\n\ndf = drinking_water_quality.get_control_results(\n    codes_reseaux=networks,\n    code_parametre=\"1340\"\n)\n```\n\nNote that this query is heavy, even if this was already restricted to nitrates.\nIn theory, you could also query the API without specifying the substance you're tracking,\nbut you may hit the 20k threshold and trigger an exception.\n\nYou can also call the same function, using official city codes directly:\n```python\ndf = drinking_water_quality.get_control_results(\n    codes_communes=['59350'],\n    code_parametre=\"1340\"\n)\n```\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith drinking_water_quality.DrinkingWaterQualitySession() as session:\n    df = session.get_cities_networks(nom_commune=\"LILLE\")\n    df = session.get_control_results(code_departement='02', code_parametre=\"1340\")\n\n```\n\n### Superficial waterbodies quality\n\n4 high level functions are available (and one class for low level operations).\n\n\nGet all stations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import superficial_waterbodies_quality \ndf = superficial_waterbodies_quality.get_all_stations()\n```\n\nGet all operations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import superficial_waterbodies_quality\ndf = superficial_waterbodies_quality.get_all_operations()\n```\n\nNote that this query is heavy, users should restrict it to a given territory.\nFor instance, you could use :\n```python\ndf = superficial_waterbodies_quality.get_all_operations(code_region=\"11\")\n```\n\nGet all environmental conditions:\n\n```python\nfrom cl_hubeau import superficial_waterbodies_quality\ndf = superficial_waterbodies_quality.get_all_environmental_conditions()\n```\n\nNote that this query is heavy, users should restrict it to a given territory.\nFor instance, you could use :\n```python\ndf = superficial_waterbodies_quality.get_all_environmental_conditions(code_region=\"11\")\n```\n\nGet all physicochemical analysis:\n```python\nfrom cl_hubeau import superficial_waterbodies_quality\ndf = superficial_waterbodies_quality.get_all_analysis()\n```\n\nNote that this query is heavy, users should restrict it to a given territory\nand given parameters. For instance, you could use :\n```python\ndf = superficial_waterbodies_quality.get_all_analysis(\n    code_departement=\"59\", \n    code_parametre=\"1313\"\n    )\n```\n\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith superficial_waterbodies_quality.SuperficialWaterbodiesQualitySession() as session:\n    df = session.get_stations(code_commune=\"59183\")\n    df = session.get_operations(code_commune=\"59183\")\n    df = session.get_environmental_conditions(code_commune=\"59183\")\n    df = session.get_analysis(code_commune='59183', code_parametre=\"1340\")\n\n```\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "Hubeau client to collect data from the different APIs",
    "version": "0.5.0",
    "project_urls": {
        "Documentation": "https://tgrandje.github.io/cl-hubeau",
        "Homepage": "https://tgrandje.github.io/cl-hubeau",
        "Repository": "https://github.com/tgrandje/cl-hubeau/"
    },
    "split_keywords": [
        "france",
        " water",
        " hydrology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d67f41b07027cb692785f7b83fb4c0ea0abae8d462f85aeeb5ef8167ef64a0d8",
                "md5": "99c26c72037887e56048f5d40b46ff6d",
                "sha256": "3fa647cc97cd7c07d9b3138537f2800bf6f4df4ad2fd02a88e06e7df96c57434"
            },
            "downloads": -1,
            "filename": "cl_hubeau-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "99c26c72037887e56048f5d40b46ff6d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 31899,
            "upload_time": "2024-10-17T16:11:59",
            "upload_time_iso_8601": "2024-10-17T16:11:59.684821Z",
            "url": "https://files.pythonhosted.org/packages/d6/7f/41b07027cb692785f7b83fb4c0ea0abae8d462f85aeeb5ef8167ef64a0d8/cl_hubeau-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3cf4dc5423226134c16ce20a369ba84b7db2bc2661cc608e60f87af73805deed",
                "md5": "06ca17557ca520602431a798e2bbb91a",
                "sha256": "514e8d35ccbf3c005d93970caa52a50f1922d656447059aca0823fa208091a4c"
            },
            "downloads": -1,
            "filename": "cl_hubeau-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "06ca17557ca520602431a798e2bbb91a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 22747,
            "upload_time": "2024-10-17T16:12:02",
            "upload_time_iso_8601": "2024-10-17T16:12:02.092675Z",
            "url": "https://files.pythonhosted.org/packages/3c/f4/dc5423226134c16ce20a369ba84b7db2bc2661cc608e60f87af73805deed/cl_hubeau-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-17 16:12:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tgrandje",
    "github_project": "cl-hubeau",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cl-hubeau"
}
        
Elapsed time: 0.43761s