cl-hubeau

Name	cl-hubeau JSON
Version	0.11.0 JSON
	download
home_page	https://tgrandje.github.io/cl-hubeau
Summary	Hubeau client to collect data from the different APIs
upload_time	2025-08-01 11:42:50
maintainer	Thomas Grandjean
docs_url	None
author	Thomas Grandjean
requires_python	<4.0,>=3.9
license	GPL-3.0-or-later
keywords	france water hydrology
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # cl-hubeau

![PyPI - Version](https://img.shields.io/pypi/v/cl-hubeau)
[![Supported Python Versions](https://img.shields.io/pypi/pyversions/cl-hubeau)](https://pypi.python.org/pypi/cl-hubeau/)
![PyPI - Status](https://img.shields.io/pypi/status/cl-hubeau)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![flake8 checks](https://raw.githubusercontent.com/tgrandje/cl-hubeau/refs/heads/main/badges/flake8-badge.svg)
![Test Coverage](https://raw.githubusercontent.com/tgrandje/cl-hubeau/refs/heads/main/badges/coverage-badge.svg)
![GitHub Issues or Pull Requests](https://img.shields.io/github/issues/tgrandje/cl-hubeau)
![GitHub commits since latest release](https://img.shields.io/github/commits-since/tgrandje/cl-hubeau/latest)

![Monthly Downloads](https://img.shields.io/pypi/dm/cl-hubeau)
![Total Downloads](https://img.shields.io/pepy/dt/cl-hubeau)

![Hub'eau Coverage](https://raw.githubusercontent.com/tgrandje/cl-hubeau/refs/heads/main/badges/hubeau-coverage.svg)

Simple hub'eau client for python

This package is currently under active development.
Every API on [Hub'eau](hubeau.eaufrance.fr/) will be covered by this package in
due time.

At this stage, the following APIs are covered by cl-hubeau:
* [phytopharmaceuticals transactions/vente et achat de produits phytopharmaceutiques](https://hubeau.eaufrance.fr/page/api-vente-achat-phytos)
* [watercourses flow/écoulement des cours d'eau](https://hubeau.eaufrance.fr/page/api-ecoulement)
* [drinking water quality/qualité de l'eau potable](https://hubeau.eaufrance.fr/page/api-qualite-eau-potable)
* [hydrobiology/hydrobiologie](https://hubeau.eaufrance.fr/page/api-hydrobiologie)
* [hydrometry/hydrométrie](https://hubeau.eaufrance.fr/page/api-hydrometrie)
* [superficial waterbodies quality/qualité des cours d'eau](https://hubeau.eaufrance.fr/page/api-qualite-cours-deau)
* [ground waterbodies quality/qualité des nappes](https://hubeau.eaufrance.fr/page/api-qualite-nappes)
* [piezometry/piézométrie](https://hubeau.eaufrance.fr/page/api-piezometrie)


For any help on available kwargs for each endpoint, please refer
directly to the documentation on `hub'eau` (this will not be covered
by the current documentation).

Assume that each function from `cl-hubeau` will be consistent with
it's `hub'eau` counterpart, with the exception of the `size` and
`page` or `cursor` arguments (those will be set automatically by
`cl-hubeau` to crawl allong the results).

## Parallelization

`cl-hubeau` already uses simple multithreading pools to perform requests.
In order not to endanger the webservers and share ressources among users, a
rate limiter is set to 10 queries per second. This limiter should work fine on
any given machine, whatever the context (even with a new parallelization
overlay).

However `cl-hubeau` should **NOT** be used in containers (or pods) with
parallelization. There is currently no way of tracking the queries' rate
among multiple machines: greedy queries may end up blacklisted by the
team managing Hub'eau.


## Configuration

Starting with `pynsee 0.2.0`, no API keys are needed anymore.

## Support

In case of bugs, please open an issue [on the repo](https://github.com/tgrandje/cl-hubeau/issues).

You will find in the present README a basic documentation in english.
For further information, please refer to :
* the docstrings (which are mostly up-to-date);
* the complete documentation (in french) available [here](https://tgrandje.github.io/cl-hubeau/).

## Contribution
Any help is welcome. Please refer to the [CONTRIBUTING file](https://github.com/tgrandje/cl-hubeau/CONTRIBUTING.md).

## Licence
GPL-3.0-or-later

## Project Status

This package is currently under active development.

## Basic examples

### Clean cache

```python
from cl_hubeau.utils import clean_all_cache
clean_all_cache()
```

### 20k results limit

`Hub'Eau` has currently a limit set to 20k results for any query. To circumvente
this, `cl-hubeau` defines upper-level functions which may slightly differ from
the low-level classes (which try to mimick `hub'eau`'s standard beahviour).
The upper-level functions are all using loops to avoid reaching the 20k results
threshold. For any query that *could* accept time ranges parameters, time ranges
will be automatically added to your desired query (if not already specified);
in case of reaching the 20k result threshold, the timeranges will be splitted
in two (thus bypassing that threshold). If you ever reach the 20k nonetheless,
please get in touch and submit an issue.

### configuring `cl-hubeau`

#### general configuration

`cl-hubeau` configuration can be accessed by the following code:

```
from cl_hubeau import _config
print(_config)
```

This configuration (stored as a dictionnary) can be altered any time you want.
For instance, if you want to alter the default cache expiration, you could do
the following:

```
from cl_hubeau import _config
from datetime import timedelta

# set a one year cache for multi-purpose cache
_config["DEFAULT_EXPIRE_AFTER"] = datetime.timedelta(day=365)

# set a one hour cache of realtime datasets
_config["DEFAULT_EXPIRE_AFTER_REALTIME"] = datetime.timedelta(day=365)
```

Note that you can also alter the number of threads used to query `Hub'eau`.
Nonetheless, there is also a ratelimit of 10 queries/second imposed by
`cl-hubeau` to avoid overloading the server.
As a consequence, you should only *reduce* the `THREADS` configuration
(if your machine has trouble with that) and never increase it (which shouldn't
have any effect).

Also note that the query rate you will see on `tqdm`'s progress bar does not
reflect the query rate of `Hub'Eau` : the cursor/page iterations of one subquery
will **not** be displayed. Hence a 2 it/s displayed might very well be
a 10 requests/s load on `Hub'Eau`'s server.

#### proxies

`cl-hubeau` executes two types of http(s) requests:

* some made by `pynsee` to gather INSEE & IGN datasets;
* some made by `cl-hubeau` itself to gather `Hub'Eau` datasets.

To work behind corporate proxies, it should be enough to configure two environment
variables :

* http_proxy
* https_proxy

You can also set the proxies using a dictionnary as an argument when creating
sessions (low-level classes from `cl-hubeau`).

Note that `pynsee` store those proxies in a [configuration file](https://github.com/InseeFrLab/pynsee/blob/0ba3e2e5b753c5c032f2b53d7fc042e995bbef04/pynsee/utils/init_conn.py#L55).
In case of troubles, don't hesitate to manually delete that file.


### Phyopharmaceuticals transactions

4 high level functions are available (and one class for low level operations).

Note that high level functions introduce new arguments (`filter_regions` and `filter_departements`
to better target territorial data.

Get all active substances bought (uses a 30 days caching):

```python
from cl_hubeau import phytopharmaceuticals_transactions as pt
df = pt.get_all_active_substances_bought()

# or to get regional data:
df = pt.get_all_active_substances_bought(
        type_territoire="Région", code_territoire="32"
    )

# or to get departemantal data:
df = pt.get_all_active_substances_bought(
        type_territoire="Département", filter_regions="32"
    )

# or to get postcode-zoned data:
df = pt.get_all_active_substances_bought(
        type_territoire="Zone postale", filter_departements=["59", "62"]
    )
```

Get all phytopharmaceutical products bought (uses a 30 days caching):

```python
from cl_hubeau import phytopharmaceuticals_transactions as pt
df = pt.get_all_phytopharmaceutical_products_bought()

# or to get regional data:
df = pt.get_all_phytopharmaceutical_products_bought(
        type_territoire="Région", code_territoire="32"
    )

# or to get departemantal data:
df = pt.get_all_phytopharmaceutical_products_bought(
        type_territoire="Département", filter_regions="32"
    )

# or to get postcode-zoned data:
df = pt.get_all_phytopharmaceutical_products_bought(
        type_territoire="Zone postale", filter_departements=["59", "62"]
    )
```

Get all active substances sold (uses a 30 days caching):

```python
from cl_hubeau import phytopharmaceuticals_transactions as pt
df = pt.get_all_active_substances_sold()

# or to get regional data:
df = pt.get_all_active_substances_sold(
        type_territoire="Région", code_territoire="32"
    )

# or to get departemantal data:
df = pt.get_all_active_substances_sold(
        type_territoire="Département", filter_regions="32"
    )
```

Get all phytopharmaceutical products sold (uses a 30 days caching):

```python
from cl_hubeau import phytopharmaceuticals_transactions as pt
df = pt.get_all_phytopharmaceutical_products_sold()

# or to get regional data:
df = pt.get_all_phytopharmaceutical_products_sold(
        type_territoire="Région", code_territoire="32"
    )

# or to get departemantal data:
df = pt.get_all_phytopharmaceutical_products_sold(
        type_territoire="Département", filter_regions="32"
    )
```

Low level class to perform the same tasks:

Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with pt.PhytopharmaceuticalsSession() as session:
    df = session.active_substances_sold(
        annee_min=2010,
        annee_max=2015,
        code_territoire=["32"],
        type_territoire="Région",
        )
    df = session.phytopharmaceutical_products_sold(
        annee_min=2010,
        annee_max=2015,
        code_territoire=["32"],
        type_territoire="Région",
        eaj="Oui",
        unite="l",
    )
    df = session.active_substances_bought(
        annee_min=2010,
        annee_max=2015,
        code_territoire=["32"],
        type_territoire="Région",
    )
    df = session.phytopharmaceutical_products_bought(
        code_territoire=["32"],
        type_territoire="Région",
        eaj="Oui",
        unite="l",
    )

```

### Watercourses flow

3 high level functions are available (and one class for low level operations).

Get all stations (uses a 30 days caching):

```python
from cl_hubeau import watercourses_flow
df = watercourses_flow.get_all_stations()
```

Get all observations (uses a 30 days caching):

```python
from cl_hubeau import watercourses_flow
df = watercourses_flow.get_all_observations()
```

Note that this query is heavy, users should restrict it to a given territory when possible.
For instance, you could use :
```python
df = watercourses_flow.get_all_observations(code_region="11")
```

Get all campaigns:

```python
from cl_hubeau import watercourses_flow
df = watercourses_flow.get_all_campaigns()
```

Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with watercourses_flow.WatercoursesFlowSession() as session:
    df = session.get_stations(code_departement="59")
    df = session.get_campaigns(code_campagne=[12])
    df = session.get_observations(code_station="F6640008")

```

### Drinking water quality

2 high level functions are available (and one class for low level operations).


Get all water networks (UDI) (uses a 30 days caching):

```python
from cl_hubeau import drinking_water_quality
df = drinking_water_quality.get_all_water_networks()
```

Get the sanitary controls's results for nitrates on all networks of Paris, Lyon & Marseille
(uses a 30 days caching) for nitrates

```python
networks = drinking_water_quality.get_all_water_networks(code_region=["11", "84", "93"])
networks = networks[
    networks.nom_commune.isin(["PARIS", "MARSEILLE", "LYON"])
    ]["code_reseau"].unique().tolist()

df = drinking_water_quality.get_control_results(
    code_reseau=networks, code_parametre="1340"
)
df = df[df.nom_commune.isin(["PARIS", "MARSEILLE", "LYON"])]
```

Note that this query is heavy, even if this was already restricted to nitrates.
In theory, you could also query the API without specifying the substance you're tracking,
but this has not been tested.

You can also call the same function, using official city codes directly:
```python
df = drinking_water_quality.get_control_results(
    code_commune=['59350'],
    code_parametre="1340"
)
```

Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with drinking_water_quality.DrinkingWaterQualitySession() as session:
    df = session.get_cities_networks(nom_commune="LILLE")
    df = session.get_control_results(code_departement='02', code_parametre="1340")

```

### Hydrobiology

3 high level functions are available (and one class for low level operations).


Get all stations (uses a 30 days caching):

```python
from cl_hubeau import hydrobiology
df = hydrobiology.get_all_water_networks()
```

Get the taxa identified on stations in Paris (uses a 30 days caching):

```python
df = hydrobiology.get_all_taxa(code_commune=["75056"])
```

Note that this query is heavy if not restricted to areas and/or timeranges.
In theory, you could query the API without arguments, but this has not been
tested (this should not be possible on standard machines because of the
RAM consumption).

Get the indexes identified on stations in Paris (uses a 30 days caching):

```python
df = hydrobiology.get_all_indexes(code_commune=["75056"])
```

Note that this query is heavy if not restricted to areas and/or timeranges.
In theory, you could query the API without arguments, but this has not been
tested (this should not be possible on standard machines because of the
RAM consumption).

Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with hydrobiology.HydrobiologySession() as session:
    df = session.get_stations(code_commune="75056")
    df = session.get_taxa(code_commune="75056")
    df = session.get_indexes(code_commune="75056")

```


### Hydrometry

4 high level functions are available (and one class for low level operations).


Get all stations (uses a 30 days caching):

```python
from cl_hubeau import hydrometry
gdf = hydrometry.get_all_stations()
```

Get all sites (uses a 30 days caching):

```python
gdf = hydrometry.get_all_sites()
```

Get observations for the first 5 sites (uses a 30 days caching):
_Note that this will also work with stations (instead of sites)._

```python
df = hydrometry.get_observations(gdf["code_site"].head(5).tolist())
```

Get realtime data for the first 5 sites (no cache stored):

A small cache is stored to allow for realtime consumption (cache expires after
only 15 minutes). Please, adopt a responsible usage with this functionnality !


```python
df = hydrometry.get_realtime_observations(gdf["code_site"].head(5).tolist())
```

Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility, noticely for realtime data

```python
with hydrometry.HydrometrySession() as session:
    df = session.get_stations(code_station="K437311001")
    df = session.get_sites(code_departement=['02', '59', '60', '62', '80'], format="geojson")
    df = session.get_realtime_observations(code_entite="K437311001")
    df = session.get_observations(code_entite="K437311001")

```

### Superficial waterbodies quality

4 high level functions are available (and one class for low level operations).


Get all stations (uses a 30 days caching):

```python
from cl_hubeau import superficial_waterbodies_quality
df = superficial_waterbodies_quality.get_all_stations()
```

Get all operations (uses a 30 days caching):

```python
from cl_hubeau import superficial_waterbodies_quality
df = superficial_waterbodies_quality.get_all_operations()
```

Note that this query is heavy, users should restrict it to a given territory.
For instance, you could use :
```python
df = superficial_waterbodies_quality.get_all_operations(code_region="11")
```

Get all environmental conditions:

```python
from cl_hubeau import superficial_waterbodies_quality
df = superficial_waterbodies_quality.get_all_environmental_conditions()
```

Note that this query is heavy, users should restrict it to a given territory.
For instance, you could use :
```python
df = superficial_waterbodies_quality.get_all_environmental_conditions(code_region="11")
```

Get all physicochemical analyses:
```python
from cl_hubeau import superficial_waterbodies_quality
df = superficial_waterbodies_quality.get_all_analyses()
```

Note that this query is heavy, users should restrict it to a given territory
and given parameters. For instance, you could use :
```python
df = superficial_waterbodies_quality.get_all_analyses(
    code_departement="59",
    code_parametre="1313"
    )
```


Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with superficial_waterbodies_quality.SuperficialWaterbodiesQualitySession() as session:
    df = session.get_stations(code_commune="59183")
    df = session.get_operations(code_commune="59183")
    df = session.get_environmental_conditions(code_commune="59183")
    df = session.get_analyses(code_commune='59183', code_parametre="1340")

```

### Ground waterbodies quality

2 high level functions are available (and one class for low level operations).


Get all stations (uses a 30 days caching):

```python
from cl_hubeau import ground_water_quality
df = ground_water_quality.get_all_stations()
```

Get the tests results for nitrates :

```python
df = ground_water_quality.df = get_all_analyses(code_param="1340")
```

Note that this query is heavy, even if this was already restricted to nitrates, and that it
may fail. In theory, you could even query the API without specifying the substance
you're tracking, but you will hit the 20k threshold and trigger an exception.

In practice, you should call the same function with a territorial restriction or with
specific `bss_id`s.
For instance, you could use official city codes directly:

```python
df = ground_water_quality.get_all_analyses(
    num_departement=["59"]
    code_param="1340"
)
```

Note: a bit of caution is needed here, as the arguments are **NOT** the same
in the two endpoints. Please have a look at the documentation on
[hubeau](https://hubeau.eaufrance.fr/page/api-qualite-nappes#/qualite-nappes/analyses).
For instance, the city's number is called `"code_insee_actuel"` on analyses' endpoint
and `"code_commune"` on station's.

Low level class to perform the same tasks:


Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility

```python
with ground_water_quality.GroundWaterQualitySession() as session:
    df = session.get_stations(bss_id="01832B0600")
    df = session.get_analyses(
        bss_id=["BSS000BMMA"],
        code_param="1461",
        )
```

### Piezometry

3 high level functions are available (and one class for low level operations).

Get all piezometers (uses a 30 days caching):

```python
from cl_hubeau import piezometry
gdf = piezometry.get_all_stations()
```

Get chronicles for the first 100 piezometers (uses a 30 days caching):

```python
df = piezometry.get_chronicles(gdf["code_bss"].head(100).tolist())
```

Get realtime data for the first 100 piezometers:

A small cache is stored to allow for realtime consumption (cache expires after
only 15 minutes). Please, adopt a responsible usage with this functionnality !

```python
df = get_realtime_chronicles(gdf["code_bss"].head(100).tolist())
```

Low level class to perform the same tasks:

Note that :

* the API is forbidding results > 20k rows and you may need inner loops
* the cache handling will be your responsibility, noticely for realtime data

```python
with piezometry.PiezometrySession() as session:
    df = session.get_chronicles(code_bss="07548X0009/F")
    df = session.get_stations(code_departement=['02', '59', '60', '62', '80'], format="geojson")
    df = session.get_chronicles_real_time(code_bss="07548X0009/F")
```





### Convenience functions

In order to ease queries on hydrographic territories, some convenience functions
have been added to this module.

In these process, we are harvesting official geodatasets which are not available on hub'eau;
afterwards, simple geospatial joins are performed with the latest geodataset of french cities.

These are **convenience** tools and there **will** be approximations (geographical precision
of both datasets might not match).

#### SAGE (Schéma d'Aménagement et de Gestion des Eaux)

You can retrieve a SAGE's communal components using the following snippet:

```python

from cl_hubeau.utils import cities_for_sage

d = cities_for_sage()
```

The official geodataset is eaufrance's SAGE.

Raw data

            {
    "_id": null,
    "home_page": "https://tgrandje.github.io/cl-hubeau",
    "name": "cl-hubeau",
    "maintainer": "Thomas Grandjean",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": "thomas.grandjean@developpement-durable.gouv.fr",
    "keywords": "france, water, hydrology",
    "author": "Thomas Grandjean",
    "author_email": "thomas.grandjean@developpement-durable.gouv.fr",
    "download_url": "https://files.pythonhosted.org/packages/19/25/64e8230140caba31145bb492d8f47cfe21d845031fcb1fb2929e9ae8287c/cl_hubeau-0.11.0.tar.gz",
    "platform": null,
    "description": "# cl-hubeau\n\n![PyPI - Version](https://img.shields.io/pypi/v/cl-hubeau)\n[![Supported Python Versions](https://img.shields.io/pypi/pyversions/cl-hubeau)](https://pypi.python.org/pypi/cl-hubeau/)\n![PyPI - Status](https://img.shields.io/pypi/status/cl-hubeau)\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n![flake8 checks](https://raw.githubusercontent.com/tgrandje/cl-hubeau/refs/heads/main/badges/flake8-badge.svg)\n![Test Coverage](https://raw.githubusercontent.com/tgrandje/cl-hubeau/refs/heads/main/badges/coverage-badge.svg)\n![GitHub Issues or Pull Requests](https://img.shields.io/github/issues/tgrandje/cl-hubeau)\n![GitHub commits since latest release](https://img.shields.io/github/commits-since/tgrandje/cl-hubeau/latest)\n\n![Monthly Downloads](https://img.shields.io/pypi/dm/cl-hubeau)\n![Total Downloads](https://img.shields.io/pepy/dt/cl-hubeau)\n\n![Hub'eau Coverage](https://raw.githubusercontent.com/tgrandje/cl-hubeau/refs/heads/main/badges/hubeau-coverage.svg)\n\nSimple hub'eau client for python\n\nThis package is currently under active development.\nEvery API on [Hub'eau](hubeau.eaufrance.fr/) will be covered by this package in\ndue time.\n\nAt this stage, the following APIs are covered by cl-hubeau:\n* [phytopharmaceuticals transactions/vente et achat de produits phytopharmaceutiques](https://hubeau.eaufrance.fr/page/api-vente-achat-phytos)\n* [watercourses flow/\u00e9coulement des cours d'eau](https://hubeau.eaufrance.fr/page/api-ecoulement)\n* [drinking water quality/qualit\u00e9 de l'eau potable](https://hubeau.eaufrance.fr/page/api-qualite-eau-potable)\n* [hydrobiology/hydrobiologie](https://hubeau.eaufrance.fr/page/api-hydrobiologie)\n* [hydrometry/hydrom\u00e9trie](https://hubeau.eaufrance.fr/page/api-hydrometrie)\n* [superficial waterbodies quality/qualit\u00e9 des cours d'eau](https://hubeau.eaufrance.fr/page/api-qualite-cours-deau)\n* [ground waterbodies quality/qualit\u00e9 des nappes](https://hubeau.eaufrance.fr/page/api-qualite-nappes)\n* [piezometry/pi\u00e9zom\u00e9trie](https://hubeau.eaufrance.fr/page/api-piezometrie)\n\n\nFor any help on available kwargs for each endpoint, please refer\ndirectly to the documentation on `hub'eau` (this will not be covered\nby the current documentation).\n\nAssume that each function from `cl-hubeau` will be consistent with\nit's `hub'eau` counterpart, with the exception of the `size` and\n`page` or `cursor` arguments (those will be set automatically by\n`cl-hubeau` to crawl allong the results).\n\n## Parallelization\n\n`cl-hubeau` already uses simple multithreading pools to perform requests.\nIn order not to endanger the webservers and share ressources among users, a\nrate limiter is set to 10 queries per second. This limiter should work fine on\nany given machine, whatever the context (even with a new parallelization\noverlay).\n\nHowever `cl-hubeau` should **NOT** be used in containers (or pods) with\nparallelization. There is currently no way of tracking the queries' rate\namong multiple machines: greedy queries may end up blacklisted by the\nteam managing Hub'eau.\n\n\n## Configuration\n\nStarting with `pynsee 0.2.0`, no API keys are needed anymore.\n\n## Support\n\nIn case of bugs, please open an issue [on the repo](https://github.com/tgrandje/cl-hubeau/issues).\n\nYou will find in the present README a basic documentation in english.\nFor further information, please refer to :\n* the docstrings (which are mostly up-to-date);\n* the complete documentation (in french) available [here](https://tgrandje.github.io/cl-hubeau/).\n\n## Contribution\nAny help is welcome. Please refer to the [CONTRIBUTING file](https://github.com/tgrandje/cl-hubeau/CONTRIBUTING.md).\n\n## Licence\nGPL-3.0-or-later\n\n## Project Status\n\nThis package is currently under active development.\n\n## Basic examples\n\n### Clean cache\n\n```python\nfrom cl_hubeau.utils import clean_all_cache\nclean_all_cache()\n```\n\n### 20k results limit\n\n`Hub'Eau` has currently a limit set to 20k results for any query. To circumvente\nthis, `cl-hubeau` defines upper-level functions which may slightly differ from\nthe low-level classes (which try to mimick `hub'eau`'s standard beahviour).\nThe upper-level functions are all using loops to avoid reaching the 20k results\nthreshold. For any query that *could* accept time ranges parameters, time ranges\nwill be automatically added to your desired query (if not already specified);\nin case of reaching the 20k result threshold, the timeranges will be splitted\nin two (thus bypassing that threshold). If you ever reach the 20k nonetheless,\nplease get in touch and submit an issue.\n\n### configuring `cl-hubeau`\n\n#### general configuration\n\n`cl-hubeau` configuration can be accessed by the following code:\n\n```\nfrom cl_hubeau import _config\nprint(_config)\n```\n\nThis configuration (stored as a dictionnary) can be altered any time you want.\nFor instance, if you want to alter the default cache expiration, you could do\nthe following:\n\n```\nfrom cl_hubeau import _config\nfrom datetime import timedelta\n\n# set a one year cache for multi-purpose cache\n_config[\"DEFAULT_EXPIRE_AFTER\"] = datetime.timedelta(day=365)\n\n# set a one hour cache of realtime datasets\n_config[\"DEFAULT_EXPIRE_AFTER_REALTIME\"] = datetime.timedelta(day=365)\n```\n\nNote that you can also alter the number of threads used to query `Hub'eau`.\nNonetheless, there is also a ratelimit of 10 queries/second imposed by\n`cl-hubeau` to avoid overloading the server.\nAs a consequence, you should only *reduce* the `THREADS` configuration\n(if your machine has trouble with that) and never increase it (which shouldn't\nhave any effect).\n\nAlso note that the query rate you will see on `tqdm`'s progress bar does not\nreflect the query rate of `Hub'Eau` : the cursor/page iterations of one subquery\nwill **not** be displayed. Hence a 2 it/s displayed might very well be\na 10 requests/s load on `Hub'Eau`'s server.\n\n#### proxies\n\n`cl-hubeau` executes two types of http(s) requests:\n\n* some made by `pynsee` to gather INSEE & IGN datasets;\n* some made by `cl-hubeau` itself to gather `Hub'Eau` datasets.\n\nTo work behind corporate proxies, it should be enough to configure two environment\nvariables :\n\n* http_proxy\n* https_proxy\n\nYou can also set the proxies using a dictionnary as an argument when creating\nsessions (low-level classes from `cl-hubeau`).\n\nNote that `pynsee` store those proxies in a [configuration file](https://github.com/InseeFrLab/pynsee/blob/0ba3e2e5b753c5c032f2b53d7fc042e995bbef04/pynsee/utils/init_conn.py#L55).\nIn case of troubles, don't hesitate to manually delete that file.\n\n\n### Phyopharmaceuticals transactions\n\n4 high level functions are available (and one class for low level operations).\n\nNote that high level functions introduce new arguments (`filter_regions` and `filter_departements`\nto better target territorial data.\n\nGet all active substances bought (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import phytopharmaceuticals_transactions as pt\ndf = pt.get_all_active_substances_bought()\n\n# or to get regional data:\ndf = pt.get_all_active_substances_bought(\n        type_territoire=\"R\u00e9gion\", code_territoire=\"32\"\n    )\n\n# or to get departemantal data:\ndf = pt.get_all_active_substances_bought(\n        type_territoire=\"D\u00e9partement\", filter_regions=\"32\"\n    )\n\n# or to get postcode-zoned data:\ndf = pt.get_all_active_substances_bought(\n        type_territoire=\"Zone postale\", filter_departements=[\"59\", \"62\"]\n    )\n```\n\nGet all phytopharmaceutical products bought (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import phytopharmaceuticals_transactions as pt\ndf = pt.get_all_phytopharmaceutical_products_bought()\n\n# or to get regional data:\ndf = pt.get_all_phytopharmaceutical_products_bought(\n        type_territoire=\"R\u00e9gion\", code_territoire=\"32\"\n    )\n\n# or to get departemantal data:\ndf = pt.get_all_phytopharmaceutical_products_bought(\n        type_territoire=\"D\u00e9partement\", filter_regions=\"32\"\n    )\n\n# or to get postcode-zoned data:\ndf = pt.get_all_phytopharmaceutical_products_bought(\n        type_territoire=\"Zone postale\", filter_departements=[\"59\", \"62\"]\n    )\n```\n\nGet all active substances sold (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import phytopharmaceuticals_transactions as pt\ndf = pt.get_all_active_substances_sold()\n\n# or to get regional data:\ndf = pt.get_all_active_substances_sold(\n        type_territoire=\"R\u00e9gion\", code_territoire=\"32\"\n    )\n\n# or to get departemantal data:\ndf = pt.get_all_active_substances_sold(\n        type_territoire=\"D\u00e9partement\", filter_regions=\"32\"\n    )\n```\n\nGet all phytopharmaceutical products sold (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import phytopharmaceuticals_transactions as pt\ndf = pt.get_all_phytopharmaceutical_products_sold()\n\n# or to get regional data:\ndf = pt.get_all_phytopharmaceutical_products_sold(\n        type_territoire=\"R\u00e9gion\", code_territoire=\"32\"\n    )\n\n# or to get departemantal data:\ndf = pt.get_all_phytopharmaceutical_products_sold(\n        type_territoire=\"D\u00e9partement\", filter_regions=\"32\"\n    )\n```\n\nLow level class to perform the same tasks:\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith pt.PhytopharmaceuticalsSession() as session:\n    df = session.active_substances_sold(\n        annee_min=2010,\n        annee_max=2015,\n        code_territoire=[\"32\"],\n        type_territoire=\"R\u00e9gion\",\n        )\n    df = session.phytopharmaceutical_products_sold(\n        annee_min=2010,\n        annee_max=2015,\n        code_territoire=[\"32\"],\n        type_territoire=\"R\u00e9gion\",\n        eaj=\"Oui\",\n        unite=\"l\",\n    )\n    df = session.active_substances_bought(\n        annee_min=2010,\n        annee_max=2015,\n        code_territoire=[\"32\"],\n        type_territoire=\"R\u00e9gion\",\n    )\n    df = session.phytopharmaceutical_products_bought(\n        code_territoire=[\"32\"],\n        type_territoire=\"R\u00e9gion\",\n        eaj=\"Oui\",\n        unite=\"l\",\n    )\n\n```\n\n### Watercourses flow\n\n3 high level functions are available (and one class for low level operations).\n\nGet all stations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import watercourses_flow\ndf = watercourses_flow.get_all_stations()\n```\n\nGet all observations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import watercourses_flow\ndf = watercourses_flow.get_all_observations()\n```\n\nNote that this query is heavy, users should restrict it to a given territory when possible.\nFor instance, you could use :\n```python\ndf = watercourses_flow.get_all_observations(code_region=\"11\")\n```\n\nGet all campaigns:\n\n```python\nfrom cl_hubeau import watercourses_flow\ndf = watercourses_flow.get_all_campaigns()\n```\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith watercourses_flow.WatercoursesFlowSession() as session:\n    df = session.get_stations(code_departement=\"59\")\n    df = session.get_campaigns(code_campagne=[12])\n    df = session.get_observations(code_station=\"F6640008\")\n\n```\n\n### Drinking water quality\n\n2 high level functions are available (and one class for low level operations).\n\n\nGet all water networks (UDI) (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import drinking_water_quality\ndf = drinking_water_quality.get_all_water_networks()\n```\n\nGet the sanitary controls's results for nitrates on all networks of Paris, Lyon & Marseille\n(uses a 30 days caching) for nitrates\n\n```python\nnetworks = drinking_water_quality.get_all_water_networks(code_region=[\"11\", \"84\", \"93\"])\nnetworks = networks[\n    networks.nom_commune.isin([\"PARIS\", \"MARSEILLE\", \"LYON\"])\n    ][\"code_reseau\"].unique().tolist()\n\ndf = drinking_water_quality.get_control_results(\n    code_reseau=networks, code_parametre=\"1340\"\n)\ndf = df[df.nom_commune.isin([\"PARIS\", \"MARSEILLE\", \"LYON\"])]\n```\n\nNote that this query is heavy, even if this was already restricted to nitrates.\nIn theory, you could also query the API without specifying the substance you're tracking,\nbut this has not been tested.\n\nYou can also call the same function, using official city codes directly:\n```python\ndf = drinking_water_quality.get_control_results(\n    code_commune=['59350'],\n    code_parametre=\"1340\"\n)\n```\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith drinking_water_quality.DrinkingWaterQualitySession() as session:\n    df = session.get_cities_networks(nom_commune=\"LILLE\")\n    df = session.get_control_results(code_departement='02', code_parametre=\"1340\")\n\n```\n\n### Hydrobiology\n\n3 high level functions are available (and one class for low level operations).\n\n\nGet all stations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import hydrobiology\ndf = hydrobiology.get_all_water_networks()\n```\n\nGet the taxa identified on stations in Paris (uses a 30 days caching):\n\n```python\ndf = hydrobiology.get_all_taxa(code_commune=[\"75056\"])\n```\n\nNote that this query is heavy if not restricted to areas and/or timeranges.\nIn theory, you could query the API without arguments, but this has not been\ntested (this should not be possible on standard machines because of the\nRAM consumption).\n\nGet the indexes identified on stations in Paris (uses a 30 days caching):\n\n```python\ndf = hydrobiology.get_all_indexes(code_commune=[\"75056\"])\n```\n\nNote that this query is heavy if not restricted to areas and/or timeranges.\nIn theory, you could query the API without arguments, but this has not been\ntested (this should not be possible on standard machines because of the\nRAM consumption).\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith hydrobiology.HydrobiologySession() as session:\n    df = session.get_stations(code_commune=\"75056\")\n    df = session.get_taxa(code_commune=\"75056\")\n    df = session.get_indexes(code_commune=\"75056\")\n\n```\n\n\n### Hydrometry\n\n4 high level functions are available (and one class for low level operations).\n\n\nGet all stations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import hydrometry\ngdf = hydrometry.get_all_stations()\n```\n\nGet all sites (uses a 30 days caching):\n\n```python\ngdf = hydrometry.get_all_sites()\n```\n\nGet observations for the first 5 sites (uses a 30 days caching):\n_Note that this will also work with stations (instead of sites)._\n\n```python\ndf = hydrometry.get_observations(gdf[\"code_site\"].head(5).tolist())\n```\n\nGet realtime data for the first 5 sites (no cache stored):\n\nA small cache is stored to allow for realtime consumption (cache expires after\nonly 15 minutes). Please, adopt a responsible usage with this functionnality !\n\n\n```python\ndf = hydrometry.get_realtime_observations(gdf[\"code_site\"].head(5).tolist())\n```\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility, noticely for realtime data\n\n```python\nwith hydrometry.HydrometrySession() as session:\n    df = session.get_stations(code_station=\"K437311001\")\n    df = session.get_sites(code_departement=['02', '59', '60', '62', '80'], format=\"geojson\")\n    df = session.get_realtime_observations(code_entite=\"K437311001\")\n    df = session.get_observations(code_entite=\"K437311001\")\n\n```\n\n### Superficial waterbodies quality\n\n4 high level functions are available (and one class for low level operations).\n\n\nGet all stations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import superficial_waterbodies_quality\ndf = superficial_waterbodies_quality.get_all_stations()\n```\n\nGet all operations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import superficial_waterbodies_quality\ndf = superficial_waterbodies_quality.get_all_operations()\n```\n\nNote that this query is heavy, users should restrict it to a given territory.\nFor instance, you could use :\n```python\ndf = superficial_waterbodies_quality.get_all_operations(code_region=\"11\")\n```\n\nGet all environmental conditions:\n\n```python\nfrom cl_hubeau import superficial_waterbodies_quality\ndf = superficial_waterbodies_quality.get_all_environmental_conditions()\n```\n\nNote that this query is heavy, users should restrict it to a given territory.\nFor instance, you could use :\n```python\ndf = superficial_waterbodies_quality.get_all_environmental_conditions(code_region=\"11\")\n```\n\nGet all physicochemical analyses:\n```python\nfrom cl_hubeau import superficial_waterbodies_quality\ndf = superficial_waterbodies_quality.get_all_analyses()\n```\n\nNote that this query is heavy, users should restrict it to a given territory\nand given parameters. For instance, you could use :\n```python\ndf = superficial_waterbodies_quality.get_all_analyses(\n    code_departement=\"59\",\n    code_parametre=\"1313\"\n    )\n```\n\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith superficial_waterbodies_quality.SuperficialWaterbodiesQualitySession() as session:\n    df = session.get_stations(code_commune=\"59183\")\n    df = session.get_operations(code_commune=\"59183\")\n    df = session.get_environmental_conditions(code_commune=\"59183\")\n    df = session.get_analyses(code_commune='59183', code_parametre=\"1340\")\n\n```\n\n### Ground waterbodies quality\n\n2 high level functions are available (and one class for low level operations).\n\n\nGet all stations (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import ground_water_quality\ndf = ground_water_quality.get_all_stations()\n```\n\nGet the tests results for nitrates :\n\n```python\ndf = ground_water_quality.df = get_all_analyses(code_param=\"1340\")\n```\n\nNote that this query is heavy, even if this was already restricted to nitrates, and that it\nmay fail. In theory, you could even query the API without specifying the substance\nyou're tracking, but you will hit the 20k threshold and trigger an exception.\n\nIn practice, you should call the same function with a territorial restriction or with\nspecific `bss_id`s.\nFor instance, you could use official city codes directly:\n\n```python\ndf = ground_water_quality.get_all_analyses(\n    num_departement=[\"59\"]\n    code_param=\"1340\"\n)\n```\n\nNote: a bit of caution is needed here, as the arguments are **NOT** the same\nin the two endpoints. Please have a look at the documentation on\n[hubeau](https://hubeau.eaufrance.fr/page/api-qualite-nappes#/qualite-nappes/analyses).\nFor instance, the city's number is called `\"code_insee_actuel\"` on analyses' endpoint\nand `\"code_commune\"` on station's.\n\nLow level class to perform the same tasks:\n\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility\n\n```python\nwith ground_water_quality.GroundWaterQualitySession() as session:\n    df = session.get_stations(bss_id=\"01832B0600\")\n    df = session.get_analyses(\n        bss_id=[\"BSS000BMMA\"],\n        code_param=\"1461\",\n        )\n```\n\n### Piezometry\n\n3 high level functions are available (and one class for low level operations).\n\nGet all piezometers (uses a 30 days caching):\n\n```python\nfrom cl_hubeau import piezometry\ngdf = piezometry.get_all_stations()\n```\n\nGet chronicles for the first 100 piezometers (uses a 30 days caching):\n\n```python\ndf = piezometry.get_chronicles(gdf[\"code_bss\"].head(100).tolist())\n```\n\nGet realtime data for the first 100 piezometers:\n\nA small cache is stored to allow for realtime consumption (cache expires after\nonly 15 minutes). Please, adopt a responsible usage with this functionnality !\n\n```python\ndf = get_realtime_chronicles(gdf[\"code_bss\"].head(100).tolist())\n```\n\nLow level class to perform the same tasks:\n\nNote that :\n\n* the API is forbidding results > 20k rows and you may need inner loops\n* the cache handling will be your responsibility, noticely for realtime data\n\n```python\nwith piezometry.PiezometrySession() as session:\n    df = session.get_chronicles(code_bss=\"07548X0009/F\")\n    df = session.get_stations(code_departement=['02', '59', '60', '62', '80'], format=\"geojson\")\n    df = session.get_chronicles_real_time(code_bss=\"07548X0009/F\")\n```\n\n\n\n\n\n### Convenience functions\n\nIn order to ease queries on hydrographic territories, some convenience functions\nhave been added to this module.\n\nIn these process, we are harvesting official geodatasets which are not available on hub'eau;\nafterwards, simple geospatial joins are performed with the latest geodataset of french cities.\n\nThese are **convenience** tools and there **will** be approximations (geographical precision\nof both datasets might not match).\n\n#### SAGE (Sch\u00e9ma d'Am\u00e9nagement et de Gestion des Eaux)\n\nYou can retrieve a SAGE's communal components using the following snippet:\n\n```python\n\nfrom cl_hubeau.utils import cities_for_sage\n\nd = cities_for_sage()\n```\n\nThe official geodataset is eaufrance's SAGE.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "Hubeau client to collect data from the different APIs",
    "version": "0.11.0",
    "project_urls": {
        "Documentation": "https://tgrandje.github.io/cl-hubeau",
        "Homepage": "https://tgrandje.github.io/cl-hubeau",
        "Repository": "https://github.com/tgrandje/cl-hubeau/"
    },
    "split_keywords": [
        "france",
        " water",
        " hydrology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6f0821eaec1021ffef8ccb00c851aa7a1ee0334addc59d2d23ef961c0c485956",
                "md5": "adc2f99d65e9a0e1e8eefb2bd080f0fc",
                "sha256": "849482aabd76afb76fa367aad8b052cbd296cc4fc2ae39d38d1ca052d00cb4f4"
            },
            "downloads": -1,
            "filename": "cl_hubeau-0.11.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "adc2f99d65e9a0e1e8eefb2bd080f0fc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 64529,
            "upload_time": "2025-08-01T11:42:49",
            "upload_time_iso_8601": "2025-08-01T11:42:49.070068Z",
            "url": "https://files.pythonhosted.org/packages/6f/08/21eaec1021ffef8ccb00c851aa7a1ee0334addc59d2d23ef961c0c485956/cl_hubeau-0.11.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "192564e8230140caba31145bb492d8f47cfe21d845031fcb1fb2929e9ae8287c",
                "md5": "8baf3a239cc5c13aa28953d64e934a7f",
                "sha256": "c389c6becd9beab83d6d1b7cebb9dfc178917da620a6ff7befcdb80174aa64ed"
            },
            "downloads": -1,
            "filename": "cl_hubeau-0.11.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8baf3a239cc5c13aa28953d64e934a7f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 49291,
            "upload_time": "2025-08-01T11:42:50",
            "upload_time_iso_8601": "2025-08-01T11:42:50.489887Z",
            "url": "https://files.pythonhosted.org/packages/19/25/64e8230140caba31145bb492d8f47cfe21d845031fcb1fb2929e9ae8287c/cl_hubeau-0.11.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-01 11:42:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tgrandje",
    "github_project": "cl-hubeau",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cl-hubeau"
}

Thomas Grandjean