datagouv-client

Name	datagouv-client JSON
Version	0.1.1 JSON
	download
home_page	https://www.data.gouv.fr
Summary	Wrapper for the data.gouv.fr API
upload_time	2025-06-06 11:04:17
maintainer	None
docs_url	None
author	Etalab
requires_python	None
license	https://spdx.org/licenses/MIT.html#licenseText
keywords	api wrapper datagouv
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # **datagouv-client**
This package is a python wrapper for the data.gouv.fr API. It allows you to interact easily with datasets and resources, on all three platforms (production aka `www`, `demo` and `dev`). You can install it through `pypi`:
```bash
pip install datagouv-client
```
in an environment that runs on `python>=3.10`.

## Use

### Getting existing datasets and resources
If you only want to retrieve existing objects (aka you don't want to modify them on datagouv), here is what a workflow could look like:
```python
from datagouv import Dataset, Resource

dataset = Dataset("5d13a8b6634f41070a43dff3")  # you can find a dataset's id in the `Informations` tab of its landing page

# you can now access a bunch of info of the dataset
print(dataset.title)
print(dataset.description)
print(dataset.created_at)
print(dataset)  # this displays all the attributes of the dataset as a dict

# and of course its resources, which are all Resource instances
for res in dataset.resources:
    print(res.title)
    print(res.url)  # this is the download URL of the resource
    print(res.id)  # the id of the resource itself
    print(res.dataset_id)  # the id of the dataset the resource belongs to
    print(res)  # this displays all the attributes of the resource as a dict

# if you are only interested in a specific resource
resource = Resource("f868cca6-8da1-4369-a78d-47463f19a9a3")  # you can find a resource's id in its `Métadonnées` tab
print(resource)

# you can also access a dataset from one of its resources
d = resource.dataset()  # NB: this is a method, and returns an instance of Dataset

# you can also download a resource locally (NB: make sure to create the parent folders upstream)
resource.download("./file.csv")  # this saves the resource in your working directory as "file.csv"

# and a subset or all resources of a dataset (NB: make sure to create the parent folders upstream)
# the files are named `resource_id.format` (for instance f868cca6-8da1-4369-a78d-47463f19a9a3.csv)
d.download_resources(
    folder="data",  # if not specified, saves them into your working directory
    resources_types=["main", "documentation"],  # default is only main resources
)
```

> NB: If you want to get objects from demo or dev, you must use a client:
```python
from datagouv import Client, Dataset, Resource

dataset = Dataset("5d13a8b6634f41070a43dff3", _client=Client("demo"))
```

### Interacting with objects online
If you want to modify objects on the datagouv platforms, you will need to create an authenticated client:
```python
from datagouv import Client

client = Client(
    environment="www",  # here you can set which platform the client will interact with, default is production
    api_key="MY_SECRET_API_KEY",  # your API key, that grants your rights on the platform
)
```
> NB: you can find your API key on https://www.data.gouv.fr/fr/admin/me/ (don't forget to change the prefix to get the key from the right environment).

Once your client is set up, you can instantiate datasets and resources from it. Of course, **you will only be allowed to modify objects according to your rights** (so objects created by you or an organization you are part of):
```python
dataset = client.dataset("5d13a8b6634f41070a43dff3")
# this is also a Dataset instance, with all the same attributes as above, but since you're authenticated, you have access to new methods

dataset.update({"title": "A brand new title"})  # update the dataset online with the payload you give, and also update the attributes of the object
print(dataset.title)  # -> "A brand new title"
dataset.delete()  # delete the dataset, use with caution!

# you can also modify the extras
dataset.update_extras(payload)
dataset.delete_extras(payload)

# the methods are the same for resources
for idx, res in enumerate(dataset.resources):
    res.update({"title": f"Resource n°{idx + 1}"})
    print(res.title)  # -> "Resource n°X"
    # delete every third resource
    if idx % 3 == 0:
        res.delete()
```

With an authenticated client, you are also allowed to create datasets and resources on the environment you specified:
```python
dataset = client.dataset().create(
    {
        "title": "New dataset", 
        "description": "A description is a required",
        "organization": "646b7187b50b2a93b1ae3d45",  # the organization that will own the dataset
    },
)  # this creates a dataset with the values you specified, and instantiates a Dataset
dataset.update({"tags": ["environment", "water"]})
```
There are two types of resources on datagouv:
- `static`: a file is uploaded directly on the platform
- `remote`: reference the URL of a file that is stored somewhere else on the internet

You have two options to create a resource (of any type):
- from the client itself, by specifying the id of the dataset you want to include it into (you must have the rights on the dataset):
```python
# to create a static resource from a file
resource = client.resource().create_static(
    file_to_upload="path/to/your/file.txt",
    payload={"title": "New static resource"},
    dataset_id="5d13a8b6634f41070a43dff3",
)  # this creates a static resource with the values you specified, and instantiates a Resource

# to create a remote resource from an url
resource = client.resource().create_remote(
    payload={"url": "http://example.com/file.txt", "title": "New remote resource"},
    dataset_id="5d13a8b6634f41070a43dff3",
)  # this creates a remote resource with the values you specified, and instantiates a Resource
```
- from the dataset you want to include it into (you must have the rights on the dataset), in which case you don't have to specify the `dataset_id`:
```python
dataset = client.dataset("5d13a8b6634f41070a43dff3")
# to create a static resource from a file
resource = dataset.create_static(
    file_to_upload="path/to/your/file.txt",
    payload={"title": "New static resource"},
)  # this creates a static resource with the values you specified, and instantiates a Resource

# to create a remote resource from an url
resource = dataset.create_remote(
    payload={"url": "http://example.com/file.txt", "title": "New remote resource"},
)  # this creates a remote resource with the values you specified, and instantiates a Resource

# to update the file of a static resource
resource.update({"title": "New title"}, file_to_upload="path/to/your/new_file.txt")
```
> NB: If you are not planning to use an object's attributes, you may prevent the initial API call using `fetch=False`, in order not to unnecessarily ping the API.
```python
dataset = client.dataset("5d13a8b6634f41070a43dff3", fetch=False)
print(dataset.title)  # -> this will fail because the attributes are not set from the initial call
# but you can update the object as usual
dataset.update({"title": "New title"})
print(dataset.title)  # -> "New title"   because the attributes are set from the response
```

### Advanced features
Many datagouv endpoints are paginated, which can make it tedious to retrieve all objects. An instance of `Client` has a method to create an iterator from any endpoint that returns paginated data:
```python
for obj in client.get_all_from_api_query(
    "api/1/datasets/?organization=534fff81a3a7292c64a77e5c",  # get all datasets from a specific organization
    mask="data{id,title,resources{id,title}}",  # you can apply a mask to retrieve only specific fields of the objects
):
    print(f"Dataset {obj['title']} has {len(obj['resources'])} resources")
```

## Contribution
Contributions and feedback are welcome! Main guidelines:
- as few API calls as possible (use responses to create/update objects)
- build on the existing

Remember to format, lint, and sort imports with [Ruff](https://docs.astral.sh/ruff/) before committing (checks will remind you anyway):
```bash
pip install .[dev]
ruff check --fix .
ruff format .
```

## Release
The release process uses [bump'X](https://github.com/datagouv/bumpx).

Raw data

            {
    "_id": null,
    "home_page": "https://www.data.gouv.fr",
    "name": "datagouv-client",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "api wrapper datagouv",
    "author": "Etalab",
    "author_email": "opendatateam@data.gouv.fr",
    "download_url": null,
    "platform": null,
    "description": "# **datagouv-client**\nThis package is a python wrapper for the data.gouv.fr API. It allows you to interact easily with datasets and resources, on all three platforms (production aka `www`, `demo` and `dev`). You can install it through `pypi`:\n```bash\npip install datagouv-client\n```\nin an environment that runs on `python>=3.10`.\n\n## Use\n\n### Getting existing datasets and resources\nIf you only want to retrieve existing objects (aka you don't want to modify them on datagouv), here is what a workflow could look like:\n```python\nfrom datagouv import Dataset, Resource\n\ndataset = Dataset(\"5d13a8b6634f41070a43dff3\")  # you can find a dataset's id in the `Informations` tab of its landing page\n\n# you can now access a bunch of info of the dataset\nprint(dataset.title)\nprint(dataset.description)\nprint(dataset.created_at)\nprint(dataset)  # this displays all the attributes of the dataset as a dict\n\n# and of course its resources, which are all Resource instances\nfor res in dataset.resources:\n    print(res.title)\n    print(res.url)  # this is the download URL of the resource\n    print(res.id)  # the id of the resource itself\n    print(res.dataset_id)  # the id of the dataset the resource belongs to\n    print(res)  # this displays all the attributes of the resource as a dict\n\n# if you are only interested in a specific resource\nresource = Resource(\"f868cca6-8da1-4369-a78d-47463f19a9a3\")  # you can find a resource's id in its `M\u00e9tadonn\u00e9es` tab\nprint(resource)\n\n# you can also access a dataset from one of its resources\nd = resource.dataset()  # NB: this is a method, and returns an instance of Dataset\n\n# you can also download a resource locally (NB: make sure to create the parent folders upstream)\nresource.download(\"./file.csv\")  # this saves the resource in your working directory as \"file.csv\"\n\n# and a subset or all resources of a dataset (NB: make sure to create the parent folders upstream)\n# the files are named `resource_id.format` (for instance f868cca6-8da1-4369-a78d-47463f19a9a3.csv)\nd.download_resources(\n    folder=\"data\",  # if not specified, saves them into your working directory\n    resources_types=[\"main\", \"documentation\"],  # default is only main resources\n)\n```\n\n> NB: If you want to get objects from demo or dev, you must use a client:\n```python\nfrom datagouv import Client, Dataset, Resource\n\ndataset = Dataset(\"5d13a8b6634f41070a43dff3\", _client=Client(\"demo\"))\n```\n\n### Interacting with objects online\nIf you want to modify objects on the datagouv platforms, you will need to create an authenticated client:\n```python\nfrom datagouv import Client\n\nclient = Client(\n    environment=\"www\",  # here you can set which platform the client will interact with, default is production\n    api_key=\"MY_SECRET_API_KEY\",  # your API key, that grants your rights on the platform\n)\n```\n> NB: you can find your API key on https://www.data.gouv.fr/fr/admin/me/ (don't forget to change the prefix to get the key from the right environment).\n\nOnce your client is set up, you can instantiate datasets and resources from it. Of course, **you will only be allowed to modify objects according to your rights** (so objects created by you or an organization you are part of):\n```python\ndataset = client.dataset(\"5d13a8b6634f41070a43dff3\")\n# this is also a Dataset instance, with all the same attributes as above, but since you're authenticated, you have access to new methods\n\ndataset.update({\"title\": \"A brand new title\"})  # update the dataset online with the payload you give, and also update the attributes of the object\nprint(dataset.title)  # -> \"A brand new title\"\ndataset.delete()  # delete the dataset, use with caution!\n\n# you can also modify the extras\ndataset.update_extras(payload)\ndataset.delete_extras(payload)\n\n# the methods are the same for resources\nfor idx, res in enumerate(dataset.resources):\n    res.update({\"title\": f\"Resource n\u00b0{idx + 1}\"})\n    print(res.title)  # -> \"Resource n\u00b0X\"\n    # delete every third resource\n    if idx % 3 == 0:\n        res.delete()\n```\n\nWith an authenticated client, you are also allowed to create datasets and resources on the environment you specified:\n```python\ndataset = client.dataset().create(\n    {\n        \"title\": \"New dataset\", \n        \"description\": \"A description is a required\",\n        \"organization\": \"646b7187b50b2a93b1ae3d45\",  # the organization that will own the dataset\n    },\n)  # this creates a dataset with the values you specified, and instantiates a Dataset\ndataset.update({\"tags\": [\"environment\", \"water\"]})\n```\nThere are two types of resources on datagouv:\n- `static`: a file is uploaded directly on the platform\n- `remote`: reference the URL of a file that is stored somewhere else on the internet\n\nYou have two options to create a resource (of any type):\n- from the client itself, by specifying the id of the dataset you want to include it into (you must have the rights on the dataset):\n```python\n# to create a static resource from a file\nresource = client.resource().create_static(\n    file_to_upload=\"path/to/your/file.txt\",\n    payload={\"title\": \"New static resource\"},\n    dataset_id=\"5d13a8b6634f41070a43dff3\",\n)  # this creates a static resource with the values you specified, and instantiates a Resource\n\n# to create a remote resource from an url\nresource = client.resource().create_remote(\n    payload={\"url\": \"http://example.com/file.txt\", \"title\": \"New remote resource\"},\n    dataset_id=\"5d13a8b6634f41070a43dff3\",\n)  # this creates a remote resource with the values you specified, and instantiates a Resource\n```\n- from the dataset you want to include it into (you must have the rights on the dataset), in which case you don't have to specify the `dataset_id`:\n```python\ndataset = client.dataset(\"5d13a8b6634f41070a43dff3\")\n# to create a static resource from a file\nresource = dataset.create_static(\n    file_to_upload=\"path/to/your/file.txt\",\n    payload={\"title\": \"New static resource\"},\n)  # this creates a static resource with the values you specified, and instantiates a Resource\n\n# to create a remote resource from an url\nresource = dataset.create_remote(\n    payload={\"url\": \"http://example.com/file.txt\", \"title\": \"New remote resource\"},\n)  # this creates a remote resource with the values you specified, and instantiates a Resource\n\n# to update the file of a static resource\nresource.update({\"title\": \"New title\"}, file_to_upload=\"path/to/your/new_file.txt\")\n```\n> NB: If you are not planning to use an object's attributes, you may prevent the initial API call using `fetch=False`, in order not to unnecessarily ping the API.\n```python\ndataset = client.dataset(\"5d13a8b6634f41070a43dff3\", fetch=False)\nprint(dataset.title)  # -> this will fail because the attributes are not set from the initial call\n# but you can update the object as usual\ndataset.update({\"title\": \"New title\"})\nprint(dataset.title)  # -> \"New title\"   because the attributes are set from the response\n```\n\n### Advanced features\nMany datagouv endpoints are paginated, which can make it tedious to retrieve all objects. An instance of `Client` has a method to create an iterator from any endpoint that returns paginated data:\n```python\nfor obj in client.get_all_from_api_query(\n    \"api/1/datasets/?organization=534fff81a3a7292c64a77e5c\",  # get all datasets from a specific organization\n    mask=\"data{id,title,resources{id,title}}\",  # you can apply a mask to retrieve only specific fields of the objects\n):\n    print(f\"Dataset {obj['title']} has {len(obj['resources'])} resources\")\n```\n\n## Contribution\nContributions and feedback are welcome! Main guidelines:\n- as few API calls as possible (use responses to create/update objects)\n- build on the existing\n\nRemember to format, lint, and sort imports with [Ruff](https://docs.astral.sh/ruff/) before committing (checks will remind you anyway):\n```bash\npip install .[dev]\nruff check --fix .\nruff format .\n```\n\n## Release\nThe release process uses [bump'X](https://github.com/datagouv/bumpx).\n",
    "bugtrack_url": null,
    "license": "https://spdx.org/licenses/MIT.html#licenseText",
    "summary": "Wrapper for the data.gouv.fr API",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://www.data.gouv.fr/fr/dataservices/api-catalogue-des-donnees-ouvertes-data-gouv-fr/",
        "Homepage": "https://www.data.gouv.fr",
        "Source": "https://github.com/datagouv/datagouv_client"
    },
    "split_keywords": [
        "api",
        "wrapper",
        "datagouv"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d6b1d030be1aeb816b7546a0aed2145906369e3ba5dd1532c3631a3eb7f5d08b",
                "md5": "573a6fc7422557f73eb6d9c3234e35e5",
                "sha256": "f4681eadbf4f855b750d705099a77729988cb849f1c15b8909b8bd23da1e223b"
            },
            "downloads": -1,
            "filename": "datagouv_client-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "573a6fc7422557f73eb6d9c3234e35e5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 15400,
            "upload_time": "2025-06-06T11:04:17",
            "upload_time_iso_8601": "2025-06-06T11:04:17.840330Z",
            "url": "https://files.pythonhosted.org/packages/d6/b1/d030be1aeb816b7546a0aed2145906369e3ba5dd1532c3631a3eb7f5d08b/datagouv_client-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-06-06 11:04:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datagouv",
    "github_project": "datagouv_client",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "circle": true,
    "lcname": "datagouv-client"
}

Etalab