Name | mobisurvstd JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | A tool to standardize French mobility survey datasets (EMC2, EGT, EMP, etc.) into a unified format. |
upload_time | 2025-07-18 16:32:46 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
mobility survey
polars
data format
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# MobiSurvStd
*A tool to standardize French mobility survey datasets (EMCÂČ, EGT, EMP, etc.) into a unified format.*
đ [Documentation](https://mobisurvstd.github.io/MobiSurvStd)
đŠ [View on PyPI](https://pypi.org/project/mobisurvstd/)
<span style="color:blue">đ«đ· [Version française](./README.fr.md)</span>
## Table of Contents
- [Introduction](#introduction)
- [Mobility Surveys in France](#mobility-surveys-in-france)
- [Why MobiSurvStd?](#why-mobisurvstd)
- [Installation](#installation)
- [Getting Started](#getting-started)
- [Supported Surveys](#supported-surveys)
- [Legal Notice](#legal-notice)
- [Contributing](#contributing)
## Introduction
MobiSurvStd (Mobility Survey Standardizer) is an easy-to-use Python command line interface to convert many French mobility surveys
(EMCÂČ, EGT, EMP, etc.) to a unique standardized and clean format.
---
## Mobility Surveys in France
In France, despite recent efforts by CEREMA to create a standard format for mobility surveys â with
the EMCÂČ surveys â various formats co-exist:
- [**EMCÂČ**](https://www.cerema.fr/fr/activites/mobilites/connaissance-modelisation-evaluation-mobilite/enquetes-mobilite-emc2)
(EnquĂȘte mobilitĂ© certifiĂ©e CEREMA): mobility surveys for many French territories (since 2018);
- [**EGT H2020**](https://omnil.fr/actualites/les-resultats-de-la-derniere-enquete-globale-transport)
(EnquĂȘte Globale Transport, Ăle-de-France MobilitĂ©s): mobility survey for Ăle-de-France (2018â2020;
incomplete due to COVID-19);
- [**EMP**](https://www.statistiques.developpement-durable.gouv.fr/resultats-detailles-de-lenquete-mobilite-des-personnes-de-2019)
(EnquĂȘte mobilitĂ© des personnes, SDES): national mobility survey (2019).
- [**EMG**](https://www.institutparisregion.fr/mobilite-et-transports/deplacements/enquete-regionale-sur-la-mobilite-des-franciliens/)
(EnquĂȘte MobilitĂ© par GPS, Institut Paris RĂ©gion): mobility survey for Ăle-de-France (2022-2023)
with individuals tracked over several days through GPS.
Also surveys based on previous formats are still in use today:
- **EDVM** (EnquĂȘtes DĂ©placements Villes Moyennes, CEREMA): mobility surveys for medium-size cities
(until 2018);
- **EDGT** (EnquĂȘtes DĂ©placements Grands Territoires, CEREMA): mobility surveys for periphery areas
(until 2018);
- [**EGT 2010**](https://omnil.fr/egt-2010) (EnquĂȘte Globale Transport, Ăle-de-France MobilitĂ©s):
previous version of the Ăle-de-France mobility survey;
- [**ENTD**](https://www.statistiques.developpement-durable.gouv.fr/enquete-nationale-transports-et-deplacements-entd-2008)
(EnquĂȘte nationale transports et dĂ©placements, SDES): former national mobility survey (2008).
---
## Why MobiSurvStd?
The existing formats all have the same drawbacks:
- Data are sorted in CSV files which are not always straightforward to read (Which separator?
Which encoding? What are the variable datatypes?).
- Variable names and modalities are not always clear (e.g., in the EMCÂČ format, variable "P2"
represents the gender of the person, with modality 1 for a man and 2 for a woman).
- Joining two datasets is hard and not well documented (e.g., in the EMCÂČ format, to join the
persons with their household, the variables to use are "METH", "ZFM" and "ECH" for the households
and "DMET", "ZFD" and "ECH" for the persons).
Additionally, when working with different territories / periods, it is often necessary to write a
similar code multiple times due to the existence of different formats.
MobiSurvStd solves all these issues by being able to convert all survey formats to a [well-defined
Parquet format](https://mobisurvstd.github.io/MobiSurvStd/format/index.html).
See [this example](https://mobisurvstd.github.io/MobiSurvStd/problem-example.html) to understand how
MobiSurvStd can simplify your workflow.
---
## Installation
Install the library with
```bash
pip install mobisurvstd
```
---
## Getting Started
Download the
[EMP 2019 survey](https://www.statistiques.developpement-durable.gouv.fr/resultats-detailles-de-lenquete-mobilite-des-personnes-de-2019)
_(Données individuelles anonymisées (fichiers au format CSV))_, then run the following command:
```bash
python -m mobisurvstd emp_2019_donnees_individuelles_anonymisees_novembre2024.zip standardized_emp2019 --survey-type emp2019
```
It will standardize the EMP 2019 survey and save the resulting Parquet files in the
`standardized_emp2019` directory.
These Parquet files can then be analyzed using e.g., [polars](https://pola.rs/) or
[pandas](https://pandas.pydata.org) in Python or [arrow](https://arrow.apache.org/docs/r/) in R.
A detailed definition of the Parquet format used by MobiSurvStd is available
[here](https://mobisurvstd.github.io/MobiSurvStd/format/index.html).
You can also standardize surveys programmatically in Python with the `standardize` function:
```python
import mobisurvstd
mobisurvstd.standardize(
"emp_2019_donnees_individuelles_anonymisees_novembre2024.zip",
"standardized_emp2019",
survey_type="emp2019",
)
```
For more, check the [User Guide](https://mobisurvstd.github.io/MobiSurvStd/howto.html).
---
## Case Study: Bicycle Use
The following graph represents the share of bicycle trips for EMCÂČ, EDGT, and EGT surveys.
The circle colors represent the average number of bicycles in the surveyed households.
The circle sizes represent the expected number of trips in the surveyed area.
The graph has been generated from the code in
[analyses/bicycle_shares.py](analyses/bicycle_shares.py).

The following map represents the share of bicycle trips within INSEE municipalities.
Only municipalities with more than 30 surveyed trips are shown.
The map has been generated from the code in
[analyses/bicycle_shares_by_insee.py](analyses/bicycle_shares_by_insee.py).

---
## Supported Surveys
Currently, MobiSurvStd supports the following survey types:
- `emp2019`
- `emc2`
- `egt2020`
- `egt2010`
- `edgt`
See [Survey Types](https://mobisurvstd.github.io/MobiSurvStd/surveys.html) for more.
Other survey types that are planned to be integrated are: `edvm`, `entd`, `emg`.
If you know another survey format that could be integrated, feel free to open an issue on GitHub.
MobiSurvStd covers only French mobility survey formats.
If other countries have similar survey formats, they might be easily integrated into MobiSurvStd (or
a variant of it).
---
## Legal Notice
<span style="color:red">
â ïž <strong>MobiSurvStd does not anonymize the data.</strong>
If you are working with confidential datasets (e.g., EMCÂČ or EGT surveys), you must apply the same
confidentiality rules to the standardized data as to the original data.
In particular, <strong>you must not share the standardized data</strong> if your confidentiality
agreement prohibits sharing the original data.
</span>
---
## Contributing
If you think you found a bug, if you have a suggestion, or if you want to integrate a new format,
feel free to open an issue on GitHub or to create a Pull Request.
Raw data
{
"_id": null,
"home_page": null,
"name": "mobisurvstd",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "mobility survey, polars, data format",
"author": null,
"author_email": "Lucas Javaudin <lucas@lucasjavaudin.com>",
"download_url": "https://files.pythonhosted.org/packages/cb/06/575a0fc694ddc7876898711e6a1908c22154e3f867f5d5c8b54fc46d721f/mobisurvstd-0.2.0.tar.gz",
"platform": null,
"description": "# MobiSurvStd\n\n*A tool to standardize French mobility survey datasets (EMC\u00b2, EGT, EMP, etc.) into a unified format.*\n\n\ud83d\udcda [Documentation](https://mobisurvstd.github.io/MobiSurvStd)\n\ud83d\udce6 [View on PyPI](https://pypi.org/project/mobisurvstd/)\n\n<span style=\"color:blue\">\ud83c\uddeb\ud83c\uddf7 [Version fran\u00e7aise](./README.fr.md)</span>\n\n## Table of Contents\n- [Introduction](#introduction)\n- [Mobility Surveys in France](#mobility-surveys-in-france)\n- [Why MobiSurvStd?](#why-mobisurvstd)\n- [Installation](#installation)\n- [Getting Started](#getting-started)\n- [Supported Surveys](#supported-surveys)\n- [Legal Notice](#legal-notice)\n- [Contributing](#contributing)\n\n## Introduction\n\nMobiSurvStd (Mobility Survey Standardizer) is an easy-to-use Python command line interface to convert many French mobility surveys\n(EMC\u00b2, EGT, EMP, etc.) to a unique standardized and clean format.\n\n---\n\n## Mobility Surveys in France\n\nIn France, despite recent efforts by CEREMA to create a standard format for mobility surveys \u2013 with\nthe EMC\u00b2 surveys \u2013 various formats co-exist:\n\n- [**EMC\u00b2**](https://www.cerema.fr/fr/activites/mobilites/connaissance-modelisation-evaluation-mobilite/enquetes-mobilite-emc2)\n (Enqu\u00eate mobilit\u00e9 certifi\u00e9e CEREMA): mobility surveys for many French territories (since 2018);\n- [**EGT H2020**](https://omnil.fr/actualites/les-resultats-de-la-derniere-enquete-globale-transport)\n (Enqu\u00eate Globale Transport, \u00cele-de-France Mobilit\u00e9s): mobility survey for \u00cele-de-France (2018\u20132020;\n incomplete due to COVID-19);\n- [**EMP**](https://www.statistiques.developpement-durable.gouv.fr/resultats-detailles-de-lenquete-mobilite-des-personnes-de-2019)\n (Enqu\u00eate mobilit\u00e9 des personnes, SDES): national mobility survey (2019).\n- [**EMG**](https://www.institutparisregion.fr/mobilite-et-transports/deplacements/enquete-regionale-sur-la-mobilite-des-franciliens/)\n (Enqu\u00eate Mobilit\u00e9 par GPS, Institut Paris R\u00e9gion): mobility survey for \u00cele-de-France (2022-2023)\n with individuals tracked over several days through GPS.\n\nAlso surveys based on previous formats are still in use today:\n\n- **EDVM** (Enqu\u00eates D\u00e9placements Villes Moyennes, CEREMA): mobility surveys for medium-size cities\n (until 2018);\n- **EDGT** (Enqu\u00eates D\u00e9placements Grands Territoires, CEREMA): mobility surveys for periphery areas\n (until 2018);\n- [**EGT 2010**](https://omnil.fr/egt-2010) (Enqu\u00eate Globale Transport, \u00cele-de-France Mobilit\u00e9s):\n previous version of the \u00cele-de-France mobility survey;\n- [**ENTD**](https://www.statistiques.developpement-durable.gouv.fr/enquete-nationale-transports-et-deplacements-entd-2008)\n (Enqu\u00eate nationale transports et d\u00e9placements, SDES): former national mobility survey (2008).\n\n---\n\n## Why MobiSurvStd?\n\nThe existing formats all have the same drawbacks:\n\n- Data are sorted in CSV files which are not always straightforward to read (Which separator?\n Which encoding? What are the variable datatypes?).\n- Variable names and modalities are not always clear (e.g., in the EMC\u00b2 format, variable \"P2\"\n represents the gender of the person, with modality 1 for a man and 2 for a woman).\n- Joining two datasets is hard and not well documented (e.g., in the EMC\u00b2 format, to join the\n persons with their household, the variables to use are \"METH\", \"ZFM\" and \"ECH\" for the households\n and \"DMET\", \"ZFD\" and \"ECH\" for the persons).\n\nAdditionally, when working with different territories / periods, it is often necessary to write a\nsimilar code multiple times due to the existence of different formats.\n\nMobiSurvStd solves all these issues by being able to convert all survey formats to a [well-defined\nParquet format](https://mobisurvstd.github.io/MobiSurvStd/format/index.html).\n\nSee [this example](https://mobisurvstd.github.io/MobiSurvStd/problem-example.html) to understand how\nMobiSurvStd can simplify your workflow.\n\n---\n\n## Installation\n\nInstall the library with\n\n```bash\npip install mobisurvstd\n```\n\n---\n\n## Getting Started\n\nDownload the\n[EMP 2019 survey](https://www.statistiques.developpement-durable.gouv.fr/resultats-detailles-de-lenquete-mobilite-des-personnes-de-2019)\n_(Donn\u00e9es individuelles anonymis\u00e9es (fichiers au format CSV))_, then run the following command:\n\n```bash\npython -m mobisurvstd emp_2019_donnees_individuelles_anonymisees_novembre2024.zip standardized_emp2019 --survey-type emp2019\n```\n\nIt will standardize the EMP 2019 survey and save the resulting Parquet files in the\n`standardized_emp2019` directory.\nThese Parquet files can then be analyzed using e.g., [polars](https://pola.rs/) or\n[pandas](https://pandas.pydata.org) in Python or [arrow](https://arrow.apache.org/docs/r/) in R.\n\nA detailed definition of the Parquet format used by MobiSurvStd is available\n[here](https://mobisurvstd.github.io/MobiSurvStd/format/index.html).\n\nYou can also standardize surveys programmatically in Python with the `standardize` function:\n\n```python\nimport mobisurvstd\nmobisurvstd.standardize(\n \"emp_2019_donnees_individuelles_anonymisees_novembre2024.zip\",\n \"standardized_emp2019\",\n survey_type=\"emp2019\",\n)\n```\n\nFor more, check the [User Guide](https://mobisurvstd.github.io/MobiSurvStd/howto.html).\n\n---\n\n## Case Study: Bicycle Use\n\nThe following graph represents the share of bicycle trips for EMC\u00b2, EDGT, and EGT surveys.\nThe circle colors represent the average number of bicycles in the surveyed households.\nThe circle sizes represent the expected number of trips in the surveyed area.\n\nThe graph has been generated from the code in\n[analyses/bicycle_shares.py](analyses/bicycle_shares.py).\n\n\n\nThe following map represents the share of bicycle trips within INSEE municipalities.\nOnly municipalities with more than 30 surveyed trips are shown.\n\nThe map has been generated from the code in\n[analyses/bicycle_shares_by_insee.py](analyses/bicycle_shares_by_insee.py).\n\n\n\n---\n\n## Supported Surveys\n\nCurrently, MobiSurvStd supports the following survey types:\n\n- `emp2019`\n- `emc2`\n- `egt2020`\n- `egt2010`\n- `edgt`\n\nSee [Survey Types](https://mobisurvstd.github.io/MobiSurvStd/surveys.html) for more.\n\nOther survey types that are planned to be integrated are: `edvm`, `entd`, `emg`.\nIf you know another survey format that could be integrated, feel free to open an issue on GitHub.\n\nMobiSurvStd covers only French mobility survey formats.\nIf other countries have similar survey formats, they might be easily integrated into MobiSurvStd (or\na variant of it).\n\n---\n\n## Legal Notice\n\n<span style=\"color:red\">\n\u26a0\ufe0f <strong>MobiSurvStd does not anonymize the data.</strong>\nIf you are working with confidential datasets (e.g., EMC\u00b2 or EGT surveys), you must apply the same\nconfidentiality rules to the standardized data as to the original data.\nIn particular, <strong>you must not share the standardized data</strong> if your confidentiality\nagreement prohibits sharing the original data.\n</span>\n\n---\n\n## Contributing\n\nIf you think you found a bug, if you have a suggestion, or if you want to integrate a new format,\nfeel free to open an issue on GitHub or to create a Pull Request.\n",
"bugtrack_url": null,
"license": null,
"summary": "A tool to standardize French mobility survey datasets (EMC2, EGT, EMP, etc.) into a unified format.",
"version": "0.2.0",
"project_urls": {
"Documentation": "https://mobisurvstd.github.io/MobiSurvStd/index.html",
"Homepage": "https://github.com/MobiSurvStd/MobiSurvStd",
"Issues": "https://github.com/MobiSurvStd/MobiSurvStd/issues",
"Repository": "https://github.com/MobiSurvStd/MobiSurvStd"
},
"split_keywords": [
"mobility survey",
" polars",
" data format"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b709f4cc727c5102eaac77b1d22f807e3db7b3f9147178b99b12c252c02c450c",
"md5": "efbdbcc1b0aa56138433f89ad34f8b39",
"sha256": "45b423a1077cf8d109cc8304e759c087d4395b663b21a1e19cafeb7fc3cf698b"
},
"downloads": -1,
"filename": "mobisurvstd-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "efbdbcc1b0aa56138433f89ad34f8b39",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 154266,
"upload_time": "2025-07-18T16:32:43",
"upload_time_iso_8601": "2025-07-18T16:32:43.936967Z",
"url": "https://files.pythonhosted.org/packages/b7/09/f4cc727c5102eaac77b1d22f807e3db7b3f9147178b99b12c252c02c450c/mobisurvstd-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cb06575a0fc694ddc7876898711e6a1908c22154e3f867f5d5c8b54fc46d721f",
"md5": "0723e788b34c2ed3bcd6a34f23c9897d",
"sha256": "b1f114f73184ea54cc08f3c9ec9cea05d38e6277650828ca738cf88f24e90076"
},
"downloads": -1,
"filename": "mobisurvstd-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "0723e788b34c2ed3bcd6a34f23c9897d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 129639,
"upload_time": "2025-07-18T16:32:46",
"upload_time_iso_8601": "2025-07-18T16:32:46.258101Z",
"url": "https://files.pythonhosted.org/packages/cb/06/575a0fc694ddc7876898711e6a1908c22154e3f867f5d5c8b54fc46d721f/mobisurvstd-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-18 16:32:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MobiSurvStd",
"github_project": "MobiSurvStd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mobisurvstd"
}