dp-mobility-report


Namedp-mobility-report JSON
Version 0.2.11 PyPI version JSON
download
home_pagehttps://github.com/FreeMoveProject/dp_mobility_report
SummaryCreate a report for mobility data with differential privacy guarantees.
upload_time2024-03-04 15:22:00
maintainer
docs_urlNone
authorAlexandra Kapp
requires_python>=3.8
licenseMIT license
keywords dp_mobility_report
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ============================================================
Differentially Private Mobility Report (DpMobilityReport)
============================================================


.. image:: https://img.shields.io/pypi/v/dp_mobility_report.svg
        :target: https://pypi.python.org/pypi/dp_mobility_report

        
.. image:: https://readthedocs.org/projects/dp-mobility-report/badge/?version=latest
        :target: https://dp-mobility-report.readthedocs.io/en/latest/?version=latest
        :alt: Documentation Status




* Free software: MIT license
* Documentation: https://dp-mobility-report.readthedocs.io.


``dp_mobility_report``: A python package to create a mobility report with differential privacy (DP) guarantees, especially for urban human mobility data. 


Quickstart 
**************

Install
==========

.. code-block:: bash

        pip install dp-mobility-report

or from GitHub:

.. code-block:: bash

        pip install git+https://github.com/FreeMoveProject/dp_mobility_report


Data preparation
====================

**df**: 

* A pandas ``DataFrame``. 
* Expected columns: User ID ``uid``, Trip ID ``tid``, timestamp ``datetime`` (expected is a datetime-like string, e.g., in the format ``yyyy-mm-dd hh:mm:ss``. If ``datetime`` contains ``int`` values, it is interpreted as sequence positions, i.e., if the dataset only consists of sequences without timestamps), latitude and longitude in CRS EPSG:4326 ``lat`` and ``lng``. (We thereby closely followed the format of the `scikit-mobility`_ ``TrajDataFrame``.)
* Here you can find an `example dataset`_.

**tessellation**: 

* A geopandas ``GeoDataFrame`` with polygons. 
* Expected columns: ``tile_id``. 
* The tessellation is used for spatial aggregations of the data. 
* Here you can find an `example tessellation`_. 
* If you don't have a tessellation, you can use this code to `create a tessellation`_.


Create a DpMobilityReport
===================================

.. code-block:: python

        import pandas as pd
        import geopandas as gpd
        from dp_mobility_report import DpMobilityReport

        df = pd.read_csv(
            "https://raw.githubusercontent.com/FreeMoveProject/dp_mobility_report/main/tests/test_files/test_data.csv"
        )
        tessellation = gpd.read_file(
            "https://raw.githubusercontent.com/FreeMoveProject/dp_mobility_report/main/tests/test_files/test_tessellation.geojson"
        )

        report = DpMobilityReport(df, tessellation, privacy_budget=10, max_trips_per_user=5)

        report.to_file("my_mobility_report.html")


The parameter ``privacy_budget`` (in terms of *epsilon*-DP) determines how much noise is added to the data. The budget is split between all analyses of the report.
If the value is set to ``None`` no noise (i.e., no privacy guarantee) is applied to the report.

The parameter ``max_trips_per_user`` specifies how many trips a user can contribute to the dataset at most. If a user is represented with more trips, a random sample is drawn according to ``max_trips_per_user``.
If the value is set to ``None`` the full dataset is used. Note, that deriving the maximum trips per user from the data violates the differential privacy guarantee. Thus, ``None`` should only be used in combination with ``privacy_budget=None``.

Please refer to the `documentation`_ for information on further parameters. Here you can find information on the `analyses`_ of the report.

Example HTMLs can be found in the examples_ folder.


Create a BenchmarkReport 
================================

A benchmark report evaluate the similarity of two (differentially private) mobility reports from one or two mobility datasets. This can be based on two datasets (``df_base`` and ``df_alternative``) or one dataset (``df_base``)) with different privacy settings.
The arguments ``df``, ``privacy_budget``, ``user_privacy``, ``max_trips_per_user`` and ``budget_split`` can differ for the two datasets set with the according ending ``_base`` and ``_alternative``. The other arguments are the same for both reports.
For the evaluation, `similarity measures`_ (namely the (mean) absolute percentage error (PE), Jensen-Shannon divergence (JSD), Kullback-Leibler divergence (KLD), and the earth mover's distance (EMD)) are computed to quantify the statistical similarity for each analysis.
The evaluation, i.e., benchmark report, will be generated as an HTML file, using the ``.to_file()`` method.


Benchmark of two different datasets 
---------------------------------------

This example creates a benchmark report with similarity measures for two mobility datasets, called *base* and *alternative* in the following. This is intended to compare different datasets with the same or no privacy budget.

.. code-block:: python

        import pandas as pd
        import geopandas as gpd
        from dp_mobility_report import BenchmarkReport

        # -- insert paths --
        df_base = pd.read_csv("mobility_dataset_base.csv")
        df_alternative = pd.read_csv("mobility_dataset_alternative.csv")
        tessellation = gpd.read_file("tessellation.gpkg")

        benchmark_report = BenchmarkReport(
            df_base=df_base, tesselation=tessellation, df_alternative=df_alternative
        )

        # Dictionary containing the similarity measures for each analysis
        similarity_measures = benchmark_report.similarity_measures
        # The measure selection indicates which similarity measure
        # (e.g. KLD, JSD, EMD, PE) has been selected for each analysis
        measure_selection = benchmark_report.measure_selection

        # If you do not want to access the selection of similarity measures
        # but e.g. the Jensen-Shannon divergence for all analyses:
        jsd = benchmark_report.jsd

        # benchmark_report.to_file("my_benchmark_mobility_report.html")


The parameter ``measure_selection`` specifies which similarity measures should be chosen for the ``similarity_measures`` dictionary that is an attribute of the ``BenchmarkReport``. 
The default is set to a specific set of similarity measures for each analysis which can be accessed by ``dp_mobility_report.default_measure_selection()``. 
The default of single analyses can be overwritten as shown in the following:

.. code-block:: python

        from dp_mobility_report import BenchmarkReport, default_measure_selection
        from dp_mobility_report import constants as const

        # print the default measure selection
        print(default_measure_selection())

        # change default of EMD for visits_per_tile to JSD.
        # For the other analyses the default measure is remained
        custom_measure_selection = {const.VISITS_PER_TILE: const.JSD}

        benchmark_report = BenchmarkReport(
            df_base=df_base,
            tesselation=tessellation,
            df_alternative=df_alternative,
            measure_selection=custom_measure_selection,
        )



Benchmark of the same dataset with different privacy settings
-------------------------------------------------------------------

This example creates a BenchmarkReport with similarity measures for the same mobility dataset with different privacy settings (``privacy_budget``, ``user_privacy``, ``max_trips_per_user`` and ``budget_split``) to assess the utility loss of the privacy budget for the different analyses. 

.. code-block:: python

        import pandas as pd
        import geopandas as gpd
        from dp_mobility_report import BenchmarkReport

        # -- insert paths --
        df_base = pd.read_csv("mobility_dataset_base.csv")
        tessellation = gpd.read_file("tessellation.gpkg")

        benchmark_report = BenchmarkReport(
            df_base=df_base,
            tesselation=tessellation,
            privacy_budget_base=None,
            privacy_budget_alternative=5,
            max_trips_per_user_base=None,
            max_trips_per_user_alternative=4,
        )

        similarity_measures = benchmark_report.similarity_measures

        # benchmark_report.to_file("my_benchmark_mobility_report.html")



Please refer to the `documentation`_ for information on further parameters.


Examples
*********

Berlin mobility data simulated using the `DLR TAPAS`_ Model: [`Code used for Berlin`_]

* `Report of Berlin without DP`_
* `Report of Berlin with DP epsilon=1`_

Madrid `CRTM survey`_ data: [`Code used for Madrid`_]

* `Report of Madrid without DP`_
* `Report of Madrid with DP epsilon=10`_

Beijing `Geolife`_ dataset: [`Code used for Beijing`_]

* `Report of Beijing without DP`_
* `Report of Beijing with DP epsilon=50`_

Benchmark Report: [`Code used for Benchmarkreport of Berlin`_]

* `Benchmarkreport of Berlin without DP and with DP epsilon=1`_

(Here you find the `code of the data preprocessing`_ to obtain the needed format)

Citing
******
if you use dp-mobility-report please cite the `following paper`_:

.. code-block::

        @article{doi:10.1080/17489725.2022.2148008,
                        author = {Alexandra Kapp and Saskia Nuñez von Voigt and Helena Mihaljević and Florian Tschorsch},
                        title = {Towards mobility reports with user-level privacy},
                        journal = {Journal of Location Based Services},
                        volume = {17},
                        number = {2},
                        pages = {95-121},
                        year  = {2023},
                        publisher = {Taylor & Francis},
                        doi = {10.1080/17489725.2022.2148008}
        }


Credits
========

This package was highly inspired by the `pandas-profiling/pandas-profiling`_ and `scikit-mobility`_ packages.

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.


This package was developed as part of the freemove project which is funded by:

.. image:: https://www.freemove.space/assets/images/bmbf-logo.svg

 
.. _`example dataset`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/tests/test_files/test_data.csv
.. _`example tessellation`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/tests/test_files/test_tessellation.geojson
.. _`create a tessellation`:  https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/create_tessellation.py
.. _documentation: https://dp-mobility-report.readthedocs.io/en/latest/modules.html
.. _analyses: https://dp-mobility-report.readthedocs.io/en/latest/analyses.html
.. _`similarity measures`: https://dp-mobility-report.readthedocs.io/en/latest/similarity_measures.html
.. _`DLR TAPAS`: https://github.com/DLR-VF/TAPAS
.. _`Report of Berlin without DP`: https://freemoveproject.github.io/dp_mobility_report/examples/html/berlin_noPrivacy.html
.. _`Report of Berlin with DP epsilon=1`: https://freemoveproject.github.io/dp_mobility_report/examples/html/berlin.html
.. _`Code used for Berlin`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_berlin.py
.. _`CRTM survey`: https://crtm.maps.arcgis.com/apps/MinimalGallery/index.html?appid=a60bb2f0142b440eadee1a69a11693fc
.. _`Report of Madrid without DP`: https://freemoveproject.github.io/dp_mobility_report/examples/html/madrid_noPrivacy.html
.. _`Report of Madrid with DP epsilon=10`: https://freemoveproject.github.io/dp_mobility_report/examples/html/madrid.html
.. _`Code used for Madrid`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_madrid.py
.. _`Geolife`: https://www.microsoft.com/en-us/download/details.aspx?id=52367
.. _`Report of Beijing without DP`: https://freemoveproject.github.io/dp_mobility_report/examples/html/geolife_noPrivacy.html
.. _`Report of Beijing with DP epsilon=50`: https://freemoveproject.github.io/dp_mobility_report/examples/html/geolife.html
.. _`Code used for Beijing`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_geolife.py
.. _`Benchmarkreport of Berlin without DP and with DP epsilon=1`: https://freemoveproject.github.io/dp_mobility_report/examples/html/berlin_benchmark.html
.. _`Code used for Benchmarkreport of Berlin`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_benchmark.py
.. _`code of the data preprocessing`: https://github.com/FreeMoveProject/evaluation_dp_mobility_report/blob/main/01_preprocess_evaluation_data.py
.. _`following paper`: https://www.tandfonline.com/doi/full/10.1080/17489725.2022.2148008
.. _`pandas-profiling/pandas-profiling`: https://github.com/pandas-profiling/pandas-profiling
.. _`scikit-mobility`: https://github.com/scikit-mobility
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage


History
*********
0.2.11 (2024-03-04)
===================
* Fix copy-paste errors in validation of preprocessing
* Fix bug: get_trips_over_time can never reach "month" condition

0.2.10 (2024-01-03)
===================
* Fix to work with pandas 2.2.0rc0 update

0.2.9 (2023-08-17)
==================
* Fix to work with pandas 2.1.0rc0 update

0.2.8 (2023-04-03)
==================
* Bug fix: smape of trips per day

0.2.7 (2023-03-30)
==================
* Update requirements

0.2.6 (2023-03-24)
==================
* Bug fix: shape mismatch in similarity_measures for edge case (only counts in bin "inf")

0.2.5 (2023-03-24)
==================
* Bug fix: compatibility with pandas >= 2.0 and pandas < 2.0

0.2.4 (2023-03-23)
==================
* Enhance HTML design 
* Include info texts for all analyses
* Include documentation for differential privacy and an info box about DP in the report
* Enhance documentation
* Add option for `subtitle` in DpMobilityReport and BenchmarkReport to name the report.

0.2.3 (2023-02-13)
==================
* Bug fix: handle if no visit is within the tessallation
* Bug fix: handle if no OD trip is within the tessallation
* Bug fix: unify histogram bins rounding issue

0.2.2 (2023-02-01)
==================
* Bug fix: exclude user_time_delta if there is no user with at least two trips.
* Bug fix: set max_trips_per_user correctly if user_privacy=False.
* Enhancement: do not exclude jump_length and travel_time if no tessellation is given

0.2.1 (2023-01-24)
==================
* Bug fix: Correct range of scale for visits per time and tile map. 

0.2.0 (2023-01-23)
==================
* Create a BenchmarkReport class that evaluates the similarity of two (differentially private) mobility reports from one or two mobility datasets and creates an HTML output similar to the DpMobilityReport.

0.1.8 (2023-01-16)
==================
* Refine handling of OD Analysis input data:
    * warn if there are no trips with more than a single record and exclude OD Analysis
    * use all trips for travel time and jump length computation instead of only trips inside tessellation.

0.1.7 (2023-01-10)
==================
* Restructuring of HTML headlines.

0.1.6 (2023-01-09)
==================
* Refactoring of template files.

0.1.5 (2022-12-12)
==================
* Remove scikit-mobility dependency and refactor od flow visualization.

0.1.4 (2022=12=07)
==================
* Remove Google Fonts from HTML.

0.1.3 (2022-12-05)
==================
* Handle FutureWarning of pandas.

0.1.2 (2022-11-24)
==================
* Enhanced documentation for all properties of `DpMobilityReport` class

0.1.1 (2022-10-27)
==================
* fix bug: prevent error "key `trips` not found" in `trips_over_time` if sum of `trip_count` is 0

0.1.0 (2022-10-21)
==================
* make tessellation an Optional parameter
* allow DataFrames without timestamps but sequence numbering instead (i.e., `integer` for `timestamp` column)
* allow to set seed for reproducible sampling of the dataset (according to `max_trips_per_user`)

0.0.8 (2022-10-20)
==================
* Fixes addressing deprecation warnings.

0.0.7 (2022-10-17)
==================

* parameter for a custom split of the privacy budget between different analyses
* extend 'analysis_selection' to include single analyses instead of entire segments
* parameter for 'analysis_exclusion' instead of selection
* bug fix: include all possible categories for days and hour of days
* bug fix: show correct percentage of outliers
* show 95% confidence-interval instead of upper and lower bound
* show privacy budget and confidence interval for each analysis

0.0.6 (2022-09-30)
==================

* Remove scaling of counts to match a consistent trip_count / record_count (from ds_statistics) in visits_per_tile, visits_per_time_tile and od_flows. Scaling was implemented to keep the report consistent, though it is removed for now as it introduces new issues.
* Minor bug fixes in the visualization: outliers were not correctly converted into percentage. 

0.0.5 (2022-08-26)
==================

Bug fix: correct scaling of timewindow counts.

0.0.4 (2022-08-22)
==================

* Simplify naming: from :code:`MobilityDataReport` to :code:`DpMobilityReport`
* Simplify import: from :code:`from dp_mobility_report import md_report.MobilityDataReport` to :code:`from dp_mobility_report import DpMobilityReport`
* Enhance documentation: change style and correctly include API reference.

0.0.3 (2022-07-22)
==================

* Fix broken link.

0.0.2 (2022-07-22)
==================

* First release to PyPi.
* It includes all basic functionality, though still in alpha version and under development.

0.0.1 (2021-12-16)
==================

* First version used for evaluation in Alexandra Kapp, Saskia Nuñez von Voigt, Helena Mihaljević & Florian Tschorsch (2022) Towards mobility reports with user-level privacy, Journal of Location Based Services, DOI: 10.1080/17489725.2022.2148008.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/FreeMoveProject/dp_mobility_report",
    "name": "dp-mobility-report",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "dp_mobility_report",
    "author": "Alexandra Kapp",
    "author_email": "alexandra.kapp@htw-berlin.de",
    "download_url": "https://files.pythonhosted.org/packages/72/80/e537576a6a8b25b88fba073742dab3a56cf855896d9387b7bc8116eb47c9/dp-mobility-report-0.2.11.tar.gz",
    "platform": null,
    "description": "============================================================\nDifferentially Private Mobility Report (DpMobilityReport)\n============================================================\n\n\n.. image:: https://img.shields.io/pypi/v/dp_mobility_report.svg\n        :target: https://pypi.python.org/pypi/dp_mobility_report\n\n        \n.. image:: https://readthedocs.org/projects/dp-mobility-report/badge/?version=latest\n        :target: https://dp-mobility-report.readthedocs.io/en/latest/?version=latest\n        :alt: Documentation Status\n\n\n\n\n* Free software: MIT license\n* Documentation: https://dp-mobility-report.readthedocs.io.\n\n\n``dp_mobility_report``: A python package to create a mobility report with differential privacy (DP) guarantees, especially for urban human mobility data. \n\n\nQuickstart \n**************\n\nInstall\n==========\n\n.. code-block:: bash\n\n        pip install dp-mobility-report\n\nor from GitHub:\n\n.. code-block:: bash\n\n        pip install git+https://github.com/FreeMoveProject/dp_mobility_report\n\n\nData preparation\n====================\n\n**df**: \n\n* A pandas ``DataFrame``. \n* Expected columns: User ID ``uid``, Trip ID ``tid``, timestamp ``datetime`` (expected is a datetime-like string, e.g., in the format ``yyyy-mm-dd hh:mm:ss``. If ``datetime`` contains ``int`` values, it is interpreted as sequence positions, i.e., if the dataset only consists of sequences without timestamps), latitude and longitude in CRS EPSG:4326 ``lat`` and ``lng``. (We thereby closely followed the format of the `scikit-mobility`_ ``TrajDataFrame``.)\n* Here you can find an `example dataset`_.\n\n**tessellation**: \n\n* A geopandas ``GeoDataFrame`` with polygons. \n* Expected columns: ``tile_id``. \n* The tessellation is used for spatial aggregations of the data. \n* Here you can find an `example tessellation`_. \n* If you don't have a tessellation, you can use this code to `create a tessellation`_.\n\n\nCreate a DpMobilityReport\n===================================\n\n.. code-block:: python\n\n        import pandas as pd\n        import geopandas as gpd\n        from dp_mobility_report import DpMobilityReport\n\n        df = pd.read_csv(\n            \"https://raw.githubusercontent.com/FreeMoveProject/dp_mobility_report/main/tests/test_files/test_data.csv\"\n        )\n        tessellation = gpd.read_file(\n            \"https://raw.githubusercontent.com/FreeMoveProject/dp_mobility_report/main/tests/test_files/test_tessellation.geojson\"\n        )\n\n        report = DpMobilityReport(df, tessellation, privacy_budget=10, max_trips_per_user=5)\n\n        report.to_file(\"my_mobility_report.html\")\n\n\nThe parameter ``privacy_budget`` (in terms of *epsilon*-DP) determines how much noise is added to the data. The budget is split between all analyses of the report.\nIf the value is set to ``None`` no noise (i.e., no privacy guarantee) is applied to the report.\n\nThe parameter ``max_trips_per_user`` specifies how many trips a user can contribute to the dataset at most. If a user is represented with more trips, a random sample is drawn according to ``max_trips_per_user``.\nIf the value is set to ``None`` the full dataset is used. Note, that deriving the maximum trips per user from the data violates the differential privacy guarantee. Thus, ``None`` should only be used in combination with ``privacy_budget=None``.\n\nPlease refer to the `documentation`_ for information on further parameters. Here you can find information on the `analyses`_ of the report.\n\nExample HTMLs can be found in the examples_ folder.\n\n\nCreate a BenchmarkReport \n================================\n\nA benchmark report evaluate the similarity of two (differentially private) mobility reports from one or two mobility datasets. This can be based on two datasets (``df_base`` and ``df_alternative``) or one dataset (``df_base``)) with different privacy settings.\nThe arguments ``df``, ``privacy_budget``, ``user_privacy``, ``max_trips_per_user`` and ``budget_split`` can differ for the two datasets set with the according ending ``_base`` and ``_alternative``. The other arguments are the same for both reports.\nFor the evaluation, `similarity measures`_ (namely the (mean) absolute percentage error (PE), Jensen-Shannon divergence (JSD), Kullback-Leibler divergence (KLD), and the earth mover's distance (EMD)) are computed to quantify the statistical similarity for each analysis.\nThe evaluation, i.e., benchmark report, will be generated as an HTML file, using the ``.to_file()`` method.\n\n\nBenchmark of two different datasets \n---------------------------------------\n\nThis example creates a benchmark report with similarity measures for two mobility datasets, called *base* and *alternative* in the following. This is intended to compare different datasets with the same or no privacy budget.\n\n.. code-block:: python\n\n        import pandas as pd\n        import geopandas as gpd\n        from dp_mobility_report import BenchmarkReport\n\n        # -- insert paths --\n        df_base = pd.read_csv(\"mobility_dataset_base.csv\")\n        df_alternative = pd.read_csv(\"mobility_dataset_alternative.csv\")\n        tessellation = gpd.read_file(\"tessellation.gpkg\")\n\n        benchmark_report = BenchmarkReport(\n            df_base=df_base, tesselation=tessellation, df_alternative=df_alternative\n        )\n\n        # Dictionary containing the similarity measures for each analysis\n        similarity_measures = benchmark_report.similarity_measures\n        # The measure selection indicates which similarity measure\n        # (e.g. KLD, JSD, EMD, PE) has been selected for each analysis\n        measure_selection = benchmark_report.measure_selection\n\n        # If you do not want to access the selection of similarity measures\n        # but e.g. the Jensen-Shannon divergence for all analyses:\n        jsd = benchmark_report.jsd\n\n        # benchmark_report.to_file(\"my_benchmark_mobility_report.html\")\n\n\nThe parameter ``measure_selection`` specifies which similarity measures should be chosen for the ``similarity_measures`` dictionary that is an attribute of the ``BenchmarkReport``. \nThe default is set to a specific set of similarity measures for each analysis which can be accessed by ``dp_mobility_report.default_measure_selection()``. \nThe default of single analyses can be overwritten as shown in the following:\n\n.. code-block:: python\n\n        from dp_mobility_report import BenchmarkReport, default_measure_selection\n        from dp_mobility_report import constants as const\n\n        # print the default measure selection\n        print(default_measure_selection())\n\n        # change default of EMD for visits_per_tile to JSD.\n        # For the other analyses the default measure is remained\n        custom_measure_selection = {const.VISITS_PER_TILE: const.JSD}\n\n        benchmark_report = BenchmarkReport(\n            df_base=df_base,\n            tesselation=tessellation,\n            df_alternative=df_alternative,\n            measure_selection=custom_measure_selection,\n        )\n\n\n\nBenchmark of the same dataset with different privacy settings\n-------------------------------------------------------------------\n\nThis example creates a BenchmarkReport with similarity measures for the same mobility dataset with different privacy settings (``privacy_budget``, ``user_privacy``, ``max_trips_per_user`` and ``budget_split``) to assess the utility loss of the privacy budget for the different analyses. \n\n.. code-block:: python\n\n        import pandas as pd\n        import geopandas as gpd\n        from dp_mobility_report import BenchmarkReport\n\n        # -- insert paths --\n        df_base = pd.read_csv(\"mobility_dataset_base.csv\")\n        tessellation = gpd.read_file(\"tessellation.gpkg\")\n\n        benchmark_report = BenchmarkReport(\n            df_base=df_base,\n            tesselation=tessellation,\n            privacy_budget_base=None,\n            privacy_budget_alternative=5,\n            max_trips_per_user_base=None,\n            max_trips_per_user_alternative=4,\n        )\n\n        similarity_measures = benchmark_report.similarity_measures\n\n        # benchmark_report.to_file(\"my_benchmark_mobility_report.html\")\n\n\n\nPlease refer to the `documentation`_ for information on further parameters.\n\n\nExamples\n*********\n\nBerlin mobility data simulated using the `DLR TAPAS`_ Model: [`Code used for Berlin`_]\n\n* `Report of Berlin without DP`_\n* `Report of Berlin with DP epsilon=1`_\n\nMadrid `CRTM survey`_ data: [`Code used for Madrid`_]\n\n* `Report of Madrid without DP`_\n* `Report of Madrid with DP epsilon=10`_\n\nBeijing `Geolife`_ dataset: [`Code used for Beijing`_]\n\n* `Report of Beijing without DP`_\n* `Report of Beijing with DP epsilon=50`_\n\nBenchmark Report: [`Code used for Benchmarkreport of Berlin`_]\n\n* `Benchmarkreport of Berlin without DP and with DP epsilon=1`_\n\n(Here you find the `code of the data preprocessing`_ to obtain the needed format)\n\nCiting\n******\nif you use dp-mobility-report please cite the `following paper`_:\n\n.. code-block::\n\n        @article{doi:10.1080/17489725.2022.2148008,\n                        author = {Alexandra Kapp and Saskia Nu\u00f1ez von Voigt and Helena Mihaljevi\u0107 and Florian Tschorsch},\n                        title = {Towards mobility reports with user-level privacy},\n                        journal = {Journal of Location Based Services},\n                        volume = {17},\n                        number = {2},\n                        pages = {95-121},\n                        year  = {2023},\n                        publisher = {Taylor & Francis},\n                        doi = {10.1080/17489725.2022.2148008}\n        }\n\n\nCredits\n========\n\nThis package was highly inspired by the `pandas-profiling/pandas-profiling`_ and `scikit-mobility`_ packages.\n\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\n\n\nThis package was developed as part of the freemove project which is funded by:\n\n.. image:: https://www.freemove.space/assets/images/bmbf-logo.svg\n\n \n.. _`example dataset`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/tests/test_files/test_data.csv\n.. _`example tessellation`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/tests/test_files/test_tessellation.geojson\n.. _`create a tessellation`:  https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/create_tessellation.py\n.. _documentation: https://dp-mobility-report.readthedocs.io/en/latest/modules.html\n.. _analyses: https://dp-mobility-report.readthedocs.io/en/latest/analyses.html\n.. _`similarity measures`: https://dp-mobility-report.readthedocs.io/en/latest/similarity_measures.html\n.. _`DLR TAPAS`: https://github.com/DLR-VF/TAPAS\n.. _`Report of Berlin without DP`: https://freemoveproject.github.io/dp_mobility_report/examples/html/berlin_noPrivacy.html\n.. _`Report of Berlin with DP epsilon=1`: https://freemoveproject.github.io/dp_mobility_report/examples/html/berlin.html\n.. _`Code used for Berlin`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_berlin.py\n.. _`CRTM survey`: https://crtm.maps.arcgis.com/apps/MinimalGallery/index.html?appid=a60bb2f0142b440eadee1a69a11693fc\n.. _`Report of Madrid without DP`: https://freemoveproject.github.io/dp_mobility_report/examples/html/madrid_noPrivacy.html\n.. _`Report of Madrid with DP epsilon=10`: https://freemoveproject.github.io/dp_mobility_report/examples/html/madrid.html\n.. _`Code used for Madrid`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_madrid.py\n.. _`Geolife`: https://www.microsoft.com/en-us/download/details.aspx?id=52367\n.. _`Report of Beijing without DP`: https://freemoveproject.github.io/dp_mobility_report/examples/html/geolife_noPrivacy.html\n.. _`Report of Beijing with DP epsilon=50`: https://freemoveproject.github.io/dp_mobility_report/examples/html/geolife.html\n.. _`Code used for Beijing`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_geolife.py\n.. _`Benchmarkreport of Berlin without DP and with DP epsilon=1`: https://freemoveproject.github.io/dp_mobility_report/examples/html/berlin_benchmark.html\n.. _`Code used for Benchmarkreport of Berlin`: https://github.com/FreeMoveProject/dp_mobility_report/blob/main/examples/example_benchmark.py\n.. _`code of the data preprocessing`: https://github.com/FreeMoveProject/evaluation_dp_mobility_report/blob/main/01_preprocess_evaluation_data.py\n.. _`following paper`: https://www.tandfonline.com/doi/full/10.1080/17489725.2022.2148008\n.. _`pandas-profiling/pandas-profiling`: https://github.com/pandas-profiling/pandas-profiling\n.. _`scikit-mobility`: https://github.com/scikit-mobility\n.. _Cookiecutter: https://github.com/audreyr/cookiecutter\n.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\n\n\nHistory\n*********\n0.2.11 (2024-03-04)\n===================\n* Fix copy-paste errors in validation of preprocessing\n* Fix bug: get_trips_over_time can never reach \"month\" condition\n\n0.2.10 (2024-01-03)\n===================\n* Fix to work with pandas 2.2.0rc0 update\n\n0.2.9 (2023-08-17)\n==================\n* Fix to work with pandas 2.1.0rc0 update\n\n0.2.8 (2023-04-03)\n==================\n* Bug fix: smape of trips per day\n\n0.2.7 (2023-03-30)\n==================\n* Update requirements\n\n0.2.6 (2023-03-24)\n==================\n* Bug fix: shape mismatch in similarity_measures for edge case (only counts in bin \"inf\")\n\n0.2.5 (2023-03-24)\n==================\n* Bug fix: compatibility with pandas >= 2.0 and pandas < 2.0\n\n0.2.4 (2023-03-23)\n==================\n* Enhance HTML design \n* Include info texts for all analyses\n* Include documentation for differential privacy and an info box about DP in the report\n* Enhance documentation\n* Add option for `subtitle` in DpMobilityReport and BenchmarkReport to name the report.\n\n0.2.3 (2023-02-13)\n==================\n* Bug fix: handle if no visit is within the tessallation\n* Bug fix: handle if no OD trip is within the tessallation\n* Bug fix: unify histogram bins rounding issue\n\n0.2.2 (2023-02-01)\n==================\n* Bug fix: exclude user_time_delta if there is no user with at least two trips.\n* Bug fix: set max_trips_per_user correctly if user_privacy=False.\n* Enhancement: do not exclude jump_length and travel_time if no tessellation is given\n\n0.2.1 (2023-01-24)\n==================\n* Bug fix: Correct range of scale for visits per time and tile map. \n\n0.2.0 (2023-01-23)\n==================\n* Create a BenchmarkReport class that evaluates the similarity of two (differentially private) mobility reports from one or two mobility datasets and creates an HTML output similar to the DpMobilityReport.\n\n0.1.8 (2023-01-16)\n==================\n* Refine handling of OD Analysis input data:\n    * warn if there are no trips with more than a single record and exclude OD Analysis\n    * use all trips for travel time and jump length computation instead of only trips inside tessellation.\n\n0.1.7 (2023-01-10)\n==================\n* Restructuring of HTML headlines.\n\n0.1.6 (2023-01-09)\n==================\n* Refactoring of template files.\n\n0.1.5 (2022-12-12)\n==================\n* Remove scikit-mobility dependency and refactor od flow visualization.\n\n0.1.4 (2022=12=07)\n==================\n* Remove Google Fonts from HTML.\n\n0.1.3 (2022-12-05)\n==================\n* Handle FutureWarning of pandas.\n\n0.1.2 (2022-11-24)\n==================\n* Enhanced documentation for all properties of `DpMobilityReport` class\n\n0.1.1 (2022-10-27)\n==================\n* fix bug: prevent error \"key `trips` not found\" in `trips_over_time` if sum of `trip_count` is 0\n\n0.1.0 (2022-10-21)\n==================\n* make tessellation an Optional parameter\n* allow DataFrames without timestamps but sequence numbering instead (i.e., `integer` for `timestamp` column)\n* allow to set seed for reproducible sampling of the dataset (according to `max_trips_per_user`)\n\n0.0.8 (2022-10-20)\n==================\n* Fixes addressing deprecation warnings.\n\n0.0.7 (2022-10-17)\n==================\n\n* parameter for a custom split of the privacy budget between different analyses\n* extend 'analysis_selection' to include single analyses instead of entire segments\n* parameter for 'analysis_exclusion' instead of selection\n* bug fix: include all possible categories for days and hour of days\n* bug fix: show correct percentage of outliers\n* show 95% confidence-interval instead of upper and lower bound\n* show privacy budget and confidence interval for each analysis\n\n0.0.6 (2022-09-30)\n==================\n\n* Remove scaling of counts to match a consistent trip_count / record_count (from ds_statistics) in visits_per_tile, visits_per_time_tile and od_flows. Scaling was implemented to keep the report consistent, though it is removed for now as it introduces new issues.\n* Minor bug fixes in the visualization: outliers were not correctly converted into percentage. \n\n0.0.5 (2022-08-26)\n==================\n\nBug fix: correct scaling of timewindow counts.\n\n0.0.4 (2022-08-22)\n==================\n\n* Simplify naming: from :code:`MobilityDataReport` to :code:`DpMobilityReport`\n* Simplify import: from :code:`from dp_mobility_report import md_report.MobilityDataReport` to :code:`from dp_mobility_report import DpMobilityReport`\n* Enhance documentation: change style and correctly include API reference.\n\n0.0.3 (2022-07-22)\n==================\n\n* Fix broken link.\n\n0.0.2 (2022-07-22)\n==================\n\n* First release to PyPi.\n* It includes all basic functionality, though still in alpha version and under development.\n\n0.0.1 (2021-12-16)\n==================\n\n* First version used for evaluation in Alexandra Kapp, Saskia Nu\u00f1ez von Voigt, Helena Mihaljevi\u0107 & Florian Tschorsch (2022) Towards mobility reports with user-level privacy, Journal of Location Based Services, DOI: 10.1080/17489725.2022.2148008.\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Create a report for mobility data with differential privacy guarantees.",
    "version": "0.2.11",
    "project_urls": {
        "Homepage": "https://github.com/FreeMoveProject/dp_mobility_report"
    },
    "split_keywords": [
        "dp_mobility_report"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7280e537576a6a8b25b88fba073742dab3a56cf855896d9387b7bc8116eb47c9",
                "md5": "1a66b399750e08c1cfe07777d1e3db7c",
                "sha256": "b482f4742be265729295f103e64815bbb1250d6ae41aa359c44879fd4fffef76"
            },
            "downloads": -1,
            "filename": "dp-mobility-report-0.2.11.tar.gz",
            "has_sig": false,
            "md5_digest": "1a66b399750e08c1cfe07777d1e3db7c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 124927,
            "upload_time": "2024-03-04T15:22:00",
            "upload_time_iso_8601": "2024-03-04T15:22:00.179462Z",
            "url": "https://files.pythonhosted.org/packages/72/80/e537576a6a8b25b88fba073742dab3a56cf855896d9387b7bc8116eb47c9/dp-mobility-report-0.2.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-04 15:22:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "FreeMoveProject",
    "github_project": "dp_mobility_report",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "dp-mobility-report"
}
        
Elapsed time: 0.19978s