popmon


Namepopmon JSON
Version 1.4.6 PyPI version JSON
download
home_page
SummaryMonitor the stability of a pandas or spark dataset
upload_time2023-07-18 10:30:55
maintainer
docs_urlNone
author
requires_python>=3.7
licenseCopyright 2023 ING Analytics Wholesale Banking Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords pandas spark data-science data-analysis monitoring statistics python jupyter ipython
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ===========================
Population Shift Monitoring
===========================

|build| |docs| |release| |release_date| |downloads| |ruff|

|logo|

`popmon` is a package that allows one to check the stability of a dataset.
`popmon` works with both **pandas** and **spark datasets**.

`popmon` creates histograms of features binned in time-slices,
and compares the stability of the profiles_ and distributions of
those histograms using `statistical tests <https://popmon.readthedocs.io/en/latest/comparisons.html>`_, both over time and with respect to a reference.
It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two features.
`popmon` can **automatically flag** and alert on **changes observed over time**, such
as trends, shifts, peaks, outliers, anomalies, changing correlations, etc,
using monitoring business rules.

|example|

|histograms|

Announcements
=============

Spark 3.0
---------

With Spark 3.0, based on Scala 2.12, make sure to pick up the correct `histogrammar` jar files:

.. code-block:: python

  spark = SparkSession.builder.config(
      "spark.jars.packages",
      "io.github.histogrammar:histogrammar_2.12:1.0.20,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.20",
  ).getOrCreate()

For Spark 2.X compiled against scala 2.11, in the string above simply replace 2.12 with 2.11.

Examples
========

- `Flight Delays and Cancellations Kaggle data <https://crclz.com/popmon/reports/flight_delays_report.html>`_
- `Synthetic data (code example below) <https://crclz.com/popmon/reports/test_data_report.html>`_

Documentation
=============

The entire `popmon` documentation including tutorials can be found at `read-the-docs <https://popmon.readthedocs.io>`_.


Notebooks
=========

.. list-table::
   :widths: 80 20
   :header-rows: 1

   * - Tutorial
     - Colab link
   * - `Basic tutorial <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_basic.ipynb>`_
     - |notebook_basic_colab|
   * - `Detailed example (featuring configuration, Apache Spark and more) <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb>`_
     - |notebook_advanced_colab|
   * - `Incremental datasets (online analysis) <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_incremental_data.ipynb>`_
     - |notebook_incremental_data_colab|
   * - `Report interpretation (step-by-step guide) <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_reports.ipynb>`_
     - |notebook_reports_colab|

Check it out
============

The `popmon` library requires Python 3.6+ and is pip friendly. To get started, simply do:

.. code-block:: bash

  $ pip install popmon

or check out the code from our GitHub repository:

.. code-block:: bash

  $ git clone https://github.com/ing-bank/popmon.git
  $ pip install -e popmon

where in this example the code is installed in edit mode (option -e).

You can now use the package in Python with:

.. code-block:: python

  import popmon

**Congratulations, you are now ready to use the popmon library!**

Quick run
=========

As a quick example, you can do:

.. code-block:: python

  import pandas as pd
  import popmon
  from popmon import resources

  # open synthetic data
  df = pd.read_csv(resources.data("test.csv.gz"), parse_dates=["date"])
  df.head()

  # generate stability report using automatic binning of all encountered features
  # (importing popmon automatically adds this functionality to a dataframe)
  report = df.pm_stability_report(time_axis="date", features=["date:age", "date:gender"])

  # to show the output of the report in a Jupyter notebook you can simply run:
  report

  # or save the report to file
  report.to_file("monitoring_report.html")

To specify your own binning specifications and features you want to report on, you do:

.. code-block:: python

  # time-axis specifications alone; all other features are auto-binned.
  report = df.pm_stability_report(
      time_axis="date", time_width="1w", time_offset="2020-1-6"
  )

  # histogram selections. Here 'date' is the first axis of each histogram.
  features = [
      "date:isActive",
      "date:age",
      "date:eyeColor",
      "date:gender",
      "date:latitude",
      "date:longitude",
      "date:isActive:age",
  ]

  # Specify your own binning specifications for individual features or combinations thereof.
  # This bin specification uses open-ended ("sparse") histograms; unspecified features get
  # auto-binned. The time-axis binning, when specified here, needs to be in nanoseconds.
  bin_specs = {
      "longitude": {"bin_width": 5.0, "bin_offset": 0.0},
      "latitude": {"bin_width": 5.0, "bin_offset": 0.0},
      "age": {"bin_width": 10.0, "bin_offset": 0.0},
      "date": {
          "bin_width": pd.Timedelta("4w").value,
          "bin_offset": pd.Timestamp("2015-1-1").value,
      },
  }

  # generate stability report
  report = df.pm_stability_report(features=features, bin_specs=bin_specs, time_axis=True)

These examples also work with spark dataframes.
You can see the output of such example notebook code `here <https://crclz.com/popmon/reports/test_data_report.html>`_.
For all available examples, please see the `tutorials <https://popmon.readthedocs.io/en/latest/tutorials.html>`_ at read-the-docs.

Pipelines for monitoring dataset shift
======================================
Advanced users can leverage popmon's modular data pipeline to customize their workflow.
Visualization of the pipeline can be useful when debugging, or for didactic purposes.
There is a `script <https://github.com/ing-bank/popmon/tree/master/tools/>`_ included with the package that you can use.
The plotting is configurable, and depending on the options you will obtain a result that can be used for understanding the data flow, the high-level components and the (re)use of datasets.

|pipeline|

*Example pipeline visualization (click to enlarge)*

Reports and integrations
========================
The data shift computations that popmon performs, are by default displayed in a self-contained HTML report.
This format is favourable in many real-world environments, where access may be restricted.
Moreover, reports can be easily shared with others.

Access to the datastore means that its possible to integrate popmon in almost any workflow.
To give an example, one could store the histogram data in a PostgreSQL database and load that from Grafana and benefit from their visualisation and alert handling features (e.g. send an email or slack message upon alert).
This may be interesting to teams that are already invested in particular choice of dashboarding tool.

Possible integrations are:

+----------------+---------------+
| |grafana_logo| | |kibana_logo| |
+----------------+---------------+
| Grafana        | Kibana        |
+----------------+---------------+

Resources on how to integrate popmon are available in the `examples directory <https://github.com/ing-bank/popmon/tree/master/examples/integrations>`_.
Contributions of additional or improved integrations are welcome!

.. |grafana_logo| image:: https://upload.wikimedia.org/wikipedia/commons/a/a1/Grafana_logo.svg
    :alt: Grafana logo
    :height: 120
    :target: https://github.com/grafana/grafana

.. |kibana_logo| image:: https://miro.medium.com/max/1400/1*HW_x9ZvIbUkyaqHstsB1ig.png
    :alt: Kibana logo
    :height: 120
    :target: https://github.com/elastic/kibana

Comparison and profile extensions
---------------------------------

External libraries or custom functionality can be easily added to Profiles_ and Comparisons_.
If you developed an extension that could be generically used, then please consider contributing it to the package.

Popmon currently integrates:

* `Diptest <https://github.com/RUrlus/diptest>`_

A Python/C++ implementation of Hartigan & Hartigan's dip test for unimodality.
The dip test tests for multimodality in a sample by taking the maximum difference, over all sample points, between the empirical distribution function, and the unimodal distribution function that minimizes that maximum difference.
Other than unimodality, it makes no further assumptions about the form of the null distribution.

To enable this extension install diptest using ``pip install diptest`` or ``pip install popmon[diptest]``.

Resources
=========

Presentations
-------------

+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| Title                                                                                          | Host                                                                                             | Date              | Speaker                 |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| popmon: Analysis Package for Dataset Shift Detection                                           | `SciPy Conference 2022 <https://www.scipy2022.scipy.org/>`_                                      | July 13, 2022     | Simon Brugman           |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| Popmon - population monitoring made easy                                                       | `Big Data Technology Warsaw Summit 2021 <https://bigdatatechwarsaw.eu/>`_                        | February 25, 2021 | Simon Brugman           |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| Popmon - population monitoring made easy                                                       | `Data Lunch @ Eneco <https://www.eneco.nl/>`_                                                    | October 29, 2020  | Max Baak, Simon Brugman |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| Popmon - population monitoring made easy                                                       | `Data Science Summit 2020 <https://dssconf.pl/en/>`_                                             | October 16, 2020  | Max Baak                |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| `Population Shift Monitoring Made Easy: the popmon package <https://youtu.be/PgaQpxzT_0g>`_    | `Online Data Science Meetup @ ING WBAA <https://www.meetup.com/nl-NL/Tech-Meetups-ING/events/>`_ | July 8 2020       | Tomas Sostak            |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| `Popmon: Population Shift Monitoring Made Easy <https://www.youtube.com/watch?v=HE-3YeVYqPY>`_ | `PyData Fest Amsterdam 2020 <https://amsterdam.pydata.org/>`_                                    | June 16, 2020     | Tomas Sostak            |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+
| Popmon: Population Shift Monitoring Made Easy                                                  | `Amundsen Community Meetup <https://github.com/amundsen-io/amundsen>`_                           | June 4, 2020      | Max Baak                |
+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+


Articles
--------

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+
| Title                                                                                                                                                                                             | Date             | Author                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+
|`POPMON v1.0.0: The Dataset-Shift Pokémon <https://medium.com/wbaa/popmon-v1-0-0-the-dataset-shift-pok%C3%A9mon-7dea9cb49a71>`_                                                                    | Aug 3, 2022      | Pradyot Patil                               |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+
|`Monitoring Model Drift with Python <https://medium.com/broadhorizon-cmotions/monitoring-model-drift-with-python-b9e15ca16b18>`_                                                                   | April 16, 2022   | Jeanine Schoonemann                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+
|`The Statistics Underlying the Popmon Hood <https://www.theanalyticslab.nl/the-statistics-underlying-the-popmon-hood/>`_                                                                           | April 15, 2022   | Jurriaan Nagelkerke and Jeanine Schoonemann |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+
|`popmon: code breakfast session <https://simonbrugman.nl/2021/11/09/popmon-code-breakfast.html>`_                                                                                                  | November 9, 2022 | Simon Brugman                               |       
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+
| `Population Shift Analysis: Monitoring Data Quality with Popmon <https://www.codemotion.com/magazine/dev-hub/big-data-analyst/popmon-data-quality-monitoring/>`_                                  | May 21, 2021     | Vito Gentile                                |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+
| `Popmon Open Source Package — Population Shift Monitoring Made Easy <https://medium.com/wbaa/population-monitoring-open-source-1ce3139d8c3a>`_                                                    | May 20, 2020     | Nicole Mpozika                              |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+


Software
--------

- `Kedro-popmon <https://github.com/stephanecollot/kedro-popmon>`_ is a plugin to integrate popmon reporting with kedro. This plugin allows you to automate the process of popmon feature and output stability monitoring. Package created by `Marian Dabrowski <https://www.linkedin.com/in/marian-dabrowski/>`_ and `Stephane Collot <https://github.com/stephanecollot/>`_.

Project contributors
====================

This package was authored by ING Analytics Wholesale Banking (INGA WB).
Special thanks to the following people who have contributed to the development of this package: `Ahmet Erdem <https://github.com/aerdem4>`_, `Fabian Jansen <https://github.com/faab5>`_, `Nanne Aben <https://github.com/nanne-aben>`_, Mathieu Grimal.


Citing popmon
=============
If ``popmon`` has been relevant in your work, and you would like to acknowledge the project in your publication, we suggest citing the following paper:

* Brugman, S., Sostak, T., Patil, P., Baak, M. *popmon: Analysis Package for Dataset Shift Detection*. Proceedings of the 21st Python in Science Conference. 161-168 (2022). (`link <https://conference.scipy.org/proceedings/scipy2022/popmon.html>`_)

*In BibTeX format:*

.. code-block:: bibtex

    @InProceedings{ popmon-proc-scipy-2022,
      author    = { {S}imon {B}rugman and {T}omas {S}ostak and {P}radyot {P}atil and {M}ax {B}aak },
      title     = { popmon: {A}nalysis {P}ackage for {D}ataset {S}hift {D}etection },
      booktitle = { {P}roceedings of the 21st {P}ython in {S}cience {C}onference },
      pages     = { 161 - 168 },
      year      = { 2022 },
      editor    = { {M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe },
    }



Contact and support
===================

* Issues & Ideas & Support: https://github.com/ing-bank/popmon/issues

Please note that INGA WB provides support only on a best-effort basis.

License
=======
Copyright INGA WB. `popmon` is completely free, open-source and licensed under the `MIT license <https://en.wikipedia.org/wiki/MIT_License>`_.

.. |logo| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/popmon-logo.png
    :alt: POPMON logo
    :target: https://github.com/ing-bank/popmon
.. |example| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/report_overview.png
    :alt: Traffic Light Overview
.. |histograms| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/histogram_inspector.png
    :alt: Histogram inspector
.. |pipeline| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/pipeline.png
    :alt: Pipeline Visualization
    :target: https://github.com/ing-bank/popmon/files/7417124/pipeline_amazingpipeline_subgraphs_unversioned.pdf
.. |build| image:: https://github.com/ing-bank/popmon/workflows/build/badge.svg
    :alt: Build status
.. |ruff| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json
    :alt: Ruff
    :target: https://github.com/charliermarsh/ruff
.. |docs| image:: https://readthedocs.org/projects/popmon/badge/?version=latest
    :alt: Package docs status
    :target: https://popmon.readthedocs.io
.. |release| image:: https://img.shields.io/github/v/release/ing-bank/popmon
    :alt: Latest GitHub release
    :target: https://github.com/ing-bank/popmon/releases
.. |release_date| image:: https://img.shields.io/github/release-date/ing-bank/popmon
    :alt: GitHub Release Date
    :target: https://github.com/ing-bank/popmon/releases

.. |notebook_basic_colab| image:: https://colab.research.google.com/assets/colab-badge.svg
    :alt: Open in Colab
    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_basic.ipynb
.. |notebook_advanced_colab| image:: https://colab.research.google.com/assets/colab-badge.svg
    :alt: Open in Colab
    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb
.. |notebook_incremental_data_colab| image:: https://colab.research.google.com/assets/colab-badge.svg
    :alt: Open in Colab
    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_incremental_data.ipynb
.. |notebook_reports_colab| image:: https://colab.research.google.com/assets/colab-badge.svg
    :alt: Open in Colab
    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_reports.ipynb
.. |downloads| image:: https://pepy.tech/badge/popmon
    :alt: PyPi downloads
    :target: https://pepy.tech/project/popmon

.. _profiles: https://popmon.readthedocs.io/en/latest/profiles.html
.. _comparisons: https://popmon.readthedocs.io/en/latest/comparisons.html

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "popmon",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "pandas,spark,data-science,data-analysis,monitoring,statistics,python,jupyter,ipython",
    "author": "",
    "author_email": "ING Analytics Wholesale Banking <wbaa@ing.com>",
    "download_url": "https://files.pythonhosted.org/packages/e2/97/7672be3dfb76f61210fb974e12bd96aa6ae68f378cd8b7c10df273a1ab81/popmon-1.4.6.tar.gz",
    "platform": null,
    "description": "===========================\nPopulation Shift Monitoring\n===========================\n\n|build| |docs| |release| |release_date| |downloads| |ruff|\n\n|logo|\n\n`popmon` is a package that allows one to check the stability of a dataset.\n`popmon` works with both **pandas** and **spark datasets**.\n\n`popmon` creates histograms of features binned in time-slices,\nand compares the stability of the profiles_ and distributions of\nthose histograms using `statistical tests <https://popmon.readthedocs.io/en/latest/comparisons.html>`_, both over time and with respect to a reference.\nIt works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two features.\n`popmon` can **automatically flag** and alert on **changes observed over time**, such\nas trends, shifts, peaks, outliers, anomalies, changing correlations, etc,\nusing monitoring business rules.\n\n|example|\n\n|histograms|\n\nAnnouncements\n=============\n\nSpark 3.0\n---------\n\nWith Spark 3.0, based on Scala 2.12, make sure to pick up the correct `histogrammar` jar files:\n\n.. code-block:: python\n\n  spark = SparkSession.builder.config(\n      \"spark.jars.packages\",\n      \"io.github.histogrammar:histogrammar_2.12:1.0.20,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.20\",\n  ).getOrCreate()\n\nFor Spark 2.X compiled against scala 2.11, in the string above simply replace 2.12 with 2.11.\n\nExamples\n========\n\n- `Flight Delays and Cancellations Kaggle data <https://crclz.com/popmon/reports/flight_delays_report.html>`_\n- `Synthetic data (code example below) <https://crclz.com/popmon/reports/test_data_report.html>`_\n\nDocumentation\n=============\n\nThe entire `popmon` documentation including tutorials can be found at `read-the-docs <https://popmon.readthedocs.io>`_.\n\n\nNotebooks\n=========\n\n.. list-table::\n   :widths: 80 20\n   :header-rows: 1\n\n   * - Tutorial\n     - Colab link\n   * - `Basic tutorial <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_basic.ipynb>`_\n     - |notebook_basic_colab|\n   * - `Detailed example (featuring configuration, Apache Spark and more) <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb>`_\n     - |notebook_advanced_colab|\n   * - `Incremental datasets (online analysis) <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_incremental_data.ipynb>`_\n     - |notebook_incremental_data_colab|\n   * - `Report interpretation (step-by-step guide) <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_reports.ipynb>`_\n     - |notebook_reports_colab|\n\nCheck it out\n============\n\nThe `popmon` library requires Python 3.6+ and is pip friendly. To get started, simply do:\n\n.. code-block:: bash\n\n  $ pip install popmon\n\nor check out the code from our GitHub repository:\n\n.. code-block:: bash\n\n  $ git clone https://github.com/ing-bank/popmon.git\n  $ pip install -e popmon\n\nwhere in this example the code is installed in edit mode (option -e).\n\nYou can now use the package in Python with:\n\n.. code-block:: python\n\n  import popmon\n\n**Congratulations, you are now ready to use the popmon library!**\n\nQuick run\n=========\n\nAs a quick example, you can do:\n\n.. code-block:: python\n\n  import pandas as pd\n  import popmon\n  from popmon import resources\n\n  # open synthetic data\n  df = pd.read_csv(resources.data(\"test.csv.gz\"), parse_dates=[\"date\"])\n  df.head()\n\n  # generate stability report using automatic binning of all encountered features\n  # (importing popmon automatically adds this functionality to a dataframe)\n  report = df.pm_stability_report(time_axis=\"date\", features=[\"date:age\", \"date:gender\"])\n\n  # to show the output of the report in a Jupyter notebook you can simply run:\n  report\n\n  # or save the report to file\n  report.to_file(\"monitoring_report.html\")\n\nTo specify your own binning specifications and features you want to report on, you do:\n\n.. code-block:: python\n\n  # time-axis specifications alone; all other features are auto-binned.\n  report = df.pm_stability_report(\n      time_axis=\"date\", time_width=\"1w\", time_offset=\"2020-1-6\"\n  )\n\n  # histogram selections. Here 'date' is the first axis of each histogram.\n  features = [\n      \"date:isActive\",\n      \"date:age\",\n      \"date:eyeColor\",\n      \"date:gender\",\n      \"date:latitude\",\n      \"date:longitude\",\n      \"date:isActive:age\",\n  ]\n\n  # Specify your own binning specifications for individual features or combinations thereof.\n  # This bin specification uses open-ended (\"sparse\") histograms; unspecified features get\n  # auto-binned. The time-axis binning, when specified here, needs to be in nanoseconds.\n  bin_specs = {\n      \"longitude\": {\"bin_width\": 5.0, \"bin_offset\": 0.0},\n      \"latitude\": {\"bin_width\": 5.0, \"bin_offset\": 0.0},\n      \"age\": {\"bin_width\": 10.0, \"bin_offset\": 0.0},\n      \"date\": {\n          \"bin_width\": pd.Timedelta(\"4w\").value,\n          \"bin_offset\": pd.Timestamp(\"2015-1-1\").value,\n      },\n  }\n\n  # generate stability report\n  report = df.pm_stability_report(features=features, bin_specs=bin_specs, time_axis=True)\n\nThese examples also work with spark dataframes.\nYou can see the output of such example notebook code `here <https://crclz.com/popmon/reports/test_data_report.html>`_.\nFor all available examples, please see the `tutorials <https://popmon.readthedocs.io/en/latest/tutorials.html>`_ at read-the-docs.\n\nPipelines for monitoring dataset shift\n======================================\nAdvanced users can leverage popmon's modular data pipeline to customize their workflow.\nVisualization of the pipeline can be useful when debugging, or for didactic purposes.\nThere is a `script <https://github.com/ing-bank/popmon/tree/master/tools/>`_ included with the package that you can use.\nThe plotting is configurable, and depending on the options you will obtain a result that can be used for understanding the data flow, the high-level components and the (re)use of datasets.\n\n|pipeline|\n\n*Example pipeline visualization (click to enlarge)*\n\nReports and integrations\n========================\nThe data shift computations that popmon performs, are by default displayed in a self-contained HTML report.\nThis format is favourable in many real-world environments, where access may be restricted.\nMoreover, reports can be easily shared with others.\n\nAccess to the datastore means that its possible to integrate popmon in almost any workflow.\nTo give an example, one could store the histogram data in a PostgreSQL database and load that from Grafana and benefit from their visualisation and alert handling features (e.g. send an email or slack message upon alert).\nThis may be interesting to teams that are already invested in particular choice of dashboarding tool.\n\nPossible integrations are:\n\n+----------------+---------------+\n| |grafana_logo| | |kibana_logo| |\n+----------------+---------------+\n| Grafana        | Kibana        |\n+----------------+---------------+\n\nResources on how to integrate popmon are available in the `examples directory <https://github.com/ing-bank/popmon/tree/master/examples/integrations>`_.\nContributions of additional or improved integrations are welcome!\n\n.. |grafana_logo| image:: https://upload.wikimedia.org/wikipedia/commons/a/a1/Grafana_logo.svg\n    :alt: Grafana logo\n    :height: 120\n    :target: https://github.com/grafana/grafana\n\n.. |kibana_logo| image:: https://miro.medium.com/max/1400/1*HW_x9ZvIbUkyaqHstsB1ig.png\n    :alt: Kibana logo\n    :height: 120\n    :target: https://github.com/elastic/kibana\n\nComparison and profile extensions\n---------------------------------\n\nExternal libraries or custom functionality can be easily added to Profiles_ and Comparisons_.\nIf you developed an extension that could be generically used, then please consider contributing it to the package.\n\nPopmon currently integrates:\n\n* `Diptest <https://github.com/RUrlus/diptest>`_\n\nA Python/C++ implementation of Hartigan & Hartigan's dip test for unimodality.\nThe dip test tests for multimodality in a sample by taking the maximum difference, over all sample points, between the empirical distribution function, and the unimodal distribution function that minimizes that maximum difference.\nOther than unimodality, it makes no further assumptions about the form of the null distribution.\n\nTo enable this extension install diptest using ``pip install diptest`` or ``pip install popmon[diptest]``.\n\nResources\n=========\n\nPresentations\n-------------\n\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| Title                                                                                          | Host                                                                                             | Date              | Speaker                 |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| popmon: Analysis Package for Dataset Shift Detection                                           | `SciPy Conference 2022 <https://www.scipy2022.scipy.org/>`_                                      | July 13, 2022     | Simon Brugman           |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| Popmon - population monitoring made easy                                                       | `Big Data Technology Warsaw Summit 2021 <https://bigdatatechwarsaw.eu/>`_                        | February 25, 2021 | Simon Brugman           |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| Popmon - population monitoring made easy                                                       | `Data Lunch @ Eneco <https://www.eneco.nl/>`_                                                    | October 29, 2020  | Max Baak, Simon Brugman |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| Popmon - population monitoring made easy                                                       | `Data Science Summit 2020 <https://dssconf.pl/en/>`_                                             | October 16, 2020  | Max Baak                |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| `Population Shift Monitoring Made Easy: the popmon package <https://youtu.be/PgaQpxzT_0g>`_    | `Online Data Science Meetup @ ING WBAA <https://www.meetup.com/nl-NL/Tech-Meetups-ING/events/>`_ | July 8 2020       | Tomas Sostak            |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| `Popmon: Population Shift Monitoring Made Easy <https://www.youtube.com/watch?v=HE-3YeVYqPY>`_ | `PyData Fest Amsterdam 2020 <https://amsterdam.pydata.org/>`_                                    | June 16, 2020     | Tomas Sostak            |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n| Popmon: Population Shift Monitoring Made Easy                                                  | `Amundsen Community Meetup <https://github.com/amundsen-io/amundsen>`_                           | June 4, 2020      | Max Baak                |\n+------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------+-------------------------+\n\n\nArticles\n--------\n\n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n| Title                                                                                                                                                                                             | Date             | Author                                      |\n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n|`POPMON v1.0.0: The Dataset-Shift Pok\u00e9mon <https://medium.com/wbaa/popmon-v1-0-0-the-dataset-shift-pok%C3%A9mon-7dea9cb49a71>`_                                                                    | Aug 3, 2022      | Pradyot Patil                               |\n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n|`Monitoring Model Drift with Python <https://medium.com/broadhorizon-cmotions/monitoring-model-drift-with-python-b9e15ca16b18>`_                                                                   | April 16, 2022   | Jeanine Schoonemann                         |\n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n|`The Statistics Underlying the Popmon Hood <https://www.theanalyticslab.nl/the-statistics-underlying-the-popmon-hood/>`_                                                                           | April 15, 2022   | Jurriaan Nagelkerke and Jeanine Schoonemann |\n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n|`popmon: code breakfast session <https://simonbrugman.nl/2021/11/09/popmon-code-breakfast.html>`_                                                                                                  | November 9, 2022 | Simon Brugman                               |       \n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n| `Population Shift Analysis: Monitoring Data Quality with Popmon <https://www.codemotion.com/magazine/dev-hub/big-data-analyst/popmon-data-quality-monitoring/>`_                                  | May 21, 2021     | Vito Gentile                                |\n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n| `Popmon Open Source Package \u2014 Population Shift Monitoring Made Easy <https://medium.com/wbaa/population-monitoring-open-source-1ce3139d8c3a>`_                                                    | May 20, 2020     | Nicole Mpozika                              |\n+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+---------------------------------------------+\n\n\nSoftware\n--------\n\n- `Kedro-popmon <https://github.com/stephanecollot/kedro-popmon>`_ is a plugin to integrate popmon reporting with kedro. This plugin allows you to automate the process of popmon feature and output stability monitoring. Package created by `Marian Dabrowski <https://www.linkedin.com/in/marian-dabrowski/>`_ and `Stephane Collot <https://github.com/stephanecollot/>`_.\n\nProject contributors\n====================\n\nThis package was authored by ING Analytics Wholesale Banking (INGA WB).\nSpecial thanks to the following people who have contributed to the development of this package: `Ahmet Erdem <https://github.com/aerdem4>`_, `Fabian Jansen <https://github.com/faab5>`_, `Nanne Aben <https://github.com/nanne-aben>`_, Mathieu Grimal.\n\n\nCiting popmon\n=============\nIf ``popmon`` has been relevant in your work, and you would like to acknowledge the project in your publication, we suggest citing the following paper:\n\n* Brugman, S., Sostak, T., Patil, P., Baak, M. *popmon: Analysis Package for Dataset Shift Detection*. Proceedings of the 21st Python in Science Conference. 161-168 (2022). (`link <https://conference.scipy.org/proceedings/scipy2022/popmon.html>`_)\n\n*In BibTeX format:*\n\n.. code-block:: bibtex\n\n    @InProceedings{ popmon-proc-scipy-2022,\n      author    = { {S}imon {B}rugman and {T}omas {S}ostak and {P}radyot {P}atil and {M}ax {B}aak },\n      title     = { popmon: {A}nalysis {P}ackage for {D}ataset {S}hift {D}etection },\n      booktitle = { {P}roceedings of the 21st {P}ython in {S}cience {C}onference },\n      pages     = { 161 - 168 },\n      year      = { 2022 },\n      editor    = { {M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe },\n    }\n\n\n\nContact and support\n===================\n\n* Issues & Ideas & Support: https://github.com/ing-bank/popmon/issues\n\nPlease note that INGA WB provides support only on a best-effort basis.\n\nLicense\n=======\nCopyright INGA WB. `popmon` is completely free, open-source and licensed under the `MIT license <https://en.wikipedia.org/wiki/MIT_License>`_.\n\n.. |logo| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/popmon-logo.png\n    :alt: POPMON logo\n    :target: https://github.com/ing-bank/popmon\n.. |example| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/report_overview.png\n    :alt: Traffic Light Overview\n.. |histograms| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/histogram_inspector.png\n    :alt: Histogram inspector\n.. |pipeline| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/pipeline.png\n    :alt: Pipeline Visualization\n    :target: https://github.com/ing-bank/popmon/files/7417124/pipeline_amazingpipeline_subgraphs_unversioned.pdf\n.. |build| image:: https://github.com/ing-bank/popmon/workflows/build/badge.svg\n    :alt: Build status\n.. |ruff| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json\n    :alt: Ruff\n    :target: https://github.com/charliermarsh/ruff\n.. |docs| image:: https://readthedocs.org/projects/popmon/badge/?version=latest\n    :alt: Package docs status\n    :target: https://popmon.readthedocs.io\n.. |release| image:: https://img.shields.io/github/v/release/ing-bank/popmon\n    :alt: Latest GitHub release\n    :target: https://github.com/ing-bank/popmon/releases\n.. |release_date| image:: https://img.shields.io/github/release-date/ing-bank/popmon\n    :alt: GitHub Release Date\n    :target: https://github.com/ing-bank/popmon/releases\n\n.. |notebook_basic_colab| image:: https://colab.research.google.com/assets/colab-badge.svg\n    :alt: Open in Colab\n    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_basic.ipynb\n.. |notebook_advanced_colab| image:: https://colab.research.google.com/assets/colab-badge.svg\n    :alt: Open in Colab\n    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb\n.. |notebook_incremental_data_colab| image:: https://colab.research.google.com/assets/colab-badge.svg\n    :alt: Open in Colab\n    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_incremental_data.ipynb\n.. |notebook_reports_colab| image:: https://colab.research.google.com/assets/colab-badge.svg\n    :alt: Open in Colab\n    :target: https://colab.research.google.com/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_reports.ipynb\n.. |downloads| image:: https://pepy.tech/badge/popmon\n    :alt: PyPi downloads\n    :target: https://pepy.tech/project/popmon\n\n.. _profiles: https://popmon.readthedocs.io/en/latest/profiles.html\n.. _comparisons: https://popmon.readthedocs.io/en/latest/comparisons.html\n",
    "bugtrack_url": null,
    "license": "Copyright 2023 ING Analytics Wholesale Banking  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Monitor the stability of a pandas or spark dataset",
    "version": "1.4.6",
    "project_urls": {
        "repository": "https://github.com/ing-bank/popmon"
    },
    "split_keywords": [
        "pandas",
        "spark",
        "data-science",
        "data-analysis",
        "monitoring",
        "statistics",
        "python",
        "jupyter",
        "ipython"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1d7497953eeddd22d134a7dc84dedc3fecb42a1090937b2231fbe0eb682e3ae4",
                "md5": "6a96dd2859ecda8d4fe1719f77890d66",
                "sha256": "938f930752b1bf05b50e939b2ece16acaa2ea87338600bc11b94d0fa86bd3ee2"
            },
            "downloads": -1,
            "filename": "popmon-1.4.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6a96dd2859ecda8d4fe1719f77890d66",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 2866508,
            "upload_time": "2023-07-18T10:30:53",
            "upload_time_iso_8601": "2023-07-18T10:30:53.785312Z",
            "url": "https://files.pythonhosted.org/packages/1d/74/97953eeddd22d134a7dc84dedc3fecb42a1090937b2231fbe0eb682e3ae4/popmon-1.4.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2977672be3dfb76f61210fb974e12bd96aa6ae68f378cd8b7c10df273a1ab81",
                "md5": "e8339a65ab6bebf6a61d324e81ef4bc6",
                "sha256": "4d413ba8407549d3a268bbce5ed07d84c86c74712b2d5b7f4518fb710833ea39"
            },
            "downloads": -1,
            "filename": "popmon-1.4.6.tar.gz",
            "has_sig": false,
            "md5_digest": "e8339a65ab6bebf6a61d324e81ef4bc6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 2809761,
            "upload_time": "2023-07-18T10:30:55",
            "upload_time_iso_8601": "2023-07-18T10:30:55.873867Z",
            "url": "https://files.pythonhosted.org/packages/e2/97/7672be3dfb76f61210fb974e12bd96aa6ae68f378cd8b7c10df273a1ab81/popmon-1.4.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-18 10:30:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ing-bank",
    "github_project": "popmon",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "popmon"
}
        
Elapsed time: 0.10911s