pandas-diff


Namepandas-diff JSON
Version 1.4.7 PyPI version JSON
download
home_pagehttps://github.com/jaimevalero/pandas_diff
SummaryPython utility to extract differences between two pandas dataframes.
upload_time2023-09-17 08:51:50
maintainer
docs_urlNone
authorJaime Valero
requires_python>=3.6
licenseMIT license
keywords pandas_diff
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            Pandas Diff
===========

|CodeFactor| |Python 3|

Installation
------------

Install pandas_diff with pip

.. code:: bash

   pip install pandas_diff

Usage/Examples
--------------

.. code:: python

   import pandas_diff as pd_diff

   import pandas as pd

   # Create two example dataframes
   df_infinity_war = pd.DataFrame([
                   {"hero" : "hulk" , "power" : "strength"},
                   {"hero" : "black_widow" , "power" : "spy"},
                   {"hero" : "thor" , "hammers" : 0 },
                   {"hero" : "thor" , "hammers" : 1 } ] )
   df_endgame = pd.DataFrame([
                   {"hero" : "hulk" , "power" : "smart"},
                   {"hero" : "captain marvel" , "power" : "strength"},
                   {"hero" : "thor" , "hammers" : 2 } ] )

   # Get differences, using the key "hero"
   df = pd_diff.get_diffs(df_infinity_war ,df_endgame ,"hero")

   df

   #operation object_keys  object_values                     object_json                     attribute_changed old_value new_value
   #0   create     [hero]    captain marvel  {'hero': 'captain marvel', 'power': 'strength'...           NaN           NaN      NaN
   #1   delete     [hero]       black_widow  {'hero': 'black_widow', 'power': 'spy', 'hamme...           NaN           NaN      NaN
   #2   modify     [hero]              thor     {'hero': 'thor', 'power': nan, 'hammers': 2.0}       hammers             1        2
   #3   modify     [hero]              hulk  {'hero': 'hulk', 'power': 'smart', 'hammers': ...         power      strength    smart

Why pandas diff ? Cases of use
------------------------------

Migrating from batch to an event driven architecture
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In my work, we use a lot of data pipelines to get info from external
platforms, (active directory, github, jira). We load the new data
replacing the entire table.

By using pandas_diff we detect how the infraestructure changes between
executions, and stream those change events into a kafka cluster, so
other teams could suscribe to their favourite events. Also, by defining
a pandas_diff step in the master pipeline, every item in our project has
ther life cycle events controlled.

Events log
~~~~~~~~~~

For every item in a table, by using pandas_diff you will have an event
log to audit of how the resources are being consumed.

Conciliation
~~~~~~~~~~~~

To conciliate one datasource against the source of truth. Eg: You have a CMDB controlling with info regarding virtual machines. As there are several methods for creating those VMs, you use pandas_diff to replicate state of the infraestructure against the CMDB.

Features
--------

-  Filtering of columns

Roadmap
-------

-  Support for stand alone app

Documentation
-------------

`Documentation <https://pandas-diff.readthedocs.io/en/latest/>`__

.. |CodeFactor| image:: https://www.codefactor.io/repository/github/jaimevalero/pandas_diff/badge
   :target: https://www.codefactor.io/repository/github/jaimevalero/pandas_diff
.. |Python 3| image:: https://pyup.io/repos/github/jaimevalero/pandas_diff/python-3-shield.svg
   :target: https://pyup.io/repos/github/jaimevalero/pandas_diff/




History
-------

0.7.18 (2021-12-05)
-------------------

\* Add codacy badge 

0.7.19 (2021-12-05)
-------------------

\* Feat filter column 

0.7.20 (2021-12-05)
-------------------

\* Feat filter column 

0.7.21 (2021-12-05)
-------------------

\* Add filter fest 

0.7.22 (2021-12-06)
-------------------

\* Add confition keys exist in df's 


1.1.0 (2021-12-06)
------------------

\* Add confition keys exist in df's
1.2.0 (2021-12-06)
------------------

\* Improve doc 

1.2.0 (2021-12-06)
------------------

\* Improve doc 

1.3.0 (2021-12-06)
--------------------

\* Remove workflows 

1.4.0 (2021-12-06)
--------------------

\* Remove workflows 

1.4.0 (2023-09-01)
--------------------

\* Improve doc 

1.4.1 (2023-09-01)
--------------------

\* Improve doc

1.4.2 (2023-09-17)
--------------------

\* Bugfix version string

1.4.3 (2023-09-17)
--------------------

\* bugfix version tag 

1.4.4 (2023-09-17)
--------------------

\* bugfix version tag 

1.4.5 (2023-09-17)
--------------------

\* bugfixx history string 

1.4.6 (2023-09-17)
--------------------

\* bugfix history string 

1.4.7 (2023-09-17)
--------------------

\* bugfix release description 




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jaimevalero/pandas_diff",
    "name": "pandas-diff",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "pandas_diff",
    "author": "Jaime Valero",
    "author_email": "jaimevalero78@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b7/19/115c112b5d1f21900a0409e08db618e1d156c0d5ecb8c55c4f8d6bab7c8a/pandas_diff-1.4.7.tar.gz",
    "platform": null,
    "description": "Pandas Diff\n===========\n\n|CodeFactor| |Python 3|\n\nInstallation\n------------\n\nInstall pandas_diff with pip\n\n.. code:: bash\n\n   pip install pandas_diff\n\nUsage/Examples\n--------------\n\n.. code:: python\n\n   import pandas_diff as pd_diff\n\n   import pandas as pd\n\n   # Create two example dataframes\n   df_infinity_war = pd.DataFrame([\n                   {\"hero\" : \"hulk\" , \"power\" : \"strength\"},\n                   {\"hero\" : \"black_widow\" , \"power\" : \"spy\"},\n                   {\"hero\" : \"thor\" , \"hammers\" : 0 },\n                   {\"hero\" : \"thor\" , \"hammers\" : 1 } ] )\n   df_endgame = pd.DataFrame([\n                   {\"hero\" : \"hulk\" , \"power\" : \"smart\"},\n                   {\"hero\" : \"captain marvel\" , \"power\" : \"strength\"},\n                   {\"hero\" : \"thor\" , \"hammers\" : 2 } ] )\n\n   # Get differences, using the key \"hero\"\n   df = pd_diff.get_diffs(df_infinity_war ,df_endgame ,\"hero\")\n\n   df\n\n   #operation object_keys  object_values                     object_json                     attribute_changed old_value new_value\n   #0   create     [hero]    captain marvel  {'hero': 'captain marvel', 'power': 'strength'...           NaN           NaN      NaN\n   #1   delete     [hero]       black_widow  {'hero': 'black_widow', 'power': 'spy', 'hamme...           NaN           NaN      NaN\n   #2   modify     [hero]              thor     {'hero': 'thor', 'power': nan, 'hammers': 2.0}       hammers             1        2\n   #3   modify     [hero]              hulk  {'hero': 'hulk', 'power': 'smart', 'hammers': ...         power      strength    smart\n\nWhy pandas diff ? Cases of use\n------------------------------\n\nMigrating from batch to an event driven architecture\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn my work, we use a lot of data pipelines to get info from external\nplatforms, (active directory, github, jira). We load the new data\nreplacing the entire table.\n\nBy using pandas_diff we detect how the infraestructure changes between\nexecutions, and stream those change events into a kafka cluster, so\nother teams could suscribe to their favourite events. Also, by defining\na pandas_diff step in the master pipeline, every item in our project has\nther life cycle events controlled.\n\nEvents log\n~~~~~~~~~~\n\nFor every item in a table, by using pandas_diff you will have an event\nlog to audit of how the resources are being consumed.\n\nConciliation\n~~~~~~~~~~~~\n\nTo conciliate one datasource against the source of truth. Eg: You have a CMDB controlling with info regarding virtual machines. As there are several methods for creating those VMs, you use pandas_diff to replicate state of the infraestructure against the CMDB.\n\nFeatures\n--------\n\n-  Filtering of columns\n\nRoadmap\n-------\n\n-  Support for stand alone app\n\nDocumentation\n-------------\n\n`Documentation <https://pandas-diff.readthedocs.io/en/latest/>`__\n\n.. |CodeFactor| image:: https://www.codefactor.io/repository/github/jaimevalero/pandas_diff/badge\n   :target: https://www.codefactor.io/repository/github/jaimevalero/pandas_diff\n.. |Python 3| image:: https://pyup.io/repos/github/jaimevalero/pandas_diff/python-3-shield.svg\n   :target: https://pyup.io/repos/github/jaimevalero/pandas_diff/\n\n\n\n\nHistory\n-------\n\n0.7.18 (2021-12-05)\n-------------------\n\n\\* Add codacy badge \n\n0.7.19 (2021-12-05)\n-------------------\n\n\\* Feat filter column \n\n0.7.20 (2021-12-05)\n-------------------\n\n\\* Feat filter column \n\n0.7.21 (2021-12-05)\n-------------------\n\n\\* Add filter fest \n\n0.7.22 (2021-12-06)\n-------------------\n\n\\* Add confition keys exist in df's \n\n\n1.1.0 (2021-12-06)\n------------------\n\n\\* Add confition keys exist in df's\n1.2.0 (2021-12-06)\n------------------\n\n\\* Improve doc \n\n1.2.0 (2021-12-06)\n------------------\n\n\\* Improve doc \n\n1.3.0 (2021-12-06)\n--------------------\n\n\\* Remove workflows \n\n1.4.0 (2021-12-06)\n--------------------\n\n\\* Remove workflows \n\n1.4.0 (2023-09-01)\n--------------------\n\n\\* Improve doc \n\n1.4.1 (2023-09-01)\n--------------------\n\n\\* Improve doc\n\n1.4.2 (2023-09-17)\n--------------------\n\n\\* Bugfix version string\n\n1.4.3 (2023-09-17)\n--------------------\n\n\\* bugfix version tag \n\n1.4.4 (2023-09-17)\n--------------------\n\n\\* bugfix version tag \n\n1.4.5 (2023-09-17)\n--------------------\n\n\\* bugfixx history string \n\n1.4.6 (2023-09-17)\n--------------------\n\n\\* bugfix history string \n\n1.4.7 (2023-09-17)\n--------------------\n\n\\* bugfix release description \n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Python utility to extract differences between two pandas dataframes.",
    "version": "1.4.7",
    "project_urls": {
        "Homepage": "https://github.com/jaimevalero/pandas_diff"
    },
    "split_keywords": [
        "pandas_diff"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b719115c112b5d1f21900a0409e08db618e1d156c0d5ecb8c55c4f8d6bab7c8a",
                "md5": "c2e3c979e39731f2c4836e5e41de91dd",
                "sha256": "fe5e4567ec3402eb77096a04cd7f2488950722fcdc488ca14bb71364f07fbdb1"
            },
            "downloads": -1,
            "filename": "pandas_diff-1.4.7.tar.gz",
            "has_sig": false,
            "md5_digest": "c2e3c979e39731f2c4836e5e41de91dd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 12841,
            "upload_time": "2023-09-17T08:51:50",
            "upload_time_iso_8601": "2023-09-17T08:51:50.668200Z",
            "url": "https://files.pythonhosted.org/packages/b7/19/115c112b5d1f21900a0409e08db618e1d156c0d5ecb8c55c4f8d6bab7c8a/pandas_diff-1.4.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-17 08:51:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jaimevalero",
    "github_project": "pandas_diff",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "pandas-diff"
}
        
Elapsed time: 0.18872s