pdistmap


Namepdistmap JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://github.com/rehanguha/pdistmap
SummaryThis package helps to find the overlap percentage of two probability distributions.
upload_time2024-12-07 15:41:35
maintainerNone
docs_urlNone
authorRehan Guha
requires_python>=3.9
licenseApache-2.0
keywords probability distributions statistics
VCS
bugtrack_url
requirements contourpy cycler fonttools importlib-resources kiwisolver matplotlib numpy packaging pillow pyparsing python-dateutil scipy six zipp
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PDistMap

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14257979.svg)](https://doi.org/10.5281/zenodo.14257979)


This package calculates the overlap percentage between two probability distributions, offering extensive applications in both academic and industrial settings. For instance, in multiple iterations of machine learning clustering, the core algorithm may change the cluster number or name, making it challenging for the end user to map the clusters accurately.

### Example Use Cases:

- **Machine Learning Clustering:** In scenarios where multiple iterations of clustering algorithms are performed, the cluster identifiers may change, making it difficult to track and compare clusters across iterations. This package helps in mapping and comparing clusters by calculating the overlap percentage between the distributions of cluster assignments. For example, if a data scientist is running a k-means clustering algorithm multiple times, the cluster labels might change in each iteration. By using this package, they can measure the overlap between the clusters from different iterations and ensure consistency in their analysis.

- **Anomaly Detection:** The package can be used to compare the distribution of data points in normal and anomalous conditions, helping in identifying and quantifying the extent of anomalies. For instance, in a network security application, the distribution of network traffic under normal conditions can be compared with the distribution during a suspected attack. The overlap percentage can help quantify the deviation and identify potential security breaches.

- **Quality Control:** In manufacturing and quality control processes, the package can be used to compare the distribution of measurements from different batches or production runs, ensuring consistency and identifying deviations. For example, a quality control engineer can compare the distribution of product dimensions from two different production runs to ensure that they meet the required specifications and identify any deviations that need to be addressed.

- **Market Research:** The package can be applied to compare the distribution of survey responses or customer preferences across different demographic groups or time periods, providing insights into market trends and changes in consumer behavior. For instance, a market researcher can compare the distribution of customer satisfaction scores from two different regions to identify any significant differences and tailor marketing strategies accordingly.

- **Healthcare Analytics:** In healthcare, the package can be used to compare the distribution of patient outcomes or treatment responses across different groups, aiding in the evaluation of treatment effectiveness and identifying potential disparities. For example, a healthcare analyst can compare the distribution of recovery times for patients receiving two different treatments to determine which treatment is more effective and identify any disparities in treatment outcomes.

## Installation

```bash
pip install pdistmap
```

## How to use it

### Method 1

```python

from pdistmap.set import KDEIntersection
import numpy as np

A = np.array([25, 40, 70, 65, 69, 75, 80, 85])
B = np.array([25, 40, 70, 65, 69, 75, 80, 85, 81, 90])

area = KDEIntersection(A,B).intersection_area()
print(area) # Expected output: 0.8752770150023454


KDEIntersection(A,B).intersection_area(plot = True)

```

![Sample Image](artifact/KDE_Plot.png)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rehanguha/pdistmap",
    "name": "pdistmap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "probability, distributions, statistics",
    "author": "Rehan Guha",
    "author_email": "rehanguha29@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c0/36/c9f338d2c0cbc839d5c346c920ca232ed38f1938a1cca8442ed1ce353443/pdistmap-0.4.0.tar.gz",
    "platform": null,
    "description": "# PDistMap\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14257979.svg)](https://doi.org/10.5281/zenodo.14257979)\n\n\nThis package calculates the overlap percentage between two probability distributions, offering extensive applications in both academic and industrial settings. For instance, in multiple iterations of machine learning clustering, the core algorithm may change the cluster number or name, making it challenging for the end user to map the clusters accurately.\n\n### Example Use Cases:\n\n- **Machine Learning Clustering:** In scenarios where multiple iterations of clustering algorithms are performed, the cluster identifiers may change, making it difficult to track and compare clusters across iterations. This package helps in mapping and comparing clusters by calculating the overlap percentage between the distributions of cluster assignments. For example, if a data scientist is running a k-means clustering algorithm multiple times, the cluster labels might change in each iteration. By using this package, they can measure the overlap between the clusters from different iterations and ensure consistency in their analysis.\n\n- **Anomaly Detection:** The package can be used to compare the distribution of data points in normal and anomalous conditions, helping in identifying and quantifying the extent of anomalies. For instance, in a network security application, the distribution of network traffic under normal conditions can be compared with the distribution during a suspected attack. The overlap percentage can help quantify the deviation and identify potential security breaches.\n\n- **Quality Control:** In manufacturing and quality control processes, the package can be used to compare the distribution of measurements from different batches or production runs, ensuring consistency and identifying deviations. For example, a quality control engineer can compare the distribution of product dimensions from two different production runs to ensure that they meet the required specifications and identify any deviations that need to be addressed.\n\n- **Market Research:** The package can be applied to compare the distribution of survey responses or customer preferences across different demographic groups or time periods, providing insights into market trends and changes in consumer behavior. For instance, a market researcher can compare the distribution of customer satisfaction scores from two different regions to identify any significant differences and tailor marketing strategies accordingly.\n\n- **Healthcare Analytics:** In healthcare, the package can be used to compare the distribution of patient outcomes or treatment responses across different groups, aiding in the evaluation of treatment effectiveness and identifying potential disparities. For example, a healthcare analyst can compare the distribution of recovery times for patients receiving two different treatments to determine which treatment is more effective and identify any disparities in treatment outcomes.\n\n## Installation\n\n```bash\npip install pdistmap\n```\n\n## How to use it\n\n### Method 1\n\n```python\n\nfrom pdistmap.set import KDEIntersection\nimport numpy as np\n\nA = np.array([25, 40, 70, 65, 69, 75, 80, 85])\nB = np.array([25, 40, 70, 65, 69, 75, 80, 85, 81, 90])\n\narea = KDEIntersection(A,B).intersection_area()\nprint(area) # Expected output: 0.8752770150023454\n\n\nKDEIntersection(A,B).intersection_area(plot = True)\n\n```\n\n![Sample Image](artifact/KDE_Plot.png)\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "This package helps to find the overlap percentage of two probability distributions.",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://github.com/rehanguha/pdistmap",
        "Repository": "https://github.com/rehanguha/pdistmap"
    },
    "split_keywords": [
        "probability",
        " distributions",
        " statistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f8478526221e898b881a2ad85977144e4b871a03a0266c69f6bae92cb4cc5a49",
                "md5": "78e02139261c3ee4daed49d7f89238f7",
                "sha256": "5e004c5e70f1d0639eee11c28ecd8d0d89fcda8347bcfa6d1421e08bf36027a5"
            },
            "downloads": -1,
            "filename": "pdistmap-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "78e02139261c3ee4daed49d7f89238f7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 11343,
            "upload_time": "2024-12-07T15:41:33",
            "upload_time_iso_8601": "2024-12-07T15:41:33.588452Z",
            "url": "https://files.pythonhosted.org/packages/f8/47/8526221e898b881a2ad85977144e4b871a03a0266c69f6bae92cb4cc5a49/pdistmap-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c036c9f338d2c0cbc839d5c346c920ca232ed38f1938a1cca8442ed1ce353443",
                "md5": "99d92302d03cee470505935fc4d0cfd0",
                "sha256": "52b880603184800cb4ae85b689c7a8456447c35f1a8cf480b1513ae1b023d97e"
            },
            "downloads": -1,
            "filename": "pdistmap-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "99d92302d03cee470505935fc4d0cfd0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 10298,
            "upload_time": "2024-12-07T15:41:35",
            "upload_time_iso_8601": "2024-12-07T15:41:35.721776Z",
            "url": "https://files.pythonhosted.org/packages/c0/36/c9f338d2c0cbc839d5c346c920ca232ed38f1938a1cca8442ed1ce353443/pdistmap-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-07 15:41:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rehanguha",
    "github_project": "pdistmap",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "contourpy",
            "specs": [
                [
                    "==",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "cycler",
            "specs": [
                [
                    "==",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "fonttools",
            "specs": [
                [
                    "==",
                    "4.53.1"
                ]
            ]
        },
        {
            "name": "importlib-resources",
            "specs": [
                [
                    "==",
                    "6.4.5"
                ]
            ]
        },
        {
            "name": "kiwisolver",
            "specs": [
                [
                    "==",
                    "1.4.7"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.9.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.0.2"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.1"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "==",
                    "10.4.0"
                ]
            ]
        },
        {
            "name": "pyparsing",
            "specs": [
                [
                    "==",
                    "3.1.4"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.13.1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "zipp",
            "specs": [
                [
                    "==",
                    "3.20.2"
                ]
            ]
        }
    ],
    "lcname": "pdistmap"
}
        
Elapsed time: 0.35722s