pltstat


Namepltstat JSON
Version 0.9.7 PyPI version JSON
download
home_pagehttps://github.com/trojanskehesten/pltstat
SummaryA Python Library for Statistical Data Visualization
upload_time2025-03-03 04:27:19
maintainerNone
docs_urlNone
authorDmitrii Beregovoi
requires_python~=3.12
licenseBSD 3-Clause License, see LICENSE file
keywords matplotlib statistics visualization dataanalysis
VCS
bugtrack_url
requirements numpy pandas matplotlib seaborn rpy2 umap-learn scipy scikit-learn phik
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pltstat: A Python Library for Statistical Data Visualization

`pltstat` is a Python library designed to facilitate the visualization of statistical data analysis. This library includes a variety of tools and methods to streamline data exploration, statistical computation, and graphical representation.

---

## Installation

### Requirements

Before installing, make sure that:

1. You are using **Python 3.12**. You can check your Python version by running:

   ```bash
   python --version
   ```

You can download it from the [official Python website](https://www.python.org/downloads/release/python-3120/).

2. **R language** is installed on your system, as the `rpy2` library (used in this project) requires it. You can download R from the [official R website](https://cloud.r-project.org/).

    Check the installed version of R:

    ```bash
    R --version
    ```

3. Verify that the `R_HOME` environment variable is correctly set and that R is accessible from the command line:
   
    ```bash
    python -c "import os; print(os.getenv('R_HOME'))"
    ```

4. Setting Up `R_HOME` (if needed)

- **Windows**:  
  1. Locate the R installation directory (e.g., `C:\Program Files\R\R-4.x.x`).
  2. Add the path to `R_HOME`:  
     - Open "System Properties" > "Environment Variables."
     - Under "System Variables," click "New" or "Edit" and set:
       - **Variable name**: `R_HOME`
       - **Variable value**: Path to your R directory (e.g., `C:\Program Files\R\R-4.x.x`).

- **macOS/Linux**:  
  Add the following line to your shell configuration file (`~/.bashrc`, `~/.zshrc`, or `~/.bash_profile`):  
  ```bash
  export R_HOME=/usr/lib/R
  
### Installation

To install the `pltstat` library, simply run the following command:

```bash
pip install pltstat
```

This will install the library along with all the required dependencies as specified in the `requirements.txt` file.

After installation the package, you can start using `pltstat` by importing the necessary modules in your Python scripts.

### Example

```python
import pandas as pd
from pltstat import twofeats as tf

# Data creation:
data = {
    "gender": ["male", "female", "female", "male", "male", "female", "female", "male", "male", 
               "female", "male", "female", "male", "male", "female", "male", "female", "male", 
               "female", "male", "female", "male", "female", "male", "female", "male", "female", 
               "female", "male", "male", "male"],
    "age": [22, 20, 17, 16, 19, 17, 11, 29, 24, 12, 22, 20, 19, 16, 11, 29, 24, 20, 16, 22, 
            17, 29, 24, 16, 17, 29, 22, 19, 22, 22, 24, 29]
}

df = pd.DataFrame(data)

# Boxplot creation:
tf.boxplot(df, "gender", "age")
```

---

## File Descriptions

### Python Modules

- **[`__init__.py`](__init__.py)**
  - Marks the directory as a Python package. This file allows you to import modules from the `pltstat` package.

- **[`singlefeat.py`](singlefeat.py)**
  - Dedicated to the analysis and visualization of single-variable features, including plotting functions such as pie charts, count plots, and histograms.

- **[`twofeats.py`](twofeats.py)**
  - Provides tools for analyzing interactions between two features. Includes functions for creating crosstabs, computing correlations, and visualizing results using violin plots, boxplots, and distribution box plots. These functions also display p-values and other statistical metrics to summarize relationships between the two features.

- **[`multfeats.py`]()**
  - Provides tools for analyzing relationships between multiple features.  
  Includes visualization functions for analyzing missing data, comparing distributions, and visualizing dimensionality reductions. Additionally, it provides methods for creating heatmaps that display correlations and p-values, including Spearman's correlation, Mann-Whitney p-values, and Phik correlations.

- **[`circle.py`](circle.py)**
  - Contains functions and methods related to circular statistical visualizations, such as radar charts or circular histograms.

- **[`cm.py`](cm.py)**
  - Contains custom colormap utilities for visualizations, such as rendering correlation matrices or creating two-colored maps for p-values with a threshold (e.g., alpha).

- **[`corr_methods.py`](corr_methods.py)**
  - Includes methods for calculating correlation matrices and related statistical relationships.

- **[`in_out.py`](in_out.py)**
  - Provides utilities for reading, writing, and preprocessing input and output data files.

### Other Files

- **[`.gitignore`](.gitignore)**
  - Specifies intentionally untracked files to ignore in the repository, such as virtual environments and temporary files.

- **`README.md`**
  - This file provides an overview of the project, including file descriptions and usage instructions.

- **[`requirements.txt`](requirements.txt)**
  - Lists the Python dependencies required to run the library. Install them using:
    ```bash
    pip install -r requirements.txt
    ```

---

## Getting Started

1. Clone the repository:
   ```bash
   git clone https://github.com/trojanskehesten/pltstat.git
   ```

2. Navigate to the project directory:
   ```bash
   cd pltstat
   ```

3. **Python Version**: This library is compatible with [Python 3.12](https://www.python.org/downloads/release/python-3120/). Ensure you have this version installed before running the project.

4. **R Installation**: Ensure that the [R language is installed](https://cloud.r-project.org/) on your system, as the `rpy2` library (used in this project) requires it.

5. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

6. Explore the modules and utilize the library in your projects.

---

## Usage
Each module in `pltstat` is designed to be modular and reusable. Import the required module and use its functions to visualize your statistical data. For example:

```python
import pandas as pd
from pltstat import singlefeat as sf

data = {
    "Age": [25, 30, 22, 27, 35],
    "A/B Test Group": ["A", "B", "A", "B", "A"],
}
df = pd.DataFrame(data)

# Example: Plot a histogram
sf.pie(df["A/B Test Group"])
```

---

## Contributing
Contributions are welcome! If you'd like to improve the library or fix issues, please:
1. Fork the repository.
2. Create a new branch.
3. Make your changes and commit them.
4. Submit a pull request.

---

## License
This project is licensed under the BSD 3-Clause License. See the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/trojanskehesten/pltstat",
    "name": "pltstat",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "~=3.12",
    "maintainer_email": null,
    "keywords": "matplotlib, statistics, visualization, dataanalysis",
    "author": "Dmitrii Beregovoi",
    "author_email": "dimaforth@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b1/31/ae4ba05d7bed24471ecb26a7a648e9e3579f228989b28aa5a19879fd6c10/pltstat-0.9.7.tar.gz",
    "platform": null,
    "description": "# pltstat: A Python Library for Statistical Data Visualization\r\n\r\n`pltstat` is a Python library designed to facilitate the visualization of statistical data analysis. This library includes a variety of tools and methods to streamline data exploration, statistical computation, and graphical representation.\r\n\r\n---\r\n\r\n## Installation\r\n\r\n### Requirements\r\n\r\nBefore installing, make sure that:\r\n\r\n1. You are using **Python 3.12**. You can check your Python version by running:\r\n\r\n   ```bash\r\n   python --version\r\n   ```\r\n\r\nYou can download it from the [official Python website](https://www.python.org/downloads/release/python-3120/).\r\n\r\n2. **R language** is installed on your system, as the `rpy2` library (used in this project) requires it. You can download R from the [official R website](https://cloud.r-project.org/).\r\n\r\n    Check the installed version of R:\r\n\r\n    ```bash\r\n    R --version\r\n    ```\r\n\r\n3. Verify that the `R_HOME` environment variable is correctly set and that R is accessible from the command line:\r\n   \r\n    ```bash\r\n    python -c \"import os; print(os.getenv('R_HOME'))\"\r\n    ```\r\n\r\n4. Setting Up `R_HOME` (if needed)\r\n\r\n- **Windows**:  \r\n  1. Locate the R installation directory (e.g., `C:\\Program Files\\R\\R-4.x.x`).\r\n  2. Add the path to `R_HOME`:  \r\n     - Open \"System Properties\" > \"Environment Variables.\"\r\n     - Under \"System Variables,\" click \"New\" or \"Edit\" and set:\r\n       - **Variable name**: `R_HOME`\r\n       - **Variable value**: Path to your R directory (e.g., `C:\\Program Files\\R\\R-4.x.x`).\r\n\r\n- **macOS/Linux**:  \r\n  Add the following line to your shell configuration file (`~/.bashrc`, `~/.zshrc`, or `~/.bash_profile`):  \r\n  ```bash\r\n  export R_HOME=/usr/lib/R\r\n  \r\n### Installation\r\n\r\nTo install the `pltstat` library, simply run the following command:\r\n\r\n```bash\r\npip install pltstat\r\n```\r\n\r\nThis will install the library along with all the required dependencies as specified in the `requirements.txt` file.\r\n\r\nAfter installation the package, you can start using `pltstat` by importing the necessary modules in your Python scripts.\r\n\r\n### Example\r\n\r\n```python\r\nimport pandas as pd\r\nfrom pltstat import twofeats as tf\r\n\r\n# Data creation:\r\ndata = {\r\n    \"gender\": [\"male\", \"female\", \"female\", \"male\", \"male\", \"female\", \"female\", \"male\", \"male\", \r\n               \"female\", \"male\", \"female\", \"male\", \"male\", \"female\", \"male\", \"female\", \"male\", \r\n               \"female\", \"male\", \"female\", \"male\", \"female\", \"male\", \"female\", \"male\", \"female\", \r\n               \"female\", \"male\", \"male\", \"male\"],\r\n    \"age\": [22, 20, 17, 16, 19, 17, 11, 29, 24, 12, 22, 20, 19, 16, 11, 29, 24, 20, 16, 22, \r\n            17, 29, 24, 16, 17, 29, 22, 19, 22, 22, 24, 29]\r\n}\r\n\r\ndf = pd.DataFrame(data)\r\n\r\n# Boxplot creation:\r\ntf.boxplot(df, \"gender\", \"age\")\r\n```\r\n\r\n---\r\n\r\n## File Descriptions\r\n\r\n### Python Modules\r\n\r\n- **[`__init__.py`](__init__.py)**\r\n  - Marks the directory as a Python package. This file allows you to import modules from the `pltstat` package.\r\n\r\n- **[`singlefeat.py`](singlefeat.py)**\r\n  - Dedicated to the analysis and visualization of single-variable features, including plotting functions such as pie charts, count plots, and histograms.\r\n\r\n- **[`twofeats.py`](twofeats.py)**\r\n  - Provides tools for analyzing interactions between two features. Includes functions for creating crosstabs, computing correlations, and visualizing results using violin plots, boxplots, and distribution box plots. These functions also display p-values and other statistical metrics to summarize relationships between the two features.\r\n\r\n- **[`multfeats.py`]()**\r\n  - Provides tools for analyzing relationships between multiple features.  \r\n  Includes visualization functions for analyzing missing data, comparing distributions, and visualizing dimensionality reductions. Additionally, it provides methods for creating heatmaps that display correlations and p-values, including Spearman's correlation, Mann-Whitney p-values, and Phik correlations.\r\n\r\n- **[`circle.py`](circle.py)**\r\n  - Contains functions and methods related to circular statistical visualizations, such as radar charts or circular histograms.\r\n\r\n- **[`cm.py`](cm.py)**\r\n  - Contains custom colormap utilities for visualizations, such as rendering correlation matrices or creating two-colored maps for p-values with a threshold (e.g., alpha).\r\n\r\n- **[`corr_methods.py`](corr_methods.py)**\r\n  - Includes methods for calculating correlation matrices and related statistical relationships.\r\n\r\n- **[`in_out.py`](in_out.py)**\r\n  - Provides utilities for reading, writing, and preprocessing input and output data files.\r\n\r\n### Other Files\r\n\r\n- **[`.gitignore`](.gitignore)**\r\n  - Specifies intentionally untracked files to ignore in the repository, such as virtual environments and temporary files.\r\n\r\n- **`README.md`**\r\n  - This file provides an overview of the project, including file descriptions and usage instructions.\r\n\r\n- **[`requirements.txt`](requirements.txt)**\r\n  - Lists the Python dependencies required to run the library. Install them using:\r\n    ```bash\r\n    pip install -r requirements.txt\r\n    ```\r\n\r\n---\r\n\r\n## Getting Started\r\n\r\n1. Clone the repository:\r\n   ```bash\r\n   git clone https://github.com/trojanskehesten/pltstat.git\r\n   ```\r\n\r\n2. Navigate to the project directory:\r\n   ```bash\r\n   cd pltstat\r\n   ```\r\n\r\n3. **Python Version**: This library is compatible with [Python 3.12](https://www.python.org/downloads/release/python-3120/). Ensure you have this version installed before running the project.\r\n\r\n4. **R Installation**: Ensure that the [R language is installed](https://cloud.r-project.org/) on your system, as the `rpy2` library (used in this project) requires it.\r\n\r\n5. Install dependencies:\r\n   ```bash\r\n   pip install -r requirements.txt\r\n   ```\r\n\r\n6. Explore the modules and utilize the library in your projects.\r\n\r\n---\r\n\r\n## Usage\r\nEach module in `pltstat` is designed to be modular and reusable. Import the required module and use its functions to visualize your statistical data. For example:\r\n\r\n```python\r\nimport pandas as pd\r\nfrom pltstat import singlefeat as sf\r\n\r\ndata = {\r\n    \"Age\": [25, 30, 22, 27, 35],\r\n    \"A/B Test Group\": [\"A\", \"B\", \"A\", \"B\", \"A\"],\r\n}\r\ndf = pd.DataFrame(data)\r\n\r\n# Example: Plot a histogram\r\nsf.pie(df[\"A/B Test Group\"])\r\n```\r\n\r\n---\r\n\r\n## Contributing\r\nContributions are welcome! If you'd like to improve the library or fix issues, please:\r\n1. Fork the repository.\r\n2. Create a new branch.\r\n3. Make your changes and commit them.\r\n4. Submit a pull request.\r\n\r\n---\r\n\r\n## License\r\nThis project is licensed under the BSD 3-Clause License. See the [LICENSE](LICENSE) file for details.\r\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License, see LICENSE file",
    "summary": "A Python Library for Statistical Data Visualization",
    "version": "0.9.7",
    "project_urls": {
        "Download": "https://github.com/trojanskehesten/pltstat/archive/refs/tags/v0.9.7.zip",
        "Homepage": "https://github.com/trojanskehesten/pltstat"
    },
    "split_keywords": [
        "matplotlib",
        " statistics",
        " visualization",
        " dataanalysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b131ae4ba05d7bed24471ecb26a7a648e9e3579f228989b28aa5a19879fd6c10",
                "md5": "31411f6985bd9dddc863b1422a291fbf",
                "sha256": "0356da8c7d775abe8690a194db658a65bbf80e48e201fad6f1eeac9e84ff8858"
            },
            "downloads": -1,
            "filename": "pltstat-0.9.7.tar.gz",
            "has_sig": false,
            "md5_digest": "31411f6985bd9dddc863b1422a291fbf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "~=3.12",
            "size": 31781,
            "upload_time": "2025-03-03T04:27:19",
            "upload_time_iso_8601": "2025-03-03T04:27:19.334116Z",
            "url": "https://files.pythonhosted.org/packages/b1/31/ae4ba05d7bed24471ecb26a7a648e9e3579f228989b28aa5a19879fd6c10/pltstat-0.9.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-03 04:27:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "trojanskehesten",
    "github_project": "pltstat",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "~=",
                    "2.0.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "~=",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "~=",
                    "3.10.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    "~=",
                    "0.13.2"
                ]
            ]
        },
        {
            "name": "rpy2",
            "specs": [
                [
                    "~=",
                    "3.5.17"
                ]
            ]
        },
        {
            "name": "umap-learn",
            "specs": [
                [
                    "~=",
                    "0.5.7"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "~=",
                    "1.14.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "~=",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "phik",
            "specs": [
                [
                    "~=",
                    "0.12.4"
                ]
            ]
        }
    ],
    "lcname": "pltstat"
}
        
Elapsed time: 0.46931s