# Statistical Analysis Toolkit
Welcome to `statanalysis`, a repository of statistical methods and tools tailored for data analysis enthusiasts. Inspired by my completion of a Coursera certificate in statistics, this repository encompasses a plethora of statistical concepts meticulously crafted into implementations. From prediction metrics to regression analysis, hypothesis testing to confidence intervals, and population parameter estimation to model estimation, `statanalysis` covers it all.
Built in Python, `statanalysis` provides meticulously crafted modules and utilities aimed at beginners in statistics, data science, and research. While following a certification on statistics on Coursera, I chose to solidify my knowledge through implementations instead of solely relying on existing modules. I believe there is no better way to understand a statistical formula than by implementing it in code, documenting it thoroughly, and validating the results through tests.
So, I've rewritten common statistical learning tools then create a repository that offers direct access to my implementations, ensuring simplicity without compromising accuracy. Futhermore, these implementations have undergone rigorous testing against established libraries like [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html), [statsmodels](https://www.statsmodels.org/stable/index.html), and [scikit-learn](https://scikit-learn.org/stable/modules/classes.html) to uphold industry standards.
Whether you're a novice or an experienced data analyst, `statanalysis` aims to simplify and enhance your statistical analysis journey. Dive in and explore a wealth of statistical methods and techniques designed to streamline your analytical processes and empower your insights.
## Features
1. **Utility Functions:**
- **Module:** `utils_md`
- **Description:** The `utils_md` module provides a collection of helper functions for various statistical tasks, including data preprocessing, standard deviation estimation, and computation of probabilities and percentiles.
1. **Hypothesis Validation:**
- **Module:** `hyp_vali_md`
- **Description:** The `hyp_vali_md` module includes functions for hypothesis validation, such as checking residuals, coefficients, and conducting hypothesis tests. Features encompass:
- **Constraint Checking:** Functions for verifying constraints, such as checking if values fall within specific ranges.
- **Hypothesis Sample Size:** Tools for ensuring minimum sample sizes for hypothesis testing scenarios.
1. **Confidence Interval Estimation:**
- **Module:** `conf_inte_md`
- **Description:** The `conf_inte_md` module offers methods for estimating confidence intervals for population parameters, such as proportions and means. Features include:
- **One-sample Proportion:** Functions for estimating confidence intervals for population proportions based on a single sample.
- **Two-sample Mean:** Methods for computing confidence intervals for the difference between two population means, considering paired and unpaired data.
1. **Hypothesis Testing:**
- **Module:** `hyp_testi_md`
- **Description:** This module encompasses a comprehensive suite of functions for hypothesis testing, covering a variety of scenarios:
- **Testing Population Proportions:** Methods for assessing hypotheses related to population proportions using z-tests.
- **Comparing Means:** Functions for conducting hypothesis tests to compare means between two or more populations, employing t-tests and ANOVA.
1. **Model Estimation:**
- **Module:** `mdl_esti_md`
- **Description:** The `mdl_esti_md` module houses classes and functions dedicated to model estimation. Notable features include:
- **Linear Regression:** Implementation of linear regression models, including ordinary least squares (OLS) and robust regression.
- **Logistic Regression:** Classes for logistic regression analysis, enabling binary classification tasks with probability predictions.
- **Multiple Regression:** Tools for conducting multiple regression analysis, facilitating the exploration of relationships between multiple independent variables and a dependent variable.
## Repository Structure
The repository is organized into two main folders:
1. **`statanalysis/` Folder:**
This folder contains the following modules:
- **`utils_md:`** Module for utility functions, offering a collection of helper functions for statistical tasks.
- **`hyp_vali_md:`** Module for hypothesis validation, containing functions for checking residuals, coefficients, and conducting hypothesis tests.
- **`conf_inte_md:`** Module for confidence interval estimation, providing methods for estimating confidence intervals for proportions and means.
- **`hyp_testi_md:`** Module for hypothesis testing, including functions for conducting hypothesis tests on proportions and means.
- **`mdl_esti_md:`** Module for model estimation, including classes and functions for linear regression, logistic regression, and multiple regression.
2. **`tests/` Folder:**
This folder features tests for all methods mentioned above.
## Usage
To utilize the statistical analysis functionalities provided by this library, you have either clone the repo or install from pypi depending on your usage
### **Clone the Repository:**
Clone the repository to your local machine using the following command:
```bash
git clone https://github.com/hermann-web/some-common-statistical-methods
```
### **Install the Library from PyPI:**
Install the library from PyPI using pip:
```bash
pip install statanalysis
```
Choose the option that best suits your needs and get started with your statistical analysis.
### **Import Modules or Functions:**
In your Python script, import the desired modules or functions using the following syntax:
```python
from statanalysis import utils_md, hyp_vali_md, conf_inte_md, hyp_testi_md, mdl_esti_md
```
### **Perform Statistical Analysis:**
Utilize the imported functions and classes to perform a wide range of statistical analysis tasks on your data. For example:
```python
# Example: Compute a confidence interval for a population proportion
confidence_interval = conf_inte_md.IC_PROPORTION_ONE(sample_size=100, parameter=0.5, confidence=0.95)
```
Leverage advanced statistical techniques and methodologies provided by the modules to analyze your data effectively.
Additionally, if you prefer to browse documentation in a more structured format, you can refer to the documentation files included in the repository, which provides detailed information about the library's functionalities and usage. There is a [detailled one](./docs/detailled-docu.md) and a[more concice one](./docs/concise-docu.md)
## Additional Information
- The repository includes a comprehensive test suite in [tests](./tests/) folder to validate the accuracy and consistency of the implemented methods against standard industry-standard libraries like scipy.stats, statsmodels, and scikit-learn.
- The module is available on PyPI for easy installation and use in various statistical analysis projects.
- For detailed explanations and references, refer to the respective sections in the code files.
- Further insights and explanations on statistical concepts can be found in the provided links.
- For inquiries or assistance regarding the repository, please contact [Hermann Agossou](mailto:hermannagossou7[at]gmail.com).
Raw data
{
"_id": null,
"home_page": "https://github.com/Hermann-web/some-common-statistical-methods",
"name": "stat-analysis",
"maintainer": "Hermann Agossou",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": "agossouhermann7@gmail.com",
"keywords": "statistics, data analysis, confidence intervals, hypothesis testing, model estimation, regression, statistical learning",
"author": "Hermann Agossou",
"author_email": "agossouhermann7@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a4/37/334ee9ed963ec63145f52a1c80698bc8438da0a38854679fa0106d7157b9/stat_analysis-1.0.0.tar.gz",
"platform": null,
"description": "# Statistical Analysis Toolkit\n\nWelcome to `statanalysis`, a repository of statistical methods and tools tailored for data analysis enthusiasts. Inspired by my completion of a Coursera certificate in statistics, this repository encompasses a plethora of statistical concepts meticulously crafted into implementations. From prediction metrics to regression analysis, hypothesis testing to confidence intervals, and population parameter estimation to model estimation, `statanalysis` covers it all.\n\nBuilt in Python, `statanalysis` provides meticulously crafted modules and utilities aimed at beginners in statistics, data science, and research. While following a certification on statistics on Coursera, I chose to solidify my knowledge through implementations instead of solely relying on existing modules. I believe there is no better way to understand a statistical formula than by implementing it in code, documenting it thoroughly, and validating the results through tests.\n\nSo, I've rewritten common statistical learning tools then create a repository that offers direct access to my implementations, ensuring simplicity without compromising accuracy. Futhermore, these implementations have undergone rigorous testing against established libraries like [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html), [statsmodels](https://www.statsmodels.org/stable/index.html), and [scikit-learn](https://scikit-learn.org/stable/modules/classes.html) to uphold industry standards.\n\nWhether you're a novice or an experienced data analyst, `statanalysis` aims to simplify and enhance your statistical analysis journey. Dive in and explore a wealth of statistical methods and techniques designed to streamline your analytical processes and empower your insights.\n\n## Features\n\n1. **Utility Functions:**\n - **Module:** `utils_md`\n - **Description:** The `utils_md` module provides a collection of helper functions for various statistical tasks, including data preprocessing, standard deviation estimation, and computation of probabilities and percentiles.\n\n1. **Hypothesis Validation:**\n - **Module:** `hyp_vali_md`\n - **Description:** The `hyp_vali_md` module includes functions for hypothesis validation, such as checking residuals, coefficients, and conducting hypothesis tests. Features encompass:\n - **Constraint Checking:** Functions for verifying constraints, such as checking if values fall within specific ranges.\n - **Hypothesis Sample Size:** Tools for ensuring minimum sample sizes for hypothesis testing scenarios.\n\n1. **Confidence Interval Estimation:**\n - **Module:** `conf_inte_md`\n - **Description:** The `conf_inte_md` module offers methods for estimating confidence intervals for population parameters, such as proportions and means. Features include:\n - **One-sample Proportion:** Functions for estimating confidence intervals for population proportions based on a single sample.\n - **Two-sample Mean:** Methods for computing confidence intervals for the difference between two population means, considering paired and unpaired data.\n\n1. **Hypothesis Testing:**\n - **Module:** `hyp_testi_md`\n - **Description:** This module encompasses a comprehensive suite of functions for hypothesis testing, covering a variety of scenarios:\n - **Testing Population Proportions:** Methods for assessing hypotheses related to population proportions using z-tests.\n - **Comparing Means:** Functions for conducting hypothesis tests to compare means between two or more populations, employing t-tests and ANOVA.\n\n1. **Model Estimation:**\n - **Module:** `mdl_esti_md`\n - **Description:** The `mdl_esti_md` module houses classes and functions dedicated to model estimation. Notable features include:\n - **Linear Regression:** Implementation of linear regression models, including ordinary least squares (OLS) and robust regression.\n - **Logistic Regression:** Classes for logistic regression analysis, enabling binary classification tasks with probability predictions.\n - **Multiple Regression:** Tools for conducting multiple regression analysis, facilitating the exploration of relationships between multiple independent variables and a dependent variable.\n\n## Repository Structure\n\nThe repository is organized into two main folders:\n\n1. **`statanalysis/` Folder:**\n\n This folder contains the following modules:\n\n - **`utils_md:`** Module for utility functions, offering a collection of helper functions for statistical tasks.\n - **`hyp_vali_md:`** Module for hypothesis validation, containing functions for checking residuals, coefficients, and conducting hypothesis tests.\n - **`conf_inte_md:`** Module for confidence interval estimation, providing methods for estimating confidence intervals for proportions and means.\n - **`hyp_testi_md:`** Module for hypothesis testing, including functions for conducting hypothesis tests on proportions and means.\n - **`mdl_esti_md:`** Module for model estimation, including classes and functions for linear regression, logistic regression, and multiple regression.\n\n2. **`tests/` Folder:**\n\n This folder features tests for all methods mentioned above.\n\n## Usage\n\nTo utilize the statistical analysis functionalities provided by this library, you have either clone the repo or install from pypi depending on your usage\n\n### **Clone the Repository:**\n\nClone the repository to your local machine using the following command:\n\n```bash\ngit clone https://github.com/hermann-web/some-common-statistical-methods\n```\n\n### **Install the Library from PyPI:**\n\nInstall the library from PyPI using pip:\n\n```bash\npip install statanalysis\n```\n\nChoose the option that best suits your needs and get started with your statistical analysis.\n\n### **Import Modules or Functions:**\n\nIn your Python script, import the desired modules or functions using the following syntax:\n\n```python\nfrom statanalysis import utils_md, hyp_vali_md, conf_inte_md, hyp_testi_md, mdl_esti_md\n```\n\n### **Perform Statistical Analysis:**\n\nUtilize the imported functions and classes to perform a wide range of statistical analysis tasks on your data. For example:\n\n```python\n# Example: Compute a confidence interval for a population proportion\nconfidence_interval = conf_inte_md.IC_PROPORTION_ONE(sample_size=100, parameter=0.5, confidence=0.95)\n```\n\nLeverage advanced statistical techniques and methodologies provided by the modules to analyze your data effectively.\n\nAdditionally, if you prefer to browse documentation in a more structured format, you can refer to the documentation files included in the repository, which provides detailed information about the library's functionalities and usage. There is a [detailled one](./docs/detailled-docu.md) and a[more concice one](./docs/concise-docu.md)\n\n## Additional Information\n\n- The repository includes a comprehensive test suite in [tests](./tests/) folder to validate the accuracy and consistency of the implemented methods against standard industry-standard libraries like scipy.stats, statsmodels, and scikit-learn.\n- The module is available on PyPI for easy installation and use in various statistical analysis projects.\n- For detailed explanations and references, refer to the respective sections in the code files.\n- Further insights and explanations on statistical concepts can be found in the provided links.\n- For inquiries or assistance regarding the repository, please contact [Hermann Agossou](mailto:hermannagossou7[at]gmail.com).\n",
"bugtrack_url": null,
"license": "Apache License",
"summary": "A Python library providing hands on implementation of a collection of common statistical methods for data analysis.",
"version": "1.0.0",
"project_urls": {
"Documentation": "https://github.com/Hermann-web/some-common-statistical-methods/blob/main/docs/detailled-docu.md",
"Homepage": "https://github.com/Hermann-web/some-common-statistical-methods",
"Repository": "https://github.com/Hermann-web/some-common-statistical-methods"
},
"split_keywords": [
"statistics",
" data analysis",
" confidence intervals",
" hypothesis testing",
" model estimation",
" regression",
" statistical learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "272edcb82db08a8d0f63bf345ad7af56b1d3658031b8fa4720b6a95fbf139ac7",
"md5": "0d1212491fcb8a73a316e242a6f6c8ce",
"sha256": "7aff9e0128aec649cd5d991b53ad064371913f67416e0b595d57f4543551ec62"
},
"downloads": -1,
"filename": "stat_analysis-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0d1212491fcb8a73a316e242a6f6c8ce",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 49165,
"upload_time": "2024-04-04T19:33:41",
"upload_time_iso_8601": "2024-04-04T19:33:41.014590Z",
"url": "https://files.pythonhosted.org/packages/27/2e/dcb82db08a8d0f63bf345ad7af56b1d3658031b8fa4720b6a95fbf139ac7/stat_analysis-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a437334ee9ed963ec63145f52a1c80698bc8438da0a38854679fa0106d7157b9",
"md5": "4aa2d8a52ae9819d31d2dd7d3367797d",
"sha256": "5bae3c5a15d56e7c6d2c23445f4fa23ed7e2d280a51a1e31fe9e9b902dfe2e72"
},
"downloads": -1,
"filename": "stat_analysis-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "4aa2d8a52ae9819d31d2dd7d3367797d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 40551,
"upload_time": "2024-04-04T19:33:43",
"upload_time_iso_8601": "2024-04-04T19:33:43.389935Z",
"url": "https://files.pythonhosted.org/packages/a4/37/334ee9ed963ec63145f52a1c80698bc8438da0a38854679fa0106d7157b9/stat_analysis-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-04 19:33:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Hermann-web",
"github_project": "some-common-statistical-methods",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "statsmodels",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "seaborn",
"specs": []
}
],
"lcname": "stat-analysis"
}