# HLR - Hierarchical Linear Regression in Python
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7683809.svg)](https://doi.org/10.5281/zenodo.7683809) [![image](https://img.shields.io/pypi/v/HLR.svg)](https://pypi.python.org/pypi/HLR) [![CI testing](https://github.com/teanijarv/HLR/actions/workflows/testing.yml/badge.svg)](https://github.com/teanijarv/HLR/actions/workflows/testing.yml) [![Documentation Status](https://readthedocs.org/projects/hlr-hierarchical-linear-regression/badge/?version=latest)](https://hlr-hierarchical-linear-regression.readthedocs.io/en/latest/?version=latest)
HLR is a simple Python package for running hierarchical linear regression.
## Features
It is built to work with Pandas dataframes, uses SciPy, statsmodels and pingouin under the hood, and runs diagnostic tests for testing assumptions while plotting figures with matplotlib and seaborn.
## Installation
HLR is meant to be used with Python 3.9 or above, and has been tested on Python 3.9-3.12.
#### Dependencies
- [NumPy](https://numpy.org/)
- [SciPy](https://www.scipy.org/)
- [Pandas](https://pandas.pydata.org/)
- [statsmodels](https://www.statsmodels.org/)
- [pingouin](https://pingouin-stats.org/)
- [matplotlib](https://matplotlib.org/)
- [seaborn](https://seaborn.pydata.org/)
#### User installation
To install HLR, run this command in your terminal:
`pip install hlr`
This is the preferred method to install HLR, as it will always install the most recent stable release.
If you don’t have [pip](https://pip.pypa.io/) installed, this [Python installation guide](http://docs.python-guide.org/en/latest/starting/installation/) can guide you through the process.
## Usage
Importing the module and running hierarchical linear regression, summarising the results, running assumption tests, and plotting.
```python
import pandas as pd
from HLR import HierarchicalLinearRegression
# Example dataframe which includes some columns which are also mentioned below
nba = pd.read_csv('NBA_train.csv')
# Define the models for hierarchical regression including predictors for each model
X = {1: ['PTS'],
2: ['PTS', 'ORB'],
3: ['PTS', 'ORB', 'BLK']}
# Define the outcome variable
y = 'W'
# Initiate the HLR object (missing_data and ols_params are optional parameters)
hreg = HierarchicalLinearRegression(df, X, y, ols_params=None)
# Generate a summarised report of HLR
hreg.summary()
# Run diagnostics on all the models (displayed output below only shows the first model)
hreg.diagnostics(verbose=True)
# Different plots (see docs for more)
fig1 = hreg.plot_studentized_residuals_vs_fitted()
fig2 = hreg.plot_qq_residuals()
fig3 = hreg.plot_influence()
fig4 = hreg.plot_std_residuals()
fig5 = hreg.plot_histogram_std_residuals()
fig_list = hreg.plot_partial_regression()
```
Output:
| | Model Level | Predictors | N (observations) | DF (residuals) | DF (model) | R-squared | F-value | P-value (F) | SSE | SSTO | MSE (model) | MSE (residuals) | MSE (total) | Beta coefs | P-values (beta coefs) | Failed assumptions (check!) | R-squared change | F-value change | P-value (F change) |
|---|-----:|-------------------------------------:|-----------------:|---------------:|-----------:|----------:|----------:|-------------:|--------------:|---------:|-------------:|----------------:|------------:|--------------------------------------------------:|--------------------------------------------------:|--------------------------------------------------:|-----------------:|---------------:|-------------------:|
| 0 | 1 | [PTS] | 835.0 | 833.0 | 1.0 | 0.089297 | 81.677748 | 1.099996e-18 | 123292.827686 | 135382.0 | 12089.172314 | 148.010597 | 162.328537 | {'Constant': -13.846261266053896, 'points': 0.... | {'Constant': 0.023091997486255577, 'points': 1... | [Homoscedasticity, Normality] | NaN | NaN | NaN |
| 1 | 2 | [PTS, ORB] | 835.0 | 832.0 | 2.0 | 0.168503 | 84.302598 | 4.591961e-34 | 112569.697267 | 135382.0 | 11406.151367 | 135.300117 | 162.328537 | {'Constant': -14.225561767669713, 'points': 0.... | {'Constant': 0.014660145903221372, 'points': 1... | [Normality, Multicollinearity] | 0.079206 | 79.254406 | 3.372595e-18 |
| 2 | 3 | [PTS, ORB, BLK] | 835.0 | 831.0 | 3.0 | 0.210012 | 73.638176 | 3.065838e-42 | 106950.174175 | 135382.0 | 9477.275275 | 128.700571 | 162.328537 | {'Constant': -21.997353037483723, 'points': 0.... | {'Constant': 0.00015712851466562279, 'points':... | [Normality, Multicollinearity, Outliers/Levera... | 0.041509 | 43.663545 | 6.962046e-11 |
```
Model Level 1 Diagnostics:
Independence of residuals (Durbin-Watson test):
DW stat: 1.9913212248708367
Passed: True
Linearity (Pearson r):
PTS: {'Pearson r': 0.29882561440469596, 'p-value': 1.099996182226575e-18, 'Passed': True}
Linearity (Rainbow test):
Rainbow Stat: 0.9145095390107386
p-value: 0.8189528030224006
Passed: True
Homoscedasticity (Breusch-Pagan test):
Lagrange Stat: 5.183865793060617
p-value: 0.022797547646224846
Passed: False
Homoscedasticity (Goldfeld-Quandt test):
F-Stat: 1.0462467498084154
p-value: 0.3225733517317874
Passed: True
Multicollinearity (pairwise correlations):
Correlations: {}
Passed: True
Multicollinearity (Variance Inflation Factors):
VIFs: {}
Passed: True
Outliers (extreme standardized residuals):
Indices: []
Passed: True
Outliers (high Cooks distance):
Indices: []
Passed: True
Normality (mean of residuals):
Mean: 4.465782367986833e-14
Passed: True
Normality (Shapiro-Wilk test):
SW Stat: 0.9873111844062805
p-value: 1.2462886616049218e-06
Passed: False
Model Level 2 Diagnostics:
...
```
![diagnostic_plot1](https://i.imgur.com/22kFc0F.jpeg) | ![diagnostic_plot2](https://i.imgur.com/j8l6qJs.png)
:-------------------------:|:-------------------------:
#### Documentation (WIP)
Find more comprehensive overview of the usage of HLR.
<https://hlr-hierarchical-linear-regression.readthedocs.io>
## Citation
Please use Zenodo DOI for citing the package in your work.
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7683809.svg)](https://doi.org/10.5281/zenodo.7683809)
#### Example
Anijärv, T. E., Mitchell, J. and Boyle, R. (2024) ‘teanijarv/HLR: v0.2.3’. Zenodo. https://doi.org/10.5281/zenodo.7683808
```
@software{toomas_erik_anijarv_2024_7683808,
author = {Toomas Erik Anijärv, Jules Mitchell, Rory Boyle},
title = {teanijarv/HLR: v0.2.3},
month = mar,
year = 2024,
publisher = {Zenodo},
version = {v0.2.3},
doi = {10.5281/zenodo.7683808},
url = {https://doi.org/10.5281/zenodo.7683808}
}
```
## Development
The HLR package was created and is maintained by [Toomas Erik Anijärv](https://www.toomaserikanijarv.com). It is updated during spare time, thereby contributions are more than welcome!
This program is provided with no warranty of any kind and it is still under development. However, this code has been checked and validated against multiple same analyses conducted in SPSS.
#### To-do
Would be great if someone with more experience with packages would contribute with testing and the whole deployment process. Also, if someone would want to write documentation, that would be amazing.
- dict values within df hard to read
- add t stats for coefficients
- add regression type option (eg, for logistic regression)
#### Contributors
[Toomas Erik Anijärv](https://github.com/teanijarv)
[Rory Boyle](https://github.com/rorytboyle)
[Jules Mitchell](https://github.com/JulesMitchell)
[Cate Scanlon](https://github.com/catescanlon)
=======
History
=======
0.1.0 (2023-02-24)
------------------
* First release on PyPI.
0.1.4 (2023-03-9)
------------------
* Fixed pairwise correlations threshold for multicollinearity assumption testing (0.3 -> 0.7)
* Fixed partial regression plots fixed figure size
* Added titles to diagnostic plots
* Fixed the VIF to match with SPSS output by adding the constant to X
0.1.5 (2023-04-6)
------------------
* Added standardised beta coefficients to model output
* Added partial and semi-partial correlations (unique variance) to model output
* Fixed F-change degrees of freedom calculation
* Fixed F-change p-value calculation
0.2.0 (2024-03-2)
------------------
* Overall project restructuring for optimisation
0.2.1 (2024-03-3)
------------------
* Option to modify the OLS parameters used in the HLR
0.2.2 (2024-03-3)
------------------
* Updated documentation
0.2.3 (2024-03-7)
------------------
* Added that the plotting functions return matplotlib figure object
Raw data
{
"_id": null,
"home_page": "https://github.com/teanijarv/HLR",
"name": "HLR",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "HLR",
"author": "Toomas Erik Anij\u00e4rv",
"author_email": "toomaserikanijarv@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f9/22/9cc9f7725a01024ab90733d3eb246dfd1d960a0cc214ba0fe00ae28558c0/HLR-0.2.3.tar.gz",
"platform": null,
"description": "# HLR - Hierarchical Linear Regression in Python\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7683809.svg)](https://doi.org/10.5281/zenodo.7683809) [![image](https://img.shields.io/pypi/v/HLR.svg)](https://pypi.python.org/pypi/HLR) [![CI testing](https://github.com/teanijarv/HLR/actions/workflows/testing.yml/badge.svg)](https://github.com/teanijarv/HLR/actions/workflows/testing.yml) [![Documentation Status](https://readthedocs.org/projects/hlr-hierarchical-linear-regression/badge/?version=latest)](https://hlr-hierarchical-linear-regression.readthedocs.io/en/latest/?version=latest)\n\nHLR is a simple Python package for running hierarchical linear regression.\n\n## Features\nIt is built to work with Pandas dataframes, uses SciPy, statsmodels and pingouin under the hood, and runs diagnostic tests for testing assumptions while plotting figures with matplotlib and seaborn.\n\n## Installation\nHLR is meant to be used with Python 3.9 or above, and has been tested on Python 3.9-3.12.\n\n#### Dependencies\n- [NumPy](https://numpy.org/)\n- [SciPy](https://www.scipy.org/)\n- [Pandas](https://pandas.pydata.org/)\n- [statsmodels](https://www.statsmodels.org/)\n- [pingouin](https://pingouin-stats.org/)\n- [matplotlib](https://matplotlib.org/)\n- [seaborn](https://seaborn.pydata.org/)\n\n#### User installation\nTo install HLR, run this command in your terminal:\n\n`pip install hlr`\n\nThis is the preferred method to install HLR, as it will always install the most recent stable release.\n\nIf you don\u2019t have [pip](https://pip.pypa.io/) installed, this [Python installation guide](http://docs.python-guide.org/en/latest/starting/installation/) can guide you through the process.\n\n## Usage\n\nImporting the module and running hierarchical linear regression, summarising the results, running assumption tests, and plotting.\n\n```python\nimport pandas as pd\nfrom HLR import HierarchicalLinearRegression\n\n# Example dataframe which includes some columns which are also mentioned below\nnba = pd.read_csv('NBA_train.csv')\n\n# Define the models for hierarchical regression including predictors for each model\nX = {1: ['PTS'], \n 2: ['PTS', 'ORB'], \n 3: ['PTS', 'ORB', 'BLK']}\n\n# Define the outcome variable\ny = 'W'\n\n# Initiate the HLR object (missing_data and ols_params are optional parameters)\nhreg = HierarchicalLinearRegression(df, X, y, ols_params=None)\n\n# Generate a summarised report of HLR\nhreg.summary()\n\n# Run diagnostics on all the models (displayed output below only shows the first model)\nhreg.diagnostics(verbose=True)\n\n# Different plots (see docs for more)\nfig1 = hreg.plot_studentized_residuals_vs_fitted()\nfig2 = hreg.plot_qq_residuals()\nfig3 = hreg.plot_influence()\nfig4 = hreg.plot_std_residuals()\nfig5 = hreg.plot_histogram_std_residuals()\nfig_list = hreg.plot_partial_regression()\n```\nOutput:\n| | Model Level | Predictors | N (observations) | DF (residuals) | DF (model) | R-squared | F-value | P-value (F) | SSE | SSTO | MSE (model) | MSE (residuals) | MSE (total) | Beta coefs | P-values (beta coefs) | Failed assumptions (check!) | R-squared change | F-value change | P-value (F change) |\n|---|-----:|-------------------------------------:|-----------------:|---------------:|-----------:|----------:|----------:|-------------:|--------------:|---------:|-------------:|----------------:|------------:|--------------------------------------------------:|--------------------------------------------------:|--------------------------------------------------:|-----------------:|---------------:|-------------------:|\n| 0 | 1 | [PTS]\t | 835.0 | 833.0 | 1.0 | 0.089297 | 81.677748 | 1.099996e-18 | 123292.827686 | 135382.0 | 12089.172314 | 148.010597 | 162.328537 | {'Constant': -13.846261266053896, 'points': 0.... | {'Constant': 0.023091997486255577, 'points': 1... | [Homoscedasticity, Normality] | NaN | NaN | NaN |\n| 1 | 2 | [PTS, ORB] | 835.0 | 832.0 | 2.0 | 0.168503 | 84.302598 | 4.591961e-34 | 112569.697267 | 135382.0 | 11406.151367 | 135.300117 | 162.328537 | {'Constant': -14.225561767669713, 'points': 0.... | {'Constant': 0.014660145903221372, 'points': 1... | [Normality, Multicollinearity] | 0.079206 | 79.254406 | 3.372595e-18 |\n| 2 | 3 | [PTS, ORB, BLK] | 835.0 | 831.0 | 3.0 | 0.210012 | 73.638176 | 3.065838e-42 | 106950.174175 | 135382.0 | 9477.275275 | 128.700571 | 162.328537 | {'Constant': -21.997353037483723, 'points': 0.... | {'Constant': 0.00015712851466562279, 'points':... | [Normality, Multicollinearity, Outliers/Levera... | 0.041509 | 43.663545 | 6.962046e-11 |\n\n```\nModel Level 1 Diagnostics:\n Independence of residuals (Durbin-Watson test):\n DW stat: 1.9913212248708367\n Passed: True\n Linearity (Pearson r):\n PTS: {'Pearson r': 0.29882561440469596, 'p-value': 1.099996182226575e-18, 'Passed': True}\n Linearity (Rainbow test):\n Rainbow Stat: 0.9145095390107386\n p-value: 0.8189528030224006\n Passed: True\n Homoscedasticity (Breusch-Pagan test):\n Lagrange Stat: 5.183865793060617\n p-value: 0.022797547646224846\n Passed: False\n Homoscedasticity (Goldfeld-Quandt test):\n F-Stat: 1.0462467498084154\n p-value: 0.3225733517317874\n Passed: True\n Multicollinearity (pairwise correlations):\n Correlations: {}\n Passed: True\n Multicollinearity (Variance Inflation Factors):\n VIFs: {}\n Passed: True\n Outliers (extreme standardized residuals):\n Indices: []\n Passed: True\n Outliers (high Cooks distance):\n Indices: []\n Passed: True\n Normality (mean of residuals):\n Mean: 4.465782367986833e-14\n Passed: True\n Normality (Shapiro-Wilk test):\n SW Stat: 0.9873111844062805\n p-value: 1.2462886616049218e-06\n Passed: False\n\nModel Level 2 Diagnostics:\n...\n```\n\n![diagnostic_plot1](https://i.imgur.com/22kFc0F.jpeg) | ![diagnostic_plot2](https://i.imgur.com/j8l6qJs.png)\n:-------------------------:|:-------------------------:\n\n#### Documentation (WIP)\nFind more comprehensive overview of the usage of HLR.\n\n <https://hlr-hierarchical-linear-regression.readthedocs.io>\n\n## Citation\nPlease use Zenodo DOI for citing the package in your work.\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7683809.svg)](https://doi.org/10.5281/zenodo.7683809)\n\n#### Example\n\nAnij\u00e4rv, T. E., Mitchell, J. and Boyle, R. (2024) \u2018teanijarv/HLR: v0.2.3\u2019. Zenodo. https://doi.org/10.5281/zenodo.7683808\n```\n@software{toomas_erik_anijarv_2024_7683808,\n author = {Toomas Erik Anij\u00e4rv, Jules Mitchell, Rory Boyle},\n title = {teanijarv/HLR: v0.2.3},\n month = mar,\n year = 2024,\n publisher = {Zenodo},\n version = {v0.2.3},\n doi = {10.5281/zenodo.7683808},\n url = {https://doi.org/10.5281/zenodo.7683808}\n}\n```\n\n## Development\nThe HLR package was created and is maintained by [Toomas Erik Anij\u00e4rv](https://www.toomaserikanijarv.com). It is updated during spare time, thereby contributions are more than welcome!\n\nThis program is provided with no warranty of any kind and it is still under development. However, this code has been checked and validated against multiple same analyses conducted in SPSS.\n\n#### To-do\nWould be great if someone with more experience with packages would contribute with testing and the whole deployment process. Also, if someone would want to write documentation, that would be amazing.\n- dict values within df hard to read\n- add t stats for coefficients\n- add regression type option (eg, for logistic regression)\n\n#### Contributors\n[Toomas Erik Anij\u00e4rv](https://github.com/teanijarv)\n[Rory Boyle](https://github.com/rorytboyle)\n[Jules Mitchell](https://github.com/JulesMitchell)\n[Cate Scanlon](https://github.com/catescanlon)\n\n=======\nHistory\n=======\n\n0.1.0 (2023-02-24)\n------------------\n\n* First release on PyPI.\n\n0.1.4 (2023-03-9)\n------------------\n\n* Fixed pairwise correlations threshold for multicollinearity assumption testing (0.3 -> 0.7)\n* Fixed partial regression plots fixed figure size\n* Added titles to diagnostic plots\n* Fixed the VIF to match with SPSS output by adding the constant to X\n\n0.1.5 (2023-04-6)\n------------------\n\n* Added standardised beta coefficients to model output\n* Added partial and semi-partial correlations (unique variance) to model output\n* Fixed F-change degrees of freedom calculation\n* Fixed F-change p-value calculation\n\n0.2.0 (2024-03-2)\n------------------\n\n* Overall project restructuring for optimisation\n\n0.2.1 (2024-03-3)\n------------------\n\n* Option to modify the OLS parameters used in the HLR\n\n\n0.2.2 (2024-03-3)\n------------------\n\n* Updated documentation\n\n0.2.3 (2024-03-7)\n------------------\n\n* Added that the plotting functions return matplotlib figure object\n",
"bugtrack_url": null,
"license": "GNU General Public License v3",
"summary": "HLR - Hierarchical Linear Regression for Python",
"version": "0.2.3",
"project_urls": {
"Homepage": "https://github.com/teanijarv/HLR"
},
"split_keywords": [
"hlr"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ded0a0d85678690f32894d64d5bdd1004c32311defe9b61ae6d8e83b6260bb98",
"md5": "d06e4da5850e6ace9af8774145616c8c",
"sha256": "400ababd3440c431d28d88832be856c1e0cf6b7cbf219a4a38a4c02e0a0959d2"
},
"downloads": -1,
"filename": "HLR-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d06e4da5850e6ace9af8774145616c8c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 12861,
"upload_time": "2024-03-07T08:22:25",
"upload_time_iso_8601": "2024-03-07T08:22:25.423733Z",
"url": "https://files.pythonhosted.org/packages/de/d0/a0d85678690f32894d64d5bdd1004c32311defe9b61ae6d8e83b6260bb98/HLR-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f9229cc9f7725a01024ab90733d3eb246dfd1d960a0cc214ba0fe00ae28558c0",
"md5": "a80717eeb8e98d6cd6e3c967636c5b73",
"sha256": "696750c4fe6a9c6674382b002f4c0d743ede0556bd966eb131529acb3fff90a6"
},
"downloads": -1,
"filename": "HLR-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "a80717eeb8e98d6cd6e3c967636c5b73",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 837913,
"upload_time": "2024-03-07T08:22:41",
"upload_time_iso_8601": "2024-03-07T08:22:41.273597Z",
"url": "https://files.pythonhosted.org/packages/f9/22/9cc9f7725a01024ab90733d3eb246dfd1d960a0cc214ba0fe00ae28558c0/HLR-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-07 08:22:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "teanijarv",
"github_project": "HLR",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "hlr"
}