# pyrifreg
A Python package for Recentered Influence Function (RIF) regression analysis. Provides tools for analyzing distributional effects in econometrics and data science applications.
## Installation
You can install the package using pip:
```bash
pip install pyrifreg
```
## Features
- Implementation of Recentered Influence Function (RIF) regression
- Support for various distributional statistics (mean, quantiles, variance, gini, etc.)
- Easy-to-use API for regression analysis
- Integration with pandas and scikit-learn
## Quick Start
```python
import numpy as np
import pandas as pd
from pyrifreg import RIFRegression
# Create sample data
X = np.random.randn(1000, 2)
y = np.random.randn(1000)
# Initialize and fit RIF regression
median_rif = RIFRegression(statistic='quantile', q=0.5)
median_rif.fit(X, y)
# Get regression results
results = median_rif.summary()
print(results)
```
You can find more examples in [example.py](https://github.com/vyasenov/pyrifreg/blob/main/example.py).
## Examples
You can find detailed usage examples in the `examples/` directory.
## Background
### From Conditional to Unconditional Effects
Many regression models focus on *conditional* statistics like:
$$
\mathbb{E}[Y \mid X = x]
$$
or conditional quantiles
$$
Q_\tau(Y \mid X = x).
$$
But policy questions often require understanding how a variable like education or income influences the *entire* distribution of an outcome, not just its mean or conditional parts. For example:
* How would expanding access to college change the 90th percentile of the wage distribution?
* What is the effect of a tax policy on income inequality or the Gini index?
Instead of looking at changes within subgroups (conditional on $X$), RIF regression helps us estimate how changes in covariates shift the *overall*, or *unconditional*, distribution of $Y$.
Let $F_Y$ be the original distribution of $Y$, and suppose an intervention shifts it to $G_Y$. For a statistic $\nu$ (like the mean, a quantile, or variance), we want to estimate:
$$
\Delta\nu = \nu(G_Y) - \nu(F_Y),
$$
i.e., how that statistic changes when the distribution shifts. RIF regression provides a way to estimate how different variables contribute to such shifts.
### Influence Functions (IF)
The influence function measures how sensitive a statistic is to a small change in the data. More precisely, it tells us how much an individual observation $y$ influences a statistic like the mean or a quantile.
Formally, imagine a slightly perturbed distribution:
$$
F_\varepsilon = (1 - \varepsilon) F + \varepsilon\, \delta_y,
$$
where $\delta_y$ is a point mass at $y$. Then the influence function is:
$$
\mathrm{IF}(y; T, F) = \lim_{\varepsilon \to 0} \frac{T(F_\varepsilon) - T(F)}{\varepsilon}.
$$
This gives us a first-order approximation of how $y$ affects the statistic $T$.
### Recentered Influence Functions (RIF)
Because the average of the influence function is always zero, we can’t use it directly in a regression. To fix this, we “recenter” it by adding the original statistic back:
$$
\mathrm{RIF}(y; T, F) = T(F) + \mathrm{IF}(y; T, F).
$$
Now, the expected value of the RIF is equal to the statistic itself:
$$
\mathbb{E}[\mathrm{RIF}(Y)] = T(F).
$$
This makes it a useful outcome variable for regression, allowing us to relate changes in the statistic $T$ to changes in covariates.
### RIF Regression
RIF regression works in two main steps:
1. Estimate the target statistic $T(F)$ (e.g. median or Gini) and compute the influence value for each observation.
2. Construct the RIF pseudo-outcome for each data point and regress it on $X$ using linear regression:
$$
r_i = x_i^\top \beta + \varepsilon_i.
$$
The regression coefficients $\beta_j$ can then be interpreted as the marginal effect of each $X_j$ on the statistic of interest.
### Unconditional Quantile Regression (UQR)
UQR is a special case of RIF regression, where the statistic of interest is an unconditional quantile $Q_\tau(Y)$. For each observation $y_i$, we compute:
$$
r_i = Q_\tau(Y) + \frac{\tau - \mathbf{1}\{y_i \le Q_\tau\}}{f_Y(Q_\tau)},
$$
where $f_Y(Q_\tau)$ is the density at the $\tau$-th quantile. Regressing $r_i$ on $X$ tells us how each covariate shifts the $\tau$-th quantile of the overall outcome distribution.
This is in contrast to conditional quantile regression (Koenker & Bassett, 1978), which examines changes in $Q_\tau(Y \mid X)$—a different and often less intuitive object for understanding broad policy effects.
### Confidence Intervals
Since RIFs are estimated in a first step before regression, the usual OLS standard errors are biased. To correct this, inference proceeds in two stages:
1. Estimate the statistic $T$, the influence function, and any needed density estimates.
2. Run the regression and compute corrected standard errors using bootstrap.
The package includes support for bootstrap inference out of the box.
## References
* Firpo, S., Fortin, N. M., & Lemieux, T. (2009). *Unconditional Quantile Regressions*. Econometrica, 77(3), 953–973.
* Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.
* Rios-Avila, F. (2020). *Recentered influence functions (RIFs) in Stata: RIF regression and RIF decomposition*. The Stata Journal, 20(1), 51-94.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Citation
To cite this package in publications, please use the following BibTeX entry:
```bibtex
@misc{yasenov2025pyrifreg,
author = {Vasco Yasenov},
title = {pyrifreg: Python Tools for Recentered Influence Function (RIF) Regression},
year = {2025},
howpublished = {\url{https://github.com/vyasenov/pyrifreg}},
note = {Version 0.1.0}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/vyasenov/pyrifreg",
"name": "pyrifreg",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Vasco Yasenov",
"author_email": "yasenov@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/01/ec/d9af6234301273dedef6482e4dab74beee666066831f690f5b947d230f2a/pyrifreg-0.1.1.tar.gz",
"platform": null,
"description": "# pyrifreg\n\nA Python package for Recentered Influence Function (RIF) regression analysis. Provides tools for analyzing distributional effects in econometrics and data science applications. \n\n## Installation\n\nYou can install the package using pip:\n\n```bash\npip install pyrifreg\n```\n\n## Features\n\n- Implementation of Recentered Influence Function (RIF) regression\n- Support for various distributional statistics (mean, quantiles, variance, gini, etc.)\n- Easy-to-use API for regression analysis\n- Integration with pandas and scikit-learn\n\n\n## Quick Start\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom pyrifreg import RIFRegression\n\n# Create sample data\nX = np.random.randn(1000, 2)\ny = np.random.randn(1000)\n\n# Initialize and fit RIF regression\nmedian_rif = RIFRegression(statistic='quantile', q=0.5)\nmedian_rif.fit(X, y)\n\n# Get regression results\nresults = median_rif.summary()\nprint(results)\n```\n\nYou can find more examples in [example.py](https://github.com/vyasenov/pyrifreg/blob/main/example.py).\n\n## Examples\n\nYou can find detailed usage examples in the `examples/` directory.\n\n## Background\n\n### From Conditional to Unconditional Effects\n\nMany regression models focus on *conditional* statistics like:\n\n$$\n\\mathbb{E}[Y \\mid X = x]\n$$\n\nor conditional quantiles\n\n$$\nQ_\\tau(Y \\mid X = x).\n$$\n\nBut policy questions often require understanding how a variable like education or income influences the *entire* distribution of an outcome, not just its mean or conditional parts. For example:\n\n* How would expanding access to college change the 90th percentile of the wage distribution?\n* What is the effect of a tax policy on income inequality or the Gini index?\n\nInstead of looking at changes within subgroups (conditional on $X$), RIF regression helps us estimate how changes in covariates shift the *overall*, or *unconditional*, distribution of $Y$.\n\nLet $F_Y$ be the original distribution of $Y$, and suppose an intervention shifts it to $G_Y$. For a statistic $\\nu$ (like the mean, a quantile, or variance), we want to estimate:\n\n$$\n\\Delta\\nu = \\nu(G_Y) - \\nu(F_Y),\n$$\n\ni.e., how that statistic changes when the distribution shifts. RIF regression provides a way to estimate how different variables contribute to such shifts.\n\n### Influence Functions (IF)\n\nThe influence function measures how sensitive a statistic is to a small change in the data. More precisely, it tells us how much an individual observation $y$ influences a statistic like the mean or a quantile.\n\nFormally, imagine a slightly perturbed distribution:\n\n$$\nF_\\varepsilon = (1 - \\varepsilon) F + \\varepsilon\\, \\delta_y,\n$$\n\nwhere $\\delta_y$ is a point mass at $y$. Then the influence function is:\n\n$$\n\\mathrm{IF}(y; T, F) = \\lim_{\\varepsilon \\to 0} \\frac{T(F_\\varepsilon) - T(F)}{\\varepsilon}.\n$$\n\nThis gives us a first-order approximation of how $y$ affects the statistic $T$.\n\n### Recentered Influence Functions (RIF)\n\nBecause the average of the influence function is always zero, we can\u2019t use it directly in a regression. To fix this, we \u201crecenter\u201d it by adding the original statistic back:\n\n$$\n\\mathrm{RIF}(y; T, F) = T(F) + \\mathrm{IF}(y; T, F).\n$$\n\nNow, the expected value of the RIF is equal to the statistic itself:\n\n$$\n\\mathbb{E}[\\mathrm{RIF}(Y)] = T(F).\n$$\n\nThis makes it a useful outcome variable for regression, allowing us to relate changes in the statistic $T$ to changes in covariates.\n\n### RIF Regression\n\nRIF regression works in two main steps:\n\n1. Estimate the target statistic $T(F)$ (e.g. median or Gini) and compute the influence value for each observation.\n2. Construct the RIF pseudo-outcome for each data point and regress it on $X$ using linear regression:\n\n $$\n r_i = x_i^\\top \\beta + \\varepsilon_i.\n $$\n\nThe regression coefficients $\\beta_j$ can then be interpreted as the marginal effect of each $X_j$ on the statistic of interest.\n\n### Unconditional Quantile Regression (UQR)\n\nUQR is a special case of RIF regression, where the statistic of interest is an unconditional quantile $Q_\\tau(Y)$. For each observation $y_i$, we compute:\n\n$$\nr_i = Q_\\tau(Y) + \\frac{\\tau - \\mathbf{1}\\{y_i \\le Q_\\tau\\}}{f_Y(Q_\\tau)},\n$$\n\nwhere $f_Y(Q_\\tau)$ is the density at the $\\tau$-th quantile. Regressing $r_i$ on $X$ tells us how each covariate shifts the $\\tau$-th quantile of the overall outcome distribution.\n\nThis is in contrast to conditional quantile regression (Koenker & Bassett, 1978), which examines changes in $Q_\\tau(Y \\mid X)$\u2014a different and often less intuitive object for understanding broad policy effects.\n\n### Confidence Intervals\n\nSince RIFs are estimated in a first step before regression, the usual OLS standard errors are biased. To correct this, inference proceeds in two stages:\n\n1. Estimate the statistic $T$, the influence function, and any needed density estimates.\n2. Run the regression and compute corrected standard errors using bootstrap.\n\nThe package includes support for bootstrap inference out of the box.\n\n## References\n\n* Firpo, S., Fortin, N. M., & Lemieux, T. (2009). *Unconditional Quantile Regressions*. Econometrica, 77(3), 953\u2013973.\n* Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.\n* Rios-Avila, F. (2020). *Recentered influence functions (RIFs) in Stata: RIF regression and RIF decomposition*. The Stata Journal, 20(1), 51-94.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Citation\n\nTo cite this package in publications, please use the following BibTeX entry:\n\n```bibtex\n@misc{yasenov2025pyrifreg,\n author = {Vasco Yasenov},\n title = {pyrifreg: Python Tools for Recentered Influence Function (RIF) Regression},\n year = {2025},\n howpublished = {\\url{https://github.com/vyasenov/pyrifreg}},\n note = {Version 0.1.0}\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python package for Recentered Influence Function (RIF) regression",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/vyasenov/pyrifreg"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c41591ba4b960d8728be8f0587255631935632f3cbd46aed4e14fb83b346ec0c",
"md5": "715ea2a1e51401bf7436c599f6febc2a",
"sha256": "79e1bdf2709650a494443c9323eb6d7d0ef5f03a29cd04168485b9733d6269f8"
},
"downloads": -1,
"filename": "pyrifreg-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "715ea2a1e51401bf7436c599f6febc2a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10132,
"upload_time": "2025-07-10T23:22:39",
"upload_time_iso_8601": "2025-07-10T23:22:39.645262Z",
"url": "https://files.pythonhosted.org/packages/c4/15/91ba4b960d8728be8f0587255631935632f3cbd46aed4e14fb83b346ec0c/pyrifreg-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "01ecd9af6234301273dedef6482e4dab74beee666066831f690f5b947d230f2a",
"md5": "9f8b3aaa2df719bc8cd3396ef52c2cbe",
"sha256": "7b602aa283764ac4c411bc95971a08cbaba7a389a8dddd8590b846f3f84475e6"
},
"downloads": -1,
"filename": "pyrifreg-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "9f8b3aaa2df719bc8cd3396ef52c2cbe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 11628,
"upload_time": "2025-07-10T23:22:40",
"upload_time_iso_8601": "2025-07-10T23:22:40.739997Z",
"url": "https://files.pythonhosted.org/packages/01/ec/d9af6234301273dedef6482e4dab74beee666066831f690f5b947d230f2a/pyrifreg-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 23:22:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "vyasenov",
"github_project": "pyrifreg",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.20.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "statsmodels",
"specs": [
[
">=",
"0.13.0"
]
]
}
],
"lcname": "pyrifreg"
}