pyrifreg


Namepyrifreg JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/vyasenov/pyrifreg
SummaryA Python package for Recentered Influence Function (RIF) regression
upload_time2025-07-10 23:22:40
maintainerNone
docs_urlNone
authorVasco Yasenov
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements numpy pandas scipy scikit-learn statsmodels
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyrifreg

A Python package for Recentered Influence Function (RIF) regression analysis. Provides tools for analyzing distributional effects in econometrics and data science applications. 

## Installation

You can install the package using pip:

```bash
pip install pyrifreg
```

## Features

- Implementation of Recentered Influence Function (RIF) regression
- Support for various distributional statistics (mean, quantiles, variance, gini, etc.)
- Easy-to-use API for regression analysis
- Integration with pandas and scikit-learn


## Quick Start

```python
import numpy as np
import pandas as pd
from pyrifreg import RIFRegression

# Create sample data
X = np.random.randn(1000, 2)
y = np.random.randn(1000)

# Initialize and fit RIF regression
median_rif = RIFRegression(statistic='quantile', q=0.5)
median_rif.fit(X, y)

# Get regression results
results = median_rif.summary()
print(results)
```

You can find more examples in [example.py](https://github.com/vyasenov/pyrifreg/blob/main/example.py).

## Examples

You can find detailed usage examples in the  `examples/` directory.

## Background

### From Conditional to Unconditional Effects

Many regression models focus on *conditional* statistics like:

$$
\mathbb{E}[Y \mid X = x]
$$

or conditional quantiles

$$
Q_\tau(Y \mid X = x).
$$

But policy questions often require understanding how a variable like education or income influences the *entire* distribution of an outcome, not just its mean or conditional parts. For example:

* How would expanding access to college change the 90th percentile of the wage distribution?
* What is the effect of a tax policy on income inequality or the Gini index?

Instead of looking at changes within subgroups (conditional on $X$), RIF regression helps us estimate how changes in covariates shift the *overall*, or *unconditional*, distribution of $Y$.

Let $F_Y$ be the original distribution of $Y$, and suppose an intervention shifts it to $G_Y$. For a statistic $\nu$ (like the mean, a quantile, or variance), we want to estimate:

$$
\Delta\nu = \nu(G_Y) - \nu(F_Y),
$$

i.e., how that statistic changes when the distribution shifts. RIF regression provides a way to estimate how different variables contribute to such shifts.

### Influence Functions (IF)

The influence function measures how sensitive a statistic is to a small change in the data. More precisely, it tells us how much an individual observation $y$ influences a statistic like the mean or a quantile.

Formally, imagine a slightly perturbed distribution:

$$
F_\varepsilon = (1 - \varepsilon) F + \varepsilon\, \delta_y,
$$

where $\delta_y$ is a point mass at $y$. Then the influence function is:

$$
\mathrm{IF}(y; T, F) = \lim_{\varepsilon \to 0} \frac{T(F_\varepsilon) - T(F)}{\varepsilon}.
$$

This gives us a first-order approximation of how $y$ affects the statistic $T$.

### Recentered Influence Functions (RIF)

Because the average of the influence function is always zero, we can’t use it directly in a regression. To fix this, we “recenter” it by adding the original statistic back:

$$
\mathrm{RIF}(y; T, F) = T(F) + \mathrm{IF}(y; T, F).
$$

Now, the expected value of the RIF is equal to the statistic itself:

$$
\mathbb{E}[\mathrm{RIF}(Y)] = T(F).
$$

This makes it a useful outcome variable for regression, allowing us to relate changes in the statistic $T$ to changes in covariates.

### RIF Regression

RIF regression works in two main steps:

1. Estimate the target statistic $T(F)$ (e.g. median or Gini) and compute the influence value for each observation.
2. Construct the RIF pseudo-outcome for each data point and regress it on $X$ using linear regression:

   $$
   r_i = x_i^\top \beta + \varepsilon_i.
   $$

The regression coefficients $\beta_j$ can then be interpreted as the marginal effect of each $X_j$ on the statistic of interest.

### Unconditional Quantile Regression (UQR)

UQR is a special case of RIF regression, where the statistic of interest is an unconditional quantile $Q_\tau(Y)$. For each observation $y_i$, we compute:

$$
r_i = Q_\tau(Y) + \frac{\tau - \mathbf{1}\{y_i \le Q_\tau\}}{f_Y(Q_\tau)},
$$

where $f_Y(Q_\tau)$ is the density at the $\tau$-th quantile. Regressing $r_i$ on $X$ tells us how each covariate shifts the $\tau$-th quantile of the overall outcome distribution.

This is in contrast to conditional quantile regression (Koenker & Bassett, 1978), which examines changes in $Q_\tau(Y \mid X)$—a different and often less intuitive object for understanding broad policy effects.

### Confidence Intervals

Since RIFs are estimated in a first step before regression, the usual OLS standard errors are biased. To correct this, inference proceeds in two stages:

1. Estimate the statistic $T$, the influence function, and any needed density estimates.
2. Run the regression and compute corrected standard errors using bootstrap.

The package includes support for bootstrap inference out of the box.

## References

* Firpo, S., Fortin, N. M., & Lemieux, T. (2009). *Unconditional Quantile Regressions*. Econometrica, 77(3), 953–973.
* Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.
* Rios-Avila, F. (2020). *Recentered influence functions (RIFs) in Stata: RIF regression and RIF decomposition*. The Stata Journal, 20(1), 51-94.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Citation

To cite this package in publications, please use the following BibTeX entry:

```bibtex
@misc{yasenov2025pyrifreg,
  author       = {Vasco Yasenov},
  title        = {pyrifreg: Python Tools for Recentered Influence Function (RIF) Regression},
  year         = {2025},
  howpublished = {\url{https://github.com/vyasenov/pyrifreg}},
  note         = {Version 0.1.0}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vyasenov/pyrifreg",
    "name": "pyrifreg",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Vasco Yasenov",
    "author_email": "yasenov@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/01/ec/d9af6234301273dedef6482e4dab74beee666066831f690f5b947d230f2a/pyrifreg-0.1.1.tar.gz",
    "platform": null,
    "description": "# pyrifreg\n\nA Python package for Recentered Influence Function (RIF) regression analysis. Provides tools for analyzing distributional effects in econometrics and data science applications. \n\n## Installation\n\nYou can install the package using pip:\n\n```bash\npip install pyrifreg\n```\n\n## Features\n\n- Implementation of Recentered Influence Function (RIF) regression\n- Support for various distributional statistics (mean, quantiles, variance, gini, etc.)\n- Easy-to-use API for regression analysis\n- Integration with pandas and scikit-learn\n\n\n## Quick Start\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom pyrifreg import RIFRegression\n\n# Create sample data\nX = np.random.randn(1000, 2)\ny = np.random.randn(1000)\n\n# Initialize and fit RIF regression\nmedian_rif = RIFRegression(statistic='quantile', q=0.5)\nmedian_rif.fit(X, y)\n\n# Get regression results\nresults = median_rif.summary()\nprint(results)\n```\n\nYou can find more examples in [example.py](https://github.com/vyasenov/pyrifreg/blob/main/example.py).\n\n## Examples\n\nYou can find detailed usage examples in the  `examples/` directory.\n\n## Background\n\n### From Conditional to Unconditional Effects\n\nMany regression models focus on *conditional* statistics like:\n\n$$\n\\mathbb{E}[Y \\mid X = x]\n$$\n\nor conditional quantiles\n\n$$\nQ_\\tau(Y \\mid X = x).\n$$\n\nBut policy questions often require understanding how a variable like education or income influences the *entire* distribution of an outcome, not just its mean or conditional parts. For example:\n\n* How would expanding access to college change the 90th percentile of the wage distribution?\n* What is the effect of a tax policy on income inequality or the Gini index?\n\nInstead of looking at changes within subgroups (conditional on $X$), RIF regression helps us estimate how changes in covariates shift the *overall*, or *unconditional*, distribution of $Y$.\n\nLet $F_Y$ be the original distribution of $Y$, and suppose an intervention shifts it to $G_Y$. For a statistic $\\nu$ (like the mean, a quantile, or variance), we want to estimate:\n\n$$\n\\Delta\\nu = \\nu(G_Y) - \\nu(F_Y),\n$$\n\ni.e., how that statistic changes when the distribution shifts. RIF regression provides a way to estimate how different variables contribute to such shifts.\n\n### Influence Functions (IF)\n\nThe influence function measures how sensitive a statistic is to a small change in the data. More precisely, it tells us how much an individual observation $y$ influences a statistic like the mean or a quantile.\n\nFormally, imagine a slightly perturbed distribution:\n\n$$\nF_\\varepsilon = (1 - \\varepsilon) F + \\varepsilon\\, \\delta_y,\n$$\n\nwhere $\\delta_y$ is a point mass at $y$. Then the influence function is:\n\n$$\n\\mathrm{IF}(y; T, F) = \\lim_{\\varepsilon \\to 0} \\frac{T(F_\\varepsilon) - T(F)}{\\varepsilon}.\n$$\n\nThis gives us a first-order approximation of how $y$ affects the statistic $T$.\n\n### Recentered Influence Functions (RIF)\n\nBecause the average of the influence function is always zero, we can\u2019t use it directly in a regression. To fix this, we \u201crecenter\u201d it by adding the original statistic back:\n\n$$\n\\mathrm{RIF}(y; T, F) = T(F) + \\mathrm{IF}(y; T, F).\n$$\n\nNow, the expected value of the RIF is equal to the statistic itself:\n\n$$\n\\mathbb{E}[\\mathrm{RIF}(Y)] = T(F).\n$$\n\nThis makes it a useful outcome variable for regression, allowing us to relate changes in the statistic $T$ to changes in covariates.\n\n### RIF Regression\n\nRIF regression works in two main steps:\n\n1. Estimate the target statistic $T(F)$ (e.g. median or Gini) and compute the influence value for each observation.\n2. Construct the RIF pseudo-outcome for each data point and regress it on $X$ using linear regression:\n\n   $$\n   r_i = x_i^\\top \\beta + \\varepsilon_i.\n   $$\n\nThe regression coefficients $\\beta_j$ can then be interpreted as the marginal effect of each $X_j$ on the statistic of interest.\n\n### Unconditional Quantile Regression (UQR)\n\nUQR is a special case of RIF regression, where the statistic of interest is an unconditional quantile $Q_\\tau(Y)$. For each observation $y_i$, we compute:\n\n$$\nr_i = Q_\\tau(Y) + \\frac{\\tau - \\mathbf{1}\\{y_i \\le Q_\\tau\\}}{f_Y(Q_\\tau)},\n$$\n\nwhere $f_Y(Q_\\tau)$ is the density at the $\\tau$-th quantile. Regressing $r_i$ on $X$ tells us how each covariate shifts the $\\tau$-th quantile of the overall outcome distribution.\n\nThis is in contrast to conditional quantile regression (Koenker & Bassett, 1978), which examines changes in $Q_\\tau(Y \\mid X)$\u2014a different and often less intuitive object for understanding broad policy effects.\n\n### Confidence Intervals\n\nSince RIFs are estimated in a first step before regression, the usual OLS standard errors are biased. To correct this, inference proceeds in two stages:\n\n1. Estimate the statistic $T$, the influence function, and any needed density estimates.\n2. Run the regression and compute corrected standard errors using bootstrap.\n\nThe package includes support for bootstrap inference out of the box.\n\n## References\n\n* Firpo, S., Fortin, N. M., & Lemieux, T. (2009). *Unconditional Quantile Regressions*. Econometrica, 77(3), 953\u2013973.\n* Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.\n* Rios-Avila, F. (2020). *Recentered influence functions (RIFs) in Stata: RIF regression and RIF decomposition*. The Stata Journal, 20(1), 51-94.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Citation\n\nTo cite this package in publications, please use the following BibTeX entry:\n\n```bibtex\n@misc{yasenov2025pyrifreg,\n  author       = {Vasco Yasenov},\n  title        = {pyrifreg: Python Tools for Recentered Influence Function (RIF) Regression},\n  year         = {2025},\n  howpublished = {\\url{https://github.com/vyasenov/pyrifreg}},\n  note         = {Version 0.1.0}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for Recentered Influence Function (RIF) regression",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/vyasenov/pyrifreg"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c41591ba4b960d8728be8f0587255631935632f3cbd46aed4e14fb83b346ec0c",
                "md5": "715ea2a1e51401bf7436c599f6febc2a",
                "sha256": "79e1bdf2709650a494443c9323eb6d7d0ef5f03a29cd04168485b9733d6269f8"
            },
            "downloads": -1,
            "filename": "pyrifreg-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "715ea2a1e51401bf7436c599f6febc2a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10132,
            "upload_time": "2025-07-10T23:22:39",
            "upload_time_iso_8601": "2025-07-10T23:22:39.645262Z",
            "url": "https://files.pythonhosted.org/packages/c4/15/91ba4b960d8728be8f0587255631935632f3cbd46aed4e14fb83b346ec0c/pyrifreg-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "01ecd9af6234301273dedef6482e4dab74beee666066831f690f5b947d230f2a",
                "md5": "9f8b3aaa2df719bc8cd3396ef52c2cbe",
                "sha256": "7b602aa283764ac4c411bc95971a08cbaba7a389a8dddd8590b846f3f84475e6"
            },
            "downloads": -1,
            "filename": "pyrifreg-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9f8b3aaa2df719bc8cd3396ef52c2cbe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 11628,
            "upload_time": "2025-07-10T23:22:40",
            "upload_time_iso_8601": "2025-07-10T23:22:40.739997Z",
            "url": "https://files.pythonhosted.org/packages/01/ec/d9af6234301273dedef6482e4dab74beee666066831f690f5b947d230f2a/pyrifreg-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 23:22:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vyasenov",
    "github_project": "pyrifreg",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.20.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "statsmodels",
            "specs": [
                [
                    ">=",
                    "0.13.0"
                ]
            ]
        }
    ],
    "lcname": "pyrifreg"
}
        
Elapsed time: 1.74602s