# PyRegHDFE
[](https://pypi.org/project/pyreghdfe/)
[](https://pypi.org/project/pyreghdfe/)
[](https://opensource.org/licenses/MIT)
[](https://pypi.org/project/pyreghdfe/)
> **Python implementation of Stata's `reghdfe` for high-dimensional fixed effects regression**
PyRegHDFE is a fast and efficient Python package that replicates the functionality of Stata's popular `reghdfe` command. It provides high-dimensional fixed effects estimation, cluster-robust standard errors, and seamless integration with pandas DataFrames.
## 🚀 Quick Installation
```bash
pip install pyreghdfe
```
## 📖 Quick Start
```python
import pandas as pd
import numpy as np
from pyreghdfe import reghdfe
# Create sample data
np.random.seed(42)
n = 1000
data = pd.DataFrame({
'wage': np.random.normal(10, 2, n),
'experience': np.random.normal(5, 2, n),
'education': np.random.normal(12, 3, n),
'firm_id': np.random.choice(range(100), n),
'year': np.random.choice(range(2010, 2020), n)
})
# Run regression with firm fixed effects
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=['firm_id']
)
# Display results
print(result.summary())
```
## 📋 Key Features
- ✅ **High-dimensional fixed effects** - Efficiently absorb multiple fixed effect dimensions
- ✅ **Cluster-robust standard errors** - Support for one-way and multi-way clustering
- ✅ **Weighted regression** - Handle sampling weights and frequency weights
- ✅ **Singleton dropping** - Automatically handle singleton groups
- ✅ **Fast computation** - Optimized algorithms for large datasets
- ✅ **Stata compatibility** - Results match Stata's `reghdfe` command
- ✅ **Pandas integration** - Seamless DataFrame compatibility
- ✅ **Flexible output** - Rich statistical results and summary tables
## 🔧 Usage Examples
### 1. Multiple Fixed Effects
```python
# Regression with firm and year fixed effects
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=['firm_id', 'year'] # Multiple dimensions
)
print(result.summary())
```
### 2. Cluster-Robust Standard Errors
```python
# One-way clustering
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=['firm_id'],
cluster=['firm_id'] # Cluster by firm
)
# Two-way clustering
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=['firm_id'],
cluster=['firm_id', 'year'] # Cluster by firm and year
)
```
### 3. Weighted Regression
```python
# Add weights to your data
data['weight'] = np.random.uniform(0.5, 2.0, len(data))
# Run weighted regression
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=['firm_id'],
weights='weight'
)
```
### 4. OLS Regression (No Fixed Effects)
```python
# Simple OLS regression
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=None # No fixed effects
)
```
## 📊 Working with Results
### Accessing Coefficients and Statistics
```python
result = reghdfe(data=data, y='wage', x=['experience', 'education'], fe=['firm_id'])
# Get coefficients
coefficients = result.coef
print("Coefficients:", coefficients)
# Get standard errors
std_errors = result.se
print("Standard Errors:", std_errors)
# Get t-statistics and p-values
t_stats = result.tstat
p_values = result.pvalue
print("T-statistics:", t_stats)
print("P-values:", p_values)
# Get confidence intervals
conf_int = result.conf_int()
print("95% Confidence Intervals:", conf_int)
# Get R-squared
print(f"R-squared: {result.rsquared:.4f}")
print(f"Adjusted R-squared: {result.rsquared_adj:.4f}")
```
### Summary Statistics
```python
# Full regression summary
print(result.summary())
# Detailed summary with additional statistics
print(result.summary(show_dof=True))
```
## ⚙️ Advanced Configuration
### Custom Absorption Options
```python
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=['firm_id'],
absorb_tolerance=1e-10, # Higher precision for absorption
drop_singletons=True, # Drop singleton groups
absorb_method='lsmr' # Alternative solver
)
```
### Different Covariance Types
```python
# Robust standard errors (default)
result = reghdfe(
data=data,
y='wage',
x=['experience'],
fe=['firm_id'],
cov_type='robust'
)
# Clustered standard errors
result = reghdfe(
data=data,
y='wage',
x=['experience'],
fe=['firm_id'],
cov_type='cluster',
cluster=['firm_id']
)
```
## 🔄 Comparison with Stata
This package aims to replicate Stata's `reghdfe` command. Here's how the syntax translates:
**Stata:**
```stata
reghdfe wage experience education, absorb(firm_id year) cluster(firm_id)
```
**Python (PyRegHDFE):**
```python
result = reghdfe(
data=data,
y='wage',
x=['experience', 'education'],
fe=['firm_id', 'year'],
cluster=['firm_id']
)
```
## 🌐 Integration Options
This package is **actively maintained** as a standalone library. For users who prefer a unified ecosystem with additional econometric and statistical tools, `reghdfe` functionality is also available through:
- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - Comprehensive Stats + Econometrics + ML + AI + LLMs toolkit
## 🔗 Related Projects
- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - StatsPAI = Stats + Econometrics + ML + AI + LLMs
- **[PyStataR](https://github.com/brycewang-stanford/PyStataR)** - Unified Stata-equivalent commands and R functions in Python
## 📚 API Reference
### Main Function: `reghdfe()`
```python
reghdfe(data, y, x, fe=None, cluster=None, weights=None,
cov_type='robust', absorb_tolerance=1e-8,
drop_singletons=True, absorb_method='lsmr')
```
**Parameters:**
- `data` (DataFrame): Input data
- `y` (str): Dependent variable name
- `x` (list): List of independent variable names
- `fe` (list, optional): List of fixed effect variable names
- `cluster` (list, optional): List of clustering variable names
- `weights` (str, optional): Weight variable name
- `cov_type` (str): Covariance type ('robust', 'cluster')
- `absorb_tolerance` (float): Tolerance for fixed effect absorption
- `drop_singletons` (bool): Whether to drop singleton groups
- `absorb_method` (str): Absorption method ('lsmr', 'lsqr')
**Returns:**
- `RegressionResults`: Object containing regression results
### Results Object
The `RegressionResults` object provides:
- `.coef`: Coefficients
- `.se`: Standard errors
- `.tstat`: T-statistics
- `.pvalue`: P-values
- `.rsquared`: R-squared
- `.rsquared_adj`: Adjusted R-squared
- `.conf_int()`: Confidence intervals
- `.summary()`: Formatted summary table
## 🛠️ Requirements
- Python ≥ 3.9
- NumPy ≥ 1.20.0
- SciPy ≥ 1.7.0
- Pandas ≥ 1.3.0
- PyHDFE ≥ 0.1.0
- Tabulate ≥ 0.8.0
## 🤝 Contributing
We welcome contributions! Please feel free to:
- **Report bugs** or request features via [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)
- **Submit pull requests** for improvements
- **Share your use cases** and examples
- **Improve documentation** and add examples
### Development Setup
```bash
git clone https://github.com/brycewang-stanford/pyreghdfe.git
cd pyreghdfe
pip install -e ".[dev]"
pytest tests/
```
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙋♂️ Support
- **Documentation**: [GitHub Repository](https://github.com/brycewang-stanford/pyreghdfe)
- **Issues**: [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)
- **Discussions**: [GitHub Discussions](https://github.com/brycewang-stanford/pyreghdfe/discussions)
---
⭐ **This package is actively maintained.** If you find it useful, please consider giving it a star on GitHub!
**Questions, bug reports, or feature requests?** Please open an issue on [GitHub](https://github.com/brycewang-stanford/pyreghdfe/issues).
Raw data
{
"_id": null,
"home_page": null,
"name": "pyreghdfe",
"maintainer": "PyRegHDFE Contributors",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "econometrics, fixed-effects, regression, hdfe, panel-data",
"author": "PyRegHDFE Contributors",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/61/71/e70cc1bb6fd7b79b0c8f710d01ef89cce0514d7b58818dcdc657b639130a/pyreghdfe-0.2.1.tar.gz",
"platform": null,
"description": "# PyRegHDFE\n\n[](https://pypi.org/project/pyreghdfe/)\n[](https://pypi.org/project/pyreghdfe/)\n[](https://opensource.org/licenses/MIT)\n[](https://pypi.org/project/pyreghdfe/)\n\n> **Python implementation of Stata's `reghdfe` for high-dimensional fixed effects regression**\n\nPyRegHDFE is a fast and efficient Python package that replicates the functionality of Stata's popular `reghdfe` command. It provides high-dimensional fixed effects estimation, cluster-robust standard errors, and seamless integration with pandas DataFrames.\n\n## \ud83d\ude80 Quick Installation\n\n```bash\npip install pyreghdfe\n```\n\n## \ud83d\udcd6 Quick Start\n\n```python\nimport pandas as pd\nimport numpy as np\nfrom pyreghdfe import reghdfe\n\n# Create sample data\nnp.random.seed(42)\nn = 1000\ndata = pd.DataFrame({\n 'wage': np.random.normal(10, 2, n),\n 'experience': np.random.normal(5, 2, n),\n 'education': np.random.normal(12, 3, n),\n 'firm_id': np.random.choice(range(100), n),\n 'year': np.random.choice(range(2010, 2020), n)\n})\n\n# Run regression with firm fixed effects\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=['firm_id']\n)\n\n# Display results\nprint(result.summary())\n```\n\n## \ud83d\udccb Key Features\n\n- \u2705 **High-dimensional fixed effects** - Efficiently absorb multiple fixed effect dimensions\n- \u2705 **Cluster-robust standard errors** - Support for one-way and multi-way clustering \n- \u2705 **Weighted regression** - Handle sampling weights and frequency weights\n- \u2705 **Singleton dropping** - Automatically handle singleton groups\n- \u2705 **Fast computation** - Optimized algorithms for large datasets\n- \u2705 **Stata compatibility** - Results match Stata's `reghdfe` command\n- \u2705 **Pandas integration** - Seamless DataFrame compatibility\n- \u2705 **Flexible output** - Rich statistical results and summary tables\n\n## \ud83d\udd27 Usage Examples\n\n### 1. Multiple Fixed Effects\n\n```python\n# Regression with firm and year fixed effects\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=['firm_id', 'year'] # Multiple dimensions\n)\nprint(result.summary())\n```\n\n### 2. Cluster-Robust Standard Errors\n\n```python\n# One-way clustering\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=['firm_id'],\n cluster=['firm_id'] # Cluster by firm\n)\n\n# Two-way clustering\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=['firm_id'],\n cluster=['firm_id', 'year'] # Cluster by firm and year\n)\n```\n\n### 3. Weighted Regression\n\n```python\n# Add weights to your data\ndata['weight'] = np.random.uniform(0.5, 2.0, len(data))\n\n# Run weighted regression\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=['firm_id'],\n weights='weight'\n)\n```\n\n### 4. OLS Regression (No Fixed Effects)\n\n```python\n# Simple OLS regression\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=None # No fixed effects\n)\n```\n\n## \ud83d\udcca Working with Results\n\n### Accessing Coefficients and Statistics\n\n```python\nresult = reghdfe(data=data, y='wage', x=['experience', 'education'], fe=['firm_id'])\n\n# Get coefficients\ncoefficients = result.coef\nprint(\"Coefficients:\", coefficients)\n\n# Get standard errors\nstd_errors = result.se\nprint(\"Standard Errors:\", std_errors)\n\n# Get t-statistics and p-values\nt_stats = result.tstat\np_values = result.pvalue\nprint(\"T-statistics:\", t_stats)\nprint(\"P-values:\", p_values)\n\n# Get confidence intervals\nconf_int = result.conf_int()\nprint(\"95% Confidence Intervals:\", conf_int)\n\n# Get R-squared\nprint(f\"R-squared: {result.rsquared:.4f}\")\nprint(f\"Adjusted R-squared: {result.rsquared_adj:.4f}\")\n```\n\n### Summary Statistics\n\n```python\n# Full regression summary\nprint(result.summary())\n\n# Detailed summary with additional statistics\nprint(result.summary(show_dof=True))\n```\n\n## \u2699\ufe0f Advanced Configuration\n\n### Custom Absorption Options\n\n```python\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=['firm_id'],\n absorb_tolerance=1e-10, # Higher precision for absorption\n drop_singletons=True, # Drop singleton groups\n absorb_method='lsmr' # Alternative solver\n)\n```\n\n### Different Covariance Types\n\n```python\n# Robust standard errors (default)\nresult = reghdfe(\n data=data, \n y='wage', \n x=['experience'], \n fe=['firm_id'], \n cov_type='robust'\n)\n\n# Clustered standard errors\nresult = reghdfe(\n data=data, \n y='wage', \n x=['experience'], \n fe=['firm_id'], \n cov_type='cluster', \n cluster=['firm_id']\n)\n```\n\n## \ud83d\udd04 Comparison with Stata\n\nThis package aims to replicate Stata's `reghdfe` command. Here's how the syntax translates:\n\n**Stata:**\n```stata\nreghdfe wage experience education, absorb(firm_id year) cluster(firm_id)\n```\n\n**Python (PyRegHDFE):**\n```python\nresult = reghdfe(\n data=data,\n y='wage',\n x=['experience', 'education'],\n fe=['firm_id', 'year'],\n cluster=['firm_id']\n)\n```\n\n## \ud83c\udf10 Integration Options\n\nThis package is **actively maintained** as a standalone library. For users who prefer a unified ecosystem with additional econometric and statistical tools, `reghdfe` functionality is also available through:\n\n- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - Comprehensive Stats + Econometrics + ML + AI + LLMs toolkit\n\n## \ud83d\udd17 Related Projects\n\n- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - StatsPAI = Stats + Econometrics + ML + AI + LLMs \n- **[PyStataR](https://github.com/brycewang-stanford/PyStataR)** - Unified Stata-equivalent commands and R functions in Python\n\n## \ud83d\udcda API Reference\n\n### Main Function: `reghdfe()`\n\n```python\nreghdfe(data, y, x, fe=None, cluster=None, weights=None, \n cov_type='robust', absorb_tolerance=1e-8, \n drop_singletons=True, absorb_method='lsmr')\n```\n\n**Parameters:**\n- `data` (DataFrame): Input data\n- `y` (str): Dependent variable name\n- `x` (list): List of independent variable names\n- `fe` (list, optional): List of fixed effect variable names\n- `cluster` (list, optional): List of clustering variable names\n- `weights` (str, optional): Weight variable name\n- `cov_type` (str): Covariance type ('robust', 'cluster')\n- `absorb_tolerance` (float): Tolerance for fixed effect absorption\n- `drop_singletons` (bool): Whether to drop singleton groups\n- `absorb_method` (str): Absorption method ('lsmr', 'lsqr')\n\n**Returns:**\n- `RegressionResults`: Object containing regression results\n\n### Results Object\n\nThe `RegressionResults` object provides:\n- `.coef`: Coefficients\n- `.se`: Standard errors\n- `.tstat`: T-statistics\n- `.pvalue`: P-values\n- `.rsquared`: R-squared\n- `.rsquared_adj`: Adjusted R-squared\n- `.conf_int()`: Confidence intervals\n- `.summary()`: Formatted summary table\n\n## \ud83d\udee0\ufe0f Requirements\n\n- Python \u2265 3.9\n- NumPy \u2265 1.20.0\n- SciPy \u2265 1.7.0\n- Pandas \u2265 1.3.0\n- PyHDFE \u2265 0.1.0\n- Tabulate \u2265 0.8.0\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please feel free to:\n\n- **Report bugs** or request features via [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)\n- **Submit pull requests** for improvements\n- **Share your use cases** and examples\n- **Improve documentation** and add examples\n\n### Development Setup\n\n```bash\ngit clone https://github.com/brycewang-stanford/pyreghdfe.git\ncd pyreghdfe\npip install -e \".[dev]\"\npytest tests/\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4b\u200d\u2642\ufe0f Support\n\n- **Documentation**: [GitHub Repository](https://github.com/brycewang-stanford/pyreghdfe)\n- **Issues**: [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/brycewang-stanford/pyreghdfe/discussions)\n\n---\n\n\u2b50 **This package is actively maintained.** If you find it useful, please consider giving it a star on GitHub!\n\n**Questions, bug reports, or feature requests?** Please open an issue on [GitHub](https://github.com/brycewang-stanford/pyreghdfe/issues).\n",
"bugtrack_url": null,
"license": null,
"summary": "Python implementation of Stata's reghdfe for high-dimensional fixed effects regression",
"version": "0.2.1",
"project_urls": {
"Bug Tracker": "https://github.com/brycewang-stanford/pyreghdfe/issues",
"Documentation": "https://github.com/brycewang-stanford/pyreghdfe#documentation",
"Homepage": "https://github.com/brycewang-stanford/pyreghdfe",
"Repository": "https://github.com/brycewang-stanford/pyreghdfe.git"
},
"split_keywords": [
"econometrics",
" fixed-effects",
" regression",
" hdfe",
" panel-data"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "acb2563402f681b0d99a1ee156c012d88724c74642656ba6314c367fef8fdecd",
"md5": "14772443d621b5b6151504c10f9f909f",
"sha256": "87a1c5d4ff8b312313eb01579fa657f2b484874a17296660be11a9b0d91a0df0"
},
"downloads": -1,
"filename": "pyreghdfe-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "14772443d621b5b6151504c10f9f909f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 20385,
"upload_time": "2025-08-01T17:40:16",
"upload_time_iso_8601": "2025-08-01T17:40:16.452648Z",
"url": "https://files.pythonhosted.org/packages/ac/b2/563402f681b0d99a1ee156c012d88724c74642656ba6314c367fef8fdecd/pyreghdfe-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6171e70cc1bb6fd7b79b0c8f710d01ef89cce0514d7b58818dcdc657b639130a",
"md5": "63aad6835fd90582a27df7858bf0efd1",
"sha256": "cc0c5345e4212dceaa0dfccd08d5ecce903e5e646218d2d9069ae343f59ebf32"
},
"downloads": -1,
"filename": "pyreghdfe-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "63aad6835fd90582a27df7858bf0efd1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 24171,
"upload_time": "2025-08-01T17:40:17",
"upload_time_iso_8601": "2025-08-01T17:40:17.791976Z",
"url": "https://files.pythonhosted.org/packages/61/71/e70cc1bb6fd7b79b0c8f710d01ef89cce0514d7b58818dcdc657b639130a/pyreghdfe-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-01 17:40:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "brycewang-stanford",
"github_project": "pyreghdfe",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.20.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.7.0"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "pyhdfe",
"specs": [
[
">=",
"0.1.0"
]
]
},
{
"name": "tabulate",
"specs": [
[
">=",
"0.8.0"
]
]
}
],
"lcname": "pyreghdfe"
}