pyreghdfe


Namepyreghdfe JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryPython implementation of Stata's reghdfe for high-dimensional fixed effects regression
upload_time2025-08-01 17:40:17
maintainerPyRegHDFE Contributors
docs_urlNone
authorPyRegHDFE Contributors
requires_python>=3.9
licenseNone
keywords econometrics fixed-effects regression hdfe panel-data
VCS
bugtrack_url
requirements numpy scipy pandas pyhdfe tabulate
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyRegHDFE

[![Python Version](https://img.shields.io/pypi/pyversions/pyreghdfe)](https://pypi.org/project/pyreghdfe/)
[![PyPI Version](https://img.shields.io/pypi/v/pyreghdfe)](https://pypi.org/project/pyreghdfe/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://img.shields.io/pypi/dm/pyreghdfe)](https://pypi.org/project/pyreghdfe/)

> **Python implementation of Stata's `reghdfe` for high-dimensional fixed effects regression**

PyRegHDFE is a fast and efficient Python package that replicates the functionality of Stata's popular `reghdfe` command. It provides high-dimensional fixed effects estimation, cluster-robust standard errors, and seamless integration with pandas DataFrames.

## 🚀 Quick Installation

```bash
pip install pyreghdfe
```

## 📖 Quick Start

```python
import pandas as pd
import numpy as np
from pyreghdfe import reghdfe

# Create sample data
np.random.seed(42)
n = 1000
data = pd.DataFrame({
    'wage': np.random.normal(10, 2, n),
    'experience': np.random.normal(5, 2, n),
    'education': np.random.normal(12, 3, n),
    'firm_id': np.random.choice(range(100), n),
    'year': np.random.choice(range(2010, 2020), n)
})

# Run regression with firm fixed effects
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=['firm_id']
)

# Display results
print(result.summary())
```

## 📋 Key Features

- ✅ **High-dimensional fixed effects** - Efficiently absorb multiple fixed effect dimensions
- ✅ **Cluster-robust standard errors** - Support for one-way and multi-way clustering  
- ✅ **Weighted regression** - Handle sampling weights and frequency weights
- ✅ **Singleton dropping** - Automatically handle singleton groups
- ✅ **Fast computation** - Optimized algorithms for large datasets
- ✅ **Stata compatibility** - Results match Stata's `reghdfe` command
- ✅ **Pandas integration** - Seamless DataFrame compatibility
- ✅ **Flexible output** - Rich statistical results and summary tables

## 🔧 Usage Examples

### 1. Multiple Fixed Effects

```python
# Regression with firm and year fixed effects
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=['firm_id', 'year']  # Multiple dimensions
)
print(result.summary())
```

### 2. Cluster-Robust Standard Errors

```python
# One-way clustering
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=['firm_id'],
    cluster=['firm_id']  # Cluster by firm
)

# Two-way clustering
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=['firm_id'],
    cluster=['firm_id', 'year']  # Cluster by firm and year
)
```

### 3. Weighted Regression

```python
# Add weights to your data
data['weight'] = np.random.uniform(0.5, 2.0, len(data))

# Run weighted regression
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=['firm_id'],
    weights='weight'
)
```

### 4. OLS Regression (No Fixed Effects)

```python
# Simple OLS regression
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=None  # No fixed effects
)
```

## 📊 Working with Results

### Accessing Coefficients and Statistics

```python
result = reghdfe(data=data, y='wage', x=['experience', 'education'], fe=['firm_id'])

# Get coefficients
coefficients = result.coef
print("Coefficients:", coefficients)

# Get standard errors
std_errors = result.se
print("Standard Errors:", std_errors)

# Get t-statistics and p-values
t_stats = result.tstat
p_values = result.pvalue
print("T-statistics:", t_stats)
print("P-values:", p_values)

# Get confidence intervals
conf_int = result.conf_int()
print("95% Confidence Intervals:", conf_int)

# Get R-squared
print(f"R-squared: {result.rsquared:.4f}")
print(f"Adjusted R-squared: {result.rsquared_adj:.4f}")
```

### Summary Statistics

```python
# Full regression summary
print(result.summary())

# Detailed summary with additional statistics
print(result.summary(show_dof=True))
```

## ⚙️ Advanced Configuration

### Custom Absorption Options

```python
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=['firm_id'],
    absorb_tolerance=1e-10,  # Higher precision for absorption
    drop_singletons=True,    # Drop singleton groups
    absorb_method='lsmr'     # Alternative solver
)
```

### Different Covariance Types

```python
# Robust standard errors (default)
result = reghdfe(
    data=data, 
    y='wage', 
    x=['experience'], 
    fe=['firm_id'], 
    cov_type='robust'
)

# Clustered standard errors
result = reghdfe(
    data=data, 
    y='wage', 
    x=['experience'], 
    fe=['firm_id'], 
    cov_type='cluster', 
    cluster=['firm_id']
)
```

## 🔄 Comparison with Stata

This package aims to replicate Stata's `reghdfe` command. Here's how the syntax translates:

**Stata:**
```stata
reghdfe wage experience education, absorb(firm_id year) cluster(firm_id)
```

**Python (PyRegHDFE):**
```python
result = reghdfe(
    data=data,
    y='wage',
    x=['experience', 'education'],
    fe=['firm_id', 'year'],
    cluster=['firm_id']
)
```

## 🌐 Integration Options

This package is **actively maintained** as a standalone library. For users who prefer a unified ecosystem with additional econometric and statistical tools, `reghdfe` functionality is also available through:

- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - Comprehensive Stats + Econometrics + ML + AI + LLMs toolkit

## 🔗 Related Projects

- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - StatsPAI = Stats + Econometrics + ML + AI + LLMs  
- **[PyStataR](https://github.com/brycewang-stanford/PyStataR)** - Unified Stata-equivalent commands and R functions in Python

## 📚 API Reference

### Main Function: `reghdfe()`

```python
reghdfe(data, y, x, fe=None, cluster=None, weights=None, 
        cov_type='robust', absorb_tolerance=1e-8, 
        drop_singletons=True, absorb_method='lsmr')
```

**Parameters:**
- `data` (DataFrame): Input data
- `y` (str): Dependent variable name
- `x` (list): List of independent variable names
- `fe` (list, optional): List of fixed effect variable names
- `cluster` (list, optional): List of clustering variable names
- `weights` (str, optional): Weight variable name
- `cov_type` (str): Covariance type ('robust', 'cluster')
- `absorb_tolerance` (float): Tolerance for fixed effect absorption
- `drop_singletons` (bool): Whether to drop singleton groups
- `absorb_method` (str): Absorption method ('lsmr', 'lsqr')

**Returns:**
- `RegressionResults`: Object containing regression results

### Results Object

The `RegressionResults` object provides:
- `.coef`: Coefficients
- `.se`: Standard errors
- `.tstat`: T-statistics
- `.pvalue`: P-values
- `.rsquared`: R-squared
- `.rsquared_adj`: Adjusted R-squared
- `.conf_int()`: Confidence intervals
- `.summary()`: Formatted summary table

## 🛠️ Requirements

- Python ≥ 3.9
- NumPy ≥ 1.20.0
- SciPy ≥ 1.7.0
- Pandas ≥ 1.3.0
- PyHDFE ≥ 0.1.0
- Tabulate ≥ 0.8.0

## 🤝 Contributing

We welcome contributions! Please feel free to:

- **Report bugs** or request features via [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)
- **Submit pull requests** for improvements
- **Share your use cases** and examples
- **Improve documentation** and add examples

### Development Setup

```bash
git clone https://github.com/brycewang-stanford/pyreghdfe.git
cd pyreghdfe
pip install -e ".[dev]"
pytest tests/
```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙋‍♂️ Support

- **Documentation**: [GitHub Repository](https://github.com/brycewang-stanford/pyreghdfe)
- **Issues**: [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)
- **Discussions**: [GitHub Discussions](https://github.com/brycewang-stanford/pyreghdfe/discussions)

---

⭐ **This package is actively maintained.** If you find it useful, please consider giving it a star on GitHub!

**Questions, bug reports, or feature requests?** Please open an issue on [GitHub](https://github.com/brycewang-stanford/pyreghdfe/issues).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pyreghdfe",
    "maintainer": "PyRegHDFE Contributors",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "econometrics, fixed-effects, regression, hdfe, panel-data",
    "author": "PyRegHDFE Contributors",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/61/71/e70cc1bb6fd7b79b0c8f710d01ef89cce0514d7b58818dcdc657b639130a/pyreghdfe-0.2.1.tar.gz",
    "platform": null,
    "description": "# PyRegHDFE\n\n[![Python Version](https://img.shields.io/pypi/pyversions/pyreghdfe)](https://pypi.org/project/pyreghdfe/)\n[![PyPI Version](https://img.shields.io/pypi/v/pyreghdfe)](https://pypi.org/project/pyreghdfe/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://img.shields.io/pypi/dm/pyreghdfe)](https://pypi.org/project/pyreghdfe/)\n\n> **Python implementation of Stata's `reghdfe` for high-dimensional fixed effects regression**\n\nPyRegHDFE is a fast and efficient Python package that replicates the functionality of Stata's popular `reghdfe` command. It provides high-dimensional fixed effects estimation, cluster-robust standard errors, and seamless integration with pandas DataFrames.\n\n## \ud83d\ude80 Quick Installation\n\n```bash\npip install pyreghdfe\n```\n\n## \ud83d\udcd6 Quick Start\n\n```python\nimport pandas as pd\nimport numpy as np\nfrom pyreghdfe import reghdfe\n\n# Create sample data\nnp.random.seed(42)\nn = 1000\ndata = pd.DataFrame({\n    'wage': np.random.normal(10, 2, n),\n    'experience': np.random.normal(5, 2, n),\n    'education': np.random.normal(12, 3, n),\n    'firm_id': np.random.choice(range(100), n),\n    'year': np.random.choice(range(2010, 2020), n)\n})\n\n# Run regression with firm fixed effects\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=['firm_id']\n)\n\n# Display results\nprint(result.summary())\n```\n\n## \ud83d\udccb Key Features\n\n- \u2705 **High-dimensional fixed effects** - Efficiently absorb multiple fixed effect dimensions\n- \u2705 **Cluster-robust standard errors** - Support for one-way and multi-way clustering  \n- \u2705 **Weighted regression** - Handle sampling weights and frequency weights\n- \u2705 **Singleton dropping** - Automatically handle singleton groups\n- \u2705 **Fast computation** - Optimized algorithms for large datasets\n- \u2705 **Stata compatibility** - Results match Stata's `reghdfe` command\n- \u2705 **Pandas integration** - Seamless DataFrame compatibility\n- \u2705 **Flexible output** - Rich statistical results and summary tables\n\n## \ud83d\udd27 Usage Examples\n\n### 1. Multiple Fixed Effects\n\n```python\n# Regression with firm and year fixed effects\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=['firm_id', 'year']  # Multiple dimensions\n)\nprint(result.summary())\n```\n\n### 2. Cluster-Robust Standard Errors\n\n```python\n# One-way clustering\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=['firm_id'],\n    cluster=['firm_id']  # Cluster by firm\n)\n\n# Two-way clustering\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=['firm_id'],\n    cluster=['firm_id', 'year']  # Cluster by firm and year\n)\n```\n\n### 3. Weighted Regression\n\n```python\n# Add weights to your data\ndata['weight'] = np.random.uniform(0.5, 2.0, len(data))\n\n# Run weighted regression\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=['firm_id'],\n    weights='weight'\n)\n```\n\n### 4. OLS Regression (No Fixed Effects)\n\n```python\n# Simple OLS regression\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=None  # No fixed effects\n)\n```\n\n## \ud83d\udcca Working with Results\n\n### Accessing Coefficients and Statistics\n\n```python\nresult = reghdfe(data=data, y='wage', x=['experience', 'education'], fe=['firm_id'])\n\n# Get coefficients\ncoefficients = result.coef\nprint(\"Coefficients:\", coefficients)\n\n# Get standard errors\nstd_errors = result.se\nprint(\"Standard Errors:\", std_errors)\n\n# Get t-statistics and p-values\nt_stats = result.tstat\np_values = result.pvalue\nprint(\"T-statistics:\", t_stats)\nprint(\"P-values:\", p_values)\n\n# Get confidence intervals\nconf_int = result.conf_int()\nprint(\"95% Confidence Intervals:\", conf_int)\n\n# Get R-squared\nprint(f\"R-squared: {result.rsquared:.4f}\")\nprint(f\"Adjusted R-squared: {result.rsquared_adj:.4f}\")\n```\n\n### Summary Statistics\n\n```python\n# Full regression summary\nprint(result.summary())\n\n# Detailed summary with additional statistics\nprint(result.summary(show_dof=True))\n```\n\n## \u2699\ufe0f Advanced Configuration\n\n### Custom Absorption Options\n\n```python\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=['firm_id'],\n    absorb_tolerance=1e-10,  # Higher precision for absorption\n    drop_singletons=True,    # Drop singleton groups\n    absorb_method='lsmr'     # Alternative solver\n)\n```\n\n### Different Covariance Types\n\n```python\n# Robust standard errors (default)\nresult = reghdfe(\n    data=data, \n    y='wage', \n    x=['experience'], \n    fe=['firm_id'], \n    cov_type='robust'\n)\n\n# Clustered standard errors\nresult = reghdfe(\n    data=data, \n    y='wage', \n    x=['experience'], \n    fe=['firm_id'], \n    cov_type='cluster', \n    cluster=['firm_id']\n)\n```\n\n## \ud83d\udd04 Comparison with Stata\n\nThis package aims to replicate Stata's `reghdfe` command. Here's how the syntax translates:\n\n**Stata:**\n```stata\nreghdfe wage experience education, absorb(firm_id year) cluster(firm_id)\n```\n\n**Python (PyRegHDFE):**\n```python\nresult = reghdfe(\n    data=data,\n    y='wage',\n    x=['experience', 'education'],\n    fe=['firm_id', 'year'],\n    cluster=['firm_id']\n)\n```\n\n## \ud83c\udf10 Integration Options\n\nThis package is **actively maintained** as a standalone library. For users who prefer a unified ecosystem with additional econometric and statistical tools, `reghdfe` functionality is also available through:\n\n- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - Comprehensive Stats + Econometrics + ML + AI + LLMs toolkit\n\n## \ud83d\udd17 Related Projects\n\n- **[StatsPAI](https://github.com/brycewang-stanford/StatsPAI/)** - StatsPAI = Stats + Econometrics + ML + AI + LLMs  \n- **[PyStataR](https://github.com/brycewang-stanford/PyStataR)** - Unified Stata-equivalent commands and R functions in Python\n\n## \ud83d\udcda API Reference\n\n### Main Function: `reghdfe()`\n\n```python\nreghdfe(data, y, x, fe=None, cluster=None, weights=None, \n        cov_type='robust', absorb_tolerance=1e-8, \n        drop_singletons=True, absorb_method='lsmr')\n```\n\n**Parameters:**\n- `data` (DataFrame): Input data\n- `y` (str): Dependent variable name\n- `x` (list): List of independent variable names\n- `fe` (list, optional): List of fixed effect variable names\n- `cluster` (list, optional): List of clustering variable names\n- `weights` (str, optional): Weight variable name\n- `cov_type` (str): Covariance type ('robust', 'cluster')\n- `absorb_tolerance` (float): Tolerance for fixed effect absorption\n- `drop_singletons` (bool): Whether to drop singleton groups\n- `absorb_method` (str): Absorption method ('lsmr', 'lsqr')\n\n**Returns:**\n- `RegressionResults`: Object containing regression results\n\n### Results Object\n\nThe `RegressionResults` object provides:\n- `.coef`: Coefficients\n- `.se`: Standard errors\n- `.tstat`: T-statistics\n- `.pvalue`: P-values\n- `.rsquared`: R-squared\n- `.rsquared_adj`: Adjusted R-squared\n- `.conf_int()`: Confidence intervals\n- `.summary()`: Formatted summary table\n\n## \ud83d\udee0\ufe0f Requirements\n\n- Python \u2265 3.9\n- NumPy \u2265 1.20.0\n- SciPy \u2265 1.7.0\n- Pandas \u2265 1.3.0\n- PyHDFE \u2265 0.1.0\n- Tabulate \u2265 0.8.0\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please feel free to:\n\n- **Report bugs** or request features via [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)\n- **Submit pull requests** for improvements\n- **Share your use cases** and examples\n- **Improve documentation** and add examples\n\n### Development Setup\n\n```bash\ngit clone https://github.com/brycewang-stanford/pyreghdfe.git\ncd pyreghdfe\npip install -e \".[dev]\"\npytest tests/\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4b\u200d\u2642\ufe0f Support\n\n- **Documentation**: [GitHub Repository](https://github.com/brycewang-stanford/pyreghdfe)\n- **Issues**: [GitHub Issues](https://github.com/brycewang-stanford/pyreghdfe/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/brycewang-stanford/pyreghdfe/discussions)\n\n---\n\n\u2b50 **This package is actively maintained.** If you find it useful, please consider giving it a star on GitHub!\n\n**Questions, bug reports, or feature requests?** Please open an issue on [GitHub](https://github.com/brycewang-stanford/pyreghdfe/issues).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python implementation of Stata's reghdfe for high-dimensional fixed effects regression",
    "version": "0.2.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/brycewang-stanford/pyreghdfe/issues",
        "Documentation": "https://github.com/brycewang-stanford/pyreghdfe#documentation",
        "Homepage": "https://github.com/brycewang-stanford/pyreghdfe",
        "Repository": "https://github.com/brycewang-stanford/pyreghdfe.git"
    },
    "split_keywords": [
        "econometrics",
        " fixed-effects",
        " regression",
        " hdfe",
        " panel-data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "acb2563402f681b0d99a1ee156c012d88724c74642656ba6314c367fef8fdecd",
                "md5": "14772443d621b5b6151504c10f9f909f",
                "sha256": "87a1c5d4ff8b312313eb01579fa657f2b484874a17296660be11a9b0d91a0df0"
            },
            "downloads": -1,
            "filename": "pyreghdfe-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "14772443d621b5b6151504c10f9f909f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 20385,
            "upload_time": "2025-08-01T17:40:16",
            "upload_time_iso_8601": "2025-08-01T17:40:16.452648Z",
            "url": "https://files.pythonhosted.org/packages/ac/b2/563402f681b0d99a1ee156c012d88724c74642656ba6314c367fef8fdecd/pyreghdfe-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6171e70cc1bb6fd7b79b0c8f710d01ef89cce0514d7b58818dcdc657b639130a",
                "md5": "63aad6835fd90582a27df7858bf0efd1",
                "sha256": "cc0c5345e4212dceaa0dfccd08d5ecce903e5e646218d2d9069ae343f59ebf32"
            },
            "downloads": -1,
            "filename": "pyreghdfe-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "63aad6835fd90582a27df7858bf0efd1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 24171,
            "upload_time": "2025-08-01T17:40:17",
            "upload_time_iso_8601": "2025-08-01T17:40:17.791976Z",
            "url": "https://files.pythonhosted.org/packages/61/71/e70cc1bb6fd7b79b0c8f710d01ef89cce0514d7b58818dcdc657b639130a/pyreghdfe-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-01 17:40:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brycewang-stanford",
    "github_project": "pyreghdfe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.20.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "pyhdfe",
            "specs": [
                [
                    ">=",
                    "0.1.0"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    ">=",
                    "0.8.0"
                ]
            ]
        }
    ],
    "lcname": "pyreghdfe"
}
        
Elapsed time: 0.78618s