# StatsPAI
[](https://badge.fury.io/py/StatsPAI)
[](https://pypi.org/project/StatsPAI/)
[](https://github.com/brycewang-stanford/pyEconometrics/blob/main/LICENSE)
[](https://github.com/brycewang-stanford/pyEconometrics/actions)
[](https://codecov.io/gh/brycewang-stanford/pyEconometrics)
**The AI-powered Statistics & Econometrics Toolkit for Python**
StatsPAI bridges the gap between user-friendly syntax and powerful econometric analysis, making advanced techniques accessible to researchers and practitioners.
## 🚀 Features
### Core Econometric Methods
- **Linear Regression**: OLS, WLS with robust standard errors
- **Instrumental Variables**: 2SLS estimation
- **Panel Data**: Fixed Effects, Random Effects models
- **Causal Inference**: Causal Forest implementation (inspired by EconML)
### User Experience
- **Formula Interface**: Intuitive R/Stata-style syntax `"y ~ x1 + x2"`
- **Excel Export**: Professional output tables via `outreg2` (Stata-inspired)
- **Flexible API**: Both formula and matrix interfaces supported
- **Rich Output**: Detailed summary statistics and diagnostic tests
### Technical Excellence
- **Robust Implementation**: Based on proven econometric theory
- **Performance Optimized**: Efficient algorithms for large datasets
- **Well Tested**: Comprehensive test suite ensuring reliability
- **Type Hints**: Full type annotation for better development experience
## 📦 Installation
```bash
# Latest stable version
pip install StatsPAI
# Development version
pip install git+https://github.com/brycewang-stanford/pyEconometrics.git
```
### Requirements
- Python 3.8+
- NumPy, SciPy, Pandas
- scikit-learn (for Causal Forest)
- openpyxl (for Excel export)
## 🏁 Quick Start
### Basic Regression Analysis
```python
import pandas as pd
from statspai import reg, outreg2
# Load your data
df = pd.read_csv('data.csv')
# Run OLS regression
result1 = reg('wage ~ education + experience', data=df)
print(result1.summary())
# Add control variables
result2 = reg('wage ~ education + experience + age + gender', data=df)
# Export results to Excel
outreg2([result1, result2], 'regression_results.xlsx',
title='Wage Regression Analysis')
```
### Instrumental Variables
```python
# 2SLS estimation
iv_result = reg('wage ~ education | mother_education + father_education',
data=df, method='2sls')
print(iv_result.summary())
```
### Panel Data Analysis
```python
# Fixed effects model
fe_result = reg('y ~ x1 + x2', data=df,
entity_col='firm_id', time_col='year',
method='fixed_effects')
```
### Causal Forest for Heterogeneous Treatment Effects
```python
from statspai import CausalForest
# Initialize Causal Forest
cf = CausalForest(n_estimators=100, random_state=42)
# Fit model: outcome ~ treatment | features | controls
cf.fit('income ~ job_training | age + education + experience | region + year',
data=df)
# Estimate individual treatment effects
individual_effects = cf.effect(df)
# Get confidence intervals
effects_ci = cf.effect_interval(df, alpha=0.05)
# Export results
cf_summary = cf.summary()
outreg2([cf_summary], 'causal_forest_results.xlsx')
```
## 📊 Advanced Usage
### Robust Standard Errors
```python
# Heteroskedasticity-robust standard errors
result = reg('y ~ x1 + x2', data=df, robust=True)
# Clustered standard errors
result = reg('y ~ x1 + x2', data=df, cluster='firm_id')
```
### Model Comparison
```python
from statspai import compare_models
models = [
reg('y ~ x1', data=df),
reg('y ~ x1 + x2', data=df),
reg('y ~ x1 + x2 + x3', data=df)
]
comparison = compare_models(models)
print(comparison.summary())
```
### Custom Output Formatting
```python
outreg2(results, 'output.xlsx',
title='Regression Results',
add_stats={'Observations': lambda r: r.nobs,
'R-squared': lambda r: r.rsquared},
decimal_places=4,
star_levels=[0.01, 0.05, 0.1])
```
## 📚 Documentation
- **[User Guide](docs/user_guide.md)**: Comprehensive tutorials and examples
- **[API Reference](docs/api_reference.md)**: Detailed function documentation
- **[Theory Guide](docs/theory_guide.md)**: Mathematical foundations
- **[Examples](examples/)**: Jupyter notebooks with real-world applications
## 🤝 Contributing
We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Clone repository
git clone https://github.com/brycewang-stanford/pyEconometrics.git
cd pyEconometrics
# Install in development mode
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
```
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- Inspired by Stata's `outreg2` command for output formatting
- Causal Forest implementation based on Wager & Athey (2018)
- Built on the shoulders of NumPy, SciPy, and scikit-learn
## 📞 Contact
- **Author**: Bryce Wang
- **Email**: brycewang2018@gmail.com
- **GitHub**: [brycewang-stanford](https://github.com/brycewang-stanford)
## 📈 Citation
If you use StatsPAI in your research, please cite:
```bibtex
@software{wang2024statspai,
title={StatsPAI: The AI-powered Statistics & Econometrics Toolkit for Python},
author={Wang, Bryce},
year={2024},
url={https://github.com/brycewang-stanford/pyEconometrics},
version={0.1.0}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "StatsPAI",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Bryce Wang <bryce.wang@example.com>",
"keywords": "econometrics, statistics, regression, causal-inference, causal-forest, panel-data, instrumental-variables, stata, R, machine-learning",
"author": null,
"author_email": "Bryce Wang <bryce.wang@example.com>",
"download_url": "https://files.pythonhosted.org/packages/cf/a3/e6af2eaebbdc4f59c847528cdb98eb4fc0a824e97dfc8560174122082f9d/statspai-0.1.0.tar.gz",
"platform": null,
"description": "# StatsPAI\n\n[](https://badge.fury.io/py/StatsPAI)\n[](https://pypi.org/project/StatsPAI/)\n[](https://github.com/brycewang-stanford/pyEconometrics/blob/main/LICENSE)\n[](https://github.com/brycewang-stanford/pyEconometrics/actions)\n[](https://codecov.io/gh/brycewang-stanford/pyEconometrics)\n\n**The AI-powered Statistics & Econometrics Toolkit for Python**\n\nStatsPAI bridges the gap between user-friendly syntax and powerful econometric analysis, making advanced techniques accessible to researchers and practitioners.\n\n## \ud83d\ude80 Features\n\n### Core Econometric Methods\n- **Linear Regression**: OLS, WLS with robust standard errors\n- **Instrumental Variables**: 2SLS estimation \n- **Panel Data**: Fixed Effects, Random Effects models\n- **Causal Inference**: Causal Forest implementation (inspired by EconML)\n\n### User Experience\n- **Formula Interface**: Intuitive R/Stata-style syntax `\"y ~ x1 + x2\"`\n- **Excel Export**: Professional output tables via `outreg2` (Stata-inspired)\n- **Flexible API**: Both formula and matrix interfaces supported\n- **Rich Output**: Detailed summary statistics and diagnostic tests\n\n### Technical Excellence\n- **Robust Implementation**: Based on proven econometric theory\n- **Performance Optimized**: Efficient algorithms for large datasets\n- **Well Tested**: Comprehensive test suite ensuring reliability\n- **Type Hints**: Full type annotation for better development experience\n\n## \ud83d\udce6 Installation\n\n```bash\n# Latest stable version\npip install StatsPAI\n\n# Development version\npip install git+https://github.com/brycewang-stanford/pyEconometrics.git\n```\n\n### Requirements\n- Python 3.8+\n- NumPy, SciPy, Pandas\n- scikit-learn (for Causal Forest)\n- openpyxl (for Excel export)\n\n## \ud83c\udfc1 Quick Start\n\n### Basic Regression Analysis\n```python\nimport pandas as pd\nfrom statspai import reg, outreg2\n\n# Load your data\ndf = pd.read_csv('data.csv')\n\n# Run OLS regression\nresult1 = reg('wage ~ education + experience', data=df)\nprint(result1.summary())\n\n# Add control variables\nresult2 = reg('wage ~ education + experience + age + gender', data=df)\n\n# Export results to Excel\noutreg2([result1, result2], 'regression_results.xlsx', \n title='Wage Regression Analysis')\n```\n\n### Instrumental Variables\n```python\n# 2SLS estimation\niv_result = reg('wage ~ education | mother_education + father_education', \n data=df, method='2sls')\nprint(iv_result.summary())\n```\n\n### Panel Data Analysis\n```python\n# Fixed effects model\nfe_result = reg('y ~ x1 + x2', data=df, \n entity_col='firm_id', time_col='year', \n method='fixed_effects')\n```\n\n### Causal Forest for Heterogeneous Treatment Effects\n```python\nfrom statspai import CausalForest\n\n# Initialize Causal Forest\ncf = CausalForest(n_estimators=100, random_state=42)\n\n# Fit model: outcome ~ treatment | features | controls\ncf.fit('income ~ job_training | age + education + experience | region + year', \n data=df)\n\n# Estimate individual treatment effects\nindividual_effects = cf.effect(df)\n\n# Get confidence intervals\neffects_ci = cf.effect_interval(df, alpha=0.05)\n\n# Export results\ncf_summary = cf.summary()\noutreg2([cf_summary], 'causal_forest_results.xlsx')\n```\n\n## \ud83d\udcca Advanced Usage\n\n### Robust Standard Errors\n```python\n# Heteroskedasticity-robust standard errors\nresult = reg('y ~ x1 + x2', data=df, robust=True)\n\n# Clustered standard errors\nresult = reg('y ~ x1 + x2', data=df, cluster='firm_id')\n```\n\n### Model Comparison\n```python\nfrom statspai import compare_models\n\nmodels = [\n reg('y ~ x1', data=df),\n reg('y ~ x1 + x2', data=df),\n reg('y ~ x1 + x2 + x3', data=df)\n]\n\ncomparison = compare_models(models)\nprint(comparison.summary())\n```\n\n### Custom Output Formatting\n```python\noutreg2(results, 'output.xlsx',\n title='Regression Results',\n add_stats={'Observations': lambda r: r.nobs,\n 'R-squared': lambda r: r.rsquared},\n decimal_places=4,\n star_levels=[0.01, 0.05, 0.1])\n```\n\n## \ud83d\udcda Documentation\n\n- **[User Guide](docs/user_guide.md)**: Comprehensive tutorials and examples\n- **[API Reference](docs/api_reference.md)**: Detailed function documentation \n- **[Theory Guide](docs/theory_guide.md)**: Mathematical foundations\n- **[Examples](examples/)**: Jupyter notebooks with real-world applications\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n```bash\n# Clone repository\ngit clone https://github.com/brycewang-stanford/pyEconometrics.git\ncd pyEconometrics\n\n# Install in development mode\npip install -e \".[dev]\"\n\n# Install pre-commit hooks\npre-commit install\n\n# Run tests\npytest\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Inspired by Stata's `outreg2` command for output formatting\n- Causal Forest implementation based on Wager & Athey (2018)\n- Built on the shoulders of NumPy, SciPy, and scikit-learn\n\n## \ud83d\udcde Contact\n\n- **Author**: Bryce Wang\n- **Email**: brycewang2018@gmail.com\n- **GitHub**: [brycewang-stanford](https://github.com/brycewang-stanford)\n\n## \ud83d\udcc8 Citation\n\nIf you use StatsPAI in your research, please cite:\n\n```bibtex\n@software{wang2024statspai,\n title={StatsPAI: The AI-powered Statistics & Econometrics Toolkit for Python},\n author={Wang, Bryce},\n year={2024},\n url={https://github.com/brycewang-stanford/pyEconometrics},\n version={0.1.0}\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "The AI-powered Statistics & Econometrics Toolkit for Python",
"version": "0.1.0",
"project_urls": {
"Bug Reports": "https://github.com/brycewang-stanford/pyEconometrics/issues",
"Documentation": "https://statspai.readthedocs.io/",
"Homepage": "https://github.com/brycewang-stanford/pyEconometrics",
"Repository": "https://github.com/brycewang-stanford/pyEconometrics"
},
"split_keywords": [
"econometrics",
" statistics",
" regression",
" causal-inference",
" causal-forest",
" panel-data",
" instrumental-variables",
" stata",
" r",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "761fa63f687c2a6a054f2911bfbf1e25a53c436935573cb7ba5a38113c113204",
"md5": "eae536dacd3c2dccab20dffd1094e189",
"sha256": "24705b3b9ce3c55ec959b62a58379e56acb3cb3186de75fcb91fe0da6097dcc0"
},
"downloads": -1,
"filename": "statspai-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eae536dacd3c2dccab20dffd1094e189",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 27122,
"upload_time": "2025-07-26T09:49:49",
"upload_time_iso_8601": "2025-07-26T09:49:49.112274Z",
"url": "https://files.pythonhosted.org/packages/76/1f/a63f687c2a6a054f2911bfbf1e25a53c436935573cb7ba5a38113c113204/statspai-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cfa3e6af2eaebbdc4f59c847528cdb98eb4fc0a824e97dfc8560174122082f9d",
"md5": "97abed1619d4417e88dfce89eeac43d7",
"sha256": "7406cf25ee636296ab16e2ec28100647c4456113e29f84a6f792d09a7145acca"
},
"downloads": -1,
"filename": "statspai-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "97abed1619d4417e88dfce89eeac43d7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 27888,
"upload_time": "2025-07-26T09:49:50",
"upload_time_iso_8601": "2025-07-26T09:49:50.541785Z",
"url": "https://files.pythonhosted.org/packages/cf/a3/e6af2eaebbdc4f59c847528cdb98eb4fc0a824e97dfc8560174122082f9d/statspai-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-26 09:49:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "brycewang-stanford",
"github_project": "pyEconometrics",
"github_not_found": true,
"lcname": "statspai"
}