StatsPAI

Name	StatsPAI JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	The AI-powered Statistics & Econometrics Toolkit for Python
upload_time	2025-07-26 09:49:50
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	econometrics statistics regression causal-inference causal-forest panel-data instrumental-variables stata r machine-learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # StatsPAI

[![PyPI version](https://badge.fury.io/py/StatsPAI.svg)](https://badge.fury.io/py/StatsPAI)
[![Python versions](https://img.shields.io/pypi/pyversions/StatsPAI.svg)](https://pypi.org/project/StatsPAI/)
[![License](https://img.shields.io/github/license/brycewang-stanford/pyEconometrics.svg)](https://github.com/brycewang-stanford/pyEconometrics/blob/main/LICENSE)
[![Build Status](https://github.com/brycewang-stanford/pyEconometrics/workflows/CI%2FCD%20Pipeline/badge.svg)](https://github.com/brycewang-stanford/pyEconometrics/actions)
[![codecov](https://codecov.io/gh/brycewang-stanford/pyEconometrics/branch/main/graph/badge.svg)](https://codecov.io/gh/brycewang-stanford/pyEconometrics)

**The AI-powered Statistics & Econometrics Toolkit for Python**

StatsPAI bridges the gap between user-friendly syntax and powerful econometric analysis, making advanced techniques accessible to researchers and practitioners.

## 🚀 Features

### Core Econometric Methods
- **Linear Regression**: OLS, WLS with robust standard errors
- **Instrumental Variables**: 2SLS estimation 
- **Panel Data**: Fixed Effects, Random Effects models
- **Causal Inference**: Causal Forest implementation (inspired by EconML)

### User Experience
- **Formula Interface**: Intuitive R/Stata-style syntax `"y ~ x1 + x2"`
- **Excel Export**: Professional output tables via `outreg2` (Stata-inspired)
- **Flexible API**: Both formula and matrix interfaces supported
- **Rich Output**: Detailed summary statistics and diagnostic tests

### Technical Excellence
- **Robust Implementation**: Based on proven econometric theory
- **Performance Optimized**: Efficient algorithms for large datasets
- **Well Tested**: Comprehensive test suite ensuring reliability
- **Type Hints**: Full type annotation for better development experience

## 📦 Installation

```bash
# Latest stable version
pip install StatsPAI

# Development version
pip install git+https://github.com/brycewang-stanford/pyEconometrics.git
```

### Requirements
- Python 3.8+
- NumPy, SciPy, Pandas
- scikit-learn (for Causal Forest)
- openpyxl (for Excel export)

## 🏁 Quick Start

### Basic Regression Analysis
```python
import pandas as pd
from statspai import reg, outreg2

# Load your data
df = pd.read_csv('data.csv')

# Run OLS regression
result1 = reg('wage ~ education + experience', data=df)
print(result1.summary())

# Add control variables
result2 = reg('wage ~ education + experience + age + gender', data=df)

# Export results to Excel
outreg2([result1, result2], 'regression_results.xlsx', 
        title='Wage Regression Analysis')
```

### Instrumental Variables
```python
# 2SLS estimation
iv_result = reg('wage ~ education | mother_education + father_education', 
                data=df, method='2sls')
print(iv_result.summary())
```

### Panel Data Analysis
```python
# Fixed effects model
fe_result = reg('y ~ x1 + x2', data=df, 
                entity_col='firm_id', time_col='year', 
                method='fixed_effects')
```

### Causal Forest for Heterogeneous Treatment Effects
```python
from statspai import CausalForest

# Initialize Causal Forest
cf = CausalForest(n_estimators=100, random_state=42)

# Fit model: outcome ~ treatment | features | controls
cf.fit('income ~ job_training | age + education + experience | region + year', 
       data=df)

# Estimate individual treatment effects
individual_effects = cf.effect(df)

# Get confidence intervals
effects_ci = cf.effect_interval(df, alpha=0.05)

# Export results
cf_summary = cf.summary()
outreg2([cf_summary], 'causal_forest_results.xlsx')
```

## 📊 Advanced Usage

### Robust Standard Errors
```python
# Heteroskedasticity-robust standard errors
result = reg('y ~ x1 + x2', data=df, robust=True)

# Clustered standard errors
result = reg('y ~ x1 + x2', data=df, cluster='firm_id')
```

### Model Comparison
```python
from statspai import compare_models

models = [
    reg('y ~ x1', data=df),
    reg('y ~ x1 + x2', data=df),
    reg('y ~ x1 + x2 + x3', data=df)
]

comparison = compare_models(models)
print(comparison.summary())
```

### Custom Output Formatting
```python
outreg2(results, 'output.xlsx',
        title='Regression Results',
        add_stats={'Observations': lambda r: r.nobs,
                  'R-squared': lambda r: r.rsquared},
        decimal_places=4,
        star_levels=[0.01, 0.05, 0.1])
```

## 📚 Documentation

- **[User Guide](docs/user_guide.md)**: Comprehensive tutorials and examples
- **[API Reference](docs/api_reference.md)**: Detailed function documentation  
- **[Theory Guide](docs/theory_guide.md)**: Mathematical foundations
- **[Examples](examples/)**: Jupyter notebooks with real-world applications

## 🤝 Contributing

We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Setup
```bash
# Clone repository
git clone https://github.com/brycewang-stanford/pyEconometrics.git
cd pyEconometrics

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests
pytest
```

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Inspired by Stata's `outreg2` command for output formatting
- Causal Forest implementation based on Wager & Athey (2018)
- Built on the shoulders of NumPy, SciPy, and scikit-learn

## 📞 Contact

- **Author**: Bryce Wang
- **Email**: brycewang2018@gmail.com
- **GitHub**: [brycewang-stanford](https://github.com/brycewang-stanford)

## 📈 Citation

If you use StatsPAI in your research, please cite:

```bibtex
@software{wang2024statspai,
  title={StatsPAI: The AI-powered Statistics & Econometrics Toolkit for Python},
  author={Wang, Bryce},
  year={2024},
  url={https://github.com/brycewang-stanford/pyEconometrics},
  version={0.1.0}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "StatsPAI",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Bryce Wang <bryce.wang@example.com>",
    "keywords": "econometrics, statistics, regression, causal-inference, causal-forest, panel-data, instrumental-variables, stata, R, machine-learning",
    "author": null,
    "author_email": "Bryce Wang <bryce.wang@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/cf/a3/e6af2eaebbdc4f59c847528cdb98eb4fc0a824e97dfc8560174122082f9d/statspai-0.1.0.tar.gz",
    "platform": null,
    "description": "# StatsPAI\n\n[![PyPI version](https://badge.fury.io/py/StatsPAI.svg)](https://badge.fury.io/py/StatsPAI)\n[![Python versions](https://img.shields.io/pypi/pyversions/StatsPAI.svg)](https://pypi.org/project/StatsPAI/)\n[![License](https://img.shields.io/github/license/brycewang-stanford/pyEconometrics.svg)](https://github.com/brycewang-stanford/pyEconometrics/blob/main/LICENSE)\n[![Build Status](https://github.com/brycewang-stanford/pyEconometrics/workflows/CI%2FCD%20Pipeline/badge.svg)](https://github.com/brycewang-stanford/pyEconometrics/actions)\n[![codecov](https://codecov.io/gh/brycewang-stanford/pyEconometrics/branch/main/graph/badge.svg)](https://codecov.io/gh/brycewang-stanford/pyEconometrics)\n\n**The AI-powered Statistics & Econometrics Toolkit for Python**\n\nStatsPAI bridges the gap between user-friendly syntax and powerful econometric analysis, making advanced techniques accessible to researchers and practitioners.\n\n## \ud83d\ude80 Features\n\n### Core Econometric Methods\n- **Linear Regression**: OLS, WLS with robust standard errors\n- **Instrumental Variables**: 2SLS estimation \n- **Panel Data**: Fixed Effects, Random Effects models\n- **Causal Inference**: Causal Forest implementation (inspired by EconML)\n\n### User Experience\n- **Formula Interface**: Intuitive R/Stata-style syntax `\"y ~ x1 + x2\"`\n- **Excel Export**: Professional output tables via `outreg2` (Stata-inspired)\n- **Flexible API**: Both formula and matrix interfaces supported\n- **Rich Output**: Detailed summary statistics and diagnostic tests\n\n### Technical Excellence\n- **Robust Implementation**: Based on proven econometric theory\n- **Performance Optimized**: Efficient algorithms for large datasets\n- **Well Tested**: Comprehensive test suite ensuring reliability\n- **Type Hints**: Full type annotation for better development experience\n\n## \ud83d\udce6 Installation\n\n```bash\n# Latest stable version\npip install StatsPAI\n\n# Development version\npip install git+https://github.com/brycewang-stanford/pyEconometrics.git\n```\n\n### Requirements\n- Python 3.8+\n- NumPy, SciPy, Pandas\n- scikit-learn (for Causal Forest)\n- openpyxl (for Excel export)\n\n## \ud83c\udfc1 Quick Start\n\n### Basic Regression Analysis\n```python\nimport pandas as pd\nfrom statspai import reg, outreg2\n\n# Load your data\ndf = pd.read_csv('data.csv')\n\n# Run OLS regression\nresult1 = reg('wage ~ education + experience', data=df)\nprint(result1.summary())\n\n# Add control variables\nresult2 = reg('wage ~ education + experience + age + gender', data=df)\n\n# Export results to Excel\noutreg2([result1, result2], 'regression_results.xlsx', \n        title='Wage Regression Analysis')\n```\n\n### Instrumental Variables\n```python\n# 2SLS estimation\niv_result = reg('wage ~ education | mother_education + father_education', \n                data=df, method='2sls')\nprint(iv_result.summary())\n```\n\n### Panel Data Analysis\n```python\n# Fixed effects model\nfe_result = reg('y ~ x1 + x2', data=df, \n                entity_col='firm_id', time_col='year', \n                method='fixed_effects')\n```\n\n### Causal Forest for Heterogeneous Treatment Effects\n```python\nfrom statspai import CausalForest\n\n# Initialize Causal Forest\ncf = CausalForest(n_estimators=100, random_state=42)\n\n# Fit model: outcome ~ treatment | features | controls\ncf.fit('income ~ job_training | age + education + experience | region + year', \n       data=df)\n\n# Estimate individual treatment effects\nindividual_effects = cf.effect(df)\n\n# Get confidence intervals\neffects_ci = cf.effect_interval(df, alpha=0.05)\n\n# Export results\ncf_summary = cf.summary()\noutreg2([cf_summary], 'causal_forest_results.xlsx')\n```\n\n## \ud83d\udcca Advanced Usage\n\n### Robust Standard Errors\n```python\n# Heteroskedasticity-robust standard errors\nresult = reg('y ~ x1 + x2', data=df, robust=True)\n\n# Clustered standard errors\nresult = reg('y ~ x1 + x2', data=df, cluster='firm_id')\n```\n\n### Model Comparison\n```python\nfrom statspai import compare_models\n\nmodels = [\n    reg('y ~ x1', data=df),\n    reg('y ~ x1 + x2', data=df),\n    reg('y ~ x1 + x2 + x3', data=df)\n]\n\ncomparison = compare_models(models)\nprint(comparison.summary())\n```\n\n### Custom Output Formatting\n```python\noutreg2(results, 'output.xlsx',\n        title='Regression Results',\n        add_stats={'Observations': lambda r: r.nobs,\n                  'R-squared': lambda r: r.rsquared},\n        decimal_places=4,\n        star_levels=[0.01, 0.05, 0.1])\n```\n\n## \ud83d\udcda Documentation\n\n- **[User Guide](docs/user_guide.md)**: Comprehensive tutorials and examples\n- **[API Reference](docs/api_reference.md)**: Detailed function documentation  \n- **[Theory Guide](docs/theory_guide.md)**: Mathematical foundations\n- **[Examples](examples/)**: Jupyter notebooks with real-world applications\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n```bash\n# Clone repository\ngit clone https://github.com/brycewang-stanford/pyEconometrics.git\ncd pyEconometrics\n\n# Install in development mode\npip install -e \".[dev]\"\n\n# Install pre-commit hooks\npre-commit install\n\n# Run tests\npytest\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Inspired by Stata's `outreg2` command for output formatting\n- Causal Forest implementation based on Wager & Athey (2018)\n- Built on the shoulders of NumPy, SciPy, and scikit-learn\n\n## \ud83d\udcde Contact\n\n- **Author**: Bryce Wang\n- **Email**: brycewang2018@gmail.com\n- **GitHub**: [brycewang-stanford](https://github.com/brycewang-stanford)\n\n## \ud83d\udcc8 Citation\n\nIf you use StatsPAI in your research, please cite:\n\n```bibtex\n@software{wang2024statspai,\n  title={StatsPAI: The AI-powered Statistics & Econometrics Toolkit for Python},\n  author={Wang, Bryce},\n  year={2024},\n  url={https://github.com/brycewang-stanford/pyEconometrics},\n  version={0.1.0}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "The AI-powered Statistics & Econometrics Toolkit for Python",
    "version": "0.1.0",
    "project_urls": {
        "Bug Reports": "https://github.com/brycewang-stanford/pyEconometrics/issues",
        "Documentation": "https://statspai.readthedocs.io/",
        "Homepage": "https://github.com/brycewang-stanford/pyEconometrics",
        "Repository": "https://github.com/brycewang-stanford/pyEconometrics"
    },
    "split_keywords": [
        "econometrics",
        " statistics",
        " regression",
        " causal-inference",
        " causal-forest",
        " panel-data",
        " instrumental-variables",
        " stata",
        " r",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "761fa63f687c2a6a054f2911bfbf1e25a53c436935573cb7ba5a38113c113204",
                "md5": "eae536dacd3c2dccab20dffd1094e189",
                "sha256": "24705b3b9ce3c55ec959b62a58379e56acb3cb3186de75fcb91fe0da6097dcc0"
            },
            "downloads": -1,
            "filename": "statspai-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eae536dacd3c2dccab20dffd1094e189",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 27122,
            "upload_time": "2025-07-26T09:49:49",
            "upload_time_iso_8601": "2025-07-26T09:49:49.112274Z",
            "url": "https://files.pythonhosted.org/packages/76/1f/a63f687c2a6a054f2911bfbf1e25a53c436935573cb7ba5a38113c113204/statspai-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cfa3e6af2eaebbdc4f59c847528cdb98eb4fc0a824e97dfc8560174122082f9d",
                "md5": "97abed1619d4417e88dfce89eeac43d7",
                "sha256": "7406cf25ee636296ab16e2ec28100647c4456113e29f84a6f792d09a7145acca"
            },
            "downloads": -1,
            "filename": "statspai-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "97abed1619d4417e88dfce89eeac43d7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 27888,
            "upload_time": "2025-07-26T09:49:50",
            "upload_time_iso_8601": "2025-07-26T09:49:50.541785Z",
            "url": "https://files.pythonhosted.org/packages/cf/a3/e6af2eaebbdc4f59c847528cdb98eb4fc0a824e97dfc8560174122082f9d/statspai-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-26 09:49:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brycewang-stanford",
    "github_project": "pyEconometrics",
    "github_not_found": true,
    "lcname": "statspai"
}

None