# pandas-tabulate
[](https://badge.fury.io/py/pandas-tabulate)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
Python implementation of Stata's tabulate command for pandas DataFrames.
pandas-tabulate brings the power and familiarity of Stata's `tabulate` command to Python, providing comprehensive cross-tabulation and frequency analysis tools that seamlessly integrate with pandas DataFrames.
## Key Features
- **Comprehensive tabulation**: One-way and two-way frequency tables
- **Statistical analysis**: Chi-square tests, Fisher exact tests, and other statistical measures
- **Flexible formatting**: Multiple output formats and customization options
- **Missing value handling**: Configurable treatment of missing data
- **Stata compatibility**: Familiar syntax and output format for Stata users
- **Performance optimized**: Efficient implementation using pandas and NumPy
## Installation
```bash
pip install pandas-tabulate
```
## Quick Start
```python
import pandas as pd
import pandas_tabulate as ptab
# Create sample data
df = pd.DataFrame({
'gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'],
'education': ['High', 'Low', 'High', 'High', 'Low', 'Low', 'High', 'Low'],
'income': [50000, 30000, 60000, 45000, 35000, 25000, 55000, 28000]
})
# One-way tabulation
result = ptab.tabulate(df['gender'])
print(result)
# Two-way tabulation with statistics
result = ptab.tabulate(df['gender'], df['education'],
chi2=True, exact=True)
print(result)
```
## Available Functions
### Core Tabulation Functions
- **`tabulate(var1, var2=None, **kwargs)`** - Main tabulation function
- **`oneway(variable, **kwargs)`** - One-way frequency tables
- **`twoway(var1, var2, **kwargs)`** - Two-way cross-tabulation
### Statistical Tests
- **Chi-square test** - Test of independence for categorical variables
- **Fisher exact test** - Exact test for small sample sizes
- **Likelihood ratio test** - Alternative test of independence
- **Cramér's V** - Measure of association strength
### Output Options
- **Frequencies** - Raw counts
- **Percentages** - Row, column, and total percentages
- **Cumulative** - Cumulative frequencies and percentages
- **Missing handling** - Include/exclude missing values
## Detailed Examples
### One-way Tabulation
```python
import pandas as pd
import pandas_tabulate as ptab
# Basic frequency table
df = pd.DataFrame({'status': ['A', 'B', 'A', 'C', 'B', 'A', 'C']})
result = ptab.oneway(df['status'])
print(result)
# With percentages and cumulative statistics
result = ptab.oneway(df['status'],
percent=True,
cumulative=True)
print(result)
```
### Two-way Cross-tabulation
```python
# Basic cross-tabulation
result = ptab.twoway(df['gender'], df['education'])
print(result)
# With row and column percentages
result = ptab.twoway(df['gender'], df['education'],
row_percent=True,
col_percent=True)
print(result)
# With statistical tests
result = ptab.twoway(df['gender'], df['education'],
chi2=True,
exact=True,
cramers_v=True)
print(result)
```
### Missing Value Handling
```python
import numpy as np
# Data with missing values
df_missing = pd.DataFrame({
'var1': ['A', 'B', np.nan, 'A', 'C'],
'var2': ['X', np.nan, 'Y', 'X', 'Y']
})
# Exclude missing values (default)
result = ptab.twoway(df_missing['var1'], df_missing['var2'])
# Include missing values
result = ptab.twoway(df_missing['var1'], df_missing['var2'],
missing=True)
```
## Stata to Python Translation Guide
| Stata Command | pandas-tabulate Equivalent |
|---------------|----------------------------|
| `tabulate var1` | `ptab.oneway(df['var1'])` |
| `tabulate var1, missing` | `ptab.oneway(df['var1'], missing=True)` |
| `tabulate var1 var2` | `ptab.twoway(df['var1'], df['var2'])` |
| `tabulate var1 var2, chi2` | `ptab.twoway(df['var1'], df['var2'], chi2=True)` |
| `tabulate var1 var2, exact` | `ptab.twoway(df['var1'], df['var2'], exact=True)` |
| `tabulate var1 var2, row col` | `ptab.twoway(df['var1'], df['var2'], row_percent=True, col_percent=True)` |
## Function Reference
### tabulate(var1, var2=None, **kwargs)
Main tabulation function that automatically determines whether to perform one-way or two-way tabulation.
**Parameters:**
- `var1`: pandas Series - First variable
- `var2`: pandas Series, optional - Second variable for cross-tabulation
- `percent`: bool, default False - Show percentages
- `cumulative`: bool, default False - Show cumulative statistics
- `chi2`: bool, default False - Perform chi-square test
- `exact`: bool, default False - Perform Fisher exact test
- `missing`: bool, default False - Include missing values
**Returns:**
- TabulationResult object with tables and statistics
### Statistical Tests
All statistical tests return results with:
- Test statistic
- p-value
- Degrees of freedom (where applicable)
- Critical value
- Interpretation
## Contributing
We welcome contributions! Please see our Contributing Guide for details.
### Development Setup
```bash
git clone https://github.com/brycewang-stanford/pandas-tabulate.git
cd pandas-tabulate
pip install -e ".[dev]"
python -m pytest tests/
```
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- Inspired by Stata's tabulate command
- Built on pandas, NumPy, and SciPy
- Thanks to the open-source community for feedback and contributions
## Support
- Bug Reports: [GitHub Issues](https://github.com/brycewang-stanford/pandas-tabulate/issues)
- Feature Requests: [GitHub Discussions](https://github.com/brycewang-stanford/pandas-tabulate/discussions)
- Email: brycew6m@stanford.edu
---
If this package helps your research, please consider starring the repository!
Raw data
{
"_id": null,
"home_page": null,
"name": "pandas-tabulate",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Bryce Wang <brycew6m@stanford.edu>",
"keywords": "stata, tabulate, pandas, statistics, cross-tabulation",
"author": null,
"author_email": "Bryce Wang <brycew6m@stanford.edu>",
"download_url": "https://files.pythonhosted.org/packages/44/2d/f79636694d2cb8d261192e11d3f465e96c7c7c79b066c24dc7983f2749c9/pandas_tabulate-0.1.0.tar.gz",
"platform": null,
"description": "# pandas-tabulate\n\n[](https://badge.fury.io/py/pandas-tabulate)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nPython implementation of Stata's tabulate command for pandas DataFrames.\n\npandas-tabulate brings the power and familiarity of Stata's `tabulate` command to Python, providing comprehensive cross-tabulation and frequency analysis tools that seamlessly integrate with pandas DataFrames.\n\n## Key Features\n\n- **Comprehensive tabulation**: One-way and two-way frequency tables\n- **Statistical analysis**: Chi-square tests, Fisher exact tests, and other statistical measures\n- **Flexible formatting**: Multiple output formats and customization options\n- **Missing value handling**: Configurable treatment of missing data\n- **Stata compatibility**: Familiar syntax and output format for Stata users\n- **Performance optimized**: Efficient implementation using pandas and NumPy\n\n## Installation\n\n```bash\npip install pandas-tabulate\n```\n\n## Quick Start\n\n```python\nimport pandas as pd\nimport pandas_tabulate as ptab\n\n# Create sample data\ndf = pd.DataFrame({\n 'gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'],\n 'education': ['High', 'Low', 'High', 'High', 'Low', 'Low', 'High', 'Low'],\n 'income': [50000, 30000, 60000, 45000, 35000, 25000, 55000, 28000]\n})\n\n# One-way tabulation\nresult = ptab.tabulate(df['gender'])\nprint(result)\n\n# Two-way tabulation with statistics\nresult = ptab.tabulate(df['gender'], df['education'], \n chi2=True, exact=True)\nprint(result)\n```\n\n## Available Functions\n\n### Core Tabulation Functions\n- **`tabulate(var1, var2=None, **kwargs)`** - Main tabulation function\n- **`oneway(variable, **kwargs)`** - One-way frequency tables\n- **`twoway(var1, var2, **kwargs)`** - Two-way cross-tabulation\n\n### Statistical Tests\n- **Chi-square test** - Test of independence for categorical variables\n- **Fisher exact test** - Exact test for small sample sizes\n- **Likelihood ratio test** - Alternative test of independence\n- **Cram\u00e9r's V** - Measure of association strength\n\n### Output Options\n- **Frequencies** - Raw counts\n- **Percentages** - Row, column, and total percentages\n- **Cumulative** - Cumulative frequencies and percentages\n- **Missing handling** - Include/exclude missing values\n\n## Detailed Examples\n\n### One-way Tabulation\n\n```python\nimport pandas as pd\nimport pandas_tabulate as ptab\n\n# Basic frequency table\ndf = pd.DataFrame({'status': ['A', 'B', 'A', 'C', 'B', 'A', 'C']})\nresult = ptab.oneway(df['status'])\nprint(result)\n\n# With percentages and cumulative statistics\nresult = ptab.oneway(df['status'], \n percent=True, \n cumulative=True)\nprint(result)\n```\n\n### Two-way Cross-tabulation\n\n```python\n# Basic cross-tabulation\nresult = ptab.twoway(df['gender'], df['education'])\nprint(result)\n\n# With row and column percentages\nresult = ptab.twoway(df['gender'], df['education'],\n row_percent=True,\n col_percent=True)\nprint(result)\n\n# With statistical tests\nresult = ptab.twoway(df['gender'], df['education'],\n chi2=True,\n exact=True,\n cramers_v=True)\nprint(result)\n```\n\n### Missing Value Handling\n\n```python\nimport numpy as np\n\n# Data with missing values\ndf_missing = pd.DataFrame({\n 'var1': ['A', 'B', np.nan, 'A', 'C'],\n 'var2': ['X', np.nan, 'Y', 'X', 'Y']\n})\n\n# Exclude missing values (default)\nresult = ptab.twoway(df_missing['var1'], df_missing['var2'])\n\n# Include missing values\nresult = ptab.twoway(df_missing['var1'], df_missing['var2'], \n missing=True)\n```\n\n## Stata to Python Translation Guide\n\n| Stata Command | pandas-tabulate Equivalent |\n|---------------|----------------------------|\n| `tabulate var1` | `ptab.oneway(df['var1'])` |\n| `tabulate var1, missing` | `ptab.oneway(df['var1'], missing=True)` |\n| `tabulate var1 var2` | `ptab.twoway(df['var1'], df['var2'])` |\n| `tabulate var1 var2, chi2` | `ptab.twoway(df['var1'], df['var2'], chi2=True)` |\n| `tabulate var1 var2, exact` | `ptab.twoway(df['var1'], df['var2'], exact=True)` |\n| `tabulate var1 var2, row col` | `ptab.twoway(df['var1'], df['var2'], row_percent=True, col_percent=True)` |\n\n## Function Reference\n\n### tabulate(var1, var2=None, **kwargs)\n\nMain tabulation function that automatically determines whether to perform one-way or two-way tabulation.\n\n**Parameters:**\n- `var1`: pandas Series - First variable\n- `var2`: pandas Series, optional - Second variable for cross-tabulation\n- `percent`: bool, default False - Show percentages\n- `cumulative`: bool, default False - Show cumulative statistics\n- `chi2`: bool, default False - Perform chi-square test\n- `exact`: bool, default False - Perform Fisher exact test\n- `missing`: bool, default False - Include missing values\n\n**Returns:**\n- TabulationResult object with tables and statistics\n\n### Statistical Tests\n\nAll statistical tests return results with:\n- Test statistic\n- p-value\n- Degrees of freedom (where applicable)\n- Critical value\n- Interpretation\n\n## Contributing\n\nWe welcome contributions! Please see our Contributing Guide for details.\n\n### Development Setup\n```bash\ngit clone https://github.com/brycewang-stanford/pandas-tabulate.git\ncd pandas-tabulate\npip install -e \".[dev]\"\npython -m pytest tests/\n```\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\n- Inspired by Stata's tabulate command\n- Built on pandas, NumPy, and SciPy\n- Thanks to the open-source community for feedback and contributions\n\n## Support\n\n- Bug Reports: [GitHub Issues](https://github.com/brycewang-stanford/pandas-tabulate/issues)\n- Feature Requests: [GitHub Discussions](https://github.com/brycewang-stanford/pandas-tabulate/discussions)\n- Email: brycew6m@stanford.edu\n\n---\n\nIf this package helps your research, please consider starring the repository!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python implementation of Stata's tabulate command for pandas DataFrames",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/brycewang-stanford/pandas-tabulate/issues",
"Documentation": "https://github.com/brycewang-stanford/pandas-tabulate#readme",
"Homepage": "https://github.com/brycewang-stanford/pandas-tabulate",
"Repository": "https://github.com/brycewang-stanford/pandas-tabulate"
},
"split_keywords": [
"stata",
" tabulate",
" pandas",
" statistics",
" cross-tabulation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0ff25ee6960abfacd83877b0737f88ed5ce1e401bf94fc9f2ec664e5d106ca16",
"md5": "ffb42d3821eb13c76aca17a57c505947",
"sha256": "fdd8000e6f11579b83ee75480c4934f517bb2c51ecdf625b0ec7e042251a59f1"
},
"downloads": -1,
"filename": "pandas_tabulate-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ffb42d3821eb13c76aca17a57c505947",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 10869,
"upload_time": "2025-07-25T17:35:42",
"upload_time_iso_8601": "2025-07-25T17:35:42.095251Z",
"url": "https://files.pythonhosted.org/packages/0f/f2/5ee6960abfacd83877b0737f88ed5ce1e401bf94fc9f2ec664e5d106ca16/pandas_tabulate-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "442df79636694d2cb8d261192e11d3f465e96c7c7c79b066c24dc7983f2749c9",
"md5": "48ca26244549a4fecf15200a171e60c7",
"sha256": "7bd608a848c02f949f543ed7d404b5b760e8f78c730e900214232e4ea567b451"
},
"downloads": -1,
"filename": "pandas_tabulate-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "48ca26244549a4fecf15200a171e60c7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 9692,
"upload_time": "2025-07-25T17:35:43",
"upload_time_iso_8601": "2025-07-25T17:35:43.370738Z",
"url": "https://files.pythonhosted.org/packages/44/2d/f79636694d2cb8d261192e11d3f465e96c7c7c79b066c24dc7983f2749c9/pandas_tabulate-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-25 17:35:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "brycewang-stanford",
"github_project": "pandas-tabulate",
"github_not_found": true,
"lcname": "pandas-tabulate"
}