pandas-tabulate


Namepandas-tabulate JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryPython implementation of Stata's tabulate command for pandas DataFrames
upload_time2025-07-25 17:35:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT
keywords stata tabulate pandas statistics cross-tabulation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pandas-tabulate

[![PyPI version](https://badge.fury.io/py/pandas-tabulate.svg)](https://badge.fury.io/py/pandas-tabulate)
[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Python implementation of Stata's tabulate command for pandas DataFrames.

pandas-tabulate brings the power and familiarity of Stata's `tabulate` command to Python, providing comprehensive cross-tabulation and frequency analysis tools that seamlessly integrate with pandas DataFrames.

## Key Features

- **Comprehensive tabulation**: One-way and two-way frequency tables
- **Statistical analysis**: Chi-square tests, Fisher exact tests, and other statistical measures
- **Flexible formatting**: Multiple output formats and customization options
- **Missing value handling**: Configurable treatment of missing data
- **Stata compatibility**: Familiar syntax and output format for Stata users
- **Performance optimized**: Efficient implementation using pandas and NumPy

## Installation

```bash
pip install pandas-tabulate
```

## Quick Start

```python
import pandas as pd
import pandas_tabulate as ptab

# Create sample data
df = pd.DataFrame({
    'gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'],
    'education': ['High', 'Low', 'High', 'High', 'Low', 'Low', 'High', 'Low'],
    'income': [50000, 30000, 60000, 45000, 35000, 25000, 55000, 28000]
})

# One-way tabulation
result = ptab.tabulate(df['gender'])
print(result)

# Two-way tabulation with statistics
result = ptab.tabulate(df['gender'], df['education'], 
                      chi2=True, exact=True)
print(result)
```

## Available Functions

### Core Tabulation Functions
- **`tabulate(var1, var2=None, **kwargs)`** - Main tabulation function
- **`oneway(variable, **kwargs)`** - One-way frequency tables
- **`twoway(var1, var2, **kwargs)`** - Two-way cross-tabulation

### Statistical Tests
- **Chi-square test** - Test of independence for categorical variables
- **Fisher exact test** - Exact test for small sample sizes
- **Likelihood ratio test** - Alternative test of independence
- **Cramér's V** - Measure of association strength

### Output Options
- **Frequencies** - Raw counts
- **Percentages** - Row, column, and total percentages
- **Cumulative** - Cumulative frequencies and percentages
- **Missing handling** - Include/exclude missing values

## Detailed Examples

### One-way Tabulation

```python
import pandas as pd
import pandas_tabulate as ptab

# Basic frequency table
df = pd.DataFrame({'status': ['A', 'B', 'A', 'C', 'B', 'A', 'C']})
result = ptab.oneway(df['status'])
print(result)

# With percentages and cumulative statistics
result = ptab.oneway(df['status'], 
                    percent=True, 
                    cumulative=True)
print(result)
```

### Two-way Cross-tabulation

```python
# Basic cross-tabulation
result = ptab.twoway(df['gender'], df['education'])
print(result)

# With row and column percentages
result = ptab.twoway(df['gender'], df['education'],
                    row_percent=True,
                    col_percent=True)
print(result)

# With statistical tests
result = ptab.twoway(df['gender'], df['education'],
                    chi2=True,
                    exact=True,
                    cramers_v=True)
print(result)
```

### Missing Value Handling

```python
import numpy as np

# Data with missing values
df_missing = pd.DataFrame({
    'var1': ['A', 'B', np.nan, 'A', 'C'],
    'var2': ['X', np.nan, 'Y', 'X', 'Y']
})

# Exclude missing values (default)
result = ptab.twoway(df_missing['var1'], df_missing['var2'])

# Include missing values
result = ptab.twoway(df_missing['var1'], df_missing['var2'], 
                    missing=True)
```

## Stata to Python Translation Guide

| Stata Command | pandas-tabulate Equivalent |
|---------------|----------------------------|
| `tabulate var1` | `ptab.oneway(df['var1'])` |
| `tabulate var1, missing` | `ptab.oneway(df['var1'], missing=True)` |
| `tabulate var1 var2` | `ptab.twoway(df['var1'], df['var2'])` |
| `tabulate var1 var2, chi2` | `ptab.twoway(df['var1'], df['var2'], chi2=True)` |
| `tabulate var1 var2, exact` | `ptab.twoway(df['var1'], df['var2'], exact=True)` |
| `tabulate var1 var2, row col` | `ptab.twoway(df['var1'], df['var2'], row_percent=True, col_percent=True)` |

## Function Reference

### tabulate(var1, var2=None, **kwargs)

Main tabulation function that automatically determines whether to perform one-way or two-way tabulation.

**Parameters:**
- `var1`: pandas Series - First variable
- `var2`: pandas Series, optional - Second variable for cross-tabulation
- `percent`: bool, default False - Show percentages
- `cumulative`: bool, default False - Show cumulative statistics
- `chi2`: bool, default False - Perform chi-square test
- `exact`: bool, default False - Perform Fisher exact test
- `missing`: bool, default False - Include missing values

**Returns:**
- TabulationResult object with tables and statistics

### Statistical Tests

All statistical tests return results with:
- Test statistic
- p-value
- Degrees of freedom (where applicable)
- Critical value
- Interpretation

## Contributing

We welcome contributions! Please see our Contributing Guide for details.

### Development Setup
```bash
git clone https://github.com/brycewang-stanford/pandas-tabulate.git
cd pandas-tabulate
pip install -e ".[dev]"
python -m pytest tests/
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- Inspired by Stata's tabulate command
- Built on pandas, NumPy, and SciPy
- Thanks to the open-source community for feedback and contributions

## Support

- Bug Reports: [GitHub Issues](https://github.com/brycewang-stanford/pandas-tabulate/issues)
- Feature Requests: [GitHub Discussions](https://github.com/brycewang-stanford/pandas-tabulate/discussions)
- Email: brycew6m@stanford.edu

---

If this package helps your research, please consider starring the repository!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pandas-tabulate",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Bryce Wang <brycew6m@stanford.edu>",
    "keywords": "stata, tabulate, pandas, statistics, cross-tabulation",
    "author": null,
    "author_email": "Bryce Wang <brycew6m@stanford.edu>",
    "download_url": "https://files.pythonhosted.org/packages/44/2d/f79636694d2cb8d261192e11d3f465e96c7c7c79b066c24dc7983f2749c9/pandas_tabulate-0.1.0.tar.gz",
    "platform": null,
    "description": "# pandas-tabulate\n\n[![PyPI version](https://badge.fury.io/py/pandas-tabulate.svg)](https://badge.fury.io/py/pandas-tabulate)\n[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nPython implementation of Stata's tabulate command for pandas DataFrames.\n\npandas-tabulate brings the power and familiarity of Stata's `tabulate` command to Python, providing comprehensive cross-tabulation and frequency analysis tools that seamlessly integrate with pandas DataFrames.\n\n## Key Features\n\n- **Comprehensive tabulation**: One-way and two-way frequency tables\n- **Statistical analysis**: Chi-square tests, Fisher exact tests, and other statistical measures\n- **Flexible formatting**: Multiple output formats and customization options\n- **Missing value handling**: Configurable treatment of missing data\n- **Stata compatibility**: Familiar syntax and output format for Stata users\n- **Performance optimized**: Efficient implementation using pandas and NumPy\n\n## Installation\n\n```bash\npip install pandas-tabulate\n```\n\n## Quick Start\n\n```python\nimport pandas as pd\nimport pandas_tabulate as ptab\n\n# Create sample data\ndf = pd.DataFrame({\n    'gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'],\n    'education': ['High', 'Low', 'High', 'High', 'Low', 'Low', 'High', 'Low'],\n    'income': [50000, 30000, 60000, 45000, 35000, 25000, 55000, 28000]\n})\n\n# One-way tabulation\nresult = ptab.tabulate(df['gender'])\nprint(result)\n\n# Two-way tabulation with statistics\nresult = ptab.tabulate(df['gender'], df['education'], \n                      chi2=True, exact=True)\nprint(result)\n```\n\n## Available Functions\n\n### Core Tabulation Functions\n- **`tabulate(var1, var2=None, **kwargs)`** - Main tabulation function\n- **`oneway(variable, **kwargs)`** - One-way frequency tables\n- **`twoway(var1, var2, **kwargs)`** - Two-way cross-tabulation\n\n### Statistical Tests\n- **Chi-square test** - Test of independence for categorical variables\n- **Fisher exact test** - Exact test for small sample sizes\n- **Likelihood ratio test** - Alternative test of independence\n- **Cram\u00e9r's V** - Measure of association strength\n\n### Output Options\n- **Frequencies** - Raw counts\n- **Percentages** - Row, column, and total percentages\n- **Cumulative** - Cumulative frequencies and percentages\n- **Missing handling** - Include/exclude missing values\n\n## Detailed Examples\n\n### One-way Tabulation\n\n```python\nimport pandas as pd\nimport pandas_tabulate as ptab\n\n# Basic frequency table\ndf = pd.DataFrame({'status': ['A', 'B', 'A', 'C', 'B', 'A', 'C']})\nresult = ptab.oneway(df['status'])\nprint(result)\n\n# With percentages and cumulative statistics\nresult = ptab.oneway(df['status'], \n                    percent=True, \n                    cumulative=True)\nprint(result)\n```\n\n### Two-way Cross-tabulation\n\n```python\n# Basic cross-tabulation\nresult = ptab.twoway(df['gender'], df['education'])\nprint(result)\n\n# With row and column percentages\nresult = ptab.twoway(df['gender'], df['education'],\n                    row_percent=True,\n                    col_percent=True)\nprint(result)\n\n# With statistical tests\nresult = ptab.twoway(df['gender'], df['education'],\n                    chi2=True,\n                    exact=True,\n                    cramers_v=True)\nprint(result)\n```\n\n### Missing Value Handling\n\n```python\nimport numpy as np\n\n# Data with missing values\ndf_missing = pd.DataFrame({\n    'var1': ['A', 'B', np.nan, 'A', 'C'],\n    'var2': ['X', np.nan, 'Y', 'X', 'Y']\n})\n\n# Exclude missing values (default)\nresult = ptab.twoway(df_missing['var1'], df_missing['var2'])\n\n# Include missing values\nresult = ptab.twoway(df_missing['var1'], df_missing['var2'], \n                    missing=True)\n```\n\n## Stata to Python Translation Guide\n\n| Stata Command | pandas-tabulate Equivalent |\n|---------------|----------------------------|\n| `tabulate var1` | `ptab.oneway(df['var1'])` |\n| `tabulate var1, missing` | `ptab.oneway(df['var1'], missing=True)` |\n| `tabulate var1 var2` | `ptab.twoway(df['var1'], df['var2'])` |\n| `tabulate var1 var2, chi2` | `ptab.twoway(df['var1'], df['var2'], chi2=True)` |\n| `tabulate var1 var2, exact` | `ptab.twoway(df['var1'], df['var2'], exact=True)` |\n| `tabulate var1 var2, row col` | `ptab.twoway(df['var1'], df['var2'], row_percent=True, col_percent=True)` |\n\n## Function Reference\n\n### tabulate(var1, var2=None, **kwargs)\n\nMain tabulation function that automatically determines whether to perform one-way or two-way tabulation.\n\n**Parameters:**\n- `var1`: pandas Series - First variable\n- `var2`: pandas Series, optional - Second variable for cross-tabulation\n- `percent`: bool, default False - Show percentages\n- `cumulative`: bool, default False - Show cumulative statistics\n- `chi2`: bool, default False - Perform chi-square test\n- `exact`: bool, default False - Perform Fisher exact test\n- `missing`: bool, default False - Include missing values\n\n**Returns:**\n- TabulationResult object with tables and statistics\n\n### Statistical Tests\n\nAll statistical tests return results with:\n- Test statistic\n- p-value\n- Degrees of freedom (where applicable)\n- Critical value\n- Interpretation\n\n## Contributing\n\nWe welcome contributions! Please see our Contributing Guide for details.\n\n### Development Setup\n```bash\ngit clone https://github.com/brycewang-stanford/pandas-tabulate.git\ncd pandas-tabulate\npip install -e \".[dev]\"\npython -m pytest tests/\n```\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\n- Inspired by Stata's tabulate command\n- Built on pandas, NumPy, and SciPy\n- Thanks to the open-source community for feedback and contributions\n\n## Support\n\n- Bug Reports: [GitHub Issues](https://github.com/brycewang-stanford/pandas-tabulate/issues)\n- Feature Requests: [GitHub Discussions](https://github.com/brycewang-stanford/pandas-tabulate/discussions)\n- Email: brycew6m@stanford.edu\n\n---\n\nIf this package helps your research, please consider starring the repository!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python implementation of Stata's tabulate command for pandas DataFrames",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/brycewang-stanford/pandas-tabulate/issues",
        "Documentation": "https://github.com/brycewang-stanford/pandas-tabulate#readme",
        "Homepage": "https://github.com/brycewang-stanford/pandas-tabulate",
        "Repository": "https://github.com/brycewang-stanford/pandas-tabulate"
    },
    "split_keywords": [
        "stata",
        " tabulate",
        " pandas",
        " statistics",
        " cross-tabulation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0ff25ee6960abfacd83877b0737f88ed5ce1e401bf94fc9f2ec664e5d106ca16",
                "md5": "ffb42d3821eb13c76aca17a57c505947",
                "sha256": "fdd8000e6f11579b83ee75480c4934f517bb2c51ecdf625b0ec7e042251a59f1"
            },
            "downloads": -1,
            "filename": "pandas_tabulate-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ffb42d3821eb13c76aca17a57c505947",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 10869,
            "upload_time": "2025-07-25T17:35:42",
            "upload_time_iso_8601": "2025-07-25T17:35:42.095251Z",
            "url": "https://files.pythonhosted.org/packages/0f/f2/5ee6960abfacd83877b0737f88ed5ce1e401bf94fc9f2ec664e5d106ca16/pandas_tabulate-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "442df79636694d2cb8d261192e11d3f465e96c7c7c79b066c24dc7983f2749c9",
                "md5": "48ca26244549a4fecf15200a171e60c7",
                "sha256": "7bd608a848c02f949f543ed7d404b5b760e8f78c730e900214232e4ea567b451"
            },
            "downloads": -1,
            "filename": "pandas_tabulate-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "48ca26244549a4fecf15200a171e60c7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 9692,
            "upload_time": "2025-07-25T17:35:43",
            "upload_time_iso_8601": "2025-07-25T17:35:43.370738Z",
            "url": "https://files.pythonhosted.org/packages/44/2d/f79636694d2cb8d261192e11d3f465e96c7c7c79b066c24dc7983f2749c9/pandas_tabulate-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-25 17:35:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brycewang-stanford",
    "github_project": "pandas-tabulate",
    "github_not_found": true,
    "lcname": "pandas-tabulate"
}
        
Elapsed time: 1.88829s