pywinsor2


Namepywinsor2 JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryPython implementation of Stata's winsor2 command for winsorizing and trimming data
upload_time2025-07-26 05:43:42
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords stata winsor winsorize trim outliers data-cleaning pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pywinsor2

[![PyPI version](https://badge.fury.io/py/pywinsor2.svg)](https://badge.fury.io/py/pywinsor2)
[![Downloads](https://static.pepy.tech/badge/pywinsor2)](https://pepy.tech/project/pywinsor2)
[![Downloads](https://static.pepy.tech/badge/pywinsor2/month)](https://pepy.tech/project/pywinsor2)
[![Downloads](https://static.pepy.tech/badge/pywinsor2/week)](https://pepy.tech/project/pywinsor2)
[![Python Versions](https://img.shields.io/pypi/pyversions/pywinsor2.svg)](https://pypi.org/project/pywinsor2/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub stars](https://img.shields.io/github/stars/brycewang-stanford/pywinsor2.svg?style=social&label=Star)](https://github.com/brycewang-stanford/pywinsor2)

Python implementation of Stata's `winsor2` command for winsorizing and trimming data.

## Installation

```bash
pip install pywinsor2
```

## Quick Start

```python
import pandas as pd
import pywinsor2 as pw2

# Load sample data
data = pd.DataFrame({
    'wage': [1.0, 1.5, 2.0, 3.0, 5.0, 8.0, 12.0, 20.0, 50.0, 100.0],
    'industry': ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'B']
})

# Winsorize at 1st and 99th percentiles (default)
result = pw2.winsor2(data, ['wage'])

# Winsorize with custom cuts
result = pw2.winsor2(data, ['wage'], cuts=(5, 95))

# Trim instead of winsorize
result = pw2.winsor2(data, ['wage'], trim=True)

# Winsorize by group
result = pw2.winsor2(data, ['wage'], by='industry')

# Replace original variables
pw2.winsor2(data, ['wage'], replace=True)
```

## Features

- **Winsorizing**: Replace extreme values with percentile values
- **Trimming**: Remove extreme values (set to NaN)
- **Group-wise processing**: Process data within groups
- **Flexible percentiles**: Specify custom cut-off percentiles
- **Multiple variables**: Process multiple columns simultaneously
- **Stata compatibility**: API designed to match Stata's `winsor2` command

## Main Function

### `winsor2(data, varlist, cuts=(1, 99), suffix=None, replace=False, trim=False, by=None, label=False)`

**Parameters:**
- `data` (DataFrame): Input pandas DataFrame
- `varlist` (list): List of column names to process
- `cuts` (tuple): Percentiles for winsorizing/trimming (default: (1, 99))
- `suffix` (str): Suffix for new variables (default: '_w' for winsor, '_tr' for trim)
- `replace` (bool): Replace original variables (default: False)
- `trim` (bool): Trim instead of winsorize (default: False)
- `by` (str or list): Group variables for group-wise processing
- `label` (bool): Add descriptive labels to new columns (default: False)

**Returns:**
- `DataFrame`: Processed DataFrame with winsorized/trimmed variables

## Examples

### Basic Usage

```python
import pandas as pd
import pywinsor2 as pw2

# Create sample data
data = pd.DataFrame({
    'wage': [1, 2, 3, 4, 5, 6, 7, 8, 9, 100],  # outlier: 100
    'age': [20, 25, 30, 35, 40, 45, 50, 55, 60, 25]
})

# Winsorize at default percentiles (1, 99)
result = pw2.winsor2(data, ['wage'])
print(result['wage_w'])  # New winsorized variable

# Winsorize multiple variables
result = pw2.winsor2(data, ['wage', 'age'], cuts=(5, 95))

# Trim outliers
result = pw2.winsor2(data, ['wage'], trim=True, cuts=(10, 90))
print(result['wage_tr'])  # Trimmed variable
```

### Group-wise Processing

```python
# Winsorize within groups
data = pd.DataFrame({
    'wage': [1, 2, 3, 10, 1, 2, 3, 15],
    'industry': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
})

result = pw2.winsor2(data, ['wage'], by='industry', cuts=(25, 75))
```

### Advanced Options

```python
# Replace original variables
pw2.winsor2(data, ['wage'], replace=True, cuts=(2, 98))

# Custom suffix and labels
result = pw2.winsor2(data, ['wage'], suffix='_clean', label=True)
```

## Comparison with Stata

| Stata Command | Python Equivalent |
|---------------|-------------------|
| `winsor2 wage` | `pw2.winsor2(df, ['wage'])` |
| `winsor2 wage, cuts(5 95)` | `pw2.winsor2(df, ['wage'], cuts=(5, 95))` |
| `winsor2 wage, trim` | `pw2.winsor2(df, ['wage'], trim=True)` |
| `winsor2 wage, by(industry)` | `pw2.winsor2(df, ['wage'], by='industry')` |
| `winsor2 wage, replace` | `pw2.winsor2(df, ['wage'], replace=True)` |

## License

MIT License

## Author

Bryce Wang - brycew6m@stanford.edu

## Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pywinsor2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Bryce Wang <brycew6m@stanford.edu>",
    "keywords": "stata, winsor, winsorize, trim, outliers, data-cleaning, pandas",
    "author": null,
    "author_email": "Bryce Wang <brycew6m@stanford.edu>",
    "download_url": "https://files.pythonhosted.org/packages/d7/d1/67b4d7c725545310b6f0314b2a268439149c23af23044de1dab4e388b85a/pywinsor2-0.1.1.tar.gz",
    "platform": null,
    "description": "# pywinsor2\n\n[![PyPI version](https://badge.fury.io/py/pywinsor2.svg)](https://badge.fury.io/py/pywinsor2)\n[![Downloads](https://static.pepy.tech/badge/pywinsor2)](https://pepy.tech/project/pywinsor2)\n[![Downloads](https://static.pepy.tech/badge/pywinsor2/month)](https://pepy.tech/project/pywinsor2)\n[![Downloads](https://static.pepy.tech/badge/pywinsor2/week)](https://pepy.tech/project/pywinsor2)\n[![Python Versions](https://img.shields.io/pypi/pyversions/pywinsor2.svg)](https://pypi.org/project/pywinsor2/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![GitHub stars](https://img.shields.io/github/stars/brycewang-stanford/pywinsor2.svg?style=social&label=Star)](https://github.com/brycewang-stanford/pywinsor2)\n\nPython implementation of Stata's `winsor2` command for winsorizing and trimming data.\n\n## Installation\n\n```bash\npip install pywinsor2\n```\n\n## Quick Start\n\n```python\nimport pandas as pd\nimport pywinsor2 as pw2\n\n# Load sample data\ndata = pd.DataFrame({\n    'wage': [1.0, 1.5, 2.0, 3.0, 5.0, 8.0, 12.0, 20.0, 50.0, 100.0],\n    'industry': ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'B']\n})\n\n# Winsorize at 1st and 99th percentiles (default)\nresult = pw2.winsor2(data, ['wage'])\n\n# Winsorize with custom cuts\nresult = pw2.winsor2(data, ['wage'], cuts=(5, 95))\n\n# Trim instead of winsorize\nresult = pw2.winsor2(data, ['wage'], trim=True)\n\n# Winsorize by group\nresult = pw2.winsor2(data, ['wage'], by='industry')\n\n# Replace original variables\npw2.winsor2(data, ['wage'], replace=True)\n```\n\n## Features\n\n- **Winsorizing**: Replace extreme values with percentile values\n- **Trimming**: Remove extreme values (set to NaN)\n- **Group-wise processing**: Process data within groups\n- **Flexible percentiles**: Specify custom cut-off percentiles\n- **Multiple variables**: Process multiple columns simultaneously\n- **Stata compatibility**: API designed to match Stata's `winsor2` command\n\n## Main Function\n\n### `winsor2(data, varlist, cuts=(1, 99), suffix=None, replace=False, trim=False, by=None, label=False)`\n\n**Parameters:**\n- `data` (DataFrame): Input pandas DataFrame\n- `varlist` (list): List of column names to process\n- `cuts` (tuple): Percentiles for winsorizing/trimming (default: (1, 99))\n- `suffix` (str): Suffix for new variables (default: '_w' for winsor, '_tr' for trim)\n- `replace` (bool): Replace original variables (default: False)\n- `trim` (bool): Trim instead of winsorize (default: False)\n- `by` (str or list): Group variables for group-wise processing\n- `label` (bool): Add descriptive labels to new columns (default: False)\n\n**Returns:**\n- `DataFrame`: Processed DataFrame with winsorized/trimmed variables\n\n## Examples\n\n### Basic Usage\n\n```python\nimport pandas as pd\nimport pywinsor2 as pw2\n\n# Create sample data\ndata = pd.DataFrame({\n    'wage': [1, 2, 3, 4, 5, 6, 7, 8, 9, 100],  # outlier: 100\n    'age': [20, 25, 30, 35, 40, 45, 50, 55, 60, 25]\n})\n\n# Winsorize at default percentiles (1, 99)\nresult = pw2.winsor2(data, ['wage'])\nprint(result['wage_w'])  # New winsorized variable\n\n# Winsorize multiple variables\nresult = pw2.winsor2(data, ['wage', 'age'], cuts=(5, 95))\n\n# Trim outliers\nresult = pw2.winsor2(data, ['wage'], trim=True, cuts=(10, 90))\nprint(result['wage_tr'])  # Trimmed variable\n```\n\n### Group-wise Processing\n\n```python\n# Winsorize within groups\ndata = pd.DataFrame({\n    'wage': [1, 2, 3, 10, 1, 2, 3, 15],\n    'industry': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']\n})\n\nresult = pw2.winsor2(data, ['wage'], by='industry', cuts=(25, 75))\n```\n\n### Advanced Options\n\n```python\n# Replace original variables\npw2.winsor2(data, ['wage'], replace=True, cuts=(2, 98))\n\n# Custom suffix and labels\nresult = pw2.winsor2(data, ['wage'], suffix='_clean', label=True)\n```\n\n## Comparison with Stata\n\n| Stata Command | Python Equivalent |\n|---------------|-------------------|\n| `winsor2 wage` | `pw2.winsor2(df, ['wage'])` |\n| `winsor2 wage, cuts(5 95)` | `pw2.winsor2(df, ['wage'], cuts=(5, 95))` |\n| `winsor2 wage, trim` | `pw2.winsor2(df, ['wage'], trim=True)` |\n| `winsor2 wage, by(industry)` | `pw2.winsor2(df, ['wage'], by='industry')` |\n| `winsor2 wage, replace` | `pw2.winsor2(df, ['wage'], replace=True)` |\n\n## License\n\nMIT License\n\n## Author\n\nBryce Wang - brycew6m@stanford.edu\n\n## Contributing\n\nContributions are welcome! Please see CONTRIBUTING.md for guidelines.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python implementation of Stata's winsor2 command for winsorizing and trimming data",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/brycewang-stanford/pywinsor2#readme",
        "Homepage": "https://github.com/brycewang-stanford/pywinsor2",
        "Issues": "https://github.com/brycewang-stanford/pywinsor2/issues",
        "Repository": "https://github.com/brycewang-stanford/pywinsor2"
    },
    "split_keywords": [
        "stata",
        " winsor",
        " winsorize",
        " trim",
        " outliers",
        " data-cleaning",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4033e3fabe4b744146a0a48a87b094ea9307128a8f722451d270adcccfe75e69",
                "md5": "493f7c44f0081ade26842084f4db1506",
                "sha256": "06b272315dde39dac0e47478ddaf07a8b13eaed3b4d5e5c0d13f1e5697a9421f"
            },
            "downloads": -1,
            "filename": "pywinsor2-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "493f7c44f0081ade26842084f4db1506",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 8791,
            "upload_time": "2025-07-26T05:43:41",
            "upload_time_iso_8601": "2025-07-26T05:43:41.790309Z",
            "url": "https://files.pythonhosted.org/packages/40/33/e3fabe4b744146a0a48a87b094ea9307128a8f722451d270adcccfe75e69/pywinsor2-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d7d167b4d7c725545310b6f0314b2a268439149c23af23044de1dab4e388b85a",
                "md5": "1bfd59ceaa84f16a1a17d4d010ddac8e",
                "sha256": "b191862b414de345483931ed808bdd47226dab06eaf191842b50979dfc6ff9a0"
            },
            "downloads": -1,
            "filename": "pywinsor2-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1bfd59ceaa84f16a1a17d4d010ddac8e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 7984,
            "upload_time": "2025-07-26T05:43:42",
            "upload_time_iso_8601": "2025-07-26T05:43:42.886938Z",
            "url": "https://files.pythonhosted.org/packages/d7/d1/67b4d7c725545310b6f0314b2a268439149c23af23044de1dab4e388b85a/pywinsor2-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-26 05:43:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "brycewang-stanford",
    "github_project": "pywinsor2#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pywinsor2"
}
        
Elapsed time: 1.87266s