# pywinsor2
[](https://badge.fury.io/py/pywinsor2)
[](https://pepy.tech/project/pywinsor2)
[](https://pepy.tech/project/pywinsor2)
[](https://pepy.tech/project/pywinsor2)
[](https://pypi.org/project/pywinsor2/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/brycewang-stanford/pywinsor2)
Python implementation of Stata's `winsor2` command for winsorizing and trimming data.
## Installation
```bash
pip install pywinsor2
```
## Quick Start
```python
import pandas as pd
import pywinsor2 as pw2
# Load sample data
data = pd.DataFrame({
'wage': [1.0, 1.5, 2.0, 3.0, 5.0, 8.0, 12.0, 20.0, 50.0, 100.0],
'industry': ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'B']
})
# Winsorize at 1st and 99th percentiles (default)
result = pw2.winsor2(data, ['wage'])
# Winsorize with custom cuts
result = pw2.winsor2(data, ['wage'], cuts=(5, 95))
# Trim instead of winsorize
result = pw2.winsor2(data, ['wage'], trim=True)
# Winsorize by group
result = pw2.winsor2(data, ['wage'], by='industry')
# Replace original variables
pw2.winsor2(data, ['wage'], replace=True)
```
## Features
- **Winsorizing**: Replace extreme values with percentile values
- **Trimming**: Remove extreme values (set to NaN)
- **Group-wise processing**: Process data within groups
- **Flexible percentiles**: Specify custom cut-off percentiles
- **Multiple variables**: Process multiple columns simultaneously
- **Stata compatibility**: API designed to match Stata's `winsor2` command
## Main Function
### `winsor2(data, varlist, cuts=(1, 99), suffix=None, replace=False, trim=False, by=None, label=False)`
**Parameters:**
- `data` (DataFrame): Input pandas DataFrame
- `varlist` (list): List of column names to process
- `cuts` (tuple): Percentiles for winsorizing/trimming (default: (1, 99))
- `suffix` (str): Suffix for new variables (default: '_w' for winsor, '_tr' for trim)
- `replace` (bool): Replace original variables (default: False)
- `trim` (bool): Trim instead of winsorize (default: False)
- `by` (str or list): Group variables for group-wise processing
- `label` (bool): Add descriptive labels to new columns (default: False)
**Returns:**
- `DataFrame`: Processed DataFrame with winsorized/trimmed variables
## Examples
### Basic Usage
```python
import pandas as pd
import pywinsor2 as pw2
# Create sample data
data = pd.DataFrame({
'wage': [1, 2, 3, 4, 5, 6, 7, 8, 9, 100], # outlier: 100
'age': [20, 25, 30, 35, 40, 45, 50, 55, 60, 25]
})
# Winsorize at default percentiles (1, 99)
result = pw2.winsor2(data, ['wage'])
print(result['wage_w']) # New winsorized variable
# Winsorize multiple variables
result = pw2.winsor2(data, ['wage', 'age'], cuts=(5, 95))
# Trim outliers
result = pw2.winsor2(data, ['wage'], trim=True, cuts=(10, 90))
print(result['wage_tr']) # Trimmed variable
```
### Group-wise Processing
```python
# Winsorize within groups
data = pd.DataFrame({
'wage': [1, 2, 3, 10, 1, 2, 3, 15],
'industry': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
})
result = pw2.winsor2(data, ['wage'], by='industry', cuts=(25, 75))
```
### Advanced Options
```python
# Replace original variables
pw2.winsor2(data, ['wage'], replace=True, cuts=(2, 98))
# Custom suffix and labels
result = pw2.winsor2(data, ['wage'], suffix='_clean', label=True)
```
## Comparison with Stata
| Stata Command | Python Equivalent |
|---------------|-------------------|
| `winsor2 wage` | `pw2.winsor2(df, ['wage'])` |
| `winsor2 wage, cuts(5 95)` | `pw2.winsor2(df, ['wage'], cuts=(5, 95))` |
| `winsor2 wage, trim` | `pw2.winsor2(df, ['wage'], trim=True)` |
| `winsor2 wage, by(industry)` | `pw2.winsor2(df, ['wage'], by='industry')` |
| `winsor2 wage, replace` | `pw2.winsor2(df, ['wage'], replace=True)` |
## License
MIT License
## Author
Bryce Wang - brycew6m@stanford.edu
## Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Raw data
{
"_id": null,
"home_page": null,
"name": "pywinsor2",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Bryce Wang <brycew6m@stanford.edu>",
"keywords": "stata, winsor, winsorize, trim, outliers, data-cleaning, pandas",
"author": null,
"author_email": "Bryce Wang <brycew6m@stanford.edu>",
"download_url": "https://files.pythonhosted.org/packages/d7/d1/67b4d7c725545310b6f0314b2a268439149c23af23044de1dab4e388b85a/pywinsor2-0.1.1.tar.gz",
"platform": null,
"description": "# pywinsor2\n\n[](https://badge.fury.io/py/pywinsor2)\n[](https://pepy.tech/project/pywinsor2)\n[](https://pepy.tech/project/pywinsor2)\n[](https://pepy.tech/project/pywinsor2)\n[](https://pypi.org/project/pywinsor2/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/brycewang-stanford/pywinsor2)\n\nPython implementation of Stata's `winsor2` command for winsorizing and trimming data.\n\n## Installation\n\n```bash\npip install pywinsor2\n```\n\n## Quick Start\n\n```python\nimport pandas as pd\nimport pywinsor2 as pw2\n\n# Load sample data\ndata = pd.DataFrame({\n 'wage': [1.0, 1.5, 2.0, 3.0, 5.0, 8.0, 12.0, 20.0, 50.0, 100.0],\n 'industry': ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'B']\n})\n\n# Winsorize at 1st and 99th percentiles (default)\nresult = pw2.winsor2(data, ['wage'])\n\n# Winsorize with custom cuts\nresult = pw2.winsor2(data, ['wage'], cuts=(5, 95))\n\n# Trim instead of winsorize\nresult = pw2.winsor2(data, ['wage'], trim=True)\n\n# Winsorize by group\nresult = pw2.winsor2(data, ['wage'], by='industry')\n\n# Replace original variables\npw2.winsor2(data, ['wage'], replace=True)\n```\n\n## Features\n\n- **Winsorizing**: Replace extreme values with percentile values\n- **Trimming**: Remove extreme values (set to NaN)\n- **Group-wise processing**: Process data within groups\n- **Flexible percentiles**: Specify custom cut-off percentiles\n- **Multiple variables**: Process multiple columns simultaneously\n- **Stata compatibility**: API designed to match Stata's `winsor2` command\n\n## Main Function\n\n### `winsor2(data, varlist, cuts=(1, 99), suffix=None, replace=False, trim=False, by=None, label=False)`\n\n**Parameters:**\n- `data` (DataFrame): Input pandas DataFrame\n- `varlist` (list): List of column names to process\n- `cuts` (tuple): Percentiles for winsorizing/trimming (default: (1, 99))\n- `suffix` (str): Suffix for new variables (default: '_w' for winsor, '_tr' for trim)\n- `replace` (bool): Replace original variables (default: False)\n- `trim` (bool): Trim instead of winsorize (default: False)\n- `by` (str or list): Group variables for group-wise processing\n- `label` (bool): Add descriptive labels to new columns (default: False)\n\n**Returns:**\n- `DataFrame`: Processed DataFrame with winsorized/trimmed variables\n\n## Examples\n\n### Basic Usage\n\n```python\nimport pandas as pd\nimport pywinsor2 as pw2\n\n# Create sample data\ndata = pd.DataFrame({\n 'wage': [1, 2, 3, 4, 5, 6, 7, 8, 9, 100], # outlier: 100\n 'age': [20, 25, 30, 35, 40, 45, 50, 55, 60, 25]\n})\n\n# Winsorize at default percentiles (1, 99)\nresult = pw2.winsor2(data, ['wage'])\nprint(result['wage_w']) # New winsorized variable\n\n# Winsorize multiple variables\nresult = pw2.winsor2(data, ['wage', 'age'], cuts=(5, 95))\n\n# Trim outliers\nresult = pw2.winsor2(data, ['wage'], trim=True, cuts=(10, 90))\nprint(result['wage_tr']) # Trimmed variable\n```\n\n### Group-wise Processing\n\n```python\n# Winsorize within groups\ndata = pd.DataFrame({\n 'wage': [1, 2, 3, 10, 1, 2, 3, 15],\n 'industry': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']\n})\n\nresult = pw2.winsor2(data, ['wage'], by='industry', cuts=(25, 75))\n```\n\n### Advanced Options\n\n```python\n# Replace original variables\npw2.winsor2(data, ['wage'], replace=True, cuts=(2, 98))\n\n# Custom suffix and labels\nresult = pw2.winsor2(data, ['wage'], suffix='_clean', label=True)\n```\n\n## Comparison with Stata\n\n| Stata Command | Python Equivalent |\n|---------------|-------------------|\n| `winsor2 wage` | `pw2.winsor2(df, ['wage'])` |\n| `winsor2 wage, cuts(5 95)` | `pw2.winsor2(df, ['wage'], cuts=(5, 95))` |\n| `winsor2 wage, trim` | `pw2.winsor2(df, ['wage'], trim=True)` |\n| `winsor2 wage, by(industry)` | `pw2.winsor2(df, ['wage'], by='industry')` |\n| `winsor2 wage, replace` | `pw2.winsor2(df, ['wage'], replace=True)` |\n\n## License\n\nMIT License\n\n## Author\n\nBryce Wang - brycew6m@stanford.edu\n\n## Contributing\n\nContributions are welcome! Please see CONTRIBUTING.md for guidelines.\n",
"bugtrack_url": null,
"license": null,
"summary": "Python implementation of Stata's winsor2 command for winsorizing and trimming data",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://github.com/brycewang-stanford/pywinsor2#readme",
"Homepage": "https://github.com/brycewang-stanford/pywinsor2",
"Issues": "https://github.com/brycewang-stanford/pywinsor2/issues",
"Repository": "https://github.com/brycewang-stanford/pywinsor2"
},
"split_keywords": [
"stata",
" winsor",
" winsorize",
" trim",
" outliers",
" data-cleaning",
" pandas"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4033e3fabe4b744146a0a48a87b094ea9307128a8f722451d270adcccfe75e69",
"md5": "493f7c44f0081ade26842084f4db1506",
"sha256": "06b272315dde39dac0e47478ddaf07a8b13eaed3b4d5e5c0d13f1e5697a9421f"
},
"downloads": -1,
"filename": "pywinsor2-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "493f7c44f0081ade26842084f4db1506",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 8791,
"upload_time": "2025-07-26T05:43:41",
"upload_time_iso_8601": "2025-07-26T05:43:41.790309Z",
"url": "https://files.pythonhosted.org/packages/40/33/e3fabe4b744146a0a48a87b094ea9307128a8f722451d270adcccfe75e69/pywinsor2-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d7d167b4d7c725545310b6f0314b2a268439149c23af23044de1dab4e388b85a",
"md5": "1bfd59ceaa84f16a1a17d4d010ddac8e",
"sha256": "b191862b414de345483931ed808bdd47226dab06eaf191842b50979dfc6ff9a0"
},
"downloads": -1,
"filename": "pywinsor2-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "1bfd59ceaa84f16a1a17d4d010ddac8e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 7984,
"upload_time": "2025-07-26T05:43:42",
"upload_time_iso_8601": "2025-07-26T05:43:42.886938Z",
"url": "https://files.pythonhosted.org/packages/d7/d1/67b4d7c725545310b6f0314b2a268439149c23af23044de1dab4e388b85a/pywinsor2-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-26 05:43:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "brycewang-stanford",
"github_project": "pywinsor2#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pywinsor2"
}