# TidySPSS
A Python package for quick processing, transforming, and managing SPSS (.sav) files with support for Excel and CSV inputs. This package is built on top of pyreadstat and pandas to give you flexible, production-ready template for processing and transforming data files into SPSS format with full metadata control.
## Philosophy
**"Make simple things simple, and complex things possible"**
## 🔄 Processing Flow
```
LOAD → TRANSFORM → CONFIGURE → SAVE
```
1. **LOAD**: Read file with metadata preservation
2. **TRANSFORM**: Apply any pandas operations directly
3. **CONFIGURE**: Set SPSS-specific options
4. **SAVE**: Output with all configurations applied
## Features
- 📁 **Multi-format support**: Read from SPSS (.sav/.zsav), Excel (.xlsx/.xls), and CSV files
- 🔄 **Comprehensive transformations**: Reorder, rename, drop, and keep columns with ease
- 🏷️ **Metadata management**: Full support for SPSS labels, formats, measures, and display widths
- 🔧 **Value replacement**: Replace specific values across columns
- 📊 **Column positioning**: Advanced column reordering with range specifications
- 🌍 **Encoding support**: Automatic handling of multiple character encodings
- 🔧 **Production-ready**: Comprehensive logging and error handling
## Installation
Install using pip:
```bash
pip install tidyspss
```
Or using uv:
```bash
uv add tidyspss
```
## Quick Start
### Basic Usage
```python
from tidyspss import read_input_file, process_and_save
# Read a file (automatically detects format)
df, meta = read_input_file("data.sav") # or .xlsx, .csv
# Process and save with transformations
df, meta = process_and_save(
df=df,
meta=meta,
output_path="output.sav",
user_variable_rename={"old_name": "new_name"},
user_variable_drop=["unwanted_col1", "unwanted_col2"],
user_column_labels={"Q1": "Question 1", "Q2": "Question 2"}
)
```
## API Reference
### Main Functions
#### `read_input_file(file_path)`
Reads a file into a pandas DataFrame with metadata.
- Supports: .sav, .zsav, .xlsx, .xls, .csv
- Returns: `(DataFrame, metadata)` tuple
#### `process_and_save(df, meta, output_path, **kwargs)`
Processes DataFrame with configurations and saves to SPSS format.
**Parameters:**
- `df`: Input DataFrame
- `meta`: Metadata from SPSS file (or None)
- `output_path`: Path for output .sav file
- `user_column_position`: Dict for column reordering
- `user_variable_drop`: List of columns to drop
- `user_variable_keep`: List of columns to keep (drops all others)
- `user_variable_rename`: Dict for renaming columns
- `user_value_replacement`: Dict for replacing values
- `user_column_labels`: Dict of column labels
- `user_variable_value_labels`: Dict of value labels
- `user_variable_format`: Dict of variable formats
- `user_variable_measure`: Dict of variable measures
- `user_variable_display_width`: Dict of display widths
- `user_missing_ranges`: Dict of missing value ranges
- `user_note`: File note string
- `user_file_label`: File label string
- `user_compress`: Boolean for file compression
- `user_row_compress`: Boolean for row compression
## Requirements
- Python ≥ 3.12
- pandas ≥ 2.3.0
- pyreadstat ≥ 1.3.0
- openpyxl ≥ 3.0.0
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "tidyspss",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "spss, sav, data, processing, pandas, pyreadstat",
"author": "Albert Li",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/c2/a4/98d8f76314f6ba8ef69f867e7d998c37fd22cb3fdb0565521b8722bfc236/tidyspss-0.1.1.tar.gz",
"platform": null,
"description": "# TidySPSS\r\n\r\nA Python package for quick processing, transforming, and managing SPSS (.sav) files with support for Excel and CSV inputs. This package is built on top of pyreadstat and pandas to give you flexible, production-ready template for processing and transforming data files into SPSS format with full metadata control.\r\n\r\n## Philosophy\r\n\r\n**\"Make simple things simple, and complex things possible\"**\r\n\r\n## \ud83d\udd04 Processing Flow\r\n\r\n```\r\nLOAD \u2192 TRANSFORM \u2192 CONFIGURE \u2192 SAVE\r\n```\r\n\r\n1. **LOAD**: Read file with metadata preservation\r\n2. **TRANSFORM**: Apply any pandas operations directly\r\n3. **CONFIGURE**: Set SPSS-specific options\r\n4. **SAVE**: Output with all configurations applied\r\n\r\n## Features\r\n\r\n- \ud83d\udcc1 **Multi-format support**: Read from SPSS (.sav/.zsav), Excel (.xlsx/.xls), and CSV files\r\n- \ud83d\udd04 **Comprehensive transformations**: Reorder, rename, drop, and keep columns with ease\r\n- \ud83c\udff7\ufe0f **Metadata management**: Full support for SPSS labels, formats, measures, and display widths\r\n- \ud83d\udd27 **Value replacement**: Replace specific values across columns\r\n- \ud83d\udcca **Column positioning**: Advanced column reordering with range specifications\r\n- \ud83c\udf0d **Encoding support**: Automatic handling of multiple character encodings\r\n- \ud83d\udd27 **Production-ready**: Comprehensive logging and error handling\r\n\r\n## Installation\r\n\r\nInstall using pip:\r\n\r\n```bash\r\npip install tidyspss\r\n```\r\n\r\nOr using uv:\r\n\r\n```bash\r\nuv add tidyspss\r\n```\r\n\r\n## Quick Start\r\n\r\n### Basic Usage\r\n\r\n```python\r\nfrom tidyspss import read_input_file, process_and_save\r\n\r\n# Read a file (automatically detects format)\r\ndf, meta = read_input_file(\"data.sav\") # or .xlsx, .csv\r\n\r\n# Process and save with transformations\r\ndf, meta = process_and_save(\r\n df=df,\r\n meta=meta,\r\n output_path=\"output.sav\",\r\n user_variable_rename={\"old_name\": \"new_name\"},\r\n user_variable_drop=[\"unwanted_col1\", \"unwanted_col2\"],\r\n user_column_labels={\"Q1\": \"Question 1\", \"Q2\": \"Question 2\"}\r\n)\r\n```\r\n\r\n\r\n\r\n## API Reference\r\n\r\n### Main Functions\r\n\r\n#### `read_input_file(file_path)`\r\nReads a file into a pandas DataFrame with metadata.\r\n- Supports: .sav, .zsav, .xlsx, .xls, .csv\r\n- Returns: `(DataFrame, metadata)` tuple\r\n\r\n#### `process_and_save(df, meta, output_path, **kwargs)`\r\nProcesses DataFrame with configurations and saves to SPSS format.\r\n\r\n**Parameters:**\r\n- `df`: Input DataFrame\r\n- `meta`: Metadata from SPSS file (or None)\r\n- `output_path`: Path for output .sav file\r\n- `user_column_position`: Dict for column reordering\r\n- `user_variable_drop`: List of columns to drop\r\n- `user_variable_keep`: List of columns to keep (drops all others)\r\n- `user_variable_rename`: Dict for renaming columns\r\n- `user_value_replacement`: Dict for replacing values\r\n- `user_column_labels`: Dict of column labels\r\n- `user_variable_value_labels`: Dict of value labels\r\n- `user_variable_format`: Dict of variable formats\r\n- `user_variable_measure`: Dict of variable measures\r\n- `user_variable_display_width`: Dict of display widths\r\n- `user_missing_ranges`: Dict of missing value ranges\r\n- `user_note`: File note string\r\n- `user_file_label`: File label string\r\n- `user_compress`: Boolean for file compression\r\n- `user_row_compress`: Boolean for row compression\r\n\r\n\r\n## Requirements\r\n\r\n- Python \u2265 3.12\r\n- pandas \u2265 2.3.0\r\n- pyreadstat \u2265 1.3.0\r\n- openpyxl \u2265 3.0.0\r\n\r\n## License\r\n\r\nMIT License - see LICENSE file for details.",
"bugtrack_url": null,
"license": null,
"summary": "A Python package for quick processing and transforming SPSS files",
"version": "0.1.1",
"project_urls": null,
"split_keywords": [
"spss",
" sav",
" data",
" processing",
" pandas",
" pyreadstat"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "598ebde4eb2d43cada27fe8ad8d394b308c4f03b918851c6611f8f8049d095df",
"md5": "ef8931ff910e8ba78843fd47676701cf",
"sha256": "40e44a24c429a9e54633baa9bbbd5c4ced09af2cc231b62d312943c692bac078"
},
"downloads": -1,
"filename": "tidyspss-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ef8931ff910e8ba78843fd47676701cf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 9180,
"upload_time": "2025-08-27T16:19:27",
"upload_time_iso_8601": "2025-08-27T16:19:27.705387Z",
"url": "https://files.pythonhosted.org/packages/59/8e/bde4eb2d43cada27fe8ad8d394b308c4f03b918851c6611f8f8049d095df/tidyspss-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c2a498d8f76314f6ba8ef69f867e7d998c37fd22cb3fdb0565521b8722bfc236",
"md5": "44392c4445439ff22a57e8e2cc482d97",
"sha256": "be9154db5e35ed87dbf54951fcaf5186db017b37dd44c50f4c8cae7707e1f4b1"
},
"downloads": -1,
"filename": "tidyspss-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "44392c4445439ff22a57e8e2cc482d97",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 7895,
"upload_time": "2025-08-27T16:19:28",
"upload_time_iso_8601": "2025-08-27T16:19:28.416005Z",
"url": "https://files.pythonhosted.org/packages/c2/a4/98d8f76314f6ba8ef69f867e7d998c37fd22cb3fdb0565521b8722bfc236/tidyspss-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-27 16:19:28",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "tidyspss"
}