tidyspss


Nametidyspss JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA Python package for quick processing and transforming SPSS files
upload_time2025-08-27 16:19:28
maintainerNone
docs_urlNone
authorAlbert Li
requires_python>=3.12
licenseNone
keywords spss sav data processing pandas pyreadstat
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TidySPSS

A Python package for quick processing, transforming, and managing SPSS (.sav) files with support for Excel and CSV inputs. This package is built on top of pyreadstat and pandas to give you flexible, production-ready template for processing and transforming data files into SPSS format with full metadata control.

## Philosophy

**"Make simple things simple, and complex things possible"**

## 🔄 Processing Flow

```
LOAD → TRANSFORM → CONFIGURE → SAVE
```

1. **LOAD**: Read file with metadata preservation
2. **TRANSFORM**: Apply any pandas operations directly
3. **CONFIGURE**: Set SPSS-specific options
4. **SAVE**: Output with all configurations applied

## Features

- 📁 **Multi-format support**: Read from SPSS (.sav/.zsav), Excel (.xlsx/.xls), and CSV files
- 🔄 **Comprehensive transformations**: Reorder, rename, drop, and keep columns with ease
- 🏷️ **Metadata management**: Full support for SPSS labels, formats, measures, and display widths
- 🔧 **Value replacement**: Replace specific values across columns
- 📊 **Column positioning**: Advanced column reordering with range specifications
- 🌍 **Encoding support**: Automatic handling of multiple character encodings
- 🔧 **Production-ready**: Comprehensive logging and error handling

## Installation

Install using pip:

```bash
pip install tidyspss
```

Or using uv:

```bash
uv add tidyspss
```

## Quick Start

### Basic Usage

```python
from tidyspss import read_input_file, process_and_save

# Read a file (automatically detects format)
df, meta = read_input_file("data.sav")  # or .xlsx, .csv

# Process and save with transformations
df, meta = process_and_save(
    df=df,
    meta=meta,
    output_path="output.sav",
    user_variable_rename={"old_name": "new_name"},
    user_variable_drop=["unwanted_col1", "unwanted_col2"],
    user_column_labels={"Q1": "Question 1", "Q2": "Question 2"}
)
```



## API Reference

### Main Functions

#### `read_input_file(file_path)`
Reads a file into a pandas DataFrame with metadata.
- Supports: .sav, .zsav, .xlsx, .xls, .csv
- Returns: `(DataFrame, metadata)` tuple

#### `process_and_save(df, meta, output_path, **kwargs)`
Processes DataFrame with configurations and saves to SPSS format.

**Parameters:**
- `df`: Input DataFrame
- `meta`: Metadata from SPSS file (or None)
- `output_path`: Path for output .sav file
- `user_column_position`: Dict for column reordering
- `user_variable_drop`: List of columns to drop
- `user_variable_keep`: List of columns to keep (drops all others)
- `user_variable_rename`: Dict for renaming columns
- `user_value_replacement`: Dict for replacing values
- `user_column_labels`: Dict of column labels
- `user_variable_value_labels`: Dict of value labels
- `user_variable_format`: Dict of variable formats
- `user_variable_measure`: Dict of variable measures
- `user_variable_display_width`: Dict of display widths
- `user_missing_ranges`: Dict of missing value ranges
- `user_note`: File note string
- `user_file_label`: File label string
- `user_compress`: Boolean for file compression
- `user_row_compress`: Boolean for row compression


## Requirements

- Python ≥ 3.12
- pandas ≥ 2.3.0
- pyreadstat ≥ 1.3.0
- openpyxl ≥ 3.0.0

## License

MIT License - see LICENSE file for details.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tidyspss",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "spss, sav, data, processing, pandas, pyreadstat",
    "author": "Albert Li",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/c2/a4/98d8f76314f6ba8ef69f867e7d998c37fd22cb3fdb0565521b8722bfc236/tidyspss-0.1.1.tar.gz",
    "platform": null,
    "description": "# TidySPSS\r\n\r\nA Python package for quick processing, transforming, and managing SPSS (.sav) files with support for Excel and CSV inputs. This package is built on top of pyreadstat and pandas to give you flexible, production-ready template for processing and transforming data files into SPSS format with full metadata control.\r\n\r\n## Philosophy\r\n\r\n**\"Make simple things simple, and complex things possible\"**\r\n\r\n## \ud83d\udd04 Processing Flow\r\n\r\n```\r\nLOAD \u2192 TRANSFORM \u2192 CONFIGURE \u2192 SAVE\r\n```\r\n\r\n1. **LOAD**: Read file with metadata preservation\r\n2. **TRANSFORM**: Apply any pandas operations directly\r\n3. **CONFIGURE**: Set SPSS-specific options\r\n4. **SAVE**: Output with all configurations applied\r\n\r\n## Features\r\n\r\n- \ud83d\udcc1 **Multi-format support**: Read from SPSS (.sav/.zsav), Excel (.xlsx/.xls), and CSV files\r\n- \ud83d\udd04 **Comprehensive transformations**: Reorder, rename, drop, and keep columns with ease\r\n- \ud83c\udff7\ufe0f **Metadata management**: Full support for SPSS labels, formats, measures, and display widths\r\n- \ud83d\udd27 **Value replacement**: Replace specific values across columns\r\n- \ud83d\udcca **Column positioning**: Advanced column reordering with range specifications\r\n- \ud83c\udf0d **Encoding support**: Automatic handling of multiple character encodings\r\n- \ud83d\udd27 **Production-ready**: Comprehensive logging and error handling\r\n\r\n## Installation\r\n\r\nInstall using pip:\r\n\r\n```bash\r\npip install tidyspss\r\n```\r\n\r\nOr using uv:\r\n\r\n```bash\r\nuv add tidyspss\r\n```\r\n\r\n## Quick Start\r\n\r\n### Basic Usage\r\n\r\n```python\r\nfrom tidyspss import read_input_file, process_and_save\r\n\r\n# Read a file (automatically detects format)\r\ndf, meta = read_input_file(\"data.sav\")  # or .xlsx, .csv\r\n\r\n# Process and save with transformations\r\ndf, meta = process_and_save(\r\n    df=df,\r\n    meta=meta,\r\n    output_path=\"output.sav\",\r\n    user_variable_rename={\"old_name\": \"new_name\"},\r\n    user_variable_drop=[\"unwanted_col1\", \"unwanted_col2\"],\r\n    user_column_labels={\"Q1\": \"Question 1\", \"Q2\": \"Question 2\"}\r\n)\r\n```\r\n\r\n\r\n\r\n## API Reference\r\n\r\n### Main Functions\r\n\r\n#### `read_input_file(file_path)`\r\nReads a file into a pandas DataFrame with metadata.\r\n- Supports: .sav, .zsav, .xlsx, .xls, .csv\r\n- Returns: `(DataFrame, metadata)` tuple\r\n\r\n#### `process_and_save(df, meta, output_path, **kwargs)`\r\nProcesses DataFrame with configurations and saves to SPSS format.\r\n\r\n**Parameters:**\r\n- `df`: Input DataFrame\r\n- `meta`: Metadata from SPSS file (or None)\r\n- `output_path`: Path for output .sav file\r\n- `user_column_position`: Dict for column reordering\r\n- `user_variable_drop`: List of columns to drop\r\n- `user_variable_keep`: List of columns to keep (drops all others)\r\n- `user_variable_rename`: Dict for renaming columns\r\n- `user_value_replacement`: Dict for replacing values\r\n- `user_column_labels`: Dict of column labels\r\n- `user_variable_value_labels`: Dict of value labels\r\n- `user_variable_format`: Dict of variable formats\r\n- `user_variable_measure`: Dict of variable measures\r\n- `user_variable_display_width`: Dict of display widths\r\n- `user_missing_ranges`: Dict of missing value ranges\r\n- `user_note`: File note string\r\n- `user_file_label`: File label string\r\n- `user_compress`: Boolean for file compression\r\n- `user_row_compress`: Boolean for row compression\r\n\r\n\r\n## Requirements\r\n\r\n- Python \u2265 3.12\r\n- pandas \u2265 2.3.0\r\n- pyreadstat \u2265 1.3.0\r\n- openpyxl \u2265 3.0.0\r\n\r\n## License\r\n\r\nMIT License - see LICENSE file for details.",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for quick processing and transforming SPSS files",
    "version": "0.1.1",
    "project_urls": null,
    "split_keywords": [
        "spss",
        " sav",
        " data",
        " processing",
        " pandas",
        " pyreadstat"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "598ebde4eb2d43cada27fe8ad8d394b308c4f03b918851c6611f8f8049d095df",
                "md5": "ef8931ff910e8ba78843fd47676701cf",
                "sha256": "40e44a24c429a9e54633baa9bbbd5c4ced09af2cc231b62d312943c692bac078"
            },
            "downloads": -1,
            "filename": "tidyspss-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ef8931ff910e8ba78843fd47676701cf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 9180,
            "upload_time": "2025-08-27T16:19:27",
            "upload_time_iso_8601": "2025-08-27T16:19:27.705387Z",
            "url": "https://files.pythonhosted.org/packages/59/8e/bde4eb2d43cada27fe8ad8d394b308c4f03b918851c6611f8f8049d095df/tidyspss-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c2a498d8f76314f6ba8ef69f867e7d998c37fd22cb3fdb0565521b8722bfc236",
                "md5": "44392c4445439ff22a57e8e2cc482d97",
                "sha256": "be9154db5e35ed87dbf54951fcaf5186db017b37dd44c50f4c8cae7707e1f4b1"
            },
            "downloads": -1,
            "filename": "tidyspss-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "44392c4445439ff22a57e8e2cc482d97",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 7895,
            "upload_time": "2025-08-27T16:19:28",
            "upload_time_iso_8601": "2025-08-27T16:19:28.416005Z",
            "url": "https://files.pythonhosted.org/packages/c2/a4/98d8f76314f6ba8ef69f867e7d998c37fd22cb3fdb0565521b8722bfc236/tidyspss-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-27 16:19:28",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tidyspss"
}
        
Elapsed time: 0.95463s