datacompose


Namedatacompose JSON
Version 0.2.6.1 PyPI version JSON
download
home_pageNone
SummaryCopy-pasteable data transformation primitives for PySpark. Inspired by shadcn-svelte.
upload_time2025-08-25 16:54:23
maintainerDatacompose Contributors
docs_urlNone
authorDatacompose Contributors
requires_python>=3.8
licenseMIT
keywords data-cleaning data-quality udf spark postgres code-generation data-pipeline etl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Datacompose

[![PyPI version](https://badge.fury.io/py/datacompose.svg)](https://pypi.org/project/datacompose/)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![Coverage](https://img.shields.io/badge/coverage-92%25-brightgreen.svg)](https://github.com/your-username/datacompose)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A powerful data transformation framework for building reusable, composable data cleaning pipelines in PySpark.

## Installation

```bash
pip install datacompose
```

## What is Datacompose?

Datacompose provides production-ready PySpark data transformation primitives that become part of YOUR codebase. Inspired by [shadcn](https://ui.shadcn.com/)'s approach to components, we believe in giving you full ownership and control over your code.

### Key Features

- **No Runtime Dependencies**: Standalone PySpark code that runs without Datacompose
- **Composable Primitives**: Build complex transformations from simple, reusable functions
- **Smart Partial Application**: Pre-configure transformations with parameters for reuse
- **Optimized Operations**: Efficient Spark transformations with minimal overhead
- **Comprehensive Libraries**: Pre-built primitives for emails, addresses, and phone numbers

### Available Transformers

- **Emails**: Validation, extraction, standardization, typo correction
- **Addresses**: Street parsing, state/zip validation, PO Box detection  
- **Phone Numbers**: NANP/international validation, formatting, toll-free detection

## Documentation

For detailed documentation, examples, and API reference, visit [datacompose.io](https://datacompose.io).

## Philosophy

This is NOT a traditional library - it gives you production-ready data transformation primitives that you can modify to fit your exact needs. You own the code, with no external dependencies to manage or worry about breaking changes.

## License

MIT License - see LICENSE file for details

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "datacompose",
    "maintainer": "Datacompose Contributors",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "data-cleaning, data-quality, udf, spark, postgres, code-generation, data-pipeline, etl",
    "author": "Datacompose Contributors",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/23/40/a9aaca73f06b8a5d310bfcdca4cec70add4241bf3cadb20b07dd4548ef8c/datacompose-0.2.6.1.tar.gz",
    "platform": null,
    "description": "# Datacompose\n\n[![PyPI version](https://badge.fury.io/py/datacompose.svg)](https://pypi.org/project/datacompose/)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![Coverage](https://img.shields.io/badge/coverage-92%25-brightgreen.svg)](https://github.com/your-username/datacompose)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA powerful data transformation framework for building reusable, composable data cleaning pipelines in PySpark.\n\n## Installation\n\n```bash\npip install datacompose\n```\n\n## What is Datacompose?\n\nDatacompose provides production-ready PySpark data transformation primitives that become part of YOUR codebase. Inspired by [shadcn](https://ui.shadcn.com/)'s approach to components, we believe in giving you full ownership and control over your code.\n\n### Key Features\n\n- **No Runtime Dependencies**: Standalone PySpark code that runs without Datacompose\n- **Composable Primitives**: Build complex transformations from simple, reusable functions\n- **Smart Partial Application**: Pre-configure transformations with parameters for reuse\n- **Optimized Operations**: Efficient Spark transformations with minimal overhead\n- **Comprehensive Libraries**: Pre-built primitives for emails, addresses, and phone numbers\n\n### Available Transformers\n\n- **Emails**: Validation, extraction, standardization, typo correction\n- **Addresses**: Street parsing, state/zip validation, PO Box detection  \n- **Phone Numbers**: NANP/international validation, formatting, toll-free detection\n\n## Documentation\n\nFor detailed documentation, examples, and API reference, visit [datacompose.io](https://datacompose.io).\n\n## Philosophy\n\nThis is NOT a traditional library - it gives you production-ready data transformation primitives that you can modify to fit your exact needs. You own the code, with no external dependencies to manage or worry about breaking changes.\n\n## License\n\nMIT License - see LICENSE file for details\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Copy-pasteable data transformation primitives for PySpark. Inspired by shadcn-svelte.",
    "version": "0.2.6.1",
    "project_urls": {
        "Changelog": "https://github.com/tc-cole/datacompose/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/tc-cole/datacompose/tree/main/docs",
        "Homepage": "https://github.com/tc-cole/datacompose",
        "Issues": "https://github.com/tc-cole/datacompose/issues",
        "Repository": "https://github.com/tc-cole/datacompose.git"
    },
    "split_keywords": [
        "data-cleaning",
        " data-quality",
        " udf",
        " spark",
        " postgres",
        " code-generation",
        " data-pipeline",
        " etl"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cd5af305ceb4a8cf9bbed29ab94b3e718c5f9cac0a84ea49ab37f75114a1047b",
                "md5": "20e7691c79fa7d3d479cf6958c2b614c",
                "sha256": "961b92cf66762dd0528a682f9a741b89624809dcd4d9c5e367a5d125591c2640"
            },
            "downloads": -1,
            "filename": "datacompose-0.2.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "20e7691c79fa7d3d479cf6958c2b614c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 54055,
            "upload_time": "2025-08-25T16:54:22",
            "upload_time_iso_8601": "2025-08-25T16:54:22.570982Z",
            "url": "https://files.pythonhosted.org/packages/cd/5a/f305ceb4a8cf9bbed29ab94b3e718c5f9cac0a84ea49ab37f75114a1047b/datacompose-0.2.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2340a9aaca73f06b8a5d310bfcdca4cec70add4241bf3cadb20b07dd4548ef8c",
                "md5": "c45e5908bf3b6c674a251f749899c004",
                "sha256": "8d4e27c023578a9d2a7498a88a8e5815904fded5c19e4ba472347e0edce1d819"
            },
            "downloads": -1,
            "filename": "datacompose-0.2.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c45e5908bf3b6c674a251f749899c004",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 148950,
            "upload_time": "2025-08-25T16:54:23",
            "upload_time_iso_8601": "2025-08-25T16:54:23.835884Z",
            "url": "https://files.pythonhosted.org/packages/23/40/a9aaca73f06b8a5d310bfcdca4cec70add4241bf3cadb20b07dd4548ef8c/datacompose-0.2.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-25 16:54:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tc-cole",
    "github_project": "datacompose",
    "github_not_found": true,
    "lcname": "datacompose"
}
        
Elapsed time: 2.58201s