# Datacompose
[](https://pypi.org/project/datacompose/)
[](https://www.python.org/downloads/)
[](https://github.com/your-username/datacompose)
[](https://opensource.org/licenses/MIT)
A powerful data transformation framework for building reusable, composable data cleaning pipelines in PySpark.
## Installation
```bash
pip install datacompose
```
## What is Datacompose?
Datacompose provides production-ready PySpark data transformation primitives that become part of YOUR codebase. Inspired by [shadcn](https://ui.shadcn.com/)'s approach to components, we believe in giving you full ownership and control over your code.
### Key Features
- **No Runtime Dependencies**: Standalone PySpark code that runs without Datacompose
- **Composable Primitives**: Build complex transformations from simple, reusable functions
- **Smart Partial Application**: Pre-configure transformations with parameters for reuse
- **Optimized Operations**: Efficient Spark transformations with minimal overhead
- **Comprehensive Libraries**: Pre-built primitives for emails, addresses, and phone numbers
### Available Transformers
- **Emails**: Validation, extraction, standardization, typo correction
- **Addresses**: Street parsing, state/zip validation, PO Box detection
- **Phone Numbers**: NANP/international validation, formatting, toll-free detection
## Documentation
For detailed documentation, examples, and API reference, visit [datacompose.io](https://datacompose.io).
## Philosophy
This is NOT a traditional library - it gives you production-ready data transformation primitives that you can modify to fit your exact needs. You own the code, with no external dependencies to manage or worry about breaking changes.
## License
MIT License - see LICENSE file for details
Raw data
{
"_id": null,
"home_page": null,
"name": "datacompose",
"maintainer": "Datacompose Contributors",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "data-cleaning, data-quality, udf, spark, postgres, code-generation, data-pipeline, etl",
"author": "Datacompose Contributors",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/23/40/a9aaca73f06b8a5d310bfcdca4cec70add4241bf3cadb20b07dd4548ef8c/datacompose-0.2.6.1.tar.gz",
"platform": null,
"description": "# Datacompose\n\n[](https://pypi.org/project/datacompose/)\n[](https://www.python.org/downloads/)\n[](https://github.com/your-username/datacompose)\n[](https://opensource.org/licenses/MIT)\n\nA powerful data transformation framework for building reusable, composable data cleaning pipelines in PySpark.\n\n## Installation\n\n```bash\npip install datacompose\n```\n\n## What is Datacompose?\n\nDatacompose provides production-ready PySpark data transformation primitives that become part of YOUR codebase. Inspired by [shadcn](https://ui.shadcn.com/)'s approach to components, we believe in giving you full ownership and control over your code.\n\n### Key Features\n\n- **No Runtime Dependencies**: Standalone PySpark code that runs without Datacompose\n- **Composable Primitives**: Build complex transformations from simple, reusable functions\n- **Smart Partial Application**: Pre-configure transformations with parameters for reuse\n- **Optimized Operations**: Efficient Spark transformations with minimal overhead\n- **Comprehensive Libraries**: Pre-built primitives for emails, addresses, and phone numbers\n\n### Available Transformers\n\n- **Emails**: Validation, extraction, standardization, typo correction\n- **Addresses**: Street parsing, state/zip validation, PO Box detection \n- **Phone Numbers**: NANP/international validation, formatting, toll-free detection\n\n## Documentation\n\nFor detailed documentation, examples, and API reference, visit [datacompose.io](https://datacompose.io).\n\n## Philosophy\n\nThis is NOT a traditional library - it gives you production-ready data transformation primitives that you can modify to fit your exact needs. You own the code, with no external dependencies to manage or worry about breaking changes.\n\n## License\n\nMIT License - see LICENSE file for details\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Copy-pasteable data transformation primitives for PySpark. Inspired by shadcn-svelte.",
"version": "0.2.6.1",
"project_urls": {
"Changelog": "https://github.com/tc-cole/datacompose/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/tc-cole/datacompose/tree/main/docs",
"Homepage": "https://github.com/tc-cole/datacompose",
"Issues": "https://github.com/tc-cole/datacompose/issues",
"Repository": "https://github.com/tc-cole/datacompose.git"
},
"split_keywords": [
"data-cleaning",
" data-quality",
" udf",
" spark",
" postgres",
" code-generation",
" data-pipeline",
" etl"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "cd5af305ceb4a8cf9bbed29ab94b3e718c5f9cac0a84ea49ab37f75114a1047b",
"md5": "20e7691c79fa7d3d479cf6958c2b614c",
"sha256": "961b92cf66762dd0528a682f9a741b89624809dcd4d9c5e367a5d125591c2640"
},
"downloads": -1,
"filename": "datacompose-0.2.6.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "20e7691c79fa7d3d479cf6958c2b614c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 54055,
"upload_time": "2025-08-25T16:54:22",
"upload_time_iso_8601": "2025-08-25T16:54:22.570982Z",
"url": "https://files.pythonhosted.org/packages/cd/5a/f305ceb4a8cf9bbed29ab94b3e718c5f9cac0a84ea49ab37f75114a1047b/datacompose-0.2.6.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2340a9aaca73f06b8a5d310bfcdca4cec70add4241bf3cadb20b07dd4548ef8c",
"md5": "c45e5908bf3b6c674a251f749899c004",
"sha256": "8d4e27c023578a9d2a7498a88a8e5815904fded5c19e4ba472347e0edce1d819"
},
"downloads": -1,
"filename": "datacompose-0.2.6.1.tar.gz",
"has_sig": false,
"md5_digest": "c45e5908bf3b6c674a251f749899c004",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 148950,
"upload_time": "2025-08-25T16:54:23",
"upload_time_iso_8601": "2025-08-25T16:54:23.835884Z",
"url": "https://files.pythonhosted.org/packages/23/40/a9aaca73f06b8a5d310bfcdca4cec70add4241bf3cadb20b07dd4548ef8c/datacompose-0.2.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-25 16:54:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tc-cole",
"github_project": "datacompose",
"github_not_found": true,
"lcname": "datacompose"
}