# DPK Python Transforms
## installation
The [transforms](https://github.com/IBM/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:
`python -m pip install data-prep-toolkit-transforms`
or
`python -m pip install data-prep-toolkit-transforms[ray]`
installing the python transforms will also install `data-prep-toolkit`
installing the ray transforms will also install `data-prep-toolkit[ray]`
## List of Transforms in current package
Note: This list includes the transforms that were part of the release starting with data-prep-toolkit-transforms:0.2.1. This list may not always reflect up to date information. Users are encourage to raise an issue in git when they discover missing components or packages that are listed below but not in the current release they get from pypi.
* code
* [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/python/README.md)
* [header_cleanser (Not available on MacOS)](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/header_cleanser/python/README.md)
* [code_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_quality/python/README.md)
* [proglang_select](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/proglang_select/python/README.md)
* language
* [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_chunk/python/README.md)
* [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_quality/python/README.md)
* [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/lang_id/python/README.md)
* [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pdf2parquet/python/README.md)
* [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/text_encoder/python/README.md)
* [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pii_redactor/python/README.md)
* universal
* [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/ededup/python/README.md)
* [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/filter/python/README.md)
* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/resize/python/README.md)
* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/tokenization/python/README.md)
* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/doc_id/python/README.md)
* [web2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/web2parquet/README.md)
## Release notes:
### 0.2.3.dev1
* code_profiler
### 0.2.3.dev0
* fdedup
### 0.2.2.dev3
* web2parquet
### 0.2.2.dev2
* pdf2parquet now supports HTML,DOCX,PPTX, ... in addition to PDF
Raw data
{
"_id": null,
"home_page": null,
"name": "data-prep-toolkit-transforms",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": "transforms, data preprocessing, data preparation, llm, generative, ai, fine-tuning, llmapps",
"author": null,
"author_email": "Maroun Touma <touma@us.ibm.com>",
"download_url": null,
"platform": null,
"description": "# DPK Python Transforms\n\n## installation\n\nThe [transforms](https://github.com/IBM/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:\n\n`python -m pip install data-prep-toolkit-transforms`\nor\n`python -m pip install data-prep-toolkit-transforms[ray]`\n\n\ninstalling the python transforms will also install `data-prep-toolkit`\n\ninstalling the ray transforms will also install `data-prep-toolkit[ray]`\n\n## List of Transforms in current package\n\nNote: This list includes the transforms that were part of the release starting with data-prep-toolkit-transforms:0.2.1. This list may not always reflect up to date information. Users are encourage to raise an issue in git when they discover missing components or packages that are listed below but not in the current release they get from pypi.\n\n* code\n * [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/python/README.md)\n * [header_cleanser (Not available on MacOS)](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/header_cleanser/python/README.md)\n * [code_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_quality/python/README.md)\n * [proglang_select](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/proglang_select/python/README.md)\n* language\n * [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_chunk/python/README.md)\n\t* [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_quality/python/README.md)\n\t* [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/lang_id/python/README.md)\n\t* [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pdf2parquet/python/README.md)\n\t* [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/text_encoder/python/README.md)\n\t* [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pii_redactor/python/README.md)\n* universal\n * [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/ededup/python/README.md)\n\t* [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/filter/python/README.md)\n\t* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/resize/python/README.md)\n\t* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/tokenization/python/README.md)\n\t* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/doc_id/python/README.md)\n\t* [web2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/web2parquet/README.md)\n \n## Release notes:\n\n### 0.2.3.dev1 \n* code_profiler\n### 0.2.3.dev0 \n* fdedup\n### 0.2.2.dev3 \n* web2parquet\n### 0.2.2.dev2\n* pdf2parquet now supports HTML,DOCX,PPTX, ... in addition to PDF\n\n\n\n\n \n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Data Preparation Toolkit Transforms using Ray",
"version": "0.2.3",
"project_urls": null,
"split_keywords": [
"transforms",
" data preprocessing",
" data preparation",
" llm",
" generative",
" ai",
" fine-tuning",
" llmapps"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8e3b475a0329cd35f2a8fe2b055886608678a1fd3ee9848f804b306b3776a036",
"md5": "9a40bc95d2e8c1f43d6dc0807dd713ac",
"sha256": "6bde46e11bbf83b47ecee11d47e3c4f6d7ecbbeca5229c6c39ee7dc9fdb49b88"
},
"downloads": -1,
"filename": "data_prep_toolkit_transforms-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9a40bc95d2e8c1f43d6dc0807dd713ac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 631128,
"upload_time": "2024-12-17T14:11:05",
"upload_time_iso_8601": "2024-12-17T14:11:05.581071Z",
"url": "https://files.pythonhosted.org/packages/8e/3b/475a0329cd35f2a8fe2b055886608678a1fd3ee9848f804b306b3776a036/data_prep_toolkit_transforms-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-17 14:11:05",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "data-prep-toolkit-transforms"
}