data-prep-toolkit-transforms


Namedata-prep-toolkit-transforms JSON
Version 1.1.2.post1 PyPI version JSON
download
home_pageNone
SummaryData Preparation Toolkit Transforms using Ray
upload_time2025-07-16 14:23:35
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.10
licenseApache-2.0
keywords transforms data preprocessing data preparation llm generative ai fine-tuning llmapps
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DPK Python Transforms

## installation

The [transforms](https://github.com/data-prep-kit/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:

`python -m pip install data-prep-toolkit-transforms[all]`
or
`python -m pip install data-prep-toolkit-transforms[ray, all]`
or
`python -m pip install data-prep-toolkit-transforms[language]`


installing the python transforms will also install  `data-prep-toolkit`

installing the ray transforms will also install  `data-prep-toolkit[ray]`

## Release notes:

### 1.1.1.dev1
	Include all code transforms as extra [code]

### 1.1.1.dev0
	Refactored code transforms (code_uality, code2parquet, header_cleanser, license select, proglang_select)
	Added ml-filter and enrichment
	renamed PDF2Parquet to Docling2Paruqet 

### 1.0.1.dev1
	Added Gneissweb transforms
	fdedup fix for windows
### 1.0.1.dev0
	PR #979 (code_profiler)
### 1.0.0.a6
	Added Profiler
	Added Resize
### 1.0.0.a5
	Added Pii Redactor
	Relax fasttext requirement >= 0.9.2
### 1.0.0.a4
	Added missing ray implementation for lang_id, doc_quality, tokenization and filter
	Added ray notebooks for lang id, Doc Quality, tokenization, and Filter
### 1.0.0.a3
	Added code_profiler
### 1.0.0.a2
   Relax dependencies on pandas (use latest or whatever is installed by application)
   Relax dependencies on requests (use latest or whatever is installed by application)



 

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "data-prep-toolkit-transforms",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "transforms, data preprocessing, data preparation, llm, generative, ai, fine-tuning, llmapps",
    "author": null,
    "author_email": "Maroun Touma <touma@us.ibm.com>",
    "download_url": null,
    "platform": null,
    "description": "# DPK Python Transforms\n\n## installation\n\nThe [transforms](https://github.com/data-prep-kit/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:\n\n`python -m pip install data-prep-toolkit-transforms[all]`\nor\n`python -m pip install data-prep-toolkit-transforms[ray, all]`\nor\n`python -m pip install data-prep-toolkit-transforms[language]`\n\n\ninstalling the python transforms will also install  `data-prep-toolkit`\n\ninstalling the ray transforms will also install  `data-prep-toolkit[ray]`\n\n## Release notes:\n\n### 1.1.1.dev1\n\tInclude all code transforms as extra [code]\n\n### 1.1.1.dev0\n\tRefactored code transforms (code_uality, code2parquet, header_cleanser, license select, proglang_select)\n\tAdded ml-filter and enrichment\n\trenamed PDF2Parquet to Docling2Paruqet \n\n### 1.0.1.dev1\n\tAdded Gneissweb transforms\n\tfdedup fix for windows\n### 1.0.1.dev0\n\tPR #979 (code_profiler)\n### 1.0.0.a6\n\tAdded Profiler\n\tAdded Resize\n### 1.0.0.a5\n\tAdded Pii Redactor\n\tRelax fasttext requirement >= 0.9.2\n### 1.0.0.a4\n\tAdded missing ray implementation for lang_id, doc_quality, tokenization and filter\n\tAdded ray notebooks for lang id, Doc Quality, tokenization, and Filter\n### 1.0.0.a3\n\tAdded code_profiler\n### 1.0.0.a2\n   Relax dependencies on pandas (use latest or whatever is installed by application)\n   Relax dependencies on requests (use latest or whatever is installed by application)\n\n\n\n \n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Data Preparation Toolkit Transforms using Ray",
    "version": "1.1.2.post1",
    "project_urls": null,
    "split_keywords": [
        "transforms",
        " data preprocessing",
        " data preparation",
        " llm",
        " generative",
        " ai",
        " fine-tuning",
        " llmapps"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9d9f5d97f9bd4b8ad40b2db4eaacd29c1c8546aef8c6140b0942cdad92d43fb5",
                "md5": "6db7a775fc47b7afc9b933aa626e6349",
                "sha256": "b613db75b90fff2225d347088d5dc66f1a3665cdfacb3e6dc4fbd62563d8bc43"
            },
            "downloads": -1,
            "filename": "data_prep_toolkit_transforms-1.1.2.post1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6db7a775fc47b7afc9b933aa626e6349",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 9075644,
            "upload_time": "2025-07-16T14:23:35",
            "upload_time_iso_8601": "2025-07-16T14:23:35.501404Z",
            "url": "https://files.pythonhosted.org/packages/9d/9f/5d97f9bd4b8ad40b2db4eaacd29c1c8546aef8c6140b0942cdad92d43fb5/data_prep_toolkit_transforms-1.1.2.post1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 14:23:35",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "data-prep-toolkit-transforms"
}
        
Elapsed time: 1.60879s