# ChalkML
**Production-grade ML data processing framework with exceptional determinism**
[](https://badge.fury.io/py/chalkml)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
## Overview
ChalkML is a data preprocessing framework for machine learning to assist researchers and engineers with advanced data processing which are missing from Pandas. In addition to its innovative positional notation, ChalkML comes with advanced data processing design patterns. Each pattern addresses critical gaps in production ML systems: deterministic reproducibility, feature selection, privacy preservation, synthetic data generation, and regulatory compliance.
## Installation
```bash
pip install chalkml
```
## Position Notation
ChalkML uses intuitive position notation for columns:
| Notation | Meaning |
|----------|---------|
| `01N` | 1st from left |
| `02N` | 2nd from left |
| `N01` | Last (1st from right) |
| `N02` | 2nd last |
```bash
# Examples
chalkml -rm col 01N data.csv              # Remove 1st column
chalkml -mv col 02N N01 data.csv          # Move 2nd to last
chalkml -fillsmart col 03N mean data.csv  # Fill with mean
```
## Core Operations
```bash
# Data manipulation
chalkml -rm col 01N data.csv                    # Remove
chalkml -mv col 01N 03N data.csv                # Move
chalkml -rn col 02N "Age" data.csv              # Rename
# Imputation
chalkml -fillsmart col 03N mean data.csv        # Mean
chalkml -fillsmart col 04N knn data.csv         # KNN
# Feature engineering
chalkml -derive "BMI" "col:weight/col:height**2" data.csv
chalkml -onehot col 05N data.csv
chalkml -scale col 03N data.csv --method standard
```
## 5 Design Patterns
### MAP - Transform each element
```bash
chalkml -map col 05N "x*2" data.csv
chalkml -map col 06N "x**2" data.csv
```
### REDUCE - Aggregate columns
```bash
chalkml -reduce col 01N,02N,03N sum Total data.csv
```
### STENCIL - Sliding windows
```bash
chalkml -stencil col 03N 5 rolling_mean data.csv
```
### SCAN - Cumulative operations
```bash
chalkml -scan col 03N cumsum data.csv
```
### FARM - Parallel operations
```bash
chalkml -farm col range 01N:10N "x*2" data.csv
```
## 5 Advanced Patterns
### QUANTUM - Schema-based compression (70-90% reduction)
```bash
chalkml -quantum compress --schema medical patients.csv
chalkml -quantum decompress --schema medical compressed.csv
```
### RELEVANCE - Feature selection (95-98% reduction)
```bash
chalkml -relevance select --target 01N --threshold 0.1 data.csv
chalkml -relevance rank --target outcome --top 10 data.csv
```
### REDACT - Privacy preservation (HIPAA, GDPR)
```bash
chalkml -redact hipaa --identifiers all patients.csv
chalkml -redact differential --epsilon 1.0 sensitive.csv
```
### SCAFFOLD - Synthetic data generation
```bash
chalkml -scaffold sequence --type fibonacci --count 100
chalkml -scaffold distribution --dist normal --count 10000
```
### BOW - Compliance standardization
```bash
chalkml -bow standard --standard GAAP financial.csv
chalkml -bow format --column date --type date --pattern "YYYY-MM-DD" data.csv
```
## Python API
```python
from chalkml import get_chalkml_engine
from chalkml.quantum_engine import get_quantum_engine
# Core operations
engine = get_chalkml_engine()
engine.remove_column("data.csv", "01N")
engine.fill_smart("data.csv", "03N", "mean")
# Advanced patterns
quantum = get_quantum_engine()
quantum.compress_file("data.csv", "schema", "out.csv")
```
## License
MIT License
## Authors
Hope Mogale & MY Pitsane  
Mankind Research Labs (mankindlabs@protonmail.com)
**Version 1.0.0** | Production-ready | 15+ validated use cases
            
         
        Raw data
        
            {
    "_id": null,
    "home_page": "https://github.com/mankind-research/chalkml",
    "name": "chalkml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "machine-learning, data-engineering, data-science, pandas, preprocessing, terminal, cli, ml-ops, feature-engineering, privacy, differential-privacy, synthetic-data, compliance, hipaa, gdpr",
    "author": "Mankind Research Labs",
    "author_email": "Mankind Research Labs <labs@mankind.research>",
    "download_url": "https://files.pythonhosted.org/packages/95/0a/c3192855e5364e2226f8f3a8fbbaa47c0c0ffdc6c69a1d39c5543f1522dc/chalkml-1.0.2.tar.gz",
    "platform": null,
    "description": "# ChalkML\n\n**Production-grade ML data processing framework with exceptional determinism**\n\n[](https://badge.fury.io/py/chalkml)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\n\n## Overview\n\nChalkML is a data preprocessing framework for machine learning to assist researchers and engineers with advanced data processing which are missing from Pandas. In addition to its innovative positional notation, ChalkML comes with advanced data processing design patterns. Each pattern addresses critical gaps in production ML systems: deterministic reproducibility, feature selection, privacy preservation, synthetic data generation, and regulatory compliance.\n\n\n\n## Installation\n\n```bash\npip install chalkml\n```\n\n## Position Notation\n\nChalkML uses intuitive position notation for columns:\n\n| Notation | Meaning |\n|----------|---------|\n| `01N` | 1st from left |\n| `02N` | 2nd from left |\n| `N01` | Last (1st from right) |\n| `N02` | 2nd last |\n\n```bash\n# Examples\nchalkml -rm col 01N data.csv              # Remove 1st column\nchalkml -mv col 02N N01 data.csv          # Move 2nd to last\nchalkml -fillsmart col 03N mean data.csv  # Fill with mean\n```\n\n## Core Operations\n\n```bash\n# Data manipulation\nchalkml -rm col 01N data.csv                    # Remove\nchalkml -mv col 01N 03N data.csv                # Move\nchalkml -rn col 02N \"Age\" data.csv              # Rename\n\n# Imputation\nchalkml -fillsmart col 03N mean data.csv        # Mean\nchalkml -fillsmart col 04N knn data.csv         # KNN\n\n# Feature engineering\nchalkml -derive \"BMI\" \"col:weight/col:height**2\" data.csv\nchalkml -onehot col 05N data.csv\nchalkml -scale col 03N data.csv --method standard\n```\n\n## 5 Design Patterns\n\n### MAP - Transform each element\n```bash\nchalkml -map col 05N \"x*2\" data.csv\nchalkml -map col 06N \"x**2\" data.csv\n```\n\n### REDUCE - Aggregate columns\n```bash\nchalkml -reduce col 01N,02N,03N sum Total data.csv\n```\n\n### STENCIL - Sliding windows\n```bash\nchalkml -stencil col 03N 5 rolling_mean data.csv\n```\n\n### SCAN - Cumulative operations\n```bash\nchalkml -scan col 03N cumsum data.csv\n```\n\n### FARM - Parallel operations\n```bash\nchalkml -farm col range 01N:10N \"x*2\" data.csv\n```\n\n## 5 Advanced Patterns\n\n### QUANTUM - Schema-based compression (70-90% reduction)\n```bash\nchalkml -quantum compress --schema medical patients.csv\nchalkml -quantum decompress --schema medical compressed.csv\n```\n\n### RELEVANCE - Feature selection (95-98% reduction)\n```bash\nchalkml -relevance select --target 01N --threshold 0.1 data.csv\nchalkml -relevance rank --target outcome --top 10 data.csv\n```\n\n### REDACT - Privacy preservation (HIPAA, GDPR)\n```bash\nchalkml -redact hipaa --identifiers all patients.csv\nchalkml -redact differential --epsilon 1.0 sensitive.csv\n```\n\n### SCAFFOLD - Synthetic data generation\n```bash\nchalkml -scaffold sequence --type fibonacci --count 100\nchalkml -scaffold distribution --dist normal --count 10000\n```\n\n### BOW - Compliance standardization\n```bash\nchalkml -bow standard --standard GAAP financial.csv\nchalkml -bow format --column date --type date --pattern \"YYYY-MM-DD\" data.csv\n```\n\n## Python API\n\n```python\nfrom chalkml import get_chalkml_engine\nfrom chalkml.quantum_engine import get_quantum_engine\n\n# Core operations\nengine = get_chalkml_engine()\nengine.remove_column(\"data.csv\", \"01N\")\nengine.fill_smart(\"data.csv\", \"03N\", \"mean\")\n\n# Advanced patterns\nquantum = get_quantum_engine()\nquantum.compress_file(\"data.csv\", \"schema\", \"out.csv\")\n```\n\n\n## License\n\nMIT License\n\n## Authors\n\nHope Mogale & MY Pitsane  \nMankind Research Labs (mankindlabs@protonmail.com)\n\n**Version 1.0.0** | Production-ready | 15+ validated use cases\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Advanced ML data processing with 5 patterns: QUANTUM, RELEVANCE, REDACT, SCAFFOLD, BOW",
    "version": "1.0.2",
    "project_urls": {
        "Documentation": "https://chalkml.readthedocs.io",
        "Homepage": "https://github.com/mankind-research/chalkml",
        "Issues": "https://github.com/mankind-research/chalkml/issues",
        "Repository": "https://github.com/mankind-research/chalkml"
    },
    "split_keywords": [
        "machine-learning",
        " data-engineering",
        " data-science",
        " pandas",
        " preprocessing",
        " terminal",
        " cli",
        " ml-ops",
        " feature-engineering",
        " privacy",
        " differential-privacy",
        " synthetic-data",
        " compliance",
        " hipaa",
        " gdpr"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9bc46bacca5d166f5e80b3c523dff4b401d0e7acab20748cc7c550099dbe02fe",
                "md5": "b9de8fd4f8d6e5c601c436cb2ac91335",
                "sha256": "4e041b8b51df068f1c1d121d3b56c7412dc09f0cd4c2be277e4e4d91e4d30671"
            },
            "downloads": -1,
            "filename": "chalkml-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b9de8fd4f8d6e5c601c436cb2ac91335",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 53705,
            "upload_time": "2025-10-31T06:44:57",
            "upload_time_iso_8601": "2025-10-31T06:44:57.056940Z",
            "url": "https://files.pythonhosted.org/packages/9b/c4/6bacca5d166f5e80b3c523dff4b401d0e7acab20748cc7c550099dbe02fe/chalkml-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "950ac3192855e5364e2226f8f3a8fbbaa47c0c0ffdc6c69a1d39c5543f1522dc",
                "md5": "870011055ec2ed61b5dc0c1673a037af",
                "sha256": "75f4f9b0c2a17a73e60fd07843ec3fdc283d1f41ad46dd4c56c0d08c938b3d42"
            },
            "downloads": -1,
            "filename": "chalkml-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "870011055ec2ed61b5dc0c1673a037af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 53347,
            "upload_time": "2025-10-31T06:45:02",
            "upload_time_iso_8601": "2025-10-31T06:45:02.814773Z",
            "url": "https://files.pythonhosted.org/packages/95/0a/c3192855e5364e2226f8f3a8fbbaa47c0c0ffdc6c69a1d39c5543f1522dc/chalkml-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-31 06:45:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mankind-research",
    "github_project": "chalkml",
    "github_not_found": true,
    "lcname": "chalkml"
}