ModularML

Name	ModularML JSON
Version	0.1.3 JSON
	download
home_page	None
Summary	A library for modular, fast, and reproducible ML experimentation built for R&D.
upload_time	2025-09-02 17:38:09
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	machine learning deep learning reproducible research neural networks scientific computing scientific machine learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
<div align="center">

[![ModularML Banner](assets/modularml_logo_banner.png)](https://github.com/REIL-UConn/modular-ml)

**Modular, fast, and reproducible ML experimentation built for R\&D.**

[![Python](https://img.shields.io/badge/Python-3.9%2B-blue.svg)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/modularml.svg)](https://pypi.org/project/modularml/)
[![Docs](https://app.readthedocs.org/projects/modular-ml/badge/?version=latest&style=flat)](https://modular-ml.readthedocs.io/en/latest/)
[![License](https://img.shields.io/badge/License-Apache%202.0-orange
)](LICENSE)

</div>


ModularML is a flexible, backend-agnostic machine learning framework for designing, training, and evaluating modular ML pipelines, tailored specifically for research and scientific workflows. 
It enables rapid experimentation with complex model architectures, supports domain-specific feature engineering, and provides full reproducibility through configuration-driven declaration.

> ModularML provides a plug-and-play ecosystem of interoperable components for data preprocessing, sampling, modeling, training, and evaluation — all wrapped in a unified experiment container.


<p align="center">
  <img src="assets/modularml_overview_diagram.png" alt="ModularML Overview Diagram" width="600"/>
</p>
<p align="center"><em>Figure 1. Overview of the ModularML framework, highlighting the three core abstractions: feature set preprocessing and splitting, modular model graph construction, and staged training orchestration.</em></p>




## Features

ModularML includes a comprehensive set of components for scientific ML workflows:

### Data Handling
- **`FeatureSet` abstraction** for organizing structured datasets with features, targets, tags, and metadata.
- **`Data` class** with unified support for multiple backends (`torch.Tensor`, `tf.Tensor`, `np.ndarray`).
- **Built-in splitters**: Supports sample-based and rule-based splitting with condition-based filtering by feature, target, or tags values.
- **Sample grouping** and multi-part splits for paired, triplet, or grouped training tasks.

### Advanced Sampling
- **Flexible `FeatureSampler` interface** with support for advanced sampling during different stages of model training, including:
  - Triplet sampling (e.g., anchor/positive/negative)
  - Paired samples
  - Class-balanced, cluster-based, or time-windowed sampling strategies.
- **Condition-aware sampling** using any tags or metadata fields.

### Model Architecture
- **`ModelGraph`**: A Directed Acyclic Graph (DAG)-based model builder where:
  - Each node is a `ModelStage` (e.g., encoder, head, discriminator).
  - Each stage can use a different backend (PyTorch, TensorFlow, scikit-learn, LightGBM, etc).
  - Mixed-backend models are supported with seamless input/output routing.
- **Stage-wise training**: Custom `TrainingPhase` configuration enables fine-tuning, freezing, and transfer learning across sub-models.

### Training & Experiments
- **`Experiment` class** encapsulates all training logic (via multiple `TrainingPhase` objects), ModelGraph and FeatureSet definition, and a `TrackingManager` that logs all configuration files and training, validation, and evaluation metrics for rapid and reproducible ML experimentation.
- Each `TrainingPhase` defines training loop logic with early stopping, validation hooks, loss weighting, and optimizer configs.
- **Multi-objective loss support** with configurable stage-level targets, sample-based loss functions, and weighted combinations.
- **Config-driven experiments**: Every experiment is fully seriallizable and reproducible from a single configuration file.
- **Built-in experiment tracking** via a `TrackingManager`, with optional integration into external managers like MLflow or other logging backends.



## Getting Started

Requires Python >= 3.9

### Installation
Install from PyPI:
```bash
pip install modularml
```

To install the latest development version:
```bash
pip install git+https://github.com/REIL-UConn/modular-ml.git
```


## Explore More
- **[Examples](examples/)** – Explore complete examples of how to set up FeatureSets, apply feature preprocessing, construct model graphs, and run training configurations.
- **[Documentation](https://modular-ml.readthedocs.io/en/latest/)** – API reference, component explanations, configuration guides, and tutorials.
- **[Discussions](https://github.com/REIL-UConn/modular-ml/discussions)** – Join the community, ask questions, suggest features, or share use cases.

---


<!-- ## Cite ModularML

If you use ModularML in your research, please cite the following:

```bibtex
@misc{nowacki2025modularml,
  author       = {Ben Nowacki and contributors},
  title        = {ModularML: Modular, fast, and reproducible ML experimentation built for R&D.
  },
  year         = {2025},
  note         = {https://github.com/REIL-UConn/modular-ml},
} -->
<!-- 
## The Team
ModularML was initiated in 2025 by Ben Nowacki as part of graduate research at the University of Connecticut. 
It is actively developed in collaboration with researchers and contributors across academia and industry, including partners from the Honda Research Institute, MathWorks, and the University of South Carolina.

The project is community-driven and welcomes contributors interested in building modular, reproducible ML workflows for science and engineering. -->

## License
**[Apache 2.0](https://github.com/REIL-UConn/modular-ml/license)**

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ModularML",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "Machine learning, Deep learning, Reproducible research, Neural networks, Scientific computing, Scientific machine learning",
    "author": null,
    "author_email": "Benjamin Nowacki <benjamin.nowacki@uconn.edu>, Tingkai Li <tingkai.li@uconn.edu>, Chao Hu <chao.hu@uconn.edu>",
    "download_url": "https://files.pythonhosted.org/packages/38/84/b8c25d1168b289aaef12e34cb41cd9caa829d12341a3cda9356ce34ba233/modularml-0.1.3.tar.gz",
    "platform": null,
    "description": "\n<div align=\"center\">\n\n[![ModularML Banner](assets/modularml_logo_banner.png)](https://github.com/REIL-UConn/modular-ml)\n\n**Modular, fast, and reproducible ML experimentation built for R\\&D.**\n\n[![Python](https://img.shields.io/badge/Python-3.9%2B-blue.svg)](https://www.python.org/)\n[![PyPI](https://img.shields.io/pypi/v/modularml.svg)](https://pypi.org/project/modularml/)\n[![Docs](https://app.readthedocs.org/projects/modular-ml/badge/?version=latest&style=flat)](https://modular-ml.readthedocs.io/en/latest/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-orange\n)](LICENSE)\n\n</div>\n\n\nModularML is a flexible, backend-agnostic machine learning framework for designing, training, and evaluating modular ML pipelines, tailored specifically for research and scientific workflows. \nIt enables rapid experimentation with complex model architectures, supports domain-specific feature engineering, and provides full reproducibility through configuration-driven declaration.\n\n> ModularML provides a plug-and-play ecosystem of interoperable components for data preprocessing, sampling, modeling, training, and evaluation \u2014 all wrapped in a unified experiment container.\n\n\n<p align=\"center\">\n  <img src=\"assets/modularml_overview_diagram.png\" alt=\"ModularML Overview Diagram\" width=\"600\"/>\n</p>\n<p align=\"center\"><em>Figure 1. Overview of the ModularML framework, highlighting the three core abstractions: feature set preprocessing and splitting, modular model graph construction, and staged training orchestration.</em></p>\n\n\n\n\n## Features\n\nModularML includes a comprehensive set of components for scientific ML workflows:\n\n### Data Handling\n- **`FeatureSet` abstraction** for organizing structured datasets with features, targets, tags, and metadata.\n- **`Data` class** with unified support for multiple backends (`torch.Tensor`, `tf.Tensor`, `np.ndarray`).\n- **Built-in splitters**: Supports sample-based and rule-based splitting with condition-based filtering by feature, target, or tags values.\n- **Sample grouping** and multi-part splits for paired, triplet, or grouped training tasks.\n\n### Advanced Sampling\n- **Flexible `FeatureSampler` interface** with support for advanced sampling during different stages of model training, including:\n  - Triplet sampling (e.g., anchor/positive/negative)\n  - Paired samples\n  - Class-balanced, cluster-based, or time-windowed sampling strategies.\n- **Condition-aware sampling** using any tags or metadata fields.\n\n### Model Architecture\n- **`ModelGraph`**: A Directed Acyclic Graph (DAG)-based model builder where:\n  - Each node is a `ModelStage` (e.g., encoder, head, discriminator).\n  - Each stage can use a different backend (PyTorch, TensorFlow, scikit-learn, LightGBM, etc).\n  - Mixed-backend models are supported with seamless input/output routing.\n- **Stage-wise training**: Custom `TrainingPhase` configuration enables fine-tuning, freezing, and transfer learning across sub-models.\n\n### Training & Experiments\n- **`Experiment` class** encapsulates all training logic (via multiple `TrainingPhase` objects), ModelGraph and FeatureSet definition, and a `TrackingManager` that logs all configuration files and training, validation, and evaluation metrics for rapid and reproducible ML experimentation.\n- Each `TrainingPhase` defines training loop logic with early stopping, validation hooks, loss weighting, and optimizer configs.\n- **Multi-objective loss support** with configurable stage-level targets, sample-based loss functions, and weighted combinations.\n- **Config-driven experiments**: Every experiment is fully seriallizable and reproducible from a single configuration file.\n- **Built-in experiment tracking** via a `TrackingManager`, with optional integration into external managers like MLflow or other logging backends.\n\n\n\n## Getting Started\n\nRequires Python >= 3.9\n\n### Installation\nInstall from PyPI:\n```bash\npip install modularml\n```\n\nTo install the latest development version:\n```bash\npip install git+https://github.com/REIL-UConn/modular-ml.git\n```\n\n\n## Explore More\n- **[Examples](examples/)** \u2013 Explore complete examples of how to set up FeatureSets, apply feature preprocessing, construct model graphs, and run training configurations.\n- **[Documentation](https://modular-ml.readthedocs.io/en/latest/)** \u2013 API reference, component explanations, configuration guides, and tutorials.\n- **[Discussions](https://github.com/REIL-UConn/modular-ml/discussions)** \u2013 Join the community, ask questions, suggest features, or share use cases.\n\n---\n\n\n<!-- ## Cite ModularML\n\nIf you use ModularML in your research, please cite the following:\n\n```bibtex\n@misc{nowacki2025modularml,\n  author       = {Ben Nowacki and contributors},\n  title        = {ModularML: Modular, fast, and reproducible ML experimentation built for R&D.\n  },\n  year         = {2025},\n  note         = {https://github.com/REIL-UConn/modular-ml},\n} -->\n<!-- \n## The Team\nModularML was initiated in 2025 by Ben Nowacki as part of graduate research at the University of Connecticut. \nIt is actively developed in collaboration with researchers and contributors across academia and industry, including partners from the Honda Research Institute, MathWorks, and the University of South Carolina.\n\nThe project is community-driven and welcomes contributors interested in building modular, reproducible ML workflows for science and engineering. -->\n\n## License\n**[Apache 2.0](https://github.com/REIL-UConn/modular-ml/license)**\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library for modular, fast, and reproducible ML experimentation built for R&D.",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/REIL-UConn/modular-ml",
        "Issues": "https://github.com/REIL-UConn/modular-ml/issues",
        "Repository": "https://github.com/REIL-UConn/modular-ml"
    },
    "split_keywords": [
        "machine learning",
        " deep learning",
        " reproducible research",
        " neural networks",
        " scientific computing",
        " scientific machine learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f4e41393ebd0b704c87a7f969f15fb636aff46ebff120826bff99c0eb87902cc",
                "md5": "5181421e491a81629280c99b1aab57e4",
                "sha256": "e03d70eb1047e901a649ec328209d809dda591e7efc659b2c1fe222fc69d7f8d"
            },
            "downloads": -1,
            "filename": "modularml-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5181421e491a81629280c99b1aab57e4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 73145,
            "upload_time": "2025-09-02T17:38:07",
            "upload_time_iso_8601": "2025-09-02T17:38:07.731112Z",
            "url": "https://files.pythonhosted.org/packages/f4/e4/1393ebd0b704c87a7f969f15fb636aff46ebff120826bff99c0eb87902cc/modularml-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3884b8c25d1168b289aaef12e34cb41cd9caa829d12341a3cda9356ce34ba233",
                "md5": "a78a7bd8f0f1a481ccd465ed72e4891c",
                "sha256": "05f454d463f1426b94b1aa825bc5e7d55b32a9271a545d1a12af80b85e00f867"
            },
            "downloads": -1,
            "filename": "modularml-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a78a7bd8f0f1a481ccd465ed72e4891c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 1808958,
            "upload_time": "2025-09-02T17:38:09",
            "upload_time_iso_8601": "2025-09-02T17:38:09.072062Z",
            "url": "https://files.pythonhosted.org/packages/38/84/b8c25d1168b289aaef12e34cb41cd9caa829d12341a3cda9356ce34ba233/modularml-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 17:38:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "REIL-UConn",
    "github_project": "modular-ml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "modularml"
}

None