moml-ca


Namemoml-ca JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryMolecular Machine Learning for Chemical Applications - A comprehensive Python package for molecular representation learning and property prediction using Graph Neural Networks
upload_time2025-08-06 00:51:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords molecular machine learning graph neural networks chemistry pfas
VCS
bugtrack_url
requirements numpy scipy pandas torch torch-geometric rdkit openmm mdtraj MDAnalysis pdbfixer dask distributed pyyaml python-json-logger pytest pytest-cov matplotlib seaborn scikit-learn networkx plotly h5py luigi tqdm black flake8 isort
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MoML-CA: Molecular Machine Learning for Chemical Applications

MoML-CA is a Python package for molecular representation learning and property prediction using Graph Neural Networks. The package provides a comprehensive set of tools for converting molecular structures to graph representations, training GNN models, and predicting molecular properties.

## Features

- **Molecular Graph Creation**: Convert SMILES and RDKit molecules to graph representations with extensive feature extraction
- **Hierarchical Graph Representations**: Create multi-level graph representations for improved model performance
- **Modular Model Architecture**: Flexible and extensible GNN architectures with easy configuration
- **Training Utilities**: Comprehensive training pipelines with callbacks and monitoring
- **Evaluation Tools**: Metrics calculation and visualization of predictions
- **Example Scripts**: Ready-to-use examples for common molecular machine learning tasks
- **Command-Line Tools**: Easy-to-use CLI for model training and prediction
- **Data Processing**: Efficient batch processing of molecular datasets
- **Visualization**: Tools for visualizing molecular graphs and model predictions

## Large Files Handling

Large data files (>100MB) like training datasets and models are not stored in the Git repository. These files are ignored by Git via the `.gitignore` file and should be shared via alternative methods (cloud storage, direct transfer, etc.).

Large files in the `data/qm9/processed/` directory (particularly `*.pt` files) are automatically excluded from Git.

## Installation

```bash
# Clone the repository (choose HTTPS or SSH)
git clone https://github.com/SAKETH11111/MoML-CA.git
# or, if you have SSH keys configured:
# git clone git@github.com:SAKETH11111/MoML-CA.git
cd MoML-CA

# Create a conda environment
conda env create -f environment.yml

# Activate the environment
conda activate moml-ca

# Install dependencies
pip install -r requirements.txt

# Install the package in development mode
pip install -e .
```

## Quick Start

```python
import torch
from rdkit import Chem
from moml.core import create_graph_processor
from moml.models.mgnn.training import initialize_model, MGNNConfig, create_trainer
from moml.models.mgnn.evaluation.predictor import create_predictor

# Create molecular graph
processor = create_graph_processor({'use_partial_charges': True})
smiles = "C(C(F)(F)F)(C(F)(F)F)(F)F"  # Perfluorobutane
graph = processor.smiles_to_graph(smiles)

# Initialize model with configuration
config = MGNNConfig({
    'model_type': 'multi_task_djmgnn',
    'hidden_dim': 64,
    'n_blocks': 3
})
model = initialize_model(config, graph.x.shape[1], graph.edge_attr.shape[1])

# Train model with dataloaders
trainer = create_trainer(config=config, train_loader=train_loader, val_loader=val_loader)
# Note: train_loader and val_loader should be PyTorch DataLoader objects containing your training and validation datasets.
# See the examples directory (examples/training_examples or examples/quickstart_examples) for how to create these dataloaders.
# Example:
# from torch.utils.data import DataLoader
# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# val_loader = DataLoader(val_dataset, batch_size=32)
history = trainer.train(epochs=50)

# Make predictions
predictor = create_predictor(model_path="path/to/saved_model.pt")  # Or pass model directly
predictions = predictor.predict_from_dataloader(val_loader)  # Or predictor.predict([graph])
```

See the [examples directory](examples) for more comprehensive examples.

### Generating force field labels

After running ORCA calculations you can generate a JSON file containing atom
types, partial charges and other force field parameters for each PFAS molecule:

```bash
python scripts/generate_force_field_labels.py
```

The output `force_field_labels.json` will be placed in
`orca_results_b3lyp_sto3g/`.

## Project Structure

```
MoML-CA/
├── moml/                        # Main package directory
│   ├── core/                    # Core functionality
│   │   ├── graph_coarsening.py      # Graph coarsening algorithms
│   │   └── molecular_graph.py       # Molecular graph representation
│   ├── models/                  # Model implementations
│   │   ├── mgnn/                    # MGNN models
│   │   │   ├── djmgnn.py               # DJMGNN implementation
│   │   │   ├── training/               # Training utilities
│   │   │   └── evaluation/             # Evaluation utilities
│   │   └── lstm/                    # LSTM models
│   ├── data/                    # Data handling utilities
│   │   ├── dataset.py               # Dataset implementations
│   │   └── processors.py            # Data processors
│   ├── utils/                   # Utility functions
│   │   ├── visualization/           # Visualization tools
│   │   ├── molecular/               # Molecular utilities
│   │   └── graph/                   # Graph utilities
│   ├── pipeline/                # Pipeline orchestration
│   ├── simulation/              # Simulation utilities
│   └── __init__.py              # Package initialization
├── examples/                    # Example scripts
│   ├── quickstart/              # Quickstart examples
│   ├── training/                # Training examples
│   ├── prediction/              # Prediction examples
│   ├── molecular_graph/         # Molecular graph examples
│   └── preprocess/              # Preprocessing examples
└── tests/                       # Test directory
```

## Recent Improvements

- **Enhanced Model Architecture**: Improved hierarchical graph representations and attention mechanisms
- **Streamlined API**: Simplified interface with factory functions and better error handling
- **Advanced Training Features**: Added support for mixed precision training and gradient accumulation
- **Improved Data Processing**: Enhanced batch processing and memory efficiency
- **Better Visualization**: New tools for visualizing molecular graphs and model attention
- **Command-Line Interface**: Added CLI tools for common tasks
- **Documentation**: Comprehensive documentation with examples and tutorials

## Documentation

See the [docs](docs/) directory for comprehensive documentation.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For guidelines on contributing, see [CONTRIBUTING.md](CONTRIBUTING.md).

## License

This project is licensed under the terms of the MIT license.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "moml-ca",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "molecular, machine learning, graph neural networks, chemistry, PFAS",
    "author": null,
    "author_email": "SAKETH11111 <sakethbaddam10@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/27/67/a5021bd7b76cc04a4a37d200150813e5357ac663bd811688822c45e4a421/moml_ca-0.1.1.tar.gz",
    "platform": null,
    "description": "# MoML-CA: Molecular Machine Learning for Chemical Applications\n\nMoML-CA is a Python package for molecular representation learning and property prediction using Graph Neural Networks. The package provides a comprehensive set of tools for converting molecular structures to graph representations, training GNN models, and predicting molecular properties.\n\n## Features\n\n- **Molecular Graph Creation**: Convert SMILES and RDKit molecules to graph representations with extensive feature extraction\n- **Hierarchical Graph Representations**: Create multi-level graph representations for improved model performance\n- **Modular Model Architecture**: Flexible and extensible GNN architectures with easy configuration\n- **Training Utilities**: Comprehensive training pipelines with callbacks and monitoring\n- **Evaluation Tools**: Metrics calculation and visualization of predictions\n- **Example Scripts**: Ready-to-use examples for common molecular machine learning tasks\n- **Command-Line Tools**: Easy-to-use CLI for model training and prediction\n- **Data Processing**: Efficient batch processing of molecular datasets\n- **Visualization**: Tools for visualizing molecular graphs and model predictions\n\n## Large Files Handling\n\nLarge data files (>100MB) like training datasets and models are not stored in the Git repository. These files are ignored by Git via the `.gitignore` file and should be shared via alternative methods (cloud storage, direct transfer, etc.).\n\nLarge files in the `data/qm9/processed/` directory (particularly `*.pt` files) are automatically excluded from Git.\n\n## Installation\n\n```bash\n# Clone the repository (choose HTTPS or SSH)\ngit clone https://github.com/SAKETH11111/MoML-CA.git\n# or, if you have SSH keys configured:\n# git clone git@github.com:SAKETH11111/MoML-CA.git\ncd MoML-CA\n\n# Create a conda environment\nconda env create -f environment.yml\n\n# Activate the environment\nconda activate moml-ca\n\n# Install dependencies\npip install -r requirements.txt\n\n# Install the package in development mode\npip install -e .\n```\n\n## Quick Start\n\n```python\nimport torch\nfrom rdkit import Chem\nfrom moml.core import create_graph_processor\nfrom moml.models.mgnn.training import initialize_model, MGNNConfig, create_trainer\nfrom moml.models.mgnn.evaluation.predictor import create_predictor\n\n# Create molecular graph\nprocessor = create_graph_processor({'use_partial_charges': True})\nsmiles = \"C(C(F)(F)F)(C(F)(F)F)(F)F\"  # Perfluorobutane\ngraph = processor.smiles_to_graph(smiles)\n\n# Initialize model with configuration\nconfig = MGNNConfig({\n    'model_type': 'multi_task_djmgnn',\n    'hidden_dim': 64,\n    'n_blocks': 3\n})\nmodel = initialize_model(config, graph.x.shape[1], graph.edge_attr.shape[1])\n\n# Train model with dataloaders\ntrainer = create_trainer(config=config, train_loader=train_loader, val_loader=val_loader)\n# Note: train_loader and val_loader should be PyTorch DataLoader objects containing your training and validation datasets.\n# See the examples directory (examples/training_examples or examples/quickstart_examples) for how to create these dataloaders.\n# Example:\n# from torch.utils.data import DataLoader\n# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n# val_loader = DataLoader(val_dataset, batch_size=32)\nhistory = trainer.train(epochs=50)\n\n# Make predictions\npredictor = create_predictor(model_path=\"path/to/saved_model.pt\")  # Or pass model directly\npredictions = predictor.predict_from_dataloader(val_loader)  # Or predictor.predict([graph])\n```\n\nSee the [examples directory](examples) for more comprehensive examples.\n\n### Generating force field labels\n\nAfter running ORCA calculations you can generate a JSON file containing atom\ntypes, partial charges and other force field parameters for each PFAS molecule:\n\n```bash\npython scripts/generate_force_field_labels.py\n```\n\nThe output `force_field_labels.json` will be placed in\n`orca_results_b3lyp_sto3g/`.\n\n## Project Structure\n\n```\nMoML-CA/\n\u251c\u2500\u2500 moml/                        # Main package directory\n\u2502   \u251c\u2500\u2500 core/                    # Core functionality\n\u2502   \u2502   \u251c\u2500\u2500 graph_coarsening.py      # Graph coarsening algorithms\n\u2502   \u2502   \u2514\u2500\u2500 molecular_graph.py       # Molecular graph representation\n\u2502   \u251c\u2500\u2500 models/                  # Model implementations\n\u2502   \u2502   \u251c\u2500\u2500 mgnn/                    # MGNN models\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 djmgnn.py               # DJMGNN implementation\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 training/               # Training utilities\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 evaluation/             # Evaluation utilities\n\u2502   \u2502   \u2514\u2500\u2500 lstm/                    # LSTM models\n\u2502   \u251c\u2500\u2500 data/                    # Data handling utilities\n\u2502   \u2502   \u251c\u2500\u2500 dataset.py               # Dataset implementations\n\u2502   \u2502   \u2514\u2500\u2500 processors.py            # Data processors\n\u2502   \u251c\u2500\u2500 utils/                   # Utility functions\n\u2502   \u2502   \u251c\u2500\u2500 visualization/           # Visualization tools\n\u2502   \u2502   \u251c\u2500\u2500 molecular/               # Molecular utilities\n\u2502   \u2502   \u2514\u2500\u2500 graph/                   # Graph utilities\n\u2502   \u251c\u2500\u2500 pipeline/                # Pipeline orchestration\n\u2502   \u251c\u2500\u2500 simulation/              # Simulation utilities\n\u2502   \u2514\u2500\u2500 __init__.py              # Package initialization\n\u251c\u2500\u2500 examples/                    # Example scripts\n\u2502   \u251c\u2500\u2500 quickstart/              # Quickstart examples\n\u2502   \u251c\u2500\u2500 training/                # Training examples\n\u2502   \u251c\u2500\u2500 prediction/              # Prediction examples\n\u2502   \u251c\u2500\u2500 molecular_graph/         # Molecular graph examples\n\u2502   \u2514\u2500\u2500 preprocess/              # Preprocessing examples\n\u2514\u2500\u2500 tests/                       # Test directory\n```\n\n## Recent Improvements\n\n- **Enhanced Model Architecture**: Improved hierarchical graph representations and attention mechanisms\n- **Streamlined API**: Simplified interface with factory functions and better error handling\n- **Advanced Training Features**: Added support for mixed precision training and gradient accumulation\n- **Improved Data Processing**: Enhanced batch processing and memory efficiency\n- **Better Visualization**: New tools for visualizing molecular graphs and model attention\n- **Command-Line Interface**: Added CLI tools for common tasks\n- **Documentation**: Comprehensive documentation with examples and tutorials\n\n## Documentation\n\nSee the [docs](docs/) directory for comprehensive documentation.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For guidelines on contributing, see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\nThis project is licensed under the terms of the MIT license.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Molecular Machine Learning for Chemical Applications - A comprehensive Python package for molecular representation learning and property prediction using Graph Neural Networks",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/SAKETH11111/MoML-CA",
        "Issues": "https://github.com/SAKETH11111/MoML-CA/issues",
        "Repository": "https://github.com/SAKETH11111/MoML-CA"
    },
    "split_keywords": [
        "molecular",
        " machine learning",
        " graph neural networks",
        " chemistry",
        " pfas"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0c21dfc54a353150f2b2d6ae113a2afe29d0e3cb0af7c69bf174eee16919a6d0",
                "md5": "158119adb4622fdc1a12bbb057ea7dc7",
                "sha256": "a86b57fee478e74e4ebb1590ef49d7d055980daee8cabc3c442b4c5cf9c130d1"
            },
            "downloads": -1,
            "filename": "moml_ca-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "158119adb4622fdc1a12bbb057ea7dc7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 197261,
            "upload_time": "2025-08-06T00:51:40",
            "upload_time_iso_8601": "2025-08-06T00:51:40.090008Z",
            "url": "https://files.pythonhosted.org/packages/0c/21/dfc54a353150f2b2d6ae113a2afe29d0e3cb0af7c69bf174eee16919a6d0/moml_ca-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2767a5021bd7b76cc04a4a37d200150813e5357ac663bd811688822c45e4a421",
                "md5": "2f369a55b8ce8c5cf3de9555aa161595",
                "sha256": "90e13a674b0d462b9d10c026585db64d7c0904bb62844e43051f090d5d3ee3bc"
            },
            "downloads": -1,
            "filename": "moml_ca-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2f369a55b8ce8c5cf3de9555aa161595",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 261330,
            "upload_time": "2025-08-06T00:51:41",
            "upload_time_iso_8601": "2025-08-06T00:51:41.563778Z",
            "url": "https://files.pythonhosted.org/packages/27/67/a5021bd7b76cc04a4a37d200150813e5357ac663bd811688822c45e4a421/moml_ca-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 00:51:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SAKETH11111",
    "github_project": "MoML-CA",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.12.0"
                ]
            ]
        },
        {
            "name": "torch-geometric",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "rdkit",
            "specs": [
                [
                    ">=",
                    "2022.03.1"
                ]
            ]
        },
        {
            "name": "openmm",
            "specs": [
                [
                    ">=",
                    "7.5.0"
                ]
            ]
        },
        {
            "name": "mdtraj",
            "specs": [
                [
                    ">=",
                    "1.9.5"
                ]
            ]
        },
        {
            "name": "MDAnalysis",
            "specs": [
                [
                    ">=",
                    "2.4.0"
                ]
            ]
        },
        {
            "name": "pdbfixer",
            "specs": [
                [
                    ">=",
                    "1.11"
                ]
            ]
        },
        {
            "name": "dask",
            "specs": [
                [
                    ">=",
                    "2022.2.0"
                ]
            ]
        },
        {
            "name": "distributed",
            "specs": [
                [
                    ">=",
                    "2022.2.0"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "python-json-logger",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.11.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    ">=",
                    "2.6.0"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.3.0"
                ]
            ]
        },
        {
            "name": "h5py",
            "specs": [
                [
                    ">=",
                    "3.6.0"
                ]
            ]
        },
        {
            "name": "luigi",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.62.0"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "21.12b0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "isort",
            "specs": [
                [
                    ">=",
                    "5.10.0"
                ]
            ]
        }
    ],
    "lcname": "moml-ca"
}
        
Elapsed time: 1.17281s