mamba-safe


Namemamba-safe JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/Anri-Lombard/DrugGPT
SummaryA framework to generate molecules with the mamba architecture
upload_time2024-09-02 19:32:25
maintainerNone
docs_urlNone
authorAnri Lombard
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Mamba-SAFE: Molecular Generation with Mamba and SAFE

Mamba-SAFE is a framework for generating molecules using the Mamba architecture and the SAFE (Structure-Agnostic Few-shot Encoding) representation (although any other representation could be used if needed). This library combines the power of the Mamba sequence modeling architecture with the versatility of the SAFE molecular representation.

## Features

- Generate molecules using the Mamba architecture
- Utilize the SAFE representation for molecular encoding

## Installation

### From PyPI

To install the latest stable version from PyPI:

```bash
pip install mamba-safe
```

### From Source

To install the latest development version from source:

```bash
git clone https://github.com/Anri-Lombard/DrugGPT.git
cd DrugGPT/mamba_safe
pip install -e .
```

**Note:** Make sure you have CUDA installed, as `mamba_ssm` requires it (https://github.com/state-spaces/mamba).

## Usage

### Generating Molecules

Here's a simple example of how to generate molecules using a trained Mamba-SAFE model:

```python
import torch
from mamba_safe import MAMBAModel, SAFETokenizer, SAFEDesign

# Set up your model and parameters
model_dir = "path/to/your/model"
tokenizer_path = "path/to/your/tokenizer"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load model and tokenizer
mamba_model = MAMBAModel.from_pretrained(model_dir, device=device)
safe_tokenizer = SAFETokenizer.from_pretrained(tokenizer_path)

# Create designer
designer = SAFEDesign(model=mamba_model, tokenizer=safe_tokenizer, verbose=True)

# Generate molecules
generated_smiles = designer.de_novo_generation(
    n_samples_per_trial=100,
    max_length=50,
    sanitize=True,
    top_k=15,
    top_p=0.9,
    temperature=0.7,
    n_trials=10,
    repetition_penalty=1.0
)

# Print the first 10 generated SMILES
for smi in generated_smiles[:10]:
    print(smi)
```

### Training a Model from Scratch

To train a Mamba-SAFE model from scratch, you can use the `safe-train` CLI. Here's an example script:

```bash
#!/bin/bash

# Set up environment variables
export WANDB_API_KEY="your_wandb_api_key"

# Set up paths
config_path="example_config.json"
tokenizer_path="tokenizer.json"
dataset_path="/path/to/safe_zinc_dataset"
output_dir="/path/to/output"

# Run the training script
safe-train \
    --config_path $config_path \
    --tokenizer_path $tokenizer_path \
    --dataset_path $dataset_path \
    --text_column "safe" \
    --optim "adamw_torch" \
    --report_to "wandb" \
    --load_best_model_at_end True \
    --metric_for_best_model "eval_loss" \
    --learning_rate 1e-4 \
    --per_device_train_batch_size 100 \
    --per_device_eval_batch_size 100 \
    --gradient_accumulation_steps 2 \
    --warmup_steps 10000 \
    --logging_first_step True \
    --save_steps 10000 \
    --eval_steps 10000 \
    --eval_accumulation_steps 1000 \
    --eval_strategy "steps" \
    --wandb_project "MAMBA_large" \
    --logging_steps 100 \
    --save_total_limit 1 \
    --output_dir $output_dir \
    --overwrite_output_dir True \
    --do_train True \
    --do_eval True \
    --save_safetensors True \
    --gradient_checkpointing True \
    --max_grad_norm 1.0 \
    --weight_decay 0.1 \
    --max_steps 250000
```

Make sure to adjust the paths and parameters according to your specific setup and requirements.

## Important Notes

1. Do not install both `safe-mol` and `mamba-safe` in the same environment to avoid conflicts. Use `safe-mol` for transformer architectures and `mamba-safe` for Mamba-based models.

2. CUDA is required to run this package efficiently, as `mamba_ssm` relies on CUDA for optimal performance.

## Citation

If you use Mamba-SAFE in your research, please cite the following papers:

```bibtex
@article{noutahi2024gotta,
  title={Gotta be SAFE: a new framework for molecular design},
  author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
  journal={Digital Discovery},
  volume={3},
  number={4},
  pages={796--804},
  year={2024},
  publisher={Royal Society of Chemistry}
}

@article{gu2023mamba,
  title={Mamba: Linear-time sequence modeling with selective state spaces},
  author={Gu, Albert and Dao, Tri},
  journal={arXiv preprint arXiv:2312.00752},
  year={2023}
}
```

## Contributing

We welcome contributions! Please see our [CONTRIBUTING.md](link-to-contributing-guide) for details on how to get started.

## License

This project is licensed under the MIT License - see the [LICENSE](link-to-license-file) file for details.

## Acknowledgments

We would like to express our sincere gratitude to:

- The SAFE authors for their pivotal work in sequence representation and molecular generation. Their contributions have been instrumental in the development of this library.
- The Mamba authors for their groundbreaking work in language model architectures. Their innovations have made this work possible.
- [SAFE](https://github.com/datamol-io/safe) for providing the molecular representation framework that forms the backbone of our approach.
- [Mamba](https://github.com/state-spaces/mamba) for developing the sequence modeling architecture that powers our models.

This library and the work it enables would not have been possible without their significant contributions to the field.

## Contact

For questions and support, please open an issue on our [GitHub repository](https://github.com/Anri-Lombard/DrugGPT) or contact Anri Lombard at anri.m.lombard@gmail.com.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Anri-Lombard/DrugGPT",
    "name": "mamba-safe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Anri Lombard",
    "author_email": "anri.m.lombard@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cf/d5/572de499172c2906ff67360456e18894f55354cd48d85e44b7a24eff1509/mamba_safe-1.0.1.tar.gz",
    "platform": null,
    "description": "# Mamba-SAFE: Molecular Generation with Mamba and SAFE\n\nMamba-SAFE is a framework for generating molecules using the Mamba architecture and the SAFE (Structure-Agnostic Few-shot Encoding) representation (although any other representation could be used if needed). This library combines the power of the Mamba sequence modeling architecture with the versatility of the SAFE molecular representation.\n\n## Features\n\n- Generate molecules using the Mamba architecture\n- Utilize the SAFE representation for molecular encoding\n\n## Installation\n\n### From PyPI\n\nTo install the latest stable version from PyPI:\n\n```bash\npip install mamba-safe\n```\n\n### From Source\n\nTo install the latest development version from source:\n\n```bash\ngit clone https://github.com/Anri-Lombard/DrugGPT.git\ncd DrugGPT/mamba_safe\npip install -e .\n```\n\n**Note:** Make sure you have CUDA installed, as `mamba_ssm` requires it (https://github.com/state-spaces/mamba).\n\n## Usage\n\n### Generating Molecules\n\nHere's a simple example of how to generate molecules using a trained Mamba-SAFE model:\n\n```python\nimport torch\nfrom mamba_safe import MAMBAModel, SAFETokenizer, SAFEDesign\n\n# Set up your model and parameters\nmodel_dir = \"path/to/your/model\"\ntokenizer_path = \"path/to/your/tokenizer\"\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n\n# Load model and tokenizer\nmamba_model = MAMBAModel.from_pretrained(model_dir, device=device)\nsafe_tokenizer = SAFETokenizer.from_pretrained(tokenizer_path)\n\n# Create designer\ndesigner = SAFEDesign(model=mamba_model, tokenizer=safe_tokenizer, verbose=True)\n\n# Generate molecules\ngenerated_smiles = designer.de_novo_generation(\n    n_samples_per_trial=100,\n    max_length=50,\n    sanitize=True,\n    top_k=15,\n    top_p=0.9,\n    temperature=0.7,\n    n_trials=10,\n    repetition_penalty=1.0\n)\n\n# Print the first 10 generated SMILES\nfor smi in generated_smiles[:10]:\n    print(smi)\n```\n\n### Training a Model from Scratch\n\nTo train a Mamba-SAFE model from scratch, you can use the `safe-train` CLI. Here's an example script:\n\n```bash\n#!/bin/bash\n\n# Set up environment variables\nexport WANDB_API_KEY=\"your_wandb_api_key\"\n\n# Set up paths\nconfig_path=\"example_config.json\"\ntokenizer_path=\"tokenizer.json\"\ndataset_path=\"/path/to/safe_zinc_dataset\"\noutput_dir=\"/path/to/output\"\n\n# Run the training script\nsafe-train \\\n    --config_path $config_path \\\n    --tokenizer_path $tokenizer_path \\\n    --dataset_path $dataset_path \\\n    --text_column \"safe\" \\\n    --optim \"adamw_torch\" \\\n    --report_to \"wandb\" \\\n    --load_best_model_at_end True \\\n    --metric_for_best_model \"eval_loss\" \\\n    --learning_rate 1e-4 \\\n    --per_device_train_batch_size 100 \\\n    --per_device_eval_batch_size 100 \\\n    --gradient_accumulation_steps 2 \\\n    --warmup_steps 10000 \\\n    --logging_first_step True \\\n    --save_steps 10000 \\\n    --eval_steps 10000 \\\n    --eval_accumulation_steps 1000 \\\n    --eval_strategy \"steps\" \\\n    --wandb_project \"MAMBA_large\" \\\n    --logging_steps 100 \\\n    --save_total_limit 1 \\\n    --output_dir $output_dir \\\n    --overwrite_output_dir True \\\n    --do_train True \\\n    --do_eval True \\\n    --save_safetensors True \\\n    --gradient_checkpointing True \\\n    --max_grad_norm 1.0 \\\n    --weight_decay 0.1 \\\n    --max_steps 250000\n```\n\nMake sure to adjust the paths and parameters according to your specific setup and requirements.\n\n## Important Notes\n\n1. Do not install both `safe-mol` and `mamba-safe` in the same environment to avoid conflicts. Use `safe-mol` for transformer architectures and `mamba-safe` for Mamba-based models.\n\n2. CUDA is required to run this package efficiently, as `mamba_ssm` relies on CUDA for optimal performance.\n\n## Citation\n\nIf you use Mamba-SAFE in your research, please cite the following papers:\n\n```bibtex\n@article{noutahi2024gotta,\n  title={Gotta be SAFE: a new framework for molecular design},\n  author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},\n  journal={Digital Discovery},\n  volume={3},\n  number={4},\n  pages={796--804},\n  year={2024},\n  publisher={Royal Society of Chemistry}\n}\n\n@article{gu2023mamba,\n  title={Mamba: Linear-time sequence modeling with selective state spaces},\n  author={Gu, Albert and Dao, Tri},\n  journal={arXiv preprint arXiv:2312.00752},\n  year={2023}\n}\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [CONTRIBUTING.md](link-to-contributing-guide) for details on how to get started.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](link-to-license-file) file for details.\n\n## Acknowledgments\n\nWe would like to express our sincere gratitude to:\n\n- The SAFE authors for their pivotal work in sequence representation and molecular generation. Their contributions have been instrumental in the development of this library.\n- The Mamba authors for their groundbreaking work in language model architectures. Their innovations have made this work possible.\n- [SAFE](https://github.com/datamol-io/safe) for providing the molecular representation framework that forms the backbone of our approach.\n- [Mamba](https://github.com/state-spaces/mamba) for developing the sequence modeling architecture that powers our models.\n\nThis library and the work it enables would not have been possible without their significant contributions to the field.\n\n## Contact\n\nFor questions and support, please open an issue on our [GitHub repository](https://github.com/Anri-Lombard/DrugGPT) or contact Anri Lombard at anri.m.lombard@gmail.com.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A framework to generate molecules with the mamba architecture",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/Anri-Lombard/DrugGPT"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4433458f4de0b162ce82e31cb86f5c1c56752ed2d24e1bf4c97244d6b555363c",
                "md5": "1cdddba964771d876e97e296ebfdaa9c",
                "sha256": "3d2cd4adb46839b1915c1d0e9f5e26ec0b95a6331f89168a39af5e370e238b33"
            },
            "downloads": -1,
            "filename": "mamba_safe-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1cdddba964771d876e97e296ebfdaa9c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 15821,
            "upload_time": "2024-09-02T19:32:23",
            "upload_time_iso_8601": "2024-09-02T19:32:23.997069Z",
            "url": "https://files.pythonhosted.org/packages/44/33/458f4de0b162ce82e31cb86f5c1c56752ed2d24e1bf4c97244d6b555363c/mamba_safe-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cfd5572de499172c2906ff67360456e18894f55354cd48d85e44b7a24eff1509",
                "md5": "d58cba56fd11a763cd10014612edb131",
                "sha256": "695700b4d940524470d96b7de3b8b90493eab2b2a008c781f4fa121029ae143d"
            },
            "downloads": -1,
            "filename": "mamba_safe-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d58cba56fd11a763cd10014612edb131",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 39737,
            "upload_time": "2024-09-02T19:32:25",
            "upload_time_iso_8601": "2024-09-02T19:32:25.813469Z",
            "url": "https://files.pythonhosted.org/packages/cf/d5/572de499172c2906ff67360456e18894f55354cd48d85e44b7a24eff1509/mamba_safe-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-02 19:32:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Anri-Lombard",
    "github_project": "DrugGPT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mamba-safe"
}
        
Elapsed time: 0.30954s