# SiPMai: A Simple Yet Effective Scanning Probe Microscope Auto Image Generator for Deep Learning
This project provides a streamlined pipeline for generating and handling molecular data, specifically for use in machine learning models. The toolkit involves generating SMILES representations, using ray tracing to create images, and preparing dataset indices for training, validation, and testing sets.
## Table of Contents
1. [Project Description](#project-description)
2. [Installation](#installation)
3. [Usage](#usage)
4. [Credits](#credits)
5. [License](#license)
## Project Description
This repository contains several Python scripts that together form a pipeline for the generation and management of molecular data. Specifically, it includes:
1. `gen_data/smile_generation.py`: A script for generating SMILES representations of molecules. It requires a CSV file containing molecule data as input and produces a JSON file containing the generated SMILES strings.
2. `gen_data/ray_generation.py`: A script that uses ray tracing to generate images of the molecules described by the SMILES strings. It has several options for customization, such as resolution, blur, and the use of motion blur and gaussian noise.
3. `gen_data/prepare_dataset.py`: A script that creates indices for the generated molecules and splits them into training, validation, and testing sets. It creates JSON files containing these indices.
The scripts are designed to be used in sequence, but can also be used independently if needed.
## Installation
This project is written in Python and requires the following Python libraries:
Please note that Python > 3.10 is not supported (due to Ray).
```
"numpy>=1.16,<1.24",
"torch>=1.4.0",
"packaging",
"tqdm",
"scikit-learn",
"matplotlib",
"scipy",
"pandas",
"opencv-python",
"numba",
"rdkit",
"ray",
```
You can install these libraries using pip:
```bash
pip install SiPMai
```
or build from source:
Open your terminal and execute the following command:
```sh
git clone https://github.com/GilesLuo/SiPMai.git
cd SiPMai
python setup.py install
```
## Usage
### Data generation:
In a ternimal, do:
```
generate_pubchem
```
It will generate a 100k dataset for molecules with 39<=`num_atom`<=200 in the **command execution directory**.
You may modify the generation configuration by doing:
```
generate_pubchem --your_args
```
Please refer to `SiPMai/gen_data/gen_all_data_pipeline.py` for a complete list of arguments.
Equivalently, you can call the main() function directly from a python script, such as:
```
import SiPMai
from SiPMai.gen_data.gen_all_data_pipeline import gen_all_data, main
main() # generate with preset arguments
# or
from SimpTM.gen_data.gen_all_data_pipeline import gen_all
gen_all_data(many_args) # generate with user-defined arguments
```
### I get `MemoryError: Unable to allocate internal buffer.`
This is typically because our code, by default, use all your CPU cores for generation. You're trying to serialize is too large to fit into memory, or if your system is running low on available memory.
A simple fix is to set `num_cpus` properly. For example, you may use
```
generate_pubchem --num_cpus 4
```
### Loading Data
We also provide a Pytorch DataLoader template to load the generated datasets. Details please refer to `SiPMai/utils/dataloader`.
More features are under development. Please feel free to raise issues and participate in developing this tool.
## Credits
This project was made possible thanks to the contributions of the team members and the use of multiple open-source libraries.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "http://github.com/GilesLuo/SiPMai",
"name": "SiPMai",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6, <3.10",
"maintainer_email": "",
"keywords": "Chemistry Simulation,STM Image Synthesis",
"author": "Zhiyao Luo, Yaotian Yang, Jiali Li",
"author_email": "zhiyao.luo@eng.ox.ac.uk",
"download_url": "https://files.pythonhosted.org/packages/6b/19/80fd3c7bcee8a620fd02b9a97395ae78cb9fd8c9d53315789342d6d8eb3f/SiPMai-0.0.20.tar.gz",
"platform": null,
"description": "# SiPMai: A Simple Yet Effective Scanning Probe Microscope Auto Image Generator for Deep Learning\r\n\r\nThis project provides a streamlined pipeline for generating and handling molecular data, specifically for use in machine learning models. The toolkit involves generating SMILES representations, using ray tracing to create images, and preparing dataset indices for training, validation, and testing sets.\r\n\r\n## Table of Contents\r\n\r\n1. [Project Description](#project-description)\r\n2. [Installation](#installation)\r\n3. [Usage](#usage)\r\n4. [Credits](#credits)\r\n5. [License](#license)\r\n\r\n## Project Description\r\n\r\nThis repository contains several Python scripts that together form a pipeline for the generation and management of molecular data. Specifically, it includes:\r\n\r\n1. `gen_data/smile_generation.py`: A script for generating SMILES representations of molecules. It requires a CSV file containing molecule data as input and produces a JSON file containing the generated SMILES strings.\r\n2. `gen_data/ray_generation.py`: A script that uses ray tracing to generate images of the molecules described by the SMILES strings. It has several options for customization, such as resolution, blur, and the use of motion blur and gaussian noise.\r\n3. `gen_data/prepare_dataset.py`: A script that creates indices for the generated molecules and splits them into training, validation, and testing sets. It creates JSON files containing these indices.\r\n\r\nThe scripts are designed to be used in sequence, but can also be used independently if needed.\r\n\r\n## Installation\r\n\r\nThis project is written in Python and requires the following Python libraries:\r\n\r\nPlease note that Python > 3.10 is not supported (due to Ray).\r\n\r\n```\r\n \"numpy>=1.16,<1.24\",\r\n \"torch>=1.4.0\",\r\n \"packaging\",\r\n \"tqdm\",\r\n \"scikit-learn\",\r\n \"matplotlib\",\r\n \"scipy\",\r\n \"pandas\",\r\n \"opencv-python\",\r\n \"numba\",\r\n \"rdkit\",\r\n \"ray\",\r\n```\r\n\r\nYou can install these libraries using pip:\r\n\r\n```bash\r\npip install SiPMai\r\n```\r\n\r\nor build from source:\r\n\r\nOpen your terminal and execute the following command:\r\n\r\n```sh\r\ngit clone https://github.com/GilesLuo/SiPMai.git\r\ncd SiPMai\r\npython setup.py install\r\n```\r\n\r\n## Usage\r\n\r\n### Data generation:\r\n\r\nIn a ternimal, do:\r\n\r\n```\r\ngenerate_pubchem\r\n```\r\n\r\nIt will generate a 100k dataset for molecules with 39<=`num_atom`<=200 in the **command execution directory**.\r\n\r\nYou may modify the generation configuration by doing:\r\n\r\n```\r\ngenerate_pubchem --your_args\r\n```\r\n\r\nPlease refer to `SiPMai/gen_data/gen_all_data_pipeline.py` for a complete list of arguments.\r\n\r\nEquivalently, you can call the main() function directly from a python script, such as:\r\n\r\n```\r\nimport SiPMai\r\nfrom SiPMai.gen_data.gen_all_data_pipeline import gen_all_data, main\r\n\r\nmain() # generate with preset arguments\r\n\r\n# or \r\n\r\nfrom SimpTM.gen_data.gen_all_data_pipeline import gen_all\r\ngen_all_data(many_args) # generate with user-defined arguments\r\n```\r\n\r\n### I get `MemoryError: Unable to allocate internal buffer.`\r\n\r\nThis is typically because our code, by default, use all your CPU cores for generation. You're trying to serialize is too large to fit into memory, or if your system is running low on available memory.\r\n\r\nA simple fix is to set `num_cpus` properly. For example, you may use\r\n\r\n```\r\ngenerate_pubchem --num_cpus 4\r\n```\r\n\r\n### Loading Data\r\n\r\nWe also provide a Pytorch DataLoader template to load the generated datasets. Details please refer to `SiPMai/utils/dataloader`.\r\n\r\nMore features are under development. Please feel free to raise issues and participate in developing this tool.\r\n\r\n## Credits\r\n\r\nThis project was made possible thanks to the contributions of the team members and the use of multiple open-source libraries.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Simple Yet Effective Scanning Tunnel Microscope Image Simulator",
"version": "0.0.20",
"project_urls": {
"Homepage": "http://github.com/GilesLuo/SiPMai",
"Source Code": "https://github.com/GilesLuo/SiPMai"
},
"split_keywords": [
"chemistry simulation",
"stm image synthesis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "993a23600deab2b118863e4bd508371dd07c44c3619bb7dcc93bc7271737ef0f",
"md5": "54d8b05023dac541065db810e86c7a48",
"sha256": "dbf27627d1139464790eb68b6e1881bd17376cffa146df5a8c1e877d80bfcb7c"
},
"downloads": -1,
"filename": "SiPMai-0.0.20-py3-none-any.whl",
"has_sig": false,
"md5_digest": "54d8b05023dac541065db810e86c7a48",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6, <3.10",
"size": 7122978,
"upload_time": "2023-11-21T14:54:56",
"upload_time_iso_8601": "2023-11-21T14:54:56.720521Z",
"url": "https://files.pythonhosted.org/packages/99/3a/23600deab2b118863e4bd508371dd07c44c3619bb7dcc93bc7271737ef0f/SiPMai-0.0.20-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6b1980fd3c7bcee8a620fd02b9a97395ae78cb9fd8c9d53315789342d6d8eb3f",
"md5": "815604bb1561db21050cc8946091e535",
"sha256": "95f755adb22b7c515fe5d2ed3a4231126dbd13a08632a95ecd50ecf2624459cf"
},
"downloads": -1,
"filename": "SiPMai-0.0.20.tar.gz",
"has_sig": false,
"md5_digest": "815604bb1561db21050cc8946091e535",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6, <3.10",
"size": 6699398,
"upload_time": "2023-11-21T14:54:59",
"upload_time_iso_8601": "2023-11-21T14:54:59.465732Z",
"url": "https://files.pythonhosted.org/packages/6b/19/80fd3c7bcee8a620fd02b9a97395ae78cb9fd8c9d53315789342d6d8eb3f/SiPMai-0.0.20.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-21 14:54:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "GilesLuo",
"github_project": "SiPMai",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sipmai"
}