# RXN package for OpenNMT-based models
[](https://github.com/rxn4chemistry/rxn-onmt-models/actions)
This repository contains a Python package and associated scripts for training reaction models based on the OpenNMT library.
The repository is built on top of other RXN packages; see our other repositories [`rxn-utilities`](https://github.com/rxn4chemistry/rxn-utilities), [`rxn-chemutils`](https://github.com/rxn4chemistry/rxn-chemutils), and [`rxn-onmt-utils`](https://github.com/rxn4chemistry/rxn-onmt-utils).
For the evaluation of trained models, see the [`rxn-metrics`](https://github.com/rxn4chemistry/rxn-metrics) repository.
Links:
* [GitHub repository](https://github.com/rxn4chemistry/rxn-onmt-models)
* [Documentation](https://rxn4chemistry.github.io/rxn-onmt-models/)
* [PyPI package](https://pypi.org/project/rxn-onmt-models/)
This repository was produced through a collaborative project involving IBM Research Europe and Syngenta.
## System Requirements
This package is supported on all operating systems.
It has been tested on the following systems:
+ macOS: Big Sur (11.1)
+ Linux: Ubuntu 18.04.4
A Python version of 3.6, 3.7, or 3.8 is recommended.
Python versions 3.9 and above are not expected to work due to compatibility with the selected version of OpenNMT.
## Installation guide
The package can be installed from Pypi:
```bash
pip install rxn-onmt-models[rdkit]
```
You can leave out `[rdkit]` if RDKit is already available in your environment.
For local development, the package can be installed with:
```bash
pip install -e ".[dev,rdkit]"
```
## Training models.
Example of usage for training RXN models
### The easy way
Simply execute the interactive program `rxn-plan-training` in your terminal and follow the instructions.
### The complicated way
0. Optional: set shell variables, to be used in the commands later on.
```shell
MODEL_TASK="forward"
# Existing TXT files
DATA_1="/path/to/data_1.txt"
DATA_2="/path/to/data_2.txt"
DATA_3="/path/to/data_3.txt"
# Where to put the processed data
DATA_DIR_1="/path/to/processed_data_1"
DATA_DIR_2="/path/to/processed_data_2"
DATA_DIR_3="/path/to/processed_data_3"
# Where to save the ONMT-preprocessed data
PREPROCESSED="/path/to/onmt-preprocessed"
# Where to save the models
MODELS="/path/to/models"
MODELS_FINETUNED="/path/to/models_finetuned"
```
1. Prepare the data (standardization, filtering, etc.)
```shell
rxn-prepare-data --input_data $DATA_1 --output_dir $DATA_DIR_1
```
2. Preprocess the data with OpenNMT
```shell
rxn-onmt-preprocess --input_dir $DATA_DIR_1 --output_dir $PREPROCESSED --model_task $MODEL_TASK
```
3. Train the model (here with small parameter values, to make it fast on CPU for testing).
```shell
rxn-onmt-train --model_output_dir $MODELS --preprocess_dir $PREPROCESSED_SINGLE --train_num_steps 10 --batch_size 4 --heads 2 --layers 2 --transformer_ff 512 --no_gpu
```
### Multi-task training
For multi-task training, the process is similar.
We need to prepare also the second data set; in addition, the OpenNMT preprocessing and training take additional arguments.
To sum up:
```shell
rxn-prepare-data --input_data $DATA_1 --output_dir $DATA_DIR_1
rxn-prepare-data --input_data $DATA_2 --output_dir $DATA_DIR_2
rxn-prepare-data --input_data $DATA_2 --output_dir $DATA_DIR_3
rxn-onmt-preprocess --input_dir $DATA_DIR_1 --output_dir $PREPROCESSED --model_task $MODEL_TASK \
--additional_data $DATA_DIR_2 --additional_data $DATA_DIR_3
rxn-onmt-train --model_output_dir $MODELS --preprocess_dir $PREPROCESSED --train_num_steps 30 --batch_size 4 --heads 2 --layers 2 --transformer_ff 256 --no_gpu \
--data_weights 1 --data_weights 3 --data_weights 4
```
### Continuing the training
Continuing training is possible (for both single-task and multi-task); it needs fewer parameters:
```shell
rxn-onmt-continue-training --model_output_dir $MODELS --preprocess_dir $PREPROCESSED --train_num_steps 30 --batch_size 4 --no_gpu \
--data_weights 1 --data_weights 3 --data_weights 4
```
### Fine-tuning
Fine-tuning is in principle similar to continuing the training.
The main differences are the potential occurrence of new tokens, as well as the optimizer being reset.
There is a dedicated command for fine-tuning. For example:
```shell
rxn-onmt-finetune --model_output_dir $MODELS_FINETUNED --preprocess_dir $PREPROCESSED --train_num_steps 20 --batch_size 4 --no_gpu \
--train_from $MODELS/model_step_30.pt
```
The syntax is very similar to `rxn-onmt-train` and `rxn-onmt-continue-training`.
This is compatible both with single-task and multi-task.
Raw data
{
"_id": null,
"home_page": "https://github.com/rxn4chemistry/rxn-onmt-models",
"name": "rxn-onmt-models",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "IBM RXN team",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/16/59/f53a51b7cb9221f69b49360699926cfe7b759b0e6f891001f31c1f4f4757/rxn-onmt-models-1.1.0.tar.gz",
"platform": null,
"description": "# RXN package for OpenNMT-based models\n\n[](https://github.com/rxn4chemistry/rxn-onmt-models/actions)\n\nThis repository contains a Python package and associated scripts for training reaction models based on the OpenNMT library.\nThe repository is built on top of other RXN packages; see our other repositories [`rxn-utilities`](https://github.com/rxn4chemistry/rxn-utilities), [`rxn-chemutils`](https://github.com/rxn4chemistry/rxn-chemutils), and [`rxn-onmt-utils`](https://github.com/rxn4chemistry/rxn-onmt-utils).\n\nFor the evaluation of trained models, see the [`rxn-metrics`](https://github.com/rxn4chemistry/rxn-metrics) repository.\n\nLinks:\n* [GitHub repository](https://github.com/rxn4chemistry/rxn-onmt-models)\n* [Documentation](https://rxn4chemistry.github.io/rxn-onmt-models/)\n* [PyPI package](https://pypi.org/project/rxn-onmt-models/)\n\nThis repository was produced through a collaborative project involving IBM Research Europe and Syngenta.\n\n## System Requirements\n\nThis package is supported on all operating systems.\nIt has been tested on the following systems:\n+ macOS: Big Sur (11.1)\n+ Linux: Ubuntu 18.04.4\n\nA Python version of 3.6, 3.7, or 3.8 is recommended.\nPython versions 3.9 and above are not expected to work due to compatibility with the selected version of OpenNMT.\n\n## Installation guide\n\nThe package can be installed from Pypi:\n```bash\npip install rxn-onmt-models[rdkit]\n```\nYou can leave out `[rdkit]` if RDKit is already available in your environment.\n\nFor local development, the package can be installed with:\n```bash\npip install -e \".[dev,rdkit]\"\n```\n\n## Training models.\n\nExample of usage for training RXN models\n\n### The easy way\n\nSimply execute the interactive program `rxn-plan-training` in your terminal and follow the instructions.\n\n### The complicated way\n\n0. Optional: set shell variables, to be used in the commands later on.\n\n```shell\nMODEL_TASK=\"forward\"\n\n# Existing TXT files\nDATA_1=\"/path/to/data_1.txt\"\nDATA_2=\"/path/to/data_2.txt\"\nDATA_3=\"/path/to/data_3.txt\"\n\n# Where to put the processed data\nDATA_DIR_1=\"/path/to/processed_data_1\"\nDATA_DIR_2=\"/path/to/processed_data_2\"\nDATA_DIR_3=\"/path/to/processed_data_3\"\n\n# Where to save the ONMT-preprocessed data\nPREPROCESSED=\"/path/to/onmt-preprocessed\"\n\n# Where to save the models\nMODELS=\"/path/to/models\"\nMODELS_FINETUNED=\"/path/to/models_finetuned\"\n```\n\n1. Prepare the data (standardization, filtering, etc.)\n\n```shell\nrxn-prepare-data --input_data $DATA_1 --output_dir $DATA_DIR_1\n```\n\n2. Preprocess the data with OpenNMT\n\n```shell\nrxn-onmt-preprocess --input_dir $DATA_DIR_1 --output_dir $PREPROCESSED --model_task $MODEL_TASK\n```\n\n3. Train the model (here with small parameter values, to make it fast on CPU for testing).\n\n```shell\nrxn-onmt-train --model_output_dir $MODELS --preprocess_dir $PREPROCESSED_SINGLE --train_num_steps 10 --batch_size 4 --heads 2 --layers 2 --transformer_ff 512 --no_gpu\n```\n\n### Multi-task training\n\nFor multi-task training, the process is similar. \nWe need to prepare also the second data set; in addition, the OpenNMT preprocessing and training take additional arguments.\nTo sum up:\n\n```shell\nrxn-prepare-data --input_data $DATA_1 --output_dir $DATA_DIR_1\nrxn-prepare-data --input_data $DATA_2 --output_dir $DATA_DIR_2\nrxn-prepare-data --input_data $DATA_2 --output_dir $DATA_DIR_3\nrxn-onmt-preprocess --input_dir $DATA_DIR_1 --output_dir $PREPROCESSED --model_task $MODEL_TASK \\\n --additional_data $DATA_DIR_2 --additional_data $DATA_DIR_3\nrxn-onmt-train --model_output_dir $MODELS --preprocess_dir $PREPROCESSED --train_num_steps 30 --batch_size 4 --heads 2 --layers 2 --transformer_ff 256 --no_gpu \\\n --data_weights 1 --data_weights 3 --data_weights 4\n```\n\n### Continuing the training\n\nContinuing training is possible (for both single-task and multi-task); it needs fewer parameters:\n```shell\nrxn-onmt-continue-training --model_output_dir $MODELS --preprocess_dir $PREPROCESSED --train_num_steps 30 --batch_size 4 --no_gpu \\\n --data_weights 1 --data_weights 3 --data_weights 4\n```\n\n### Fine-tuning\n\nFine-tuning is in principle similar to continuing the training. \nThe main differences are the potential occurrence of new tokens, as well as the optimizer being reset.\nThere is a dedicated command for fine-tuning. For example:\n```shell\nrxn-onmt-finetune --model_output_dir $MODELS_FINETUNED --preprocess_dir $PREPROCESSED --train_num_steps 20 --batch_size 4 --no_gpu \\\n --train_from $MODELS/model_step_30.pt\n```\nThe syntax is very similar to `rxn-onmt-train` and `rxn-onmt-continue-training`.\nThis is compatible both with single-task and multi-task.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Training of OpenNMT-based RXN models",
"version": "1.1.0",
"project_urls": {
"Documentation": "https://rxn4chemistry.github.io/rxn-onmt-models/",
"Homepage": "https://github.com/rxn4chemistry/rxn-onmt-models",
"Repository": "https://github.com/rxn4chemistry/rxn-onmt-models"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "23c07074652131ff547d8e3d0d96159069617767b857a1c6dbfa811c926394d9",
"md5": "a54d561129829c226c06543fa790b9bd",
"sha256": "62975185ac0fb8572bda9b6e1bcd59adcab31b6c33fb2980bed0ea56ede8a72e"
},
"downloads": -1,
"filename": "rxn_onmt_models-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a54d561129829c226c06543fa790b9bd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 29513,
"upload_time": "2023-10-05T10:24:13",
"upload_time_iso_8601": "2023-10-05T10:24:13.459073Z",
"url": "https://files.pythonhosted.org/packages/23/c0/7074652131ff547d8e3d0d96159069617767b857a1c6dbfa811c926394d9/rxn_onmt_models-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1659f53a51b7cb9221f69b49360699926cfe7b759b0e6f891001f31c1f4f4757",
"md5": "e6fc3c0f1fd2b7698fba2cbedac78ae5",
"sha256": "b2fb6c8b72340c51f380cb339c40ff97a7ad72e388afd0bcfa3f6d94970667ab"
},
"downloads": -1,
"filename": "rxn-onmt-models-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e6fc3c0f1fd2b7698fba2cbedac78ae5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 25636,
"upload_time": "2023-10-05T10:24:15",
"upload_time_iso_8601": "2023-10-05T10:24:15.047653Z",
"url": "https://files.pythonhosted.org/packages/16/59/f53a51b7cb9221f69b49360699926cfe7b759b0e6f891001f31c1f4f4757/rxn-onmt-models-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-05 10:24:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rxn4chemistry",
"github_project": "rxn-onmt-models",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "rxn-onmt-models"
}