# Nonadditivity analysis
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
## Synposis
A program to find key complex patterns in SAR data
## Installation
The programm requires python >= 3.10.
In your already configured environment, install NonadditivityAnalysis with
```shell
pip install nonadditivity
```
### Create python environment with Conda
If you don't have an environment yet, you can use the provided (very minimal) conda environment file to create a valid environment.
```shell
conda env create -n <env_name> -f environment.yaml
conda activate <env_name>
```
or simply
```shell
conda create -n <env_name> python=3.*
```
with * being 10, 11, or 12.
Then use the `pip install nonadditivity` to install the programm.
### Dev Mode
If you want to install the package for development, some extra steps are required.
This package is managed by [Poetry](https://python-poetry.org/), so you first need to install poetry as described [here](https://python-poetry.org/docs/#installing-with-pipx).
After that, just clone the repository and install the code with poetry.
```shell
$ git clone git+https://github.com/KramerChristian/NonadditivityAnalysis.git
$ cd NonadditivityAnalysis
$ poetry install
```
## How to run the program and get help
The code runs as a simple command-line tool. Command line options are printed via
```shell
nonadditivity --help
```
## Example usage
Using the test files supplied, an example run can be
```shell
nonadditivity -i <input_file> -d <delimiter> --series-column <series_column_name> -p <property1> -p <property2> ... -u <unit1> -u <unit2>
```
or with the double-transformation cycles classification
```shell
nonadditivity -i <input_file> -d <delimiter> --series-column <series_column_name> -p <property1> -p <property2> ... -u <unit1> -u <unit2> --classify
```
### Input file format
IDENTIFIER [sep] SMILES [sep] property1 ... [sep] series_column(optional)
...
where [sep] is the separator and can be chosen from tab, space, comma, and
semicolon.
------------------
## Repo Structure
- [`examples`](example/): Contains some example input files.
- [`nonadditivity/`](nonadditivity/): Contains the source code for the package. See the [README](nonadditivity/README.md) in the folder for more info.
- [`tests`](tests/): Unit tests for the package.
- [`environment.yaml`](environment.yaml): Environment file for the conda environment.
- [`poetry.lock`](poetry.lock): File with the specification of libraries used (version and origin).
- [`pyproject.toml`](pyproject.toml): File containing build instructions for poetry as well as the metadata of the project.
## Publication
If you use this code for a publication, please cite
Kramer, C. Nonadditivity Analysis. J. Chem. Inf. Model. 2019, 59, 9, 4034–4042.
<https://pubs.acs.org/doi/10.1021/acs.jcim.9b00631>
Or cite Guasch et al if you are utilizing the classification module. (to be completed once the publication is accepted)
------------------
## Background
The overall process is:
1) Parse input:
- read structures
- clean and transform activity data
- remove Salts
2.) Compute MMPs
3.) Find double-transformation cycles
4.) Write to output & calculate statistics
### 1) Parse input
Ideally, the compounds are already standardized when input into nonadditivity
analysis. The code will not correct tautomers and charge state, but it will
attempt to desalt the input.
Since Nonadditivity analysis only makes sense on normally distributed data, the
input activity data can be transformed depending on the input units. You can choose
from "M", "mM", "uM", "nM", "pM", and "noconv". The 'xM' units will be transformed
to pActivity with the corresponding factors. 'noconv' keeps the input as is and does
not do any transformation.
For multiplicate structures, only the first occurence will be kept.
### 2) Compute MMPs
Matched Pairs will be computed based on the cleaned structures. This is done by a
subprocess call to the external mmpdb program. Per default, 20 parallel jobs are used
for the fragmentation. This can be changed on line 681.
### 3) Find double-transformation cycles
This is the heart of the Nonadditivity algorithm. Here, sets of four compounds that are
linked by two transformations are identified. For more details about the interpretation
see publication above.
### 4) Classify double-transformatoin cycles
Runs a bunch of classification functions that calculate topological as well as physico-chemical
properties of a double transformation cycle to help you filter out uninteresting cases when
analysing the created data. Only runs if `--classify` is provided in the command line.
### 5) Write to output and calculate statistics
Information about the compounds making up the cycles and the distribution of
nonadditivity is written to output files. [...] denotes the input file name.
The file named
`"NAA_output.csv"`
contains information about the cycles and the Probability distribution
The file named
`"perCompound.csv"`
contains information about the Nonadditivity aggregated per Compound across all cycles
where a given compound occurs.
The file named
`"c2c.csv"`
links the two files above and can be used for examnple for visualizations in SpotFire.
If you provide the `--classify` flag in the command line, `"NAA_output.csv"` and `"perCompound.csv"` will contain additional columns with the implemented descriptors.
If you provide the `--canonicalize` flag in the command line, there are two more files genrated.
The first file named
`"canonical_na_output.csv"`
is like the NAAOutput.csv, but the transformations are canonicalized, i.e. every transformation
is only occuring in one way (e.g. only "Cl>>F" and not both "Cl>>F" and "F>>Cl").
The second file named
`"canonical_transformations.csv"`
contains the transformations included here, so you can build yourself a quasi mmp analysis with this output.
------------------
## Copyright
The NonadditivityAnalysis code is copyright 2015-2024 by F. Hoffmann-La
Roche Ltd and distributed under Apache 2.0 license (see LICENSE.txt).
Raw data
{
"_id": null,
"home_page": "https://github.com/Roche/NonadditivityAnalysis",
"name": "nonadditivity",
"maintainer": "Dr. Laura Guasch",
"docs_url": null,
"requires_python": "<3.13,>3.9",
"maintainer_email": "laura.guasch@roche.com",
"keywords": "SAR, Cheminformatics, MMP, RDKit, Nonadditivity",
"author": "Dr. Christian Kramer",
"author_email": "christian.kramer@roche.com",
"download_url": "https://files.pythonhosted.org/packages/8b/8f/a36650229228031b136840b4dabb33b616016e28be36d779911be89098b1/nonadditivity-2.0.0.tar.gz",
"platform": null,
"description": "# Nonadditivity analysis\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n## Synposis\n\nA program to find key complex patterns in SAR data\n\n## Installation\n\nThe programm requires python >= 3.10.\nIn your already configured environment, install NonadditivityAnalysis with\n\n```shell\npip install nonadditivity\n```\n\n### Create python environment with Conda\n\nIf you don't have an environment yet, you can use the provided (very minimal) conda environment file to create a valid environment.\n\n```shell\nconda env create -n <env_name> -f environment.yaml\nconda activate <env_name>\n```\n\nor simply\n\n```shell\nconda create -n <env_name> python=3.*\n```\nwith * being 10, 11, or 12.\n\nThen use the `pip install nonadditivity` to install the programm.\n\n### Dev Mode\n\nIf you want to install the package for development, some extra steps are required.\n\nThis package is managed by [Poetry](https://python-poetry.org/), so you first need to install poetry as described [here](https://python-poetry.org/docs/#installing-with-pipx).\nAfter that, just clone the repository and install the code with poetry.\n\n```shell\n$ git clone git+https://github.com/KramerChristian/NonadditivityAnalysis.git\n$ cd NonadditivityAnalysis\n$ poetry install\n```\n\n## How to run the program and get help\n\nThe code runs as a simple command-line tool. Command line options are printed via\n\n```shell\nnonadditivity --help\n```\n\n## Example usage\n\nUsing the test files supplied, an example run can be\n\n```shell\nnonadditivity -i <input_file> -d <delimiter> --series-column <series_column_name> -p <property1> -p <property2> ... -u <unit1> -u <unit2>\n```\nor with the double-transformation cycles classification\n\n```shell\nnonadditivity -i <input_file> -d <delimiter> --series-column <series_column_name> -p <property1> -p <property2> ... -u <unit1> -u <unit2> --classify\n```\n\n### Input file format\n\nIDENTIFIER [sep] SMILES [sep] property1 ... [sep] series_column(optional)\n...\n\nwhere [sep] is the separator and can be chosen from tab, space, comma, and\nsemicolon.\n\n------------------\n\n## Repo Structure\n\n- [`examples`](example/): Contains some example input files.\n- [`nonadditivity/`](nonadditivity/): Contains the source code for the package. See the [README](nonadditivity/README.md) in the folder for more info.\n- [`tests`](tests/): Unit tests for the package.\n- [`environment.yaml`](environment.yaml): Environment file for the conda environment.\n- [`poetry.lock`](poetry.lock): File with the specification of libraries used (version and origin).\n- [`pyproject.toml`](pyproject.toml): File containing build instructions for poetry as well as the metadata of the project.\n\n## Publication\n\nIf you use this code for a publication, please cite\nKramer, C. Nonadditivity Analysis. J. Chem. Inf. Model. 2019, 59, 9, 4034\u20134042.\n\n<https://pubs.acs.org/doi/10.1021/acs.jcim.9b00631>\n\nOr cite Guasch et al if you are utilizing the classification module. (to be completed once the publication is accepted)\n\n------------------\n\n## Background\n\nThe overall process is:\n\n 1) Parse input:\n - read structures\n - clean and transform activity data\n - remove Salts\n\n 2.) Compute MMPs\n\n 3.) Find double-transformation cycles\n\n 4.) Write to output & calculate statistics\n\n### 1) Parse input\n\nIdeally, the compounds are already standardized when input into nonadditivity\nanalysis. The code will not correct tautomers and charge state, but it will\nattempt to desalt the input.\n\nSince Nonadditivity analysis only makes sense on normally distributed data, the\ninput activity data can be transformed depending on the input units. You can choose\nfrom \"M\", \"mM\", \"uM\", \"nM\", \"pM\", and \"noconv\". The 'xM' units will be transformed\nto pActivity with the corresponding factors. 'noconv' keeps the input as is and does\nnot do any transformation.\n\nFor multiplicate structures, only the first occurence will be kept.\n\n### 2) Compute MMPs\n\nMatched Pairs will be computed based on the cleaned structures. This is done by a\nsubprocess call to the external mmpdb program. Per default, 20 parallel jobs are used\nfor the fragmentation. This can be changed on line 681.\n\n### 3) Find double-transformation cycles\n\nThis is the heart of the Nonadditivity algorithm. Here, sets of four compounds that are\nlinked by two transformations are identified. For more details about the interpretation\nsee publication above.\n\n### 4) Classify double-transformatoin cycles\n\nRuns a bunch of classification functions that calculate topological as well as physico-chemical\nproperties of a double transformation cycle to help you filter out uninteresting cases when\nanalysing the created data. Only runs if `--classify` is provided in the command line.\n\n### 5) Write to output and calculate statistics\n\nInformation about the compounds making up the cycles and the distribution of\nnonadditivity is written to output files. [...] denotes the input file name.\nThe file named\n\n`\"NAA_output.csv\"`\n\ncontains information about the cycles and the Probability distribution\n\nThe file named\n\n`\"perCompound.csv\"`\n\ncontains information about the Nonadditivity aggregated per Compound across all cycles\nwhere a given compound occurs.\n\nThe file named\n\n`\"c2c.csv\"`\n\nlinks the two files above and can be used for examnple for visualizations in SpotFire.\n\nIf you provide the `--classify` flag in the command line, `\"NAA_output.csv\"` and `\"perCompound.csv\"` will contain additional columns with the implemented descriptors.\n\nIf you provide the `--canonicalize` flag in the command line, there are two more files genrated.\n\nThe first file named\n\n`\"canonical_na_output.csv\"`\n\nis like the NAAOutput.csv, but the transformations are canonicalized, i.e. every transformation\nis only occuring in one way (e.g. only \"Cl>>F\" and not both \"Cl>>F\" and \"F>>Cl\").\n\nThe second file named\n\n`\"canonical_transformations.csv\"`\n\ncontains the transformations included here, so you can build yourself a quasi mmp analysis with this output.\n\n------------------\n\n## Copyright\n\nThe NonadditivityAnalysis code is copyright 2015-2024 by F. Hoffmann-La\nRoche Ltd and distributed under Apache 2.0 license (see LICENSE.txt).\n",
"bugtrack_url": null,
"license": "LICENSE",
"summary": "A program to find key complex patterns in SAR data",
"version": "2.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/Roche/NonadditivityAnalysis/issues",
"Homepage": "https://github.com/Roche/NonadditivityAnalysis",
"Repository": "https://github.com/Roche/NonadditivityAnalysis",
"Source Code": "https://github.com/Roche/NonadditivityAnalysis/"
},
"split_keywords": [
"sar",
" cheminformatics",
" mmp",
" rdkit",
" nonadditivity"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "33ee3d2b7cf961031115ed905c2f36f0968990ea48b340b986321fbe152180cc",
"md5": "a2f12d44221ca7fcc9350a0a3e7f32c4",
"sha256": "588362a6f06f9a640b81765fd2c93a2b46f8970658fd1ed653d1275bae3ddc03"
},
"downloads": -1,
"filename": "nonadditivity-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a2f12d44221ca7fcc9350a0a3e7f32c4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>3.9",
"size": 66887,
"upload_time": "2024-07-14T22:27:35",
"upload_time_iso_8601": "2024-07-14T22:27:35.696729Z",
"url": "https://files.pythonhosted.org/packages/33/ee/3d2b7cf961031115ed905c2f36f0968990ea48b340b986321fbe152180cc/nonadditivity-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8b8fa36650229228031b136840b4dabb33b616016e28be36d779911be89098b1",
"md5": "64baba034645f4dbb6bb39f4bee3fca9",
"sha256": "9cd2583703763fa6985a27167e450b940dffd2e8aead84d11b8b10519a4d555d"
},
"downloads": -1,
"filename": "nonadditivity-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "64baba034645f4dbb6bb39f4bee3fca9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>3.9",
"size": 56069,
"upload_time": "2024-07-14T22:27:37",
"upload_time_iso_8601": "2024-07-14T22:27:37.479335Z",
"url": "https://files.pythonhosted.org/packages/8b/8f/a36650229228031b136840b4dabb33b616016e28be36d779911be89098b1/nonadditivity-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-14 22:27:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Roche",
"github_project": "NonadditivityAnalysis",
"github_not_found": true,
"lcname": "nonadditivity"
}