ccmap


Nameccmap JSON
Version 4.0.2 PyPI version JSON
download
home_pagehttps://github.com/MMSB-MOBI/ccmap
SummaryA C implementation of a mesh based atomic pairwise distance computating engine, with docking pose generation capabilities and fast solvant accessible surface estimation
upload_time2023-01-12 18:42:59
maintainer
docs_urlNone
authorG.Launay
requires_python
license
keywords protein docking bioinformatics structure
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # A Python package and C Library for fast molecular contact map computation

[Current Version 2.1.3](https://pypi.org/project/ccmap/)

This package was designed as a tool to quickly compute thousands of sets of atomic or residue molecular contacts. The contacts can be evaluated inside a single body or across two bodies. The library scales well, with the support of the native python multithreading.
The module also provides docking poses evaluation by the application of triplets of Euler angles and translation vectors to initial unbound conformations.

## Installing and using the python module

### Installation

Should be as simple as `pip intstall ccmap`. Alternatively you can clone this repo and run `python setup.py install` at the root folder.
Current release was successfully installed through pip on the following combinations of interpreter/platforms.

* python3.8/OSX.10.14.6
* python3.8/Ubuntu LTS

### Usage

From there you can load the package and display its help.

```python
import ccmap
help(ccmap)
```

#### Functions

Four functions are available:

* cmap: computes the contacts of one single/two body molecule
* lcmap: computes the contacts of a list of single/two body molecules
* zmap: computes the contacts between a receptor and a ligand molecule after applying transformations to the ligand coordinates
* lzmap: computes many sets of contacts between a receptor and a ligand molecule, one for each applied ligand transformation

#### Parameters

All module functions take molecular object coordinates as dictionaries, where keys are atoms descriptors and values are lists.

* 'x' : list of float x coordinates
* 'y' : list of float x coordinates
* 'y' : list of float x coordinates
* 'seqRes' : list of strings
* 'chainID' : list of one-letter string
* 'resName' : list of strings
* 'name' : list of strings

#### Additional arguments

##### Contact threshold distance

In Angstrom's unit, its default value is 4.5. It can be redefined by the name parameter `d`.

##### encode : Boolean

If True, contacts are returned as integers. Each integer encoding one pair of atoms/residues positions in contact with this simple formula,

```python
def K2IJ(k, sizeBody1, sizeBody2):
    nCol = sizeBody2 if sizeBody2 else sizeBody1
    return int(k/nCol), k%nCol
```

if False, contacts are returned as strings of JSON Objects

##### atomic : Boolean

If True, compute contact at the atomic level. By default, this if False and the contacts are computed at the residue level.

##### apply : Boolean

If True, the past dictionaries of coordinates will be modified according to Euler/translation parameters.
This is useful to generate single docking conformation.
This argument is only available for the **cmap** function.

##### offsetRec and offsetLig

When working with protein docking data, unbound conformations are often centered to the origin of the coordinates system. Specify the translation vectors for each body with the `offsetRec` and `offsetLig` named arguments. Only available for the **zmap** and **lzmap** functions.

### Working with PDB coordinates files

#### Parsing coordinate data

We usually work with molecules in the PDB format. We can use the [pyproteinsExt](https://pypi.org/search/?q=pyproteinsExt) package to handle the boilerplate. 

```python
import pyproteinsExt
parser = PDB.Parser()
pdbREC = parser.load(file="dummy_A.pdb")
pdbDictREC = pdbREC.atomDictorize
pdbDictREC.keys()
#dict_keys(['x', 'y', 'z', 'seqRes', 'chainID', 'resName', 'name']) ```
```

By convention, following examples will use two molecules names REC(eptor) and LIG(and).

```python
pdbLIG = parser.load(file="dummy_B.pdb")
pdbDictLIG = pdbLIG.atomDictorize
pdbDictLIG.keys()
#dict_keys(['x', 'y', 'z', 'seqRes', 'chainID', 'resName', 'name']) ```
```

## Examples

### Computing single body contact map

#### Computing one map

Setting contact distance of 6.0 and recovering residue-residue contact as an integer list.

```python
ccmap.cmap(pdbDictLIG, d=6.0, encode=True)
```

#### Computing many maps

Using default contact distance and recovering atomic contact maps as JSON object string. The first positional argument specifies a list of bodies to process independently. 

```python
import json
json.load( ccmap.lcmap([ pdbDictLIG, pdbDictREC ], atomic=True) )
```

### Computing two-body contact map

#### Straight computation of one map

The second positional argument of **cmap** is optional and defines the second body.

```python
ccmap.cmap(pdbDictLIG, pdbDictLIG, d=6.0, encode=True)
```

#### Straight computation of many maps

The second positional argument of **lcmap** is an optional list of second bodies. The first two arguments must be of the same size, as the *i*-element of the first will be processed with the *i*-element of the second.

```python
ccmap.lcmap([pdbDictREC_1, ..., pdbDictREC_n], [pdbDictLIG_1, pdbDictLIG_n], d=6.0, encode=True)
```

#### Computation of one map after conformational change

Use the **zmap** function with third and fourth positional arguments respectively specifying the :

* Euler angles triplet
* translation vector

```python
ccmap.zmap(pdbDictREC, pdbDictLIG , (e1, e2, e3), (t1, t2, t3) )
```

Transformations are always applied to the coordinates provided as a second argument, e.g. : `pdbDictLIG`.

#### Computation of many maps after conformational changes

Use the **lzmap** function, arguments are similar but for the Euler angles and translation vectors which must be supplied as lists.

```python
ccmap.lzmap(pdbDictREC, pdbDictLIG , [(e1, e2, e3),], [(t1, t2, t3),] )
```

### Generating docking conformations

The conformations obtained by coordinate transformation can be back mapped to PDB files.
Here, offset vectors `[u1, u2, u3]` and `[v1, v2, v3]` respectively center `pdbDictREC` and `pdbDictLIG` and one transformation defined by the `[e1, e2, e3]` Euler's angles and the `[t1, t2, t3]` translation vector is applied to `pdbDictLIG`. The resulting two-body conformation is finally **applied** to the provided `pdbDictREC` and `pdbDictLIG`. These updated coordinates update the original PDB object for later writing to file.

```python
# Perform computation & alter provided dictionaries
ccmap.zmap( pdbDictREC, pdbDictLIG,
\ [e1, e2, e3], [t1, t2, t3],
\ offsetRec=[u1, u2, u3],
\ offsetLig=[v1, v2, v3],
\ apply=True)
# Update PDB containers from previous examples
pdbREC.setCoordinateFromDictorize(pdbDictREC)
pdbLIG.setCoordinateFromDictorize(pdbDictLIG)
# Dump to coordinate files
with open("new_receptor.pdb", "w") as fp:
    fp.write( str(pdbREC) )
with open("new_ligand.pdb", "w") as fp:
    fp.write( str(pdbLIG) )
```

## Multithreading

The C implementation makes it possible for the ccmap functions to release Python Global Interpreter Lock. Hence, "actual" multithreading can be achieved and performances scale decently with the number of workers. For this benchmark, up to 50000 docking poses were generated and processed for three coordinate sets of increasing number of atoms: 1974([1GL1](https://www.rcsb.org/structure/1GL1)) 3424([1F34](https://www.rcsb.org/structure/1F34)) 10677([2VIS](https://www.rcsb.org/structure/2VIS)).

<figure> <img src="notebook/img/LZMAP_benchmark_1.png" alt="benchmark" /> </figure>

A simple example of a multithread implementation can be found in the provided [script](tests/scripts/threadsTest.py). The `tests` folder allows for the reproduction of the above benchmark.

## Installing and using the C Library

C executable can be generated with the provided makefile. The low-level functions are the same, but the following limitations exist:

* One computation per executable call
* No multithreading.

# Finding the optimal molecular path connecting two atoms
Using a thiner mesh size, it is possible to obtain the shortest path connecting two atoms.
The atoms to connect must be solvent accessible and path search will operate over the surface and solvant accessible cells. The solvent excluded volume is computed for each atom as the sum of of its Van Der Waals radius and the radius of water molecule.
## install
```sh
 git clone -b fibonacci git@github.com:MMSB-MOBI/ccmap.git
 cd ccmap/ccmap
 make pathfinder
```
## Usage

<!--
`./bin/linky -x 'ILE:A:1:CA' -y 'LYS:A:13:N' -i ../tests/structures/small_peptide_noH_1model.pdb`
-->

`./bin/linky -i ../tests/structures/gil_input.pdb  -x 'LI1:B:502:CA' -y 'LI2:C:502:CA' -s 1`

will display:

```shell
Applying H20 probe radius of 1.4 A. to atomic solvant volume exclusion
User atom selection:
	Start atom:	" CA  LI1  502 B 14.024000 15.909000 -5.891000"
	End atom :	" CA  LI2  502 C 8.770000 25.230000 -5.380000"
Mesh [459x408x316] created: 3959 cells contain atoms
Building surfaces w/ mesh unit of 0.2 A. ...
	Total of 12749100 voxels constructed
Searching for start/stop cells at start/stop atoms surfaces...
	start/stop surfaces contain 391/1058 voxels, picking closest to the other volume center cell ...
Starting from cell (240,281, 134) (b=0)
Trying to reachcell (230,307, 139) (b=0)
	---Best walk is made of 47 moves---
Theoritical distance from 1st vox_path to start atom
	voxel [240 281 134] 11.018,16.813,-6.889
	atom:  CA  LI1  502 B 14.024000 15.909000 -5.891000
	d=3.29382 A.
Start/Stop atoms exclusion radius: 3.2 / 3.2 A.
Best pathway -- aprox. polyline lengths 15.8 A
Trailing space equals 0.2A
Threading of 7 atoms w/ 1A spacing completed
Approximate linker length 12.6 A.
```

Where last line is fair approximation of the shortest curve linking desired atoms.

## Effect of parameters on search
The path is guaranteed to be optimal but may take some time to run depending on structure topology and command line parameters.

### cell size
Too large value may lead irrealistic path. Smaller values are better but increase path search space and therefore the computation time.

### water probe
Water probe radius affect the solvant volume and therefore the path

### bead spacing
Doesn't affect the optimal path, but will affect the evaluation of the linker length.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MMSB-MOBI/ccmap",
    "name": "ccmap",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "protein docking bioinformatics structure",
    "author": "G.Launay",
    "author_email": "pitooon@gmail.com",
    "download_url": "",
    "platform": null,
    "description": "# A Python package and C Library for fast molecular contact map computation\n\n[Current Version 2.1.3](https://pypi.org/project/ccmap/)\n\nThis package was designed as a tool to quickly compute thousands of sets of atomic or residue molecular contacts. The contacts can be evaluated inside a single body or across two bodies. The library scales well, with the support of the native python multithreading.\nThe module also provides docking poses evaluation by the application of triplets of Euler angles and translation vectors to initial unbound conformations.\n\n## Installing and using the python module\n\n### Installation\n\nShould be as simple as `pip intstall ccmap`. Alternatively you can clone this repo and run `python setup.py install` at the root folder.\nCurrent release was successfully installed through pip on the following combinations of interpreter/platforms.\n\n* python3.8/OSX.10.14.6\n* python3.8/Ubuntu LTS\n\n### Usage\n\nFrom there you can load the package and display its help.\n\n```python\nimport ccmap\nhelp(ccmap)\n```\n\n#### Functions\n\nFour functions are available:\n\n* cmap: computes the contacts of one single/two body molecule\n* lcmap: computes the contacts of a list of single/two body molecules\n* zmap: computes the contacts between a receptor and a ligand molecule after applying transformations to the ligand coordinates\n* lzmap: computes many sets of contacts between a receptor and a ligand molecule, one for each applied ligand transformation\n\n#### Parameters\n\nAll module functions take molecular object coordinates as dictionaries, where keys are atoms descriptors and values are lists.\n\n* 'x' : list of float x coordinates\n* 'y' : list of float x coordinates\n* 'y' : list of float x coordinates\n* 'seqRes' : list of strings\n* 'chainID' : list of one-letter string\n* 'resName' : list of strings\n* 'name' : list of strings\n\n#### Additional arguments\n\n##### Contact threshold distance\n\nIn Angstrom's unit, its default value is 4.5. It can be redefined by the name parameter `d`.\n\n##### encode : Boolean\n\nIf True, contacts are returned as integers. Each integer encoding one pair of atoms/residues positions in contact with this simple formula,\n\n```python\ndef K2IJ(k, sizeBody1, sizeBody2):\n    nCol = sizeBody2 if sizeBody2 else sizeBody1\n    return int(k/nCol), k%nCol\n```\n\nif False, contacts are returned as strings of JSON Objects\n\n##### atomic : Boolean\n\nIf True, compute contact at the atomic level. By default, this if False and the contacts are computed at the residue level.\n\n##### apply : Boolean\n\nIf True, the past dictionaries of coordinates will be modified according to Euler/translation parameters.\nThis is useful to generate single docking conformation.\nThis argument is only available for the **cmap** function.\n\n##### offsetRec and offsetLig\n\nWhen working with protein docking data, unbound conformations are often centered to the origin of the coordinates system. Specify the translation vectors for each body with the `offsetRec` and `offsetLig` named arguments. Only available for the **zmap** and **lzmap** functions.\n\n### Working with PDB coordinates files\n\n#### Parsing coordinate data\n\nWe usually work with molecules in the PDB format. We can use the [pyproteinsExt](https://pypi.org/search/?q=pyproteinsExt) package to handle the boilerplate. \n\n```python\nimport pyproteinsExt\nparser = PDB.Parser()\npdbREC = parser.load(file=\"dummy_A.pdb\")\npdbDictREC = pdbREC.atomDictorize\npdbDictREC.keys()\n#dict_keys(['x', 'y', 'z', 'seqRes', 'chainID', 'resName', 'name']) ```\n```\n\nBy convention, following examples will use two molecules names REC(eptor) and LIG(and).\n\n```python\npdbLIG = parser.load(file=\"dummy_B.pdb\")\npdbDictLIG = pdbLIG.atomDictorize\npdbDictLIG.keys()\n#dict_keys(['x', 'y', 'z', 'seqRes', 'chainID', 'resName', 'name']) ```\n```\n\n## Examples\n\n### Computing single body contact map\n\n#### Computing one map\n\nSetting contact distance of 6.0 and recovering residue-residue contact as an integer list.\n\n```python\nccmap.cmap(pdbDictLIG, d=6.0, encode=True)\n```\n\n#### Computing many maps\n\nUsing default contact distance and recovering atomic contact maps as JSON object string. The first positional argument specifies a list of bodies to process independently. \n\n```python\nimport json\njson.load( ccmap.lcmap([ pdbDictLIG, pdbDictREC ], atomic=True) )\n```\n\n### Computing two-body contact map\n\n#### Straight computation of one map\n\nThe second positional argument of **cmap** is optional and defines the second body.\n\n```python\nccmap.cmap(pdbDictLIG, pdbDictLIG, d=6.0, encode=True)\n```\n\n#### Straight computation of many maps\n\nThe second positional argument of **lcmap** is an optional list of second bodies. The first two arguments must be of the same size, as the *i*-element of the first will be processed with the *i*-element of the second.\n\n```python\nccmap.lcmap([pdbDictREC_1, ..., pdbDictREC_n], [pdbDictLIG_1, pdbDictLIG_n], d=6.0, encode=True)\n```\n\n#### Computation of one map after conformational change\n\nUse the **zmap** function with third and fourth positional arguments respectively specifying the :\n\n* Euler angles triplet\n* translation vector\n\n```python\nccmap.zmap(pdbDictREC, pdbDictLIG , (e1, e2, e3), (t1, t2, t3) )\n```\n\nTransformations are always applied to the coordinates provided as a second argument, e.g. : `pdbDictLIG`.\n\n#### Computation of many maps after conformational changes\n\nUse the **lzmap** function, arguments are similar but for the Euler angles and translation vectors which must be supplied as lists.\n\n```python\nccmap.lzmap(pdbDictREC, pdbDictLIG , [(e1, e2, e3),], [(t1, t2, t3),] )\n```\n\n### Generating docking conformations\n\nThe conformations obtained by coordinate transformation can be back mapped to PDB files.\nHere, offset vectors `[u1, u2, u3]` and `[v1, v2, v3]` respectively center `pdbDictREC` and `pdbDictLIG` and one transformation defined by the `[e1, e2, e3]` Euler's angles and the `[t1, t2, t3]` translation vector is applied to `pdbDictLIG`. The resulting two-body conformation is finally **applied** to the provided `pdbDictREC` and `pdbDictLIG`. These updated coordinates update the original PDB object for later writing to file.\n\n```python\n# Perform computation & alter provided dictionaries\nccmap.zmap( pdbDictREC, pdbDictLIG,\n\\ [e1, e2, e3], [t1, t2, t3],\n\\ offsetRec=[u1, u2, u3],\n\\ offsetLig=[v1, v2, v3],\n\\ apply=True)\n# Update PDB containers from previous examples\npdbREC.setCoordinateFromDictorize(pdbDictREC)\npdbLIG.setCoordinateFromDictorize(pdbDictLIG)\n# Dump to coordinate files\nwith open(\"new_receptor.pdb\", \"w\") as fp:\n    fp.write( str(pdbREC) )\nwith open(\"new_ligand.pdb\", \"w\") as fp:\n    fp.write( str(pdbLIG) )\n```\n\n## Multithreading\n\nThe C implementation makes it possible for the ccmap functions to release Python Global Interpreter Lock. Hence, \"actual\" multithreading can be achieved and performances scale decently with the number of workers. For this benchmark, up to 50000 docking poses were generated and processed for three coordinate sets of increasing number of atoms: 1974([1GL1](https://www.rcsb.org/structure/1GL1)) 3424([1F34](https://www.rcsb.org/structure/1F34)) 10677([2VIS](https://www.rcsb.org/structure/2VIS)).\n\n<figure> <img src=\"notebook/img/LZMAP_benchmark_1.png\" alt=\"benchmark\" /> </figure>\n\nA simple example of a multithread implementation can be found in the provided [script](tests/scripts/threadsTest.py). The `tests` folder allows for the reproduction of the above benchmark.\n\n## Installing and using the C Library\n\nC executable can be generated with the provided makefile. The low-level functions are the same, but the following limitations exist:\n\n* One computation per executable call\n* No multithreading.\n\n# Finding the optimal molecular path connecting two atoms\nUsing a thiner mesh size, it is possible to obtain the shortest path connecting two atoms.\nThe atoms to connect must be solvent accessible and path search will operate over the surface and solvant accessible cells. The solvent excluded volume is computed for each atom as the sum of of its Van Der Waals radius and the radius of water molecule.\n## install\n```sh\n git clone -b fibonacci git@github.com:MMSB-MOBI/ccmap.git\n cd ccmap/ccmap\n make pathfinder\n```\n## Usage\n\n<!--\n`./bin/linky -x 'ILE:A:1:CA' -y 'LYS:A:13:N' -i ../tests/structures/small_peptide_noH_1model.pdb`\n-->\n\n`./bin/linky -i ../tests/structures/gil_input.pdb  -x 'LI1:B:502:CA' -y 'LI2:C:502:CA' -s 1`\n\nwill display:\n\n```shell\nApplying H20 probe radius of 1.4 A. to atomic solvant volume exclusion\nUser atom selection:\n\tStart atom:\t\" CA  LI1  502 B 14.024000 15.909000 -5.891000\"\n\tEnd atom :\t\" CA  LI2  502 C 8.770000 25.230000 -5.380000\"\nMesh [459x408x316] created: 3959 cells contain atoms\nBuilding surfaces w/ mesh unit of 0.2 A. ...\n\tTotal of 12749100 voxels constructed\nSearching for start/stop cells at start/stop atoms surfaces...\n\tstart/stop surfaces contain 391/1058 voxels, picking closest to the other volume center cell ...\nStarting from cell (240,281, 134) (b=0)\nTrying to reachcell (230,307, 139) (b=0)\n\t---Best walk is made of 47 moves---\nTheoritical distance from 1st vox_path to start atom\n\tvoxel [240 281 134] 11.018,16.813,-6.889\n\tatom:  CA  LI1  502 B 14.024000 15.909000 -5.891000\n\td=3.29382 A.\nStart/Stop atoms exclusion radius: 3.2 / 3.2 A.\nBest pathway -- aprox. polyline lengths 15.8 A\nTrailing space equals 0.2A\nThreading of 7 atoms w/ 1A spacing completed\nApproximate linker length 12.6 A.\n```\n\nWhere last line is fair approximation of the shortest curve linking desired atoms.\n\n## Effect of parameters on search\nThe path is guaranteed to be optimal but may take some time to run depending on structure topology and command line parameters.\n\n### cell size\nToo large value may lead irrealistic path. Smaller values are better but increase path search space and therefore the computation time.\n\n### water probe\nWater probe radius affect the solvant volume and therefore the path\n\n### bead spacing\nDoesn't affect the optimal path, but will affect the evaluation of the linker length.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A C implementation of a mesh based atomic pairwise distance computating engine, with docking pose generation capabilities and fast solvant accessible surface estimation",
    "version": "4.0.2",
    "split_keywords": [
        "protein",
        "docking",
        "bioinformatics",
        "structure"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "45b2616a3615f015c19b262fb8e1e0e1ed87db95a2a98c1dbe8a89c3e70dccf3",
                "md5": "1bce51ac7183fda202c13d4c09efe866",
                "sha256": "cd6fddae197cd44bac2f38b6b2d7a53c695a44de4b9bafbc944dc416b6ff8d46"
            },
            "downloads": -1,
            "filename": "ccmap-4.0.2-cp39-cp39-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "1bce51ac7183fda202c13d4c09efe866",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 42042,
            "upload_time": "2023-01-12T18:42:59",
            "upload_time_iso_8601": "2023-01-12T18:42:59.539218Z",
            "url": "https://files.pythonhosted.org/packages/45/b2/616a3615f015c19b262fb8e1e0e1ed87db95a2a98c1dbe8a89c3e70dccf3/ccmap-4.0.2-cp39-cp39-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2c063ab1e90f83f2efc182943147f8c77d99551dd5c31de6fd5e6a16067cedc4",
                "md5": "0e8be7b230e788614e31dd6366710f90",
                "sha256": "908b73ad7050a0fd229f61b4bf3f4af1c370b9f3aa7e70a59eeeefc036b607e1"
            },
            "downloads": -1,
            "filename": "ccmap-4.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "0e8be7b230e788614e31dd6366710f90",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": null,
            "size": 156667,
            "upload_time": "2023-01-12T18:41:21",
            "upload_time_iso_8601": "2023-01-12T18:41:21.855389Z",
            "url": "https://files.pythonhosted.org/packages/2c/06/3ab1e90f83f2efc182943147f8c77d99551dd5c31de6fd5e6a16067cedc4/ccmap-4.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-12 18:42:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "MMSB-MOBI",
    "github_project": "ccmap",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ccmap"
}
        
Elapsed time: 0.05865s