protein-cluster-conformers

Name	protein-cluster-conformers JSON
Version	1.2.1 JSON
	download
home_page	https://github.com/PDBeurope/protein-cluster-conformers
Summary	Clusters conformations of monomeric protein
upload_time	2024-10-31 16:27:39
maintainer	None
docs_url	None
author	Joseph I. J. Ellaway
requires_python	>=3.10
license	None
keywords	cluster monomeric superpose linear algebra distance
VCS
bugtrack_url
requirements	argparse gemmi matplotlib numpy pandas pillow requests seaborn scipy scikit-learn
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Monomeric protein conformational state clustering

These scripts can be used to cluster a parsed set of monomeric protein chains via a global conformational change metric based on CA distances. Once the polypeptide chains destined for clustering have been specified, a pairwise CA distance matrix for each chain is produced. Distance difference matrices are then generated, again, pairwise but between CA distance matrices. Therefore, for `N` unique peptide chains, `N` CA distance matrices and `N*(N-1)/1` distance difference matrices are generated. _NB: the score between A->B is the same as B->A_.

Additional scripts are provided to cluster the chains based on distance-based scores calculated from all pairwise distance difference matricies, as well as scripts to produce dendrograms of the clustering results, and heatmaps for each distance difference matrix.

Example input data is provided in the `benchmark_data/examples` folder, including scripts to download and save data from the [PDBe-KB's benchmark conformational state dataset](http://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/). Example scripts are included in `examples`, which run complete executions of the entire pipeline for a selection of structures from several difference UniProt accessions.

For intructions on importing `protein-cluster-conformers` into your own Python code, refer to `/tutorials/instructions.ipynb`.

**Dependencies**:

`protein-cluster-conformers` requires >=Python3.10 to run. Initialise virtual environment and install dependencies with:

```shell
cd protein-cluster-conformers
python3.10 -m venv cluster_venv
source cluster_venv/bin/activate
pip install -r requirements.txt
```

_____

## CLI: Clustering structures

To cluster a set of protein structures, run the `find_clusters.py` script:

```shell
python3 find_conformers.py [-h] [-v] -u UNIPROT -m MMCIF [MMCIF ...]
							[-s PATH_CLUSTERS] -c PATH_CA [-d PATH_DD]
                          	[-g PATH_DENDROGRAM [PATH_DENDROGRAM ...]]
                          	[-w PATH_SWARM [PATH_SWARM ...]] [-o PATH_HISTOGRAM]
                          	[-a PATH_ALPHA_FOLD]
                            [-0 FIRST_RESIDUE_POSITION] [-1 LAST_RESIDUE_POSITION]
```

The following parameters can be parsed:

```shell
required arguments:
  -u UNIPROT, --uniprot UNIPROT
                        UniProt accession
  -m MMCIF [MMCIF ...], --mmcif MMCIF [MMCIF ...]
                        Enter list of paths to mmCIFs that overlap a given UniProt segment
optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase verbosity
  -s PATH_CLUSTERS, --path_clusters PATH_CLUSTERS
                        Path to save clustering results
  -c PATH_CA, --path_ca PATH_CA
                        Path to save CA distance matrices
  -d PATH_DD, --path_dd PATH_DD
                        Path to save distance difference matrices
  -g PATH_DENDROGRAM [PATH_DENDROGRAM ...], --path_dendrogram PATH_DENDROGRAM [PATH_DENDROGRAM ...]
                        Path to save dendrogram of clustering results
  -w PATH_SWARM [PATH_SWARM ...], --path_swarm PATH_SWARM [PATH_SWARM ...]
                        Path to save swarm plot of scores
  -o PATH_HISTOGRAM, --path_histogram PATH_HISTOGRAM
                        Path to save histograms of distance difference maps
  -a PATH_ALPHA_FOLD, --path_alpha_fold PATH_ALPHA_FOLD
                        Path to save AlphaFold Database structure
  -0 FIRST_RESIDUE_POSITION, --first_residue_position FIRST_RESIDUE_POSITION
                        First residue position in (UniProt) sequence
  -1 LAST_RESIDUE_POSITION, --last_residue_position LAST_RESIDUE_POSITION
                        Last residue position in (UniProt) sequence

```

---

### **Run instructions**

#### Option 1) Cluster and save matrices

To only cluster a set of monomeric protein structures that share part or all of the same UniProt sequence, run:

``` shell
python3 find_clusters.py -u "A12345" \
    -m /path/to/structure_1.cif [chains] \
    -m /path/to/structure_2.cif [chains] \
    ... \
    -m /path/to/structure_N.cif [chains] \
    -s /path/to/save/clustering/results/
```

The paths to each structure are parsed using the `-m` flag.

Chain IDs (only `struct_asym_id` is currently recognised at the moment) should be given as space-delimited arguments after the path. Parse in multiple structures using consecutive  `-m` flags. The UniProt accession must be parsed using the `-u` flag.

**Example**: O34926

```shell
python3 find_conformers.py -u "O34926" \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -s benchmark_data/examples/O34926/O34926_cluster_results/
```

By default, the pipeline only clusters the parsed mmCIFs (and specified chains), saving clustering results to a CSV file in `-s` specified directory.

----

#### Option 2) Save CA matrices only

To save the matrices produced in the pipeline, simply specify the path in which to save them using the `-c` flag for CA distance matrices and the `-d` flag for CA distance difference matrices:

```shell
$ python find_conformers.py -u "A12345" \
    -m /path/to/structure_1.cif [chains] \
    -m ... \
    -s /path/to/save/cluster_results.csv \
    -c /path/to/save/CA/distance/matices \
    -d /path/to/save/distance/difference/matrices/
```

**Example**: O34926

```shell
python3 find_conformers.py -u "O34926" \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances \
```

---

#### Option 3) Render distance difference maps only

2D histograms (heatmaps) can be rendered and saved for each CA distance difference matrix by specifying the save directory using the `-o` flag:

```shell
$ python find_conformers.py -u "A12345" \
    -m /path/to/structure_1.cif [chains] \
    -m ... \
    -o /path/to/save/distance/difference/2D/histograms/
```

The resulting plots are saved in PNG format (to save render time). E.g:

<img src="./benchmark_data/figures/A6UVT1_6hac_A_to_6hae_K.png" alt="Distance difference map of 6hac chain A to 6hae chain K" height="350"/>

<br>

**Example**: O34926

```shell
python3 find_conformers.py -u "O34926" \
	-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -o benchmark_data/examples/O34926/O34926_distance_difference_maps/
```

---

#### Option 4) Render dendrogram only

From the clustering results, a dendrogram can be rendered to show the relationships between all clustered chains. To save a dendrogram of the hierarchical clustering results, run:

```shell
$ python find_conformers.py -u "A12345" \
    -m /path/to/structure_1.cif [chains] \
    -m ... \
    -g /path/to/save/dendrogram/ [png svg]
```

where either a `png` or `svg` file type is saved. E.g.

<img src="./benchmark_data/figures/O34926_1_405_agglomerative_dendrogram.png" alt="Dendrogram of clustered UniProt:P14902 chains, via UPGMA agglomerative clustering" width="400"/>

<img src="./benchmark_data/figures/P15291_122_398_agglomerative_dendrogram.png" alt="Dendrogram of clustered UniProt:P14902 chains, via UPGMA agglomerative clustering" width="400"/>

<br>

**Example**: O34926

```shell
python3 find_conformers.py -u "O34926" \
	-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg
```

---

<!-- #### Option 5) Render swarm plot

The scores generated between pairwise structure comparisons can be plotted as a swarm plot by parsing the `-w` flag:

```shell
$ python find_conformers.py -u "A12345" \
    -m /path/to/structure_1.cif [chains] \
    -m ... \
    -w /path/to/save/swarm_plot/ [png svg]
```

Like rendering the dendrogram, the swarm plot can either be saved as `png`,  `svg` or both. E.g.

<img src="./benchmark_data/figures/P15291_swarm_plot.png" alt="Swarm plot of distance-based scores for chains in the UniProt:P15291 clusters" height="350"/>

<br>

**Example**: O34926

```shell
python3 find_conformers.py -u "O34926" \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -w benchmark_data/examples/O34926/O34926_cluster_results/ png svg
```

------ -->

#### Option 6) Include AlphaFold Database structure when generating CA and distance difference matrices

By parsing in the `-a` flag,  the script will attempt to download and cluster the pre-generated AlphaFold structure, stored on the [AlphaFold Database](https://alphafold.ebi.ac.uk/). You do not need to have downloaded the predicted AlphaFold structure already but must be connected to the internet. The structure will be saved

```shell
$ python find_conformers.py -u "A12345" \
		-m /path/to/structure_1.cif [chains] \
    -m ... \
    -a
```

**Example**: O34926

```shell
python3 find_conformers.py -u "O34926" \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -a benchmark_data/examples/O34926/O34926_path_alphafold/
```

------

#### **Option 7) Run all**

**Example #1:** O34926

```shell
python3 find_conformers.py -u "O34926" \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances/ \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -s benchmark_data/examples/O34926/O34926_cluster_results/ \
    -o benchmark_data/examples/O34926/O34926_distance_difference_maps/ \
    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg \
    -a benchmark_data/examples/O34926/O34926_path_alphafold/
```

or use the `examples/run_O34926.sh` script.

``` shell
$ ./examples/run_O34926.sh
```

**Example #2:** P15291

``` shell
python3 find_conformers.py -u "P15291" \
    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fy7_updated.cif A \
    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fya_updated.cif A \
    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fyb_updated.cif A \
    -m benchmark_data/examples/P15291/P15291_updated_mmcif/6fwu_updated.cif A B \
    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fyc_updated.cif A B \
    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fyd_updated.cif A B \
    -c benchmark_data/examples/P15291/P15291_ca_distances/ \
    -d benchmark_data/examples/P15291/P15291_distance_differences/ \
    -s benchmark_data/examples/P15291/P15291_cluster_results/ \
    -o benchmark_data/examples/P15291/P15291_distance_difference_maps/ \
    -g benchmark_data/examples/P15291/P15291_cluster_results/ png svg \
    -a benchmark_data/examples/P15291/P15291_path_alphafold/
```

or execute the `examples/run_P15291.sh` script.

```shell
$ ./examples/run_P15291.sh
```

When imported into Orc, the arguments required to execute the clustering process correctly will be parsed into the class instance and related methods as lists generated from the preceding functions called by the existing `protein-superpose` pipeline.

#### Optional arguments

The start and end residue positions can be parsed into the script using the `-0` and `-1` flags, respectively. This will not restrict the residue ranges during clustering but will be used to label the axes of the distance difference maps and dendrograms.

*Example*: O34926:

```shell
python3 find_conformers.py -u "O34926" \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances/ \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -s benchmark_data/examples/O34926/O34926_cluster_results/ \
    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg \
    -0 1 \
    -1 405
```

The pipeline will avoid re-processing existing files where it files them. To update a single PDB entry, specify the PDB accession using the `-i` flag, e.g. `-i 3nc3`. To force all entries to be re-processed, use the `-f` flag, which will overwrite existing files indescriminately.

*Example*: O34926:

```shell
python3 find_conformers.py -u "O34926" \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
    -c benchmark_data/examples/O34926/O34926_ca_distances/ \
    -d benchmark_data/examples/O34926/O34926_distance_differences/ \
    -s benchmark_data/examples/O34926/O34926_cluster_results/ \
    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg \
    -f
```

---

### Run on benchmark dataset

The scripts above are called by the `run_benchmark.py` wrapper. To generate conformational clustering results for the included benchmark dataset, run:

``` shell
python3 cluster_benchmark.py
```

This will call the `run_benchmark(...)` functions included in `ca_distance.py`, `distance_difference.py`, `cluster.py`, `plot_distance_difference.py`, `plot_dendrogram.py` and `plot_swarm_plot.py`. No arguments need parsing into the script.

Results will be saved in the `./benchmark_data/` folder.

------

## Contributing

Install developer dependencies:

```shell
pip install -r dev-requirements.txt
```

To run unit tests on the package, the Pytest framework is recommented and can be performed with:

```shell
pytest --cov=cluster_conformers --cov-report=html -v
```

The following dependencies will be required:

- `pytest-cov`
- `pytest-forked`
- `purest-xdist` (optional)

They are installed along with the main package dependencies in `requirements.txt`.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/PDBeurope/protein-cluster-conformers",
    "name": "protein-cluster-conformers",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "cluster, monomeric, superpose, linear algebra, distance",
    "author": "Joseph I. J. Ellaway",
    "author_email": "jellaway@ebi.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/40/24/6c6c535ec1e18e2d9dd016c75c3ce4222419b99d30596ad0156ed8cb65db/protein-cluster-conformers-1.2.1.tar.gz",
    "platform": null,
    "description": "# Monomeric protein conformational state clustering\n\nThese scripts can be used to cluster a parsed set of monomeric protein chains via a global conformational change metric based on CA distances. Once the polypeptide chains destined for clustering have been specified, a pairwise CA distance matrix for each chain is produced. Distance difference matrices are then generated, again, pairwise but between CA distance matrices. Therefore, for `N` unique peptide chains, `N` CA distance matrices and `N*(N-1)/1` distance difference matrices are generated. _NB: the score between A->B is the same as B->A_.\n\nAdditional scripts are provided to cluster the chains based on distance-based scores calculated from all pairwise distance difference matricies, as well as scripts to produce dendrograms of the clustering results, and heatmaps for each distance difference matrix.\n\nExample input data is provided in the `benchmark_data/examples` folder, including scripts to download and save data from the [PDBe-KB's benchmark conformational state dataset](http://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/). Example scripts are included in `examples`, which run complete executions of the entire pipeline for a selection of structures from several difference UniProt accessions.\n\nFor intructions on importing `protein-cluster-conformers` into your own Python code, refer to `/tutorials/instructions.ipynb`.\n\n**Dependencies**:\n\n`protein-cluster-conformers` requires >=Python3.10 to run. Initialise virtual environment and install dependencies with:\n\n```shell\ncd protein-cluster-conformers\npython3.10 -m venv cluster_venv\nsource cluster_venv/bin/activate\npip install -r requirements.txt\n```\n\n_____\n\n## CLI: Clustering structures\n\nTo cluster a set of protein structures, run the `find_clusters.py` script:\n\n```shell\npython3 find_conformers.py [-h] [-v] -u UNIPROT -m MMCIF [MMCIF ...]\n\t\t\t\t\t\t\t[-s PATH_CLUSTERS] -c PATH_CA [-d PATH_DD]\n                          \t[-g PATH_DENDROGRAM [PATH_DENDROGRAM ...]]\n                          \t[-w PATH_SWARM [PATH_SWARM ...]] [-o PATH_HISTOGRAM]\n                          \t[-a PATH_ALPHA_FOLD]\n                            [-0 FIRST_RESIDUE_POSITION] [-1 LAST_RESIDUE_POSITION]\n```\n\nThe following parameters can be parsed:\n\n```shell\nrequired arguments:\n  -u UNIPROT, --uniprot UNIPROT\n                        UniProt accession\n  -m MMCIF [MMCIF ...], --mmcif MMCIF [MMCIF ...]\n                        Enter list of paths to mmCIFs that overlap a given UniProt segment\noptional arguments:\n  -h, --help            show this help message and exit\n  -v, --verbose         Increase verbosity\n  -s PATH_CLUSTERS, --path_clusters PATH_CLUSTERS\n                        Path to save clustering results\n  -c PATH_CA, --path_ca PATH_CA\n                        Path to save CA distance matrices\n  -d PATH_DD, --path_dd PATH_DD\n                        Path to save distance difference matrices\n  -g PATH_DENDROGRAM [PATH_DENDROGRAM ...], --path_dendrogram PATH_DENDROGRAM [PATH_DENDROGRAM ...]\n                        Path to save dendrogram of clustering results\n  -w PATH_SWARM [PATH_SWARM ...], --path_swarm PATH_SWARM [PATH_SWARM ...]\n                        Path to save swarm plot of scores\n  -o PATH_HISTOGRAM, --path_histogram PATH_HISTOGRAM\n                        Path to save histograms of distance difference maps\n  -a PATH_ALPHA_FOLD, --path_alpha_fold PATH_ALPHA_FOLD\n                        Path to save AlphaFold Database structure\n  -0 FIRST_RESIDUE_POSITION, --first_residue_position FIRST_RESIDUE_POSITION\n                        First residue position in (UniProt) sequence\n  -1 LAST_RESIDUE_POSITION, --last_residue_position LAST_RESIDUE_POSITION\n                        Last residue position in (UniProt) sequence\n\n```\n\n---\n\n### **Run instructions**\n\n#### Option 1) Cluster and save matrices\n\nTo only cluster a set of monomeric protein structures that share part or all of the same UniProt sequence, run:\n\n``` shell\npython3 find_clusters.py -u \"A12345\" \\\n    -m /path/to/structure_1.cif [chains] \\\n    -m /path/to/structure_2.cif [chains] \\\n    ... \\\n    -m /path/to/structure_N.cif [chains] \\\n    -s /path/to/save/clustering/results/\n```\n\nThe paths to each structure are parsed using the `-m` flag.\n\nChain IDs (only `struct_asym_id` is currently recognised at the moment) should be given as space-delimited arguments after the path. Parse in multiple structures using consecutive  `-m` flags. The UniProt accession must be parsed using the `-u` flag.\n\n**Example**: O34926\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -s benchmark_data/examples/O34926/O34926_cluster_results/\n```\n\nBy default, the pipeline only clusters the parsed mmCIFs (and specified chains), saving clustering results to a CSV file in `-s` specified directory.\n\n----\n\n#### Option 2) Save CA matrices only\n\nTo save the matrices produced in the pipeline, simply specify the path in which to save them using the `-c` flag for CA distance matrices and the `-d` flag for CA distance difference matrices:\n\n```shell\n$ python find_conformers.py -u \"A12345\" \\\n    -m /path/to/structure_1.cif [chains] \\\n    -m ... \\\n    -s /path/to/save/cluster_results.csv \\\n    -c /path/to/save/CA/distance/matices \\\n    -d /path/to/save/distance/difference/matrices/\n```\n\n**Example**: O34926\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances \\\n```\n\n---\n\n#### Option 3) Render distance difference maps only\n\n2D histograms (heatmaps) can be rendered and saved for each CA distance difference matrix by specifying the save directory using the `-o` flag:\n\n```shell\n$ python find_conformers.py -u \"A12345\" \\\n    -m /path/to/structure_1.cif [chains] \\\n    -m ... \\\n    -o /path/to/save/distance/difference/2D/histograms/\n```\n\nThe resulting plots are saved in PNG format (to save render time). E.g:\n\n<img src=\"./benchmark_data/figures/A6UVT1_6hac_A_to_6hae_K.png\" alt=\"Distance difference map of 6hac chain A to 6hae chain K\" height=\"350\"/>\n\n<br>\n\n**Example**: O34926\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n\t-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -o benchmark_data/examples/O34926/O34926_distance_difference_maps/\n```\n\n---\n\n#### Option 4) Render dendrogram only\n\nFrom the clustering results, a dendrogram can be rendered to show the relationships between all clustered chains. To save a dendrogram of the hierarchical clustering results, run:\n\n```shell\n$ python find_conformers.py -u \"A12345\" \\\n    -m /path/to/structure_1.cif [chains] \\\n    -m ... \\\n    -g /path/to/save/dendrogram/ [png svg]\n```\n\nwhere either a `png` or `svg` file type is saved. E.g.\n\n<img src=\"./benchmark_data/figures/O34926_1_405_agglomerative_dendrogram.png\" alt=\"Dendrogram of clustered UniProt:P14902 chains, via UPGMA agglomerative clustering\" width=\"400\"/>\n\n<img src=\"./benchmark_data/figures/P15291_122_398_agglomerative_dendrogram.png\" alt=\"Dendrogram of clustered UniProt:P14902 chains, via UPGMA agglomerative clustering\" width=\"400\"/>\n\n<br>\n\n**Example**: O34926\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n\t-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg\n```\n\n---\n\n<!-- #### Option 5) Render swarm plot\n\nThe scores generated between pairwise structure comparisons can be plotted as a swarm plot by parsing the `-w` flag:\n\n```shell\n$ python find_conformers.py -u \"A12345\" \\\n    -m /path/to/structure_1.cif [chains] \\\n    -m ... \\\n    -w /path/to/save/swarm_plot/ [png svg]\n```\n\nLike rendering the dendrogram, the swarm plot can either be saved as `png`,  `svg` or both. E.g.\n\n<img src=\"./benchmark_data/figures/P15291_swarm_plot.png\" alt=\"Swarm plot of distance-based scores for chains in the UniProt:P15291 clusters\" height=\"350\"/>\n\n<br>\n\n**Example**: O34926\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -w benchmark_data/examples/O34926/O34926_cluster_results/ png svg\n```\n\n------ -->\n\n#### Option 6) Include AlphaFold Database structure when generating CA and distance difference matrices\n\nBy parsing in the `-a` flag,  the script will attempt to download and cluster the pre-generated AlphaFold structure, stored on the [AlphaFold Database](https://alphafold.ebi.ac.uk/). You do not need to have downloaded the predicted AlphaFold structure already but must be connected to the internet. The structure will be saved\n\n```shell\n$ python find_conformers.py -u \"A12345\" \\\n\t\t-m /path/to/structure_1.cif [chains] \\\n    -m ... \\\n    -a\n```\n\n**Example**: O34926\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -a benchmark_data/examples/O34926/O34926_path_alphafold/\n```\n\n------\n\n#### **Option 7) Run all**\n\n**Example #1:** O34926\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances/ \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -s benchmark_data/examples/O34926/O34926_cluster_results/ \\\n    -o benchmark_data/examples/O34926/O34926_distance_difference_maps/ \\\n    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg \\\n    -a benchmark_data/examples/O34926/O34926_path_alphafold/\n```\n\nor use the `examples/run_O34926.sh` script.\n\n``` shell\n$ ./examples/run_O34926.sh\n```\n\n**Example #2:** P15291\n\n``` shell\npython3 find_conformers.py -u \"P15291\" \\\n    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fy7_updated.cif A \\\n    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fya_updated.cif A \\\n    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fyb_updated.cif A \\\n    -m benchmark_data/examples/P15291/P15291_updated_mmcif/6fwu_updated.cif A B \\\n    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fyc_updated.cif A B \\\n    -m benchmark_data/examples/P15291/P15291_updated_mmcif/2fyd_updated.cif A B \\\n    -c benchmark_data/examples/P15291/P15291_ca_distances/ \\\n    -d benchmark_data/examples/P15291/P15291_distance_differences/ \\\n    -s benchmark_data/examples/P15291/P15291_cluster_results/ \\\n    -o benchmark_data/examples/P15291/P15291_distance_difference_maps/ \\\n    -g benchmark_data/examples/P15291/P15291_cluster_results/ png svg \\\n    -a benchmark_data/examples/P15291/P15291_path_alphafold/\n```\n\nor execute the `examples/run_P15291.sh` script.\n\n```shell\n$ ./examples/run_P15291.sh\n```\n\nWhen imported into Orc, the arguments required to execute the clustering process correctly will be parsed into the class instance and related methods as lists generated from the preceding functions called by the existing `protein-superpose` pipeline.\n\n#### Optional arguments\n\nThe start and end residue positions can be parsed into the script using the `-0` and `-1` flags, respectively. This will not restrict the residue ranges during clustering but will be used to label the axes of the distance difference maps and dendrograms.\n\n*Example*: O34926:\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances/ \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -s benchmark_data/examples/O34926/O34926_cluster_results/ \\\n    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg \\\n    -0 1 \\\n    -1 405\n```\n\nThe pipeline will avoid re-processing existing files where it files them. To update a single PDB entry, specify the PDB accession using the `-i` flag, e.g. `-i 3nc3`. To force all entries to be re-processed, use the `-f` flag, which will overwrite existing files indescriminately.\n\n*Example*: O34926:\n\n```shell\npython3 find_conformers.py -u \"O34926\" \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \\\n    -m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \\\n    -c benchmark_data/examples/O34926/O34926_ca_distances/ \\\n    -d benchmark_data/examples/O34926/O34926_distance_differences/ \\\n    -s benchmark_data/examples/O34926/O34926_cluster_results/ \\\n    -g benchmark_data/examples/O34926/O34926_cluster_results/ png svg \\\n    -f\n```\n\n---\n\n### Run on benchmark dataset\n\nThe scripts above are called by the `run_benchmark.py` wrapper. To generate conformational clustering results for the included benchmark dataset, run:\n\n``` shell\npython3 cluster_benchmark.py\n```\n\nThis will call the `run_benchmark(...)` functions included in `ca_distance.py`, `distance_difference.py`, `cluster.py`, `plot_distance_difference.py`, `plot_dendrogram.py` and `plot_swarm_plot.py`. No arguments need parsing into the script.\n\nResults will be saved in the `./benchmark_data/` folder.\n\n------\n\n## Contributing\n\nInstall developer dependencies:\n\n```shell\npip install -r dev-requirements.txt\n```\n\nTo run unit tests on the package, the Pytest framework is recommented and can be performed with:\n\n```shell\npytest --cov=cluster_conformers --cov-report=html -v\n```\n\nThe following dependencies will be required:\n\n- `pytest-cov`\n- `pytest-forked`\n- `purest-xdist` (optional)\n\nThey are installed along with the main package dependencies in `requirements.txt`.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Clusters conformations of monomeric protein",
    "version": "1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/PDBeurope/protein-cluster-conformers"
    },
    "split_keywords": [
        "cluster",
        " monomeric",
        " superpose",
        " linear algebra",
        " distance"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ff70e738a0338f7d725f8ef93932e98702cf6f995b6460a0c0a71ab09d1aac57",
                "md5": "985ee3c4448acff02d46cb8c160093e3",
                "sha256": "a94d185612551c078c0eceab88c2a56af8f0bce34f06da54cae0220373bd4bc4"
            },
            "downloads": -1,
            "filename": "protein_cluster_conformers-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "985ee3c4448acff02d46cb8c160093e3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 55023,
            "upload_time": "2024-10-31T16:27:38",
            "upload_time_iso_8601": "2024-10-31T16:27:38.002244Z",
            "url": "https://files.pythonhosted.org/packages/ff/70/e738a0338f7d725f8ef93932e98702cf6f995b6460a0c0a71ab09d1aac57/protein_cluster_conformers-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "40246c6c535ec1e18e2d9dd016c75c3ce4222419b99d30596ad0156ed8cb65db",
                "md5": "93dc564c482145f817420e0d04990849",
                "sha256": "e014a89407fbdf03f691bcb9d84115fe870f3c67579faf66a779cf03e9ba0460"
            },
            "downloads": -1,
            "filename": "protein-cluster-conformers-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "93dc564c482145f817420e0d04990849",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 47351,
            "upload_time": "2024-10-31T16:27:39",
            "upload_time_iso_8601": "2024-10-31T16:27:39.023593Z",
            "url": "https://files.pythonhosted.org/packages/40/24/6c6c535ec1e18e2d9dd016c75c3ce4222419b99d30596ad0156ed8cb65db/protein-cluster-conformers-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-31 16:27:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PDBeurope",
    "github_project": "protein-cluster-conformers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "argparse",
            "specs": [
                [
                    "==",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "gemmi",
            "specs": [
                [
                    "==",
                    "0.5.6"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.7.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.21.6"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "==",
                    "9.2.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.28.1"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    "==",
                    "0.12.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.7.3"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.2.2"
                ]
            ]
        }
    ],
    "lcname": "protein-cluster-conformers"
}

Joseph I. J. Ellaway