SBILib


NameSBILib JSON
Version 0.3.8 PyPI version JSON
download
home_pagehttps://github.com/structuralbioinformatics/SBI
Summary
upload_time2024-02-08 10:27:22
maintainer
docs_urlNone
authorPatrick Gohl
requires_python
licenseMIT
keywords structural bioinformatics loops
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Welcome to the StructuralBioInformatics pylib

The **SBILib** python library is a continuous work in progress that clusters functionalities to work with data derived from [PDB][1], from creating a local PDB repository to do _real_ Blast searches over PDB sequences (i.e. sequences of the really crystallized protein sections) to perform a simply split chains (separate the sections of a PDB file).


## Requirements:

The library **is not exempt of bugs** (probably), and can not be considered a final version. This means that one must use it at its own risk. And, while new functionalities are bound to appear, others might disappear at any given moment. Any comments, complains or bug reports can be addressed in the corresponding sections.

[1]: http://www.pdb.org/


# SBILib Library Manual 

This manual contains a short turorial split into 6 common Tasks.<br/>
Information on how to access more detailed help pages within python is provided at the end.<br/>
We have made available a list of slightly more complex scenarios on which the SBILib library may be applied. In Scenarios.md (https://github.com/structuralbioinformatics/SBILib)

### Dependencies
python3 <br/>
scipy <br/>
numpy <br/>
DSSP <br/>
BLAST <br/>
CD-Hit <br/>


## Installation:
SBILib has been distributed on PyPi and can be installed with pip. <br/>

```console
pip install SBILib
```

This command will install the package locally.<br/>


It can also be directly cloned from our github repo:<br/>

```console
git clone https://github.com/structuralbioinformatics/SBILib
```

Alternatively it is also possible to navigate directly to the repository https://github.com/structuralbioinformatics/SBILib and copy the folder manually.<br/>


## Testing:
We have implemented unit testing. After installation these scripts can be run to ensure that the package is functioning as it should.
These scripts are located in the directory "tests". Due to size limits we werent able to provide the database file needed for BLAST through PyPi. This file, will be automatically downloaded from https://www.ebi.ac.uk/uniprot/download-center once the unit tests are conducted. Later users may define their own Blast database file as outlined in the BLAST section of this tutorial. IMPORTANT before running ensure that external dependencies are properly configured as per TASK 2.2. 
```console
pip show SBILib | grep Location

# Location: /location/of/SBILib/locally

cd /location/of/SBILib/locally/SBILib/tests

python Test.py

#........('TEST_PDB1A3Q_A.66.pdb', 'TEST_PDB1A3Q_A.66.dssp')
#truncated chain!TEST_PDB1A3Q_A.66.dssp
#.('TEST_PDB2RAM_A.38.pdb', 'TEST_PDB2RAM_A.38.dssp')
#('TEST_PDB2RAM_B.26.pdb', 'TEST_PDB2RAM_B.26.dssp')
#('TEST_PDB1A3Q_A.30.pdb', 'TEST_PDB1A3Q_A.30.dssp')
#truncated chain!TEST_PDB1A3Q_A.30.dssp
#('TEST_PDB1A3Q_B.75.pdb', 'TEST_PDB1A3Q_B.75.dssp')
#truncated chain!TEST_PDB1A3Q_B.75.dssp
#..
#----------------------------------------------------------------------
#Ran 11 tests in 67.249s
#
#OK
```


## TASK 1

### 1) loading in the appropriate modules

The following code demonstrates how a PDB file can be loaded: <br/>

The first step is to load in the necessary modules. <br/>
PDB link will allow us to grab a protein from the protein data bank. While the PDB module will allow us to do some basic tasks on the protein. <br/>
```{python}

from SBILib.databases import PDBlink
from SBILib.structure import PDB

```
### 2) Fetching PDBs

In the next step we create an instance of the PDBlink class and use it to get a pdb from the databank with the coresponding accession code. <br/>
Here we have selected protein "1a3q". <br/>
Note that the 'path/to/pdb' will be saved in the variable 'path'. This can then be read in as a new PDB object. <br/>
If the pdb file is stored locally the address can be provided directly without the need to fetch it from the databank. <br/>

```{python}

new = PDBlink()
path = new.get_PDB("1a3q")
protein1 = PDB(path)

```

Additionally predicted PDBs can also be fetched from the AlphaFold2 model database.

```{python}

from SBILib.databases import AlphaFoldlink

alphaPath = AlphaFoldlink.download_pdb("P31946")

AlphModel = PDB(alphaPath)

```

### 3) Reading out chains in a protein

First I will load in a second protein this time providing the address directly, and then have a look at the chains contained within the protein. <br/>
We can look at exacty what chains are contained within the protein. <br/>
Here our protein contains 2 chains identified as A and B <br/>

```{python}

protein2 = PDB(path/to/pdbfile.pdb)
protein2.chain_identifiers
# set(['A', 'B'])

```

### 4) Selecting other elements of PDB 

Now that we have seen how to select Chains of a protein we can also select the amino acids.<br/>
We will renumber the atoms and residues so that they start at 1.<br/>
```{python}

ChainA = protein.get_chain_by_id("A")

ChainA.renumber_residues(1)
ChainA.renumber_atoms(1)

ChainA.aminoacids

```

This lists all of the amino acids in the Chain and its atoms.<br/>
The selection can be more specific by selecting one amino acid from the list.<br/>
We can access the number (the position of the amino acid on the chain)<br/>
The type of amino acid (three letter code and single letter).<br/>
The coordinates of the O, C, CA and CB (if present)<br/>
The coordinates of the backbone atoms.<br/>


```{python}

ChainA.aminoacids[0].number
#1
ChainA.aminoacids[0].type
#'GLY'
ChainA.aminoacids[0].single_letter
#G
ChainA.aminoacids[0].n 
#<AtomOfAminoAcid: [N, 1]:(-5.833, 72.589, 45.643)>
ChainA.aminoacids[0].o 
#<AtomOfAminoAcid: [O, 4]:(-5.436, 72.093, 42.983)>
ChainA.aminoacids[0].c
#<AtomOfAminoAcid: [C, 3]:(-5.850, 73.226, 43.203)>
ChainA.aminoacids[0].ca
#<AtomOfAminoAcid: [CA, 2]:(-5.902, 73.707, 44.647)>
ChainA.aminoacids[0].cb
#
ChainA.aminoacids[0].backbone_atoms
#[<AtomOfAminoAcid: [N, 1]:(-5.833, 72.589, 45.643)>, <AtomOfAminoAcid: [CA, 2]:(-5.902, 73.707, 44.647)>, <AtomOfAminoAcid: [C, 3]:(-5.850, 73.226, 43.203)>]



```
There are other features such as secondary structure and accessibility information which is only available once the dssp of the chain has been calculated (see Task 2).<br/>

Alternatively we can also accesss the atoms of the amino acid by their index.<br/>
We see that the first amino acid has 4 atoms<br/>
We see the name and number of the atom<br/>
We can then access the x,y,and z coordinate of the atom<br/>
We can also see the occupancy <br/>

```{python}

len( ChainA.aminoacids[0].atoms)
#4
ChainA.aminoacids[0].atoms[0].name
#N
ChainA.aminoacids[0].atoms[0].number
#1
ChainA.aminoacids[0].atoms[0].x
#-5.833
ChainA.aminoacids[0].atoms[0].y
#72.589
ChainA.aminoacids[0].atoms[0].z
#45.643


```





### 5) Duplicating protein and fusing chains

We can duplicate the protein as a new object with the duplicate command. <br/>
Then if we want to we could fuse chains "A" and "B" into a new chain "A" <br/>
Note that you will not be able to fuse two different kinds of structural chains. <br/>

```{python}

protein3 = protein2.duplicate()
protein3 = protein3.fuse_chains(["A","B"])
protein3.chain_identifiers
# set(["A"])

```


### 6) Adding a chain to a protein

Now let us imagine a scenario where we wanted to take chain A from protein2 and add it to protein1. <br/>
First we have to pull out the chain that we want (A) from protein2. <br/>
Then by checking the chain Identifiers in protein1 we can see that it already has a chain "A". <br/>
This means that we will have to change the identifier of the chain we wish to add. Lets make it "E". <br/>
First we will have to duplicate the chain, creating a new object (ChainE), so that we dont alter the chain Ids of protein2. <br/>
Now we can add the chain to protein1 <br/>
If we check the identifiers again we will see that a new chain "E" has been added to the protein. <br/>


```{python}

ChainA = protein2.get_chain_by_id("A")
protein1.chain_identifiers
# set(['A', 'C', 'B', 'D'])
ChainE = ChainA.duplicate()
ChainE.chain = "E"
protein1.add_chain(ChainE)
protein1.chain_identifiers
# set(['A', 'C', 'B', 'D', 'E'])

```

### 7) Removing chain(s) from a protein

Now we would like to delete this chain again: <br/>
First we set up a temporary empty PDB object and call it protein3. <br/>
Then we will define a list of the chain identifiers we wish to remove (in this case only "E") <br/>
First we loop through all of the chains ids in protein1. <br/>
If that protein is not in the deletion list we provided... <br/>
We add it to the newly created protein3. <br/>
We can then redifine protein1 to leave it without the unwanted chains. <br/>

```{python}
 
protein3 = PDB()
deletion = ["E"]
for chain in protein1.chain_identifiers:
     if chain not in deletion:
             protein3.add_chain(protein1.get_chain_by_id(chain))
             if (protein1.get_chain_by_id(chain).chaintype == "P"):
             	protein3._has_prot = "TRUE"
             elif (protein1.get_chain_by_id(chain).chaintype == "N"):
             	protein3._has_nucl = "TRUE"

protein1 = protein3

```

### 8) Adding multiple chains to a protein

Addition of multiple chains can also be accomplished. <br/>
First we have to grab the second chain from protein2 and change its id like we did to ChainA. <br/>
Then we change the identifiers of both chains as we have done previously <br/>
Then we create a list of the chains that we wish to add <br/>
We then add this list of chains to protein1 <br/>

```{python}

ChainB = protein2.get_chain_by_id("B")
ChainF = ChainB.duplicate()
ChainF.chain = "F"
chainlist = [ChainE,ChainF]
protein1.add_chains(chainlist)
protein1.chain_identifiers
# set(['A','B','C','D','E','F'])

```

You can revert back to the original protein by repeating step 6 and adding "F" to the deletion list. <br/>

### 9) Reading out protein sequence (gapped and exact)

We can now have a look at the protein sequence <br/>
First we can have a look at the exact sequence of the protein without gaps indicated (not containing gaps of non-crystalized positions).  <br/>
Then we look at the gapped version (default). Xs will be placed where the crystal sequence contains gaps. <br/>
We can also read out the sequence of each chain individually as a gapped sequence (as shown with ChainA) <br/>
We can also read out the non gapped sequence of each chain as shown <br/>


```{python}
ChainA = protein1.get_chain_by_id("A")
protein1.FASTA_format(gapped=False)
# "protein sequence in FASTA format"
protein1.FASTA_format()
# "gapped protein sequence in FASTA format"
ChainA.gapped_protein_sequence
# "gapped protein sequence for Chain A only"
ChainA.protein_sequence
# "The exact sequence in the crysal without non crystalized positions indicated"

```

### 10) Getting DNA sequence

In addition to protein sequences the PDB may contain DNA <br/>
We can check if this is the case for protein1 <br/>
In the case where this is true we can read out the sequences in Fasta format <br/>


```{python}
protein1.has_nucleotide
# True
for chain in protein1.nucleotides:
	print(">" + chain.globalID + "\n" + chain.nucleotide_sequence())
# >"globalID"
# "Nucleotide Sequence"

``` 

### 11) Cleaning protein and writing PDBfile

We can clean the protein with the following command <br/>
We can then write the PDB of the protein out to a file name that we provide <br/>

```{python}

protein1.clean()
protein1.write("protein1.pdb")

```

## TASK 2

### 1) Dependencies

The following application of this SBILib library requires DSSP to be run <br/>

### 2) Configuration

In the configSBI.txt file are stored the paths to the executable necessary. <br/> 
This file is located in SBILib/external and looks something like this. <br/>
Here only dssp is shown but the format holds for other dependencies (BLAST, HMMER etc.) <br/>
```text

[dssp]
executable     = mkdssp
path           = /path/to/the/executable/folder

```

If a config file is already available it can be used as well. <br/>
This file can be added to the scripts with the following commands <br/>

```{python}

import os 
os.environ['SBI_CONFIG_FILE'] = "path/to/new/configSBI.txt"

```


### 3) Calculating a chains secondary structure

First we load in a new protein and grab one of the chains as in the previous chapter <br/> 
We can then calculate the secondary structure of the chain <br/>
After this is done the secondary structure can be printed out. <br/>


```{python}

from SBILib.structure import PDB
protein = PDB("pdb1a3q.ent.gz")
ChainA = protein.get_chain_by_id("A")
ChainA.calculate_dssp()
ChainA.gapped_protein_secondary_structure
# "gapped secondary structure"

```

The secondary structure annotation follows the DSSP classification:<br/>

G = 3-turn helix (310 helix). Min length 3 residues.<br/>
H = 4-turn helix (α helix). Minimum length 4 residues.<br/>
I = 5-turn helix (π helix). Minimum length 5 residues.<br/>
T = hydrogen bonded turn (3, 4 or 5 turn)<br/>
E = extended strand in parallel and/or anti-parallel β-sheet conformation. Min length 2 residues.<br/>
B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)<br/>
S = bend (the only non-hydrogen-bond based assignment).<br/>
C = coil (residues which are not in any of the above conformations).<br/>



### 4) printing out the entire proteins amino acid sequence and secondary structure

```{python}

for chain in protein.proteins:
	chain.calculate_dssp()
	print(">" + chain.globalID + "\n" + chain.gapped_protein_sequence + "\n" + chain.gapped_protein_secondary_structure)

# >"globalID"
# "gapped amino acid sequence"
# "gapped secondary structure"


```
This Gives us an overview of the protein. We see the Amino Acid sequence and the DSSP predicted Secondary structure all together.


## TASK 3

### 1) Set up a blast  


In order to blast you must have a database <br/>
This is a multifasta file containing protein or DNA sequences depending on the blast type. <br/>
Reformating of the multifasta file will be done automatically once it is provided. <br/>
The blast object will now know that the blast is to be conducted on this database. <br/>
If the unittest was run after installation a database file will be located in the tests folder (see Testing section)<br/>

```{python}

from SBILib.external.blast import BlastExe
blast = BlastExe(database = "path/to/database/file.fa")

```

### 2) execute blast
Once the blast is set up we need a query sequence to pass. <br/>
For this example we will blast chainA of protein1 against our database. <br/>
With these variables we can now execute the blast and save the to "blastresults". <br/>
The result is an instance of type "BlastResult". <br/>

```{python}

querySequence = ChainA.protein_sequence
queryID = ChainA.globalID
blastresults = blast.execute_query_seq(sequenceID = queryID, sequence = querySequence)
type(blastresults)
# <class 'SBILib.external.blast.BlastResult.BlastResult'>

```


### 3) View results
There are several ways that we can view the results of the blast. <br/>
Firstly we can print out a compacted version of all of the results. <br/>
We could also focus to select points of interest (eg. hit_id and evalue). <br/>

```{python}

blastresults.print_compacted_blast()
# query and target info, e-value, coverage etc. 

for hit in blastresults.get_hits():
	print(hit.sequenceID, " ", hit.e_value)
# list of all hits and their associated e-values.

```

The compacted blast output is divided into the following collumns:<br/>
Query ID, Query length, Target ID, Accession length, Identities, Positives, Gaps, e_value , Query sequence, Target sequence, Format Positions<br/>
The varying outputs here have the following meaning:<br/>

Query ID = ID of provided sequence<br/>
Query length = length of provided sequence<br/>
Target ID = id of the hit sequence<br/>
Accession length = length of the hit sequence<br/>
Identities = number of identical amino acids between the query and target<br/>
Positives = The number of amino acids that are either identical between the query and the subject sequence or have similar chemical properties<br/>
Gaps = number of gaps in the alignment<br/>
e_value = the expected number of hits expected to be seen by chance given the size of the blast database.<br/>
Query sequence = provided sequence<br/>
Target sequence = hit sequence<br/>
Format Positions = a single string in which fragment definitions will be represented as follows: <br/>
Query segment 1 start : Hit segment 1start, Query segment 1 end : Hit segment 1 end,Query segment 2 start : Hit segment 2 start, Query segment 2 end : Hit segment 2 end  etc.<br/>



### 4) Filter results
If we want to evaluate whether the proteins fall within the Rost Curve twilight zone, <br/>
we can parse through the list and print it off similarly to how it was done above. <br/>
If we want to only consider hits passing the rost evaluation we can pass that as an argument to the "get_hits" command <br/>



```{python}

for hit in blastresults.get_hits():
	print(hit.sequenceID, " ", hit.evaluate_Rost_twilight_zone())
# indication of wich proteins would pass the test

for hit in blastresults.get_hits(tz_type = "ID"):
	print(hit.sequenceID, " ", hit.evaluate_Rost_twilight_zone())
# only hits that pass the test will be shown


```


### 4) PIR 

PIR formatted alignments are available for every hit. <br/>
We can select hit number 13 with the result parameter <br/>
We could add in the filters to grab the PIR of alll of the best results <br/>

```{python}

blastresults.str_PIR(result = 13)
# PIR alignment
for hit in  blastresults.get_hits( evalue = 0, tz_type = "ID"):
	blastresults.str_PIR(result = hit._num_seq)
# All PIRs with hits that match the criteria given 

```

### 5) Find Results with matching secondary structure

The secondary structure of a protein plays a vital role in its function.<br/>
The SBILib package allows for the quick viewing of how secondary structure is represented in an alignment.<br/>
Are vital secondary structure elements located in a gap resulting in a homolog with unequal function?<br/>
Here we demonstrate how this can be analyzed.<br/>

```{python}

blastresults.str_PIR(result = 13)


#>P1;1A3Q_A
#sequence:1A3Q_A:2:.:285:.:.:.:.:.
#PYLVIVEQPKQRGFRFRYGCEGPSHGGLPGASSEKGRKTYPTVKICNYEGPAKIEVDLVT
#HSDPPRAHAHSLVGKQCSELGICAVSVGPKDMTAQFNNLGVLHVTKKNMMGTMIQKLQRQ
#RLRSRPQGLTEAEQRELEQEAKELKKVMDLSIVRLRFSAFL------RSLPLKPVISQPI
#HDSKSPGASNLKISRMDKTAGSVRGGDEVYLLCDKVQKDDIEVRFYEDDENGWQAFGDFS
#PTDVHKQYAIVFRTPPYHKMKIERPVTVFLQLKRKRGGDVSDSKQFTYYP*
#>P1;REL
#structureX:REL:16:AVIRE:289:AVIRE:.:.:.:.
#PYIEIFEQPRQRGTRFRYKCEGRSAGSIPGEHSTDNNKTFPSIQILNYFGKVKIRTTLVT
#KNEPYKPHPHDLVGKGCRD-GYYEAEFGPERQVLSFQNLGIQCVKKKDLKESISLRISK-
#--KINPFNVPEEQLHNIDE--------YDLNVVRLCFQAFLPDEHGNYTLALPPLISNPI
#YDNRAPNTAELRICRVNKNCGSVKGGDEIFLLCDKVQKDDIEVRFVLGN---WEAKGSFS
#QADVHRQVAIVFRTPPFLG-DITEPITVKMQLRRPSDQAVSEPVDFRYLP*


```
This will output a PIR format alignment of the two proteins. Dashes in the target protein (second sequence of the PIR) indicate gaps in the alignment. We are interested in seeing what secondary structure elements (if any) may have gone missing as a result of these gaps.  
```{python}

string = blastresults.str_PIR(result = 13)
indices = [i for i, c in enumerate("".join(string.split(">")[2].split("\n")[2:]))if c == "-"]
i = 0    
string= ""
for char in ChainA.gapped_protein_secondary_structure:
	if i in indices:
		string = string + char
	else:
		string = string + "-"
	i+=1


lines = textwrap.wrap(string,60)
print("\n".join(lines))

#------------------------------------------------------------
#-------------------T---------------------------------------
#HHH-----------------HHHHHHHT--------------------------------
#--------------------------------------------------SSS-------
#----------------------------------------------------

```

Here we can see what secondary structures go missing as a result of the alignment.<br/>
To remind ourselves of the query proteins secondary structure lets look at it again:

```{python}
lines = textwrap.wrap(ChainA.gapped_protein_secondary_structure, 60)
print("\n".join(lines))


#--EEEEEE-B-SSSB--EETTT-S-S----BS---TT-----EEEEET--
#SSEEEEEEEE-SSSS--B-SSEEEETTB-TTS-EEEEE-SS--EEE---EEEEE--
#TTTHHHHHHHHHHHHHTTTS-S---HHHHHHHHHHHHHHHTT--TTEEEEEEEEEE-
#xxxxxx-EE---EE---EEBTTSTTTS---EEEES-SEEETT---EEEEEESS--
#TTTEEEEEEE-SSS-EEEE-B--GGGEETTTEEEEE----S-TT-SS-EEEEEEEEETTT
#--B---EEEEEE-



```
We can see that for this Homolog according to our alignment half of a 4-turn helix is missing in our target protein.<br/> 
Should this element be of functional significance we would have to continue our search for a more suitable homolog.<br/> 


## Task 4

### 1) Complex
For this exercise we will use the protein DNA complex that we used in the previous tasks. <br/>
In order to extract the interfaces from a protein complex we will first have to create an object of the complex class. <br/> 
In createing a new complex class we need to provide the protein complex. <br/>
We can grab the individual chains of the complex straight from this class using the .pdb method if we wanted. <br/>

```{python}

from SBILib.structure import Complex
from SBILib.structure import PDB
protein = PDB("pdb1a3q.ent.gz")
complex = Complex(protein)
complex.pdb
# PDB object


```
Here we have used the default definitions for contacts within the complex which are as follows.

```{python}
Complex(pdb, biomolecule=False, PPI=True, PPI_type='cb', PPI_distance=12, PNI=True, PNI_type='min', PNI_distance=8, PHI=True, PHI_type='min', PHI_distance=6)
```
Protein Protein interaction:<br/>
	type: beta carbons<br/>
	distance: 12 Angstrom<br/>
	
Protein Nucleotide interaction:<br/>
	type: minimum<br/>
	distance: 8 Angstrom<br/>

Protein Heteroatom interaction:<br/>
	type: minimum<br/>
	distance: 6 Angstrom<br/>


However these can be altered:
```{python}

complex = Complex(protein, PPI_type='ca', PPI_distance=14)

```

The available contact types are:<br/>
Protein Protein interaction:<br/>
	minimum (the distance between the closest pairs of atoms of each residue)<br/>
	alpha carbons (distance between the alpha carbons of the two residues)<br/>
	beta carbons (distance between the beta carbons of the two residues)<br/>
	geometric (distance between the geometric centers of the two residues)<br/>
	backbone (distance between the center of the backbone atoms between the two residues)<br/>
Protein Nucleotide interaction:<br/>
	minimum<br/>
	geometric<br/>
	backbone<br/>
Protein Heteroatom interaction:<br/>
	minimum<br/>
	geometric<br/>



### 2) PPI
The interfaces are already calculated and present within the Complex object. <br/>
The results of the Protein Protein Interfaces are stored in a list and accessible with the PPInterfaces method <br/>
We can count the number of interfaces produced by looking at the length of the list (here just 1) <br/>
To access the interfaces we can simply index the list <br/>
The results can quickly be summarized by converting to a string. <br/>
Alternatively we can enter the interface for more detailed info. <br/>
The interface contains information on all of the contacts in the form of objects of the contact class. We can find the amino acid identifiers of all of the contact residues by iterating through the interface object (here we simply selected the first element of the list because there are no more interfaces). <br/>
We can then print each amino acids position within its chain and the single letter code identifier as well as the coordinates for the alpha carbon of that amino acid. <br/>
(aminoacid1 is the first chain provided and aminoacid2 is the second chain/interactor) <br/>
We can then print the geometric distance between the two amino acids. <br/> 


```{python}
print(complex.PPInterfaces[0].toString())
#Chain1, Chain2, threshold_type(default cb), threshold distance. 

len(complex.PPInterfaces)
# 1

complex.PPInterfaces[0]
# Interface object

for contact in complex.PPInterfaces[0].contacts:
	print(contact.aminoacid1.identifier, contact.aminoacid1.single_letter, contact.aminoacid1.ca.coordinates)
	print(contact.aminoacid2.identifier, contact.aminoacid2.single_letter, contact.aminoacid2.ca.coordinates)
	print(contact.geometric_distance)
# complete list of the contacts, coordinates and distances in this interface

```



### 2) PNI

We can calculate Protein Nucleotide interfaces as well. <br/>
The interfaces are accessible with the PNInterfaces method. <br/> 
When we take a look at the interface list we see that this time we have multiple results. <br/>
These can be accessed by indexing or iterated through. <br/>
We can then gather the position and identifiers for the amino acid and nucleotides. As well as the alpha carbon coordinate in the case of the amino acid and the phosphate coordinate in the case of the nucleotide. <br/>


```{python}
len(complex.PNInterfaces)
# 4 

for contact in complex.PNInterfaces[0].contacts:
	print(contact.nucleotide.identifier, contact.nucleotide.single_letter, contact.nucleotide.p.coordinates)
	print(contact.aminoacid.identifier, contact.aminoacid.single_letter, contact.aminoacid.ca.coordinates)
	print(contact.geometric_distance)
# complete list of the contacts, coordinates and distances in this interface

```

### 3) PHI

In order to find the Protein Heteroatoms interface you must first check whether your chains contains heteroatoms. <br/>
This can be checked with the has_heteroatoms method from the chain class. <br/>
I will load in a pdb file that I know has heteroatoms <br/>
The PHInterface will mix the residue types between the two chains. <br/>
For this reason we will find one that is of type aminoacid so that we can check the type within the contact object. <br/>
Now that we know which is which we can control for the classes in the iterations. <br/>
In the case of an amino acid residue we can print the same information from the previous examples. <br/>
In the case that the residue is a heteroatom we will print the identifier, all of the residues atoms (which contain the coordinats as well), we can also print the coordinates seperately with the all_coordinates method. <br/>



```{python}
for chain in protein.chains:
	print(chain.has_heteroatoms)
# False
# False
# False
# False


protein = PDB("1a2w.pdb")
for chain in protein.chains:
	print(chain.has_heteroatoms)
# True
# True

complex = Complex(protein)
len(complex.PHInterfaces)
# 1

complex.PHInterfaces[0].contacts
# list of contacts

type(list(complex.PHInterfaces[0].contacts)[0].aminoacid)
# type of the first residue

type(list(complex.PHInterfaces[0].contacts)[0].heteroatom)
# type of the second Residue which will be opposite of the first (heteroatom residue vs aminoacid)


for contact in complex.PHInterfaces[0].contacts:
	if (contact.aminoacid.mode == "ATOM"):
		print(contact.aminoacid.identifier, contact.aminoacid.single_letter, contact.aminoacid.ca.coordinates)
		print(contact.heteroatom.identifier, contact.heteroatom.atoms, contact.heteroatom.all_coordinates)
		print(contact.geometric_distance)
	else:
		print(contact.aminoacid.identifier, contact.aminoacid.atoms, contact.aminoacid.all_coordinates)
		print(contact.heteroatom.identifier, contact.heteroatom.single_letter, contact.heteroatom.ca.coordinates)
		print(contact.geometric_distance)
# complete list of the contacts, coordinates and distances in this interface


```


## Task 5

### Protein loop geometry

We will load in the pdb we have worked with before and have a look at the loop geometry in its chain "A".

```{python}

protein = PDB("pdb1a3q.ent.gz")
ChainA = protein.get_chain_by_id("A")

```

In order to visualize the loop geometry we need to first have an idea of the proteins secondary structure. <br/>
Like before this is done with DSSP and can be called within the SBILib library. <br/>
Once the secondary structures are resolved we can calculate the loop geometry. <br/>
In the SBILib library loops are refered to as archs. <br/>

```{python}

ChainA.calculate_dssp()
ChainA.calculate_archs()
ChainA.archs

#[**Secondary Structure Relation:
#STR1:	( E )  39  <--  6 -->  44 
#	f11:  43  cmf11:              [-3.84483333 62.7425     28.42483333] eigf11:              [ 0.34916202 -0.59272239 -0.72578651]
#STR2:	( E )  54  <--  2 -->  55 
#	f44:  55  cmf44:              [ 7.85366667 71.586       7.988     ] eigf44:              [ 0.04348626  0.87377397 -0.48438414]
#ARCH TYPE: BK
#INTERNAL SS: 0
#DISTANCE: 22.978424484459143
#THETA: 98.69428 RHO: 74.28570 DELTA: 58.26752
#**



```
The output of the archs command is a list of the loop geometries calculated for the given chain (ChainA)<br/>
We get the first secondary structure that flanks the loop. In parentheses the structure type (see Task2.3) it's start, length and end.<br/>
An axis for flanking secondary structures 1 defined based on the shortest of the principal moments of inertia for that structure. <br/>
The same information is printed out for the second flanking structure.<br/>
The arch Type is printed:<br/>
BK: beta-link<br/>
BN: beta-hairpin<br/>
EG: beta-helix310<br/>
EH: beta-alpha helix<br/>
GE: helix310-beta<br/>
GG: helix310-helix310<br/>
GH: helix310-alpha helix<br/>
HE: alpha helix-beta<br/>
HG: alpha helix-helix310<br/>
HH: alpha helix-alpha helix<br/>

INTERNAL SS = the number of secondary structures contained in the loop.<br/>
Distance = The euclidean distance between the beginning and end of the loop.<br/>
Theta: the angle between the axes of secondary structure 1 and 2.<br/>
Delta: the angle between the axis of secondary structure 1 and the vector of distance between beginning and end of loop.<br/>
Rho: the angle between the secondary structure 2 axis and the plane that contains the axis of secondary structure 1.<br/>


To get larger loops which may contain secondary structures:

```{python}

ChainA.superarchs

```

The output follows the same format as the previous.



## Task 6

### Loop Grafting

We will show how SBILib may be used to graft the loop of one protein into the structure of another protein with similar loop geometry. 

```{python}

new = PDBlink()
path = new.get_PDB("2ram")
protein2 = PDB(path)
report = protein.compare_loops(protein2)
report[0]

# 'QueryProt:A,176,182:TargetProt:A,160,166:VISQPIH,VLSHPIF'

```
These Commands took the protein previously defined (1a3q here QueryProt) and for each loop within that protein searched for loops in the TargetProt (2ram) with matching geometries. These pairs are saved in a list (here defined by the variable report). Each element (loop matches) of that list is composed of a string with the format seen above. QueryProt:A refers to the first loop (from the query protein) being located on chain A. 176,182 the start and end positions of that loop respectively. TargetProt:A the second loop is found on chain A of the target protein. 160,166 our start and end positions. VISQPIH,VLSHPIF are the sequences for the Query protein and Target protein loops respectively. 
We define loop geometries to be similar if all of the following conditions are met:<br/>

```{python}
Rho = math.sqrt((loop1.rho - loop2.rho)**2) <= 10 
Delta = math.sqrt((loop1.delta - loop2.delta)**2) <= 5
Theta = math.sqrt((loop1.theta - loop2.theta)**2) <= 5
Distance = math.sqrt((loop1.cartesian_distance - loop2.cartesian_distance)**2) <= 0.5 

```
That is to say if the difference between each type of angle and distance is less than the thresholds of (10 Rho angle, 5 Delta angle, 5 Theta angle, 0.5 Cartesian distance)<br/>



Now we tell the protein to graft the target protein loop into the query protein using the first match found in the previous output list. Note that if the loops are of unequal length this will not work. 
```{python}

graft = self.protein.graft(self.protein2, report[0])

```




## Details on classes
If you wish to check the functions available for any class, they are available with pytons dir method. <br/>
first create an instance of the class you wish to inspect. <br/>
then call the dir method on that instance. What follows is a list of all of the functions of the PDB class <br/>
to get more detailed information call the help methdod. <br/>

```{python}

protein = PDB("pdb1a3q.ent.gz")
dir(protein)

#['FASTA_IDX', 'FASTA_format', 'IDX_format', 'PDB_format', '_COMPND', '_NMR', '_NMR_chains', '__abstractmethods__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_apply_matrix', '_biomol_id', '_chain_id', '_chains', '_cif_file', '_get_chain_position_by_id', '_has_nucl', '_has_prot', '_header', '_pdb_file', '_read_PDB_file', '_read_mmCIF_file', '_write_PDB_file', 'add_chain', 'add_chains', 'all_models', 'apply_biomolecule_matrices', 'apply_symmetry_matrices', 'biomolecule_identifier', 'chain_exists', 'chain_identifiers', 'chains', 'clean', 'dehydrate', 'dump', 'duplicate', 'fuse_chains', 'get_chain_by_id', 'has_nucleotide', 'has_protein', 'header', 'id', 'is_NMR', 'is_all_ca', 'load', 'non_standard_chains', 'nucleotides', 'pdb_file', 'proteins', 'read', 'repeated_chain_ids', 'reposition', 'rotate', 'tmpclean', 'translate', 'write']

help(protein)

```

This is the result:

### PDB
class PDB(SBILib.beans.StorableObject.StorableObject) <br/>
	A {PDB} is a collection of {Chain} <br/>

Methods defined here: <br/>

 |  FASTA_IDX(self, protein=True, nucleotide=False) <br/>
 |  <br/>
 |  FASTA_format(self, gapped=True, protein=True, nucleotide=False) <br/>
 |  <br/>
 |  IDX_format(self, protein=True, nucleotide=False) <br/>
 |  <br/>
 |  PDB_format(self, clean=False, terminal=True) <br/>
 |      Strings a {PDB} in PDB format <br/>
 |      @rtype: String <br/>
 |  <br/>
 |  __init__(self, pdb_file=None, dehydrate=False, header=False, onlyheader=False, biomolecule=False) <br/>
 |      @type  pdb_file: String <br/>
 |      @param pdb_file: PDB formated file to read <br/>
 |      <br/>
 |      @raise IOError if pdb_file does not exist and it is not an empty object <br/>
 |  __len__(self) <br/>
 |      # OVERRIDE DEFAULT METHODS <br/>
 |  <br/>
 |  add_chain(self, chain, NMR=False) <br/>
 |      Adds a new chain to the PDB <br/>
 |  <br/>
 |  add_chains(self, chains, NMR=False) <br/>
 |      Adds a new chains to the PDB <br/>
 |  <br/>
 |  apply_biomolecule_matrices(self, keepchains=False, water=True) <br/>
 |      Only works if the PDB file is an original PDB file or <br/>
 |      the matrices have been added in the correct PDB format <br/>
 |      @rtype: {PDB} <br/>
 |  <br/>
 |  apply_symmetry_matrices(self) <br/>
 |      Only works if the PDB file is an original PDB file <br/>
 |      or the matrices have been added in the correct PDB format <br/>
 |      @rtype: {PDB} <br/>
 |  <br/>
 |  chain_exists(self, chain) <br/>
 |      Confirms if a given chain exists in the PDB <br/>
 |      @rtype: Boolean <br/>
 |  <br/>
 |  clean(self) <br/>
 |  <br/>
 |  dehydrate(self) <br/>
 |      # METHODS <br/>
 |  <br/>
 |  duplicate(self, hetero=True, water=False, NMR=False) <br/>
 |      Returns a {PDB} identical to the original but as a new object <br/>
 |      @rtype: {PDB} <br/>
 |  fuse_chains(self, chains_ids) <br/>
 |      Fuses several chains into the first one. <br/>
 |      It will not allow to fuse different structural chains. <br/>
 |      It does not alter the {PDB}, but provides a new one <br/>
 |      @rtype: {Chain} <br/>
 |      <br/>
 |      @raise AttributeError if: <br/>
 |          a) A given chain ID is not present <br/>
 |          b) Try to fuse different structural chains <br/>
 |  <br/>
 |  get_chain_by_id(self, id) <br/>
 |      Returns a chain according to its id or None if no chain with that id is found <br/>
 |      @rtype: {Chain} <br/>
 |  <br/>
 |  reposition(self, matrix=None, vector=None) <br/>
 |      Rotates and Translates each {Chain} according to a matrix and a translational vector <br/>
 |      <br/>
 |      @type matrix: numpy.matrix <br/>
 |      <br/>
 |      @type vector: numpy.array <br/>
 |  <br/>
 |  rotate(self, matrix=None) <br/>
 |      Rotates each {Chain} according to a given matrix <br/>
 |      <br/>
 |      @type matrix: numpy.matrix <br/>
 |  <br/>
 |  tmpclean(self, cluster_by_alternative_id=False) <br/>
 |      Makes a clean version of the PDB, rechaining in order and renumbering atoms. <br/>
 |      Renumbering residues is optional <br/>
 |  <br/>
 |  translate(self, vector=None) <br/>
 |      Translates each {Chain} according to a translational vector <br/>
 |      <br/>
 |      @type vector: numpy.array <br/>
 |  <br/>
 |  write(self, output_file=None, format='PDB', force=False, clean=False) <br/>
 |      Writes the object in a specific format <br/>
 |      <br/>
 |      @type  output_file: String <br/>
 |      @param output_file: File to write <br/>
 |      <br/>
 |      @type  format: String <br/>
 |      @param format: Format of the file to print <br/>
 |  <br/>
 |  ---------------------------------------------------------------------- <br/>
 |  Static methods defined here: <br/>
 |  <br/>
 |  read(input_file, format='PDB') <br/>
 |      Reads a file of data in a specific format and returns the object <br/>
 |      <br/>
 |      @type  input_file: String <br/>
 |      @param input_file: File to read <br/>
 |      <br/>
 |      @type  format: String <br/>
 |      @param format: Format of the file to read <br/>
 |  <br/>
 |  ---------------------------------------------------------------------- <br/>










            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/structuralbioinformatics/SBI",
    "name": "SBILib",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Structural Bioinformatics,Loops",
    "author": "Patrick Gohl",
    "author_email": "patrick.gohl@upf.edu",
    "download_url": "",
    "platform": null,
    "description": "# Welcome to the StructuralBioInformatics pylib\n\nThe **SBILib** python library is a continuous work in progress that clusters functionalities to work with data derived from [PDB][1], from creating a local PDB repository to do _real_ Blast searches over PDB sequences (i.e. sequences of the really crystallized protein sections) to perform a simply split chains (separate the sections of a PDB file).\n\n\n## Requirements:\n\nThe library **is not exempt of bugs** (probably), and can not be considered a final version. This means that one must use it at its own risk. And, while new functionalities are bound to appear, others might disappear at any given moment. Any comments, complains or bug reports can be addressed in the corresponding sections.\n\n[1]: http://www.pdb.org/\n\n\n# SBILib Library Manual \n\nThis manual contains a short turorial split into 6 common Tasks.<br/>\nInformation on how to access more detailed help pages within python is provided at the end.<br/>\nWe have made available a list of slightly more complex scenarios on which the SBILib library may be applied. In Scenarios.md (https://github.com/structuralbioinformatics/SBILib)\n\n### Dependencies\npython3 <br/>\nscipy <br/>\nnumpy <br/>\nDSSP <br/>\nBLAST <br/>\nCD-Hit <br/>\n\n\n## Installation:\nSBILib has been distributed on PyPi and can be installed with pip. <br/>\n\n```console\npip install SBILib\n```\n\nThis command will install the package locally.<br/>\n\n\nIt can also be directly cloned from our github repo:<br/>\n\n```console\ngit clone https://github.com/structuralbioinformatics/SBILib\n```\n\nAlternatively it is also possible to navigate directly to the repository https://github.com/structuralbioinformatics/SBILib and copy the folder manually.<br/>\n\n\n## Testing:\nWe have implemented unit testing. After installation these scripts can be run to ensure that the package is functioning as it should.\nThese scripts are located in the directory \"tests\". Due to size limits we werent able to provide the database file needed for BLAST through PyPi. This file, will be automatically downloaded from https://www.ebi.ac.uk/uniprot/download-center once the unit tests are conducted. Later users may define their own Blast database file as outlined in the BLAST section of this tutorial. IMPORTANT before running ensure that external dependencies are properly configured as per TASK 2.2. \n```console\npip show SBILib | grep Location\n\n# Location: /location/of/SBILib/locally\n\ncd /location/of/SBILib/locally/SBILib/tests\n\npython Test.py\n\n#........('TEST_PDB1A3Q_A.66.pdb', 'TEST_PDB1A3Q_A.66.dssp')\n#truncated chain!TEST_PDB1A3Q_A.66.dssp\n#.('TEST_PDB2RAM_A.38.pdb', 'TEST_PDB2RAM_A.38.dssp')\n#('TEST_PDB2RAM_B.26.pdb', 'TEST_PDB2RAM_B.26.dssp')\n#('TEST_PDB1A3Q_A.30.pdb', 'TEST_PDB1A3Q_A.30.dssp')\n#truncated chain!TEST_PDB1A3Q_A.30.dssp\n#('TEST_PDB1A3Q_B.75.pdb', 'TEST_PDB1A3Q_B.75.dssp')\n#truncated chain!TEST_PDB1A3Q_B.75.dssp\n#..\n#----------------------------------------------------------------------\n#Ran 11 tests in 67.249s\n#\n#OK\n```\n\n\n## TASK 1\n\n### 1) loading in the appropriate modules\n\nThe following code demonstrates how a PDB file can be loaded: <br/>\n\nThe first step is to load in the necessary modules. <br/>\nPDB link will allow us to grab a protein from the protein data bank. While the PDB module will allow us to do some basic tasks on the protein. <br/>\n```{python}\n\nfrom SBILib.databases import PDBlink\nfrom SBILib.structure import PDB\n\n```\n### 2) Fetching PDBs\n\nIn the next step we create an instance of the PDBlink class and use it to get a pdb from the databank with the coresponding accession code. <br/>\nHere we have selected protein \"1a3q\". <br/>\nNote that the 'path/to/pdb' will be saved in the variable 'path'. This can then be read in as a new PDB object. <br/>\nIf the pdb file is stored locally the address can be provided directly without the need to fetch it from the databank. <br/>\n\n```{python}\n\nnew = PDBlink()\npath = new.get_PDB(\"1a3q\")\nprotein1 = PDB(path)\n\n```\n\nAdditionally predicted PDBs can also be fetched from the AlphaFold2 model database.\n\n```{python}\n\nfrom SBILib.databases import AlphaFoldlink\n\nalphaPath = AlphaFoldlink.download_pdb(\"P31946\")\n\nAlphModel = PDB(alphaPath)\n\n```\n\n### 3) Reading out chains in a protein\n\nFirst I will load in a second protein this time providing the address directly, and then have a look at the chains contained within the protein. <br/>\nWe can look at exacty what chains are contained within the protein. <br/>\nHere our protein contains 2 chains identified as A and B <br/>\n\n```{python}\n\nprotein2 = PDB(path/to/pdbfile.pdb)\nprotein2.chain_identifiers\n# set(['A', 'B'])\n\n```\n\n### 4) Selecting other elements of PDB \n\nNow that we have seen how to select Chains of a protein we can also select the amino acids.<br/>\nWe will renumber the atoms and residues so that they start at 1.<br/>\n```{python}\n\nChainA = protein.get_chain_by_id(\"A\")\n\nChainA.renumber_residues(1)\nChainA.renumber_atoms(1)\n\nChainA.aminoacids\n\n```\n\nThis lists all of the amino acids in the Chain and its atoms.<br/>\nThe selection can be more specific by selecting one amino acid from the list.<br/>\nWe can access the number (the position of the amino acid on the chain)<br/>\nThe type of amino acid (three letter code and single letter).<br/>\nThe coordinates of the O, C, CA and CB (if present)<br/>\nThe coordinates of the backbone atoms.<br/>\n\n\n```{python}\n\nChainA.aminoacids[0].number\n#1\nChainA.aminoacids[0].type\n#'GLY'\nChainA.aminoacids[0].single_letter\n#G\nChainA.aminoacids[0].n \n#<AtomOfAminoAcid: [N, 1]:(-5.833, 72.589, 45.643)>\nChainA.aminoacids[0].o \n#<AtomOfAminoAcid: [O, 4]:(-5.436, 72.093, 42.983)>\nChainA.aminoacids[0].c\n#<AtomOfAminoAcid: [C, 3]:(-5.850, 73.226, 43.203)>\nChainA.aminoacids[0].ca\n#<AtomOfAminoAcid: [CA, 2]:(-5.902, 73.707, 44.647)>\nChainA.aminoacids[0].cb\n#\nChainA.aminoacids[0].backbone_atoms\n#[<AtomOfAminoAcid: [N, 1]:(-5.833, 72.589, 45.643)>, <AtomOfAminoAcid: [CA, 2]:(-5.902, 73.707, 44.647)>, <AtomOfAminoAcid: [C, 3]:(-5.850, 73.226, 43.203)>]\n\n\n\n```\nThere are other features such as secondary structure and accessibility information which is only available once the dssp of the chain has been calculated (see Task 2).<br/>\n\nAlternatively we can also accesss the atoms of the amino acid by their index.<br/>\nWe see that the first amino acid has 4 atoms<br/>\nWe see the name and number of the atom<br/>\nWe can then access the x,y,and z coordinate of the atom<br/>\nWe can also see the occupancy <br/>\n\n```{python}\n\nlen( ChainA.aminoacids[0].atoms)\n#4\nChainA.aminoacids[0].atoms[0].name\n#N\nChainA.aminoacids[0].atoms[0].number\n#1\nChainA.aminoacids[0].atoms[0].x\n#-5.833\nChainA.aminoacids[0].atoms[0].y\n#72.589\nChainA.aminoacids[0].atoms[0].z\n#45.643\n\n\n```\n\n\n\n\n\n### 5) Duplicating protein and fusing chains\n\nWe can duplicate the protein as a new object with the duplicate command. <br/>\nThen if we want to we could fuse chains \"A\" and \"B\" into a new chain \"A\" <br/>\nNote that you will not be able to fuse two different kinds of structural chains. <br/>\n\n```{python}\n\nprotein3 = protein2.duplicate()\nprotein3 = protein3.fuse_chains([\"A\",\"B\"])\nprotein3.chain_identifiers\n# set([\"A\"])\n\n```\n\n\n### 6) Adding a chain to a protein\n\nNow let us imagine a scenario where we wanted to take chain A from protein2 and add it to protein1. <br/>\nFirst we have to pull out the chain that we want (A) from protein2. <br/>\nThen by checking the chain Identifiers in protein1 we can see that it already has a chain \"A\". <br/>\nThis means that we will have to change the identifier of the chain we wish to add. Lets make it \"E\". <br/>\nFirst we will have to duplicate the chain, creating a new object (ChainE), so that we dont alter the chain Ids of protein2. <br/>\nNow we can add the chain to protein1 <br/>\nIf we check the identifiers again we will see that a new chain \"E\" has been added to the protein. <br/>\n\n\n```{python}\n\nChainA = protein2.get_chain_by_id(\"A\")\nprotein1.chain_identifiers\n# set(['A', 'C', 'B', 'D'])\nChainE = ChainA.duplicate()\nChainE.chain = \"E\"\nprotein1.add_chain(ChainE)\nprotein1.chain_identifiers\n# set(['A', 'C', 'B', 'D', 'E'])\n\n```\n\n### 7) Removing chain(s) from a protein\n\nNow we would like to delete this chain again: <br/>\nFirst we set up a temporary empty PDB object and call it protein3. <br/>\nThen we will define a list of the chain identifiers we wish to remove (in this case only \"E\") <br/>\nFirst we loop through all of the chains ids in protein1. <br/>\nIf that protein is not in the deletion list we provided... <br/>\nWe add it to the newly created protein3. <br/>\nWe can then redifine protein1 to leave it without the unwanted chains. <br/>\n\n```{python}\n \nprotein3 = PDB()\ndeletion = [\"E\"]\nfor chain in protein1.chain_identifiers:\n     if chain not in deletion:\n             protein3.add_chain(protein1.get_chain_by_id(chain))\n             if (protein1.get_chain_by_id(chain).chaintype == \"P\"):\n             \tprotein3._has_prot = \"TRUE\"\n             elif (protein1.get_chain_by_id(chain).chaintype == \"N\"):\n             \tprotein3._has_nucl = \"TRUE\"\n\nprotein1 = protein3\n\n```\n\n### 8) Adding multiple chains to a protein\n\nAddition of multiple chains can also be accomplished. <br/>\nFirst we have to grab the second chain from protein2 and change its id like we did to ChainA. <br/>\nThen we change the identifiers of both chains as we have done previously <br/>\nThen we create a list of the chains that we wish to add <br/>\nWe then add this list of chains to protein1 <br/>\n\n```{python}\n\nChainB = protein2.get_chain_by_id(\"B\")\nChainF = ChainB.duplicate()\nChainF.chain = \"F\"\nchainlist = [ChainE,ChainF]\nprotein1.add_chains(chainlist)\nprotein1.chain_identifiers\n# set(['A','B','C','D','E','F'])\n\n```\n\nYou can revert back to the original protein by repeating step 6 and adding \"F\" to the deletion list. <br/>\n\n### 9) Reading out protein sequence (gapped and exact)\n\nWe can now have a look at the protein sequence <br/>\nFirst we can have a look at the exact sequence of the protein without gaps indicated (not containing gaps of non-crystalized positions).  <br/>\nThen we look at the gapped version (default). Xs will be placed where the crystal sequence contains gaps. <br/>\nWe can also read out the sequence of each chain individually as a gapped sequence (as shown with ChainA) <br/>\nWe can also read out the non gapped sequence of each chain as shown <br/>\n\n\n```{python}\nChainA = protein1.get_chain_by_id(\"A\")\nprotein1.FASTA_format(gapped=False)\n# \"protein sequence in FASTA format\"\nprotein1.FASTA_format()\n# \"gapped protein sequence in FASTA format\"\nChainA.gapped_protein_sequence\n# \"gapped protein sequence for Chain A only\"\nChainA.protein_sequence\n# \"The exact sequence in the crysal without non crystalized positions indicated\"\n\n```\n\n### 10) Getting DNA sequence\n\nIn addition to protein sequences the PDB may contain DNA <br/>\nWe can check if this is the case for protein1 <br/>\nIn the case where this is true we can read out the sequences in Fasta format <br/>\n\n\n```{python}\nprotein1.has_nucleotide\n# True\nfor chain in protein1.nucleotides:\n\tprint(\">\" + chain.globalID + \"\\n\" + chain.nucleotide_sequence())\n# >\"globalID\"\n# \"Nucleotide Sequence\"\n\n``` \n\n### 11) Cleaning protein and writing PDBfile\n\nWe can clean the protein with the following command <br/>\nWe can then write the PDB of the protein out to a file name that we provide <br/>\n\n```{python}\n\nprotein1.clean()\nprotein1.write(\"protein1.pdb\")\n\n```\n\n## TASK 2\n\n### 1) Dependencies\n\nThe following application of this SBILib library requires DSSP to be run <br/>\n\n### 2) Configuration\n\nIn the configSBI.txt file are stored the paths to the executable necessary. <br/> \nThis file is located in SBILib/external and looks something like this. <br/>\nHere only dssp is shown but the format holds for other dependencies (BLAST, HMMER etc.) <br/>\n```text\n\n[dssp]\nexecutable     = mkdssp\npath           = /path/to/the/executable/folder\n\n```\n\nIf a config file is already available it can be used as well. <br/>\nThis file can be added to the scripts with the following commands <br/>\n\n```{python}\n\nimport os \nos.environ['SBI_CONFIG_FILE'] = \"path/to/new/configSBI.txt\"\n\n```\n\n\n### 3) Calculating a chains secondary structure\n\nFirst we load in a new protein and grab one of the chains as in the previous chapter <br/> \nWe can then calculate the secondary structure of the chain <br/>\nAfter this is done the secondary structure can be printed out. <br/>\n\n\n```{python}\n\nfrom SBILib.structure import PDB\nprotein = PDB(\"pdb1a3q.ent.gz\")\nChainA = protein.get_chain_by_id(\"A\")\nChainA.calculate_dssp()\nChainA.gapped_protein_secondary_structure\n# \"gapped secondary structure\"\n\n```\n\nThe secondary structure annotation follows the DSSP classification:<br/>\n\nG = 3-turn helix (310 helix). Min length 3 residues.<br/>\nH = 4-turn helix (\u03b1 helix). Minimum length 4 residues.<br/>\nI = 5-turn helix (\u03c0 helix). Minimum length 5 residues.<br/>\nT = hydrogen bonded turn (3, 4 or 5 turn)<br/>\nE = extended strand in parallel and/or anti-parallel \u03b2-sheet conformation. Min length 2 residues.<br/>\nB = residue in isolated \u03b2-bridge (single pair \u03b2-sheet hydrogen bond formation)<br/>\nS = bend (the only non-hydrogen-bond based assignment).<br/>\nC = coil (residues which are not in any of the above conformations).<br/>\n\n\n\n### 4) printing out the entire proteins amino acid sequence and secondary structure\n\n```{python}\n\nfor chain in protein.proteins:\n\tchain.calculate_dssp()\n\tprint(\">\" + chain.globalID + \"\\n\" + chain.gapped_protein_sequence + \"\\n\" + chain.gapped_protein_secondary_structure)\n\n# >\"globalID\"\n# \"gapped amino acid sequence\"\n# \"gapped secondary structure\"\n\n\n```\nThis Gives us an overview of the protein. We see the Amino Acid sequence and the DSSP predicted Secondary structure all together.\n\n\n## TASK 3\n\n### 1) Set up a blast  \n\n\nIn order to blast you must have a database <br/>\nThis is a multifasta file containing protein or DNA sequences depending on the blast type. <br/>\nReformating of the multifasta file will be done automatically once it is provided. <br/>\nThe blast object will now know that the blast is to be conducted on this database. <br/>\nIf the unittest was run after installation a database file will be located in the tests folder (see Testing section)<br/>\n\n```{python}\n\nfrom SBILib.external.blast import BlastExe\nblast = BlastExe(database = \"path/to/database/file.fa\")\n\n```\n\n### 2) execute blast\nOnce the blast is set up we need a query sequence to pass. <br/>\nFor this example we will blast chainA of protein1 against our database. <br/>\nWith these variables we can now execute the blast and save the to \"blastresults\". <br/>\nThe result is an instance of type \"BlastResult\". <br/>\n\n```{python}\n\nquerySequence = ChainA.protein_sequence\nqueryID = ChainA.globalID\nblastresults = blast.execute_query_seq(sequenceID = queryID, sequence = querySequence)\ntype(blastresults)\n# <class 'SBILib.external.blast.BlastResult.BlastResult'>\n\n```\n\n\n### 3) View results\nThere are several ways that we can view the results of the blast. <br/>\nFirstly we can print out a compacted version of all of the results. <br/>\nWe could also focus to select points of interest (eg. hit_id and evalue). <br/>\n\n```{python}\n\nblastresults.print_compacted_blast()\n# query and target info, e-value, coverage etc. \n\nfor hit in blastresults.get_hits():\n\tprint(hit.sequenceID, \" \", hit.e_value)\n# list of all hits and their associated e-values.\n\n```\n\nThe compacted blast output is divided into the following collumns:<br/>\nQuery ID, Query length, Target ID, Accession length, Identities, Positives, Gaps, e_value , Query sequence, Target sequence, Format Positions<br/>\nThe varying outputs here have the following meaning:<br/>\n\nQuery ID = ID of provided sequence<br/>\nQuery length = length of provided sequence<br/>\nTarget ID = id of the hit sequence<br/>\nAccession length = length of the hit sequence<br/>\nIdentities = number of identical amino acids between the query and target<br/>\nPositives = The number of amino acids that are either identical between the query and the subject sequence or have similar chemical properties<br/>\nGaps = number of gaps in the alignment<br/>\ne_value = the expected number of hits expected to be seen by chance given the size of the blast database.<br/>\nQuery sequence = provided sequence<br/>\nTarget sequence = hit sequence<br/>\nFormat Positions = a single string in which fragment definitions will be represented as follows: <br/>\nQuery segment 1 start : Hit segment 1start, Query segment 1 end : Hit segment 1 end,Query segment 2 start : Hit segment 2 start, Query segment 2 end : Hit segment 2 end  etc.<br/>\n\n\n\n### 4) Filter results\nIf we want to evaluate whether the proteins fall within the Rost Curve twilight zone, <br/>\nwe can parse through the list and print it off similarly to how it was done above. <br/>\nIf we want to only consider hits passing the rost evaluation we can pass that as an argument to the \"get_hits\" command <br/>\n\n\n\n```{python}\n\nfor hit in blastresults.get_hits():\n\tprint(hit.sequenceID, \" \", hit.evaluate_Rost_twilight_zone())\n# indication of wich proteins would pass the test\n\nfor hit in blastresults.get_hits(tz_type = \"ID\"):\n\tprint(hit.sequenceID, \" \", hit.evaluate_Rost_twilight_zone())\n# only hits that pass the test will be shown\n\n\n```\n\n\n### 4) PIR \n\nPIR formatted alignments are available for every hit. <br/>\nWe can select hit number 13 with the result parameter <br/>\nWe could add in the filters to grab the PIR of alll of the best results <br/>\n\n```{python}\n\nblastresults.str_PIR(result = 13)\n# PIR alignment\nfor hit in  blastresults.get_hits( evalue = 0, tz_type = \"ID\"):\n\tblastresults.str_PIR(result = hit._num_seq)\n# All PIRs with hits that match the criteria given \n\n```\n\n### 5) Find Results with matching secondary structure\n\nThe secondary structure of a protein plays a vital role in its function.<br/>\nThe SBILib package allows for the quick viewing of how secondary structure is represented in an alignment.<br/>\nAre vital secondary structure elements located in a gap resulting in a homolog with unequal function?<br/>\nHere we demonstrate how this can be analyzed.<br/>\n\n```{python}\n\nblastresults.str_PIR(result = 13)\n\n\n#>P1;1A3Q_A\n#sequence:1A3Q_A:2:.:285:.:.:.:.:.\n#PYLVIVEQPKQRGFRFRYGCEGPSHGGLPGASSEKGRKTYPTVKICNYEGPAKIEVDLVT\n#HSDPPRAHAHSLVGKQCSELGICAVSVGPKDMTAQFNNLGVLHVTKKNMMGTMIQKLQRQ\n#RLRSRPQGLTEAEQRELEQEAKELKKVMDLSIVRLRFSAFL------RSLPLKPVISQPI\n#HDSKSPGASNLKISRMDKTAGSVRGGDEVYLLCDKVQKDDIEVRFYEDDENGWQAFGDFS\n#PTDVHKQYAIVFRTPPYHKMKIERPVTVFLQLKRKRGGDVSDSKQFTYYP*\n#>P1;REL\n#structureX:REL:16:AVIRE:289:AVIRE:.:.:.:.\n#PYIEIFEQPRQRGTRFRYKCEGRSAGSIPGEHSTDNNKTFPSIQILNYFGKVKIRTTLVT\n#KNEPYKPHPHDLVGKGCRD-GYYEAEFGPERQVLSFQNLGIQCVKKKDLKESISLRISK-\n#--KINPFNVPEEQLHNIDE--------YDLNVVRLCFQAFLPDEHGNYTLALPPLISNPI\n#YDNRAPNTAELRICRVNKNCGSVKGGDEIFLLCDKVQKDDIEVRFVLGN---WEAKGSFS\n#QADVHRQVAIVFRTPPFLG-DITEPITVKMQLRRPSDQAVSEPVDFRYLP*\n\n\n```\nThis will output a PIR format alignment of the two proteins. Dashes in the target protein (second sequence of the PIR) indicate gaps in the alignment. We are interested in seeing what secondary structure elements (if any) may have gone missing as a result of these gaps.  \n```{python}\n\nstring = blastresults.str_PIR(result = 13)\nindices = [i for i, c in enumerate(\"\".join(string.split(\">\")[2].split(\"\\n\")[2:]))if c == \"-\"]\ni = 0    \nstring= \"\"\nfor char in ChainA.gapped_protein_secondary_structure:\n\tif i in indices:\n\t\tstring = string + char\n\telse:\n\t\tstring = string + \"-\"\n\ti+=1\n\n\nlines = textwrap.wrap(string,60)\nprint(\"\\n\".join(lines))\n\n#------------------------------------------------------------\n#-------------------T---------------------------------------\n#HHH-----------------HHHHHHHT--------------------------------\n#--------------------------------------------------SSS-------\n#----------------------------------------------------\n\n```\n\nHere we can see what secondary structures go missing as a result of the alignment.<br/>\nTo remind ourselves of the query proteins secondary structure lets look at it again:\n\n```{python}\nlines = textwrap.wrap(ChainA.gapped_protein_secondary_structure, 60)\nprint(\"\\n\".join(lines))\n\n\n#--EEEEEE-B-SSSB--EETTT-S-S----BS---TT-----EEEEET--\n#SSEEEEEEEE-SSSS--B-SSEEEETTB-TTS-EEEEE-SS--EEE---EEEEE--\n#TTTHHHHHHHHHHHHHTTTS-S---HHHHHHHHHHHHHHHTT--TTEEEEEEEEEE-\n#xxxxxx-EE---EE---EEBTTSTTTS---EEEES-SEEETT---EEEEEESS--\n#TTTEEEEEEE-SSS-EEEE-B--GGGEETTTEEEEE----S-TT-SS-EEEEEEEEETTT\n#--B---EEEEEE-\n\n\n\n```\nWe can see that for this Homolog according to our alignment half of a 4-turn helix is missing in our target protein.<br/> \nShould this element be of functional significance we would have to continue our search for a more suitable homolog.<br/> \n\n\n## Task 4\n\n### 1) Complex\nFor this exercise we will use the protein DNA complex that we used in the previous tasks. <br/>\nIn order to extract the interfaces from a protein complex we will first have to create an object of the complex class. <br/> \nIn createing a new complex class we need to provide the protein complex. <br/>\nWe can grab the individual chains of the complex straight from this class using the .pdb method if we wanted. <br/>\n\n```{python}\n\nfrom SBILib.structure import Complex\nfrom SBILib.structure import PDB\nprotein = PDB(\"pdb1a3q.ent.gz\")\ncomplex = Complex(protein)\ncomplex.pdb\n# PDB object\n\n\n```\nHere we have used the default definitions for contacts within the complex which are as follows.\n\n```{python}\nComplex(pdb, biomolecule=False, PPI=True, PPI_type='cb', PPI_distance=12, PNI=True, PNI_type='min', PNI_distance=8, PHI=True, PHI_type='min', PHI_distance=6)\n```\nProtein Protein interaction:<br/>\n\ttype: beta carbons<br/>\n\tdistance: 12 Angstrom<br/>\n\t\nProtein Nucleotide interaction:<br/>\n\ttype: minimum<br/>\n\tdistance: 8 Angstrom<br/>\n\nProtein Heteroatom interaction:<br/>\n\ttype: minimum<br/>\n\tdistance: 6 Angstrom<br/>\n\n\nHowever these can be altered:\n```{python}\n\ncomplex = Complex(protein, PPI_type='ca', PPI_distance=14)\n\n```\n\nThe available contact types are:<br/>\nProtein Protein interaction:<br/>\n\tminimum (the distance between the closest pairs of atoms of each residue)<br/>\n\talpha carbons (distance between the alpha carbons of the two residues)<br/>\n\tbeta carbons (distance between the beta carbons of the two residues)<br/>\n\tgeometric (distance between the geometric centers of the two residues)<br/>\n\tbackbone (distance between the center of the backbone atoms between the two residues)<br/>\nProtein Nucleotide interaction:<br/>\n\tminimum<br/>\n\tgeometric<br/>\n\tbackbone<br/>\nProtein Heteroatom interaction:<br/>\n\tminimum<br/>\n\tgeometric<br/>\n\n\n\n### 2) PPI\nThe interfaces are already calculated and present within the Complex object. <br/>\nThe results of the Protein Protein Interfaces are stored in a list and accessible with the PPInterfaces method <br/>\nWe can count the number of interfaces produced by looking at the length of the list (here just 1) <br/>\nTo access the interfaces we can simply index the list <br/>\nThe results can quickly be summarized by converting to a string. <br/>\nAlternatively we can enter the interface for more detailed info. <br/>\nThe interface contains information on all of the contacts in the form of objects of the contact class. We can find the amino acid identifiers of all of the contact residues by iterating through the interface object (here we simply selected the first element of the list because there are no more interfaces). <br/>\nWe can then print each amino acids position within its chain and the single letter code identifier as well as the coordinates for the alpha carbon of that amino acid. <br/>\n(aminoacid1 is the first chain provided and aminoacid2 is the second chain/interactor) <br/>\nWe can then print the geometric distance between the two amino acids. <br/> \n\n\n```{python}\nprint(complex.PPInterfaces[0].toString())\n#Chain1, Chain2, threshold_type(default cb), threshold distance. \n\nlen(complex.PPInterfaces)\n# 1\n\ncomplex.PPInterfaces[0]\n# Interface object\n\nfor contact in complex.PPInterfaces[0].contacts:\n\tprint(contact.aminoacid1.identifier, contact.aminoacid1.single_letter, contact.aminoacid1.ca.coordinates)\n\tprint(contact.aminoacid2.identifier, contact.aminoacid2.single_letter, contact.aminoacid2.ca.coordinates)\n\tprint(contact.geometric_distance)\n# complete list of the contacts, coordinates and distances in this interface\n\n```\n\n\n\n### 2) PNI\n\nWe can calculate Protein Nucleotide interfaces as well. <br/>\nThe interfaces are accessible with the PNInterfaces method. <br/> \nWhen we take a look at the interface list we see that this time we have multiple results. <br/>\nThese can be accessed by indexing or iterated through. <br/>\nWe can then gather the position and identifiers for the amino acid and nucleotides. As well as the alpha carbon coordinate in the case of the amino acid and the phosphate coordinate in the case of the nucleotide. <br/>\n\n\n```{python}\nlen(complex.PNInterfaces)\n# 4 \n\nfor contact in complex.PNInterfaces[0].contacts:\n\tprint(contact.nucleotide.identifier, contact.nucleotide.single_letter, contact.nucleotide.p.coordinates)\n\tprint(contact.aminoacid.identifier, contact.aminoacid.single_letter, contact.aminoacid.ca.coordinates)\n\tprint(contact.geometric_distance)\n# complete list of the contacts, coordinates and distances in this interface\n\n```\n\n### 3) PHI\n\nIn order to find the Protein Heteroatoms interface you must first check whether your chains contains heteroatoms. <br/>\nThis can be checked with the has_heteroatoms method from the chain class. <br/>\nI will load in a pdb file that I know has heteroatoms <br/>\nThe PHInterface will mix the residue types between the two chains. <br/>\nFor this reason we will find one that is of type aminoacid so that we can check the type within the contact object. <br/>\nNow that we know which is which we can control for the classes in the iterations. <br/>\nIn the case of an amino acid residue we can print the same information from the previous examples. <br/>\nIn the case that the residue is a heteroatom we will print the identifier, all of the residues atoms (which contain the coordinats as well), we can also print the coordinates seperately with the all_coordinates method. <br/>\n\n\n\n```{python}\nfor chain in protein.chains:\n\tprint(chain.has_heteroatoms)\n# False\n# False\n# False\n# False\n\n\nprotein = PDB(\"1a2w.pdb\")\nfor chain in protein.chains:\n\tprint(chain.has_heteroatoms)\n# True\n# True\n\ncomplex = Complex(protein)\nlen(complex.PHInterfaces)\n# 1\n\ncomplex.PHInterfaces[0].contacts\n# list of contacts\n\ntype(list(complex.PHInterfaces[0].contacts)[0].aminoacid)\n# type of the first residue\n\ntype(list(complex.PHInterfaces[0].contacts)[0].heteroatom)\n# type of the second Residue which will be opposite of the first (heteroatom residue vs aminoacid)\n\n\nfor contact in complex.PHInterfaces[0].contacts:\n\tif (contact.aminoacid.mode == \"ATOM\"):\n\t\tprint(contact.aminoacid.identifier, contact.aminoacid.single_letter, contact.aminoacid.ca.coordinates)\n\t\tprint(contact.heteroatom.identifier, contact.heteroatom.atoms, contact.heteroatom.all_coordinates)\n\t\tprint(contact.geometric_distance)\n\telse:\n\t\tprint(contact.aminoacid.identifier, contact.aminoacid.atoms, contact.aminoacid.all_coordinates)\n\t\tprint(contact.heteroatom.identifier, contact.heteroatom.single_letter, contact.heteroatom.ca.coordinates)\n\t\tprint(contact.geometric_distance)\n# complete list of the contacts, coordinates and distances in this interface\n\n\n```\n\n\n## Task 5\n\n### Protein loop geometry\n\nWe will load in the pdb we have worked with before and have a look at the loop geometry in its chain \"A\".\n\n```{python}\n\nprotein = PDB(\"pdb1a3q.ent.gz\")\nChainA = protein.get_chain_by_id(\"A\")\n\n```\n\nIn order to visualize the loop geometry we need to first have an idea of the proteins secondary structure. <br/>\nLike before this is done with DSSP and can be called within the SBILib library. <br/>\nOnce the secondary structures are resolved we can calculate the loop geometry. <br/>\nIn the SBILib library loops are refered to as archs. <br/>\n\n```{python}\n\nChainA.calculate_dssp()\nChainA.calculate_archs()\nChainA.archs\n\n#[**Secondary Structure Relation:\n#STR1:\t( E )  39  <--  6 -->  44 \n#\tf11:  43  cmf11:              [-3.84483333 62.7425     28.42483333] eigf11:              [ 0.34916202 -0.59272239 -0.72578651]\n#STR2:\t( E )  54  <--  2 -->  55 \n#\tf44:  55  cmf44:              [ 7.85366667 71.586       7.988     ] eigf44:              [ 0.04348626  0.87377397 -0.48438414]\n#ARCH TYPE: BK\n#INTERNAL SS: 0\n#DISTANCE: 22.978424484459143\n#THETA: 98.69428 RHO: 74.28570 DELTA: 58.26752\n#**\n\n\n\n```\nThe output of the archs command is a list of the loop geometries calculated for the given chain (ChainA)<br/>\nWe get the first secondary structure that flanks the loop. In parentheses the structure type (see Task2.3) it's start, length and end.<br/>\nAn axis for flanking secondary structures 1 defined based on the shortest of the principal moments of inertia for that structure. <br/>\nThe same information is printed out for the second flanking structure.<br/>\nThe arch Type is printed:<br/>\nBK: beta-link<br/>\nBN: beta-hairpin<br/>\nEG: beta-helix310<br/>\nEH: beta-alpha helix<br/>\nGE: helix310-beta<br/>\nGG: helix310-helix310<br/>\nGH: helix310-alpha helix<br/>\nHE: alpha helix-beta<br/>\nHG: alpha helix-helix310<br/>\nHH: alpha helix-alpha helix<br/>\n\nINTERNAL SS = the number of secondary structures contained in the loop.<br/>\nDistance = The euclidean distance between the beginning and end of the loop.<br/>\nTheta: the angle between the axes of secondary structure 1 and 2.<br/>\nDelta: the angle between the axis of secondary structure 1 and the vector of distance between beginning and end of loop.<br/>\nRho: the angle between the secondary structure 2 axis and the plane that contains the axis of secondary structure 1.<br/>\n\n\nTo get larger loops which may contain secondary structures:\n\n```{python}\n\nChainA.superarchs\n\n```\n\nThe output follows the same format as the previous.\n\n\n\n## Task 6\n\n### Loop Grafting\n\nWe will show how SBILib may be used to graft the loop of one protein into the structure of another protein with similar loop geometry. \n\n```{python}\n\nnew = PDBlink()\npath = new.get_PDB(\"2ram\")\nprotein2 = PDB(path)\nreport = protein.compare_loops(protein2)\nreport[0]\n\n# 'QueryProt:A,176,182:TargetProt:A,160,166:VISQPIH,VLSHPIF'\n\n```\nThese Commands took the protein previously defined (1a3q here QueryProt) and for each loop within that protein searched for loops in the TargetProt (2ram) with matching geometries. These pairs are saved in a list (here defined by the variable report). Each element (loop matches) of that list is composed of a string with the format seen above. QueryProt:A refers to the first loop (from the query protein) being located on chain A. 176,182 the start and end positions of that loop respectively. TargetProt:A the second loop is found on chain A of the target protein. 160,166 our start and end positions. VISQPIH,VLSHPIF are the sequences for the Query protein and Target protein loops respectively. \nWe define loop geometries to be similar if all of the following conditions are met:<br/>\n\n```{python}\nRho = math.sqrt((loop1.rho - loop2.rho)**2) <= 10 \nDelta = math.sqrt((loop1.delta - loop2.delta)**2) <= 5\nTheta = math.sqrt((loop1.theta - loop2.theta)**2) <= 5\nDistance = math.sqrt((loop1.cartesian_distance - loop2.cartesian_distance)**2) <= 0.5 \n\n```\nThat is to say if the difference between each type of angle and distance is less than the thresholds of (10 Rho angle, 5 Delta angle, 5 Theta angle, 0.5 Cartesian distance)<br/>\n\n\n\nNow we tell the protein to graft the target protein loop into the query protein using the first match found in the previous output list. Note that if the loops are of unequal length this will not work. \n```{python}\n\ngraft = self.protein.graft(self.protein2, report[0])\n\n```\n\n\n\n\n## Details on classes\nIf you wish to check the functions available for any class, they are available with pytons dir method. <br/>\nfirst create an instance of the class you wish to inspect. <br/>\nthen call the dir method on that instance. What follows is a list of all of the functions of the PDB class <br/>\nto get more detailed information call the help methdod. <br/>\n\n```{python}\n\nprotein = PDB(\"pdb1a3q.ent.gz\")\ndir(protein)\n\n#['FASTA_IDX', 'FASTA_format', 'IDX_format', 'PDB_format', '_COMPND', '_NMR', '_NMR_chains', '__abstractmethods__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_apply_matrix', '_biomol_id', '_chain_id', '_chains', '_cif_file', '_get_chain_position_by_id', '_has_nucl', '_has_prot', '_header', '_pdb_file', '_read_PDB_file', '_read_mmCIF_file', '_write_PDB_file', 'add_chain', 'add_chains', 'all_models', 'apply_biomolecule_matrices', 'apply_symmetry_matrices', 'biomolecule_identifier', 'chain_exists', 'chain_identifiers', 'chains', 'clean', 'dehydrate', 'dump', 'duplicate', 'fuse_chains', 'get_chain_by_id', 'has_nucleotide', 'has_protein', 'header', 'id', 'is_NMR', 'is_all_ca', 'load', 'non_standard_chains', 'nucleotides', 'pdb_file', 'proteins', 'read', 'repeated_chain_ids', 'reposition', 'rotate', 'tmpclean', 'translate', 'write']\n\nhelp(protein)\n\n```\n\nThis is the result:\n\n### PDB\nclass PDB(SBILib.beans.StorableObject.StorableObject) <br/>\n\tA {PDB} is a collection of {Chain} <br/>\n\nMethods defined here: <br/>\n\n |  FASTA_IDX(self, protein=True, nucleotide=False) <br/>\n |  <br/>\n |  FASTA_format(self, gapped=True, protein=True, nucleotide=False) <br/>\n |  <br/>\n |  IDX_format(self, protein=True, nucleotide=False) <br/>\n |  <br/>\n |  PDB_format(self, clean=False, terminal=True) <br/>\n |      Strings a {PDB} in PDB format <br/>\n |      @rtype: String <br/>\n |  <br/>\n |  __init__(self, pdb_file=None, dehydrate=False, header=False, onlyheader=False, biomolecule=False) <br/>\n |      @type  pdb_file: String <br/>\n |      @param pdb_file: PDB formated file to read <br/>\n |      <br/>\n |      @raise IOError if pdb_file does not exist and it is not an empty object <br/>\n |  __len__(self) <br/>\n |      # OVERRIDE DEFAULT METHODS <br/>\n |  <br/>\n |  add_chain(self, chain, NMR=False) <br/>\n |      Adds a new chain to the PDB <br/>\n |  <br/>\n |  add_chains(self, chains, NMR=False) <br/>\n |      Adds a new chains to the PDB <br/>\n |  <br/>\n |  apply_biomolecule_matrices(self, keepchains=False, water=True) <br/>\n |      Only works if the PDB file is an original PDB file or <br/>\n |      the matrices have been added in the correct PDB format <br/>\n |      @rtype: {PDB} <br/>\n |  <br/>\n |  apply_symmetry_matrices(self) <br/>\n |      Only works if the PDB file is an original PDB file <br/>\n |      or the matrices have been added in the correct PDB format <br/>\n |      @rtype: {PDB} <br/>\n |  <br/>\n |  chain_exists(self, chain) <br/>\n |      Confirms if a given chain exists in the PDB <br/>\n |      @rtype: Boolean <br/>\n |  <br/>\n |  clean(self) <br/>\n |  <br/>\n |  dehydrate(self) <br/>\n |      # METHODS <br/>\n |  <br/>\n |  duplicate(self, hetero=True, water=False, NMR=False) <br/>\n |      Returns a {PDB} identical to the original but as a new object <br/>\n |      @rtype: {PDB} <br/>\n |  fuse_chains(self, chains_ids) <br/>\n |      Fuses several chains into the first one. <br/>\n |      It will not allow to fuse different structural chains. <br/>\n |      It does not alter the {PDB}, but provides a new one <br/>\n |      @rtype: {Chain} <br/>\n |      <br/>\n |      @raise AttributeError if: <br/>\n |          a) A given chain ID is not present <br/>\n |          b) Try to fuse different structural chains <br/>\n |  <br/>\n |  get_chain_by_id(self, id) <br/>\n |      Returns a chain according to its id or None if no chain with that id is found <br/>\n |      @rtype: {Chain} <br/>\n |  <br/>\n |  reposition(self, matrix=None, vector=None) <br/>\n |      Rotates and Translates each {Chain} according to a matrix and a translational vector <br/>\n |      <br/>\n |      @type matrix: numpy.matrix <br/>\n |      <br/>\n |      @type vector: numpy.array <br/>\n |  <br/>\n |  rotate(self, matrix=None) <br/>\n |      Rotates each {Chain} according to a given matrix <br/>\n |      <br/>\n |      @type matrix: numpy.matrix <br/>\n |  <br/>\n |  tmpclean(self, cluster_by_alternative_id=False) <br/>\n |      Makes a clean version of the PDB, rechaining in order and renumbering atoms. <br/>\n |      Renumbering residues is optional <br/>\n |  <br/>\n |  translate(self, vector=None) <br/>\n |      Translates each {Chain} according to a translational vector <br/>\n |      <br/>\n |      @type vector: numpy.array <br/>\n |  <br/>\n |  write(self, output_file=None, format='PDB', force=False, clean=False) <br/>\n |      Writes the object in a specific format <br/>\n |      <br/>\n |      @type  output_file: String <br/>\n |      @param output_file: File to write <br/>\n |      <br/>\n |      @type  format: String <br/>\n |      @param format: Format of the file to print <br/>\n |  <br/>\n |  ---------------------------------------------------------------------- <br/>\n |  Static methods defined here: <br/>\n |  <br/>\n |  read(input_file, format='PDB') <br/>\n |      Reads a file of data in a specific format and returns the object <br/>\n |      <br/>\n |      @type  input_file: String <br/>\n |      @param input_file: File to read <br/>\n |      <br/>\n |      @type  format: String <br/>\n |      @param format: Format of the file to read <br/>\n |  <br/>\n |  ---------------------------------------------------------------------- <br/>\n\n\n\n\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "",
    "version": "0.3.8",
    "project_urls": {
        "Homepage": "https://github.com/structuralbioinformatics/SBI"
    },
    "split_keywords": [
        "structural bioinformatics",
        "loops"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9038e18bb5920c6152d02ec04681d0b489ffa48b8c0e059aef2b79db15c1e343",
                "md5": "397e8a75c211e699a9a8f27db795850b",
                "sha256": "3ce6c1a124ccd4e69bc893976204f3675429e1cbe08d4421409ddd3396622b84"
            },
            "downloads": -1,
            "filename": "SBILib-0.3.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "397e8a75c211e699a9a8f27db795850b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 93544871,
            "upload_time": "2024-02-08T10:27:22",
            "upload_time_iso_8601": "2024-02-08T10:27:22.202958Z",
            "url": "https://files.pythonhosted.org/packages/90/38/e18bb5920c6152d02ec04681d0b489ffa48b8c0e059aef2b79db15c1e343/SBILib-0.3.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-08 10:27:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "structuralbioinformatics",
    "github_project": "SBI",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sbilib"
}
        
Elapsed time: 0.17995s