# pyabpoa: abPOA Python interface
## Introduction
pyabpoa provides an easy-to-use interface to [abPOA](https://github.com/yangao07/abPOA), it contains all the APIs that can be used to perform MSA for a set of sequences and consensus calling from the final alignment graph.
## Installation
### Install pyabpoa with pip
pyabpoa can be installed with pip:
```
pip install pyabpoa
```
### Install pyabpoa from source
Alternatively, you can install pyabpoa from source (cython is required):
```
git clone --recursive https://github.com/yangao07/abPOA.git
cd abPOA
make install_py
```
## Examples
The following code illustrates how to use pyabpoa.
```
import pyabpoa as pa
a = pa.msa_aligner()
seqs=[
'CCGAAGA',
'CCGAACTCGA',
'CCCGGAAGA',
'CCGAAGA'
]
res=a.msa(seqs, out_cons=True, out_msa=True) # perform multiple sequence alignment
for seq in res.cons_seq:
print(seq) # print consensus sequence
res.print_msa() # print row-column multiple sequence alignment in PIR format
```
You can also try the example script provided in the source folder:
```
python ./python/example.py
```
## APIs
### Class pyabpoa.msa_aligner
```
pyabpoa.msa_aligner(aln_mode='g', ...)
```
This constructs a multiple sequence alignment handler of pyabpoa, it accepts the following arguments:
* **aln_mode**: alignment mode. 'g': global, 'l': local, 'e': extension; default: **'g'**
* **is_aa**: input is amino acid sequence; default: **False**
* **match**: match score; default: **2**
* **mismatch**: match penaty; default: **4**
* **score_matrix**: scoring matrix file, **match** and **mismatch** are not used when **score_matrix** is used; default: **''**
* **gap_open1**: first gap opening penalty; default: **4**
* **gap_ext1**: first gap extension penalty; default: **2**
* **gap_open2**: second gap opening penalty; default: **24**
* **gap_ext2**: second gap extension penalty; default: **1**
* **extra_b**: first adaptive banding paremeter; set as < 0 to disable adaptive banded DP; default: **10**
* **extra_f**: second adaptive banding paremete; the number of extra bases added on both sites of the band is *b+f\*L*, where *L* is the length of the aligned sequence; default : **0.01**
* **cons_algrm**: consensus calling algorithm. 'HB': heaviest bunlding, 'MF': most frequent bases; default: **'HB'**
The `msa_aligner` handler provides one method which performs multiple sequence alignment and takes four arguments:
```
pyabpoa.msa_aligner.msa(seqs, out_cons, out_msa, out_pog='', incr_fn='')
```
* **seqs**: a list variable containing a set of input sequences; **positional**
* **out_cons**: a bool variable to ask pyabpoa to generate consensus sequence; **positional**
* **out_msa**: a bool variable to ask pyabpoa to generate RC-MSA; **positional**
* **max_n_cons**: maximum number of consensus sequence to generate; default: **1**
* **min_freq**: minimum frequency of each consensus to output (effective when **max_n_cons** > 1); default: **0.3**
* **out_pog**: name of a file (`.png` or `.pdf`) to store the plot of the final alignment graph; default: **''**
* **incr_fn**: name of an existing graph (GFA) or MSA (FASTA) file, incrementally align sequence to this graph/MSA; default: **''**
### Class pyabpoa.msa_result
```
pyabpoa.msa_result(seq_n, cons_n, cons_len, ...)
```
This class describes the information of the generated consensus sequence and the RC-MSA. The returned result of `pyabpoa.msa_aligner.msa()` is an object of this class that has the following properties:
* **n_seq**: number of input aligned sequences
* **n_cons**: number of generated consensus sequences (generally 1, could be 2 or more if **max_n_cons** is set as > 1)
* **clu_n_seq**: an array of sequence cluster size
* **cons_len**: an array of consensus sequence length(s)
* **cons_seq**: an array of consensus sequence(s)
* **cons_cov**: an array of consensus sequence coverage for each base
* **msa_len**: size of each row in the RC-MSA
* **msa_seq**: an array containing `n_seq`+`n_cons` strings that demonstrates the RC-MSA, each consisting of one input sequence and several `-` indicating the alignment gaps.
`pyabpoa.msa_result()` has a function of `print_msa` which prints the RC-MSA to screen.
```
pyabpoa.msa_result().print_msa()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/yangao07/abPOA",
"name": "pyabpoa",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "multiple-sequence-alignment partial-order-graph-alignment",
"author": "Yan Gao",
"author_email": "yangao@ds.dfci.harvard.edu",
"download_url": "https://files.pythonhosted.org/packages/6e/b8/2ec2e44c82b8011e4179b1da91748d232789a3f8a8b7b8bec1d61c39cac3/pyabpoa-1.5.3.tar.gz",
"platform": null,
"description": "# pyabpoa: abPOA Python interface\n## Introduction\npyabpoa provides an easy-to-use interface to [abPOA](https://github.com/yangao07/abPOA), it contains all the APIs that can be used to perform MSA for a set of sequences and consensus calling from the final alignment graph.\n\n## Installation\n\n### Install pyabpoa with pip\n\npyabpoa can be installed with pip:\n\n```\npip install pyabpoa\n```\n\n### Install pyabpoa from source\nAlternatively, you can install pyabpoa from source (cython is required):\n```\ngit clone --recursive https://github.com/yangao07/abPOA.git\ncd abPOA\nmake install_py\n```\n\n## Examples\nThe following code illustrates how to use pyabpoa.\n```\nimport pyabpoa as pa\na = pa.msa_aligner()\nseqs=[\n'CCGAAGA',\n'CCGAACTCGA',\n'CCCGGAAGA',\n'CCGAAGA'\n]\nres=a.msa(seqs, out_cons=True, out_msa=True) # perform multiple sequence alignment \n\nfor seq in res.cons_seq:\n print(seq) # print consensus sequence\n\nres.print_msa() # print row-column multiple sequence alignment in PIR format\n```\nYou can also try the example script provided in the source folder:\n```\npython ./python/example.py\n```\n\n\n## APIs\n\n### Class pyabpoa.msa_aligner\n```\npyabpoa.msa_aligner(aln_mode='g', ...)\n```\nThis constructs a multiple sequence alignment handler of pyabpoa, it accepts the following arguments:\n\n* **aln_mode**: alignment mode. 'g': global, 'l': local, 'e': extension; default: **'g'**\n* **is_aa**: input is amino acid sequence; default: **False**\n* **match**: match score; default: **2**\n* **mismatch**: match penaty; default: **4**\n* **score_matrix**: scoring matrix file, **match** and **mismatch** are not used when **score_matrix** is used; default: **''**\n* **gap_open1**: first gap opening penalty; default: **4**\n* **gap_ext1**: first gap extension penalty; default: **2**\n* **gap_open2**: second gap opening penalty; default: **24**\n* **gap_ext2**: second gap extension penalty; default: **1**\n* **extra_b**: first adaptive banding paremeter; set as < 0 to disable adaptive banded DP; default: **10**\n* **extra_f**: second adaptive banding paremete; the number of extra bases added on both sites of the band is *b+f\\*L*, where *L* is the length of the aligned sequence; default : **0.01**\n* **cons_algrm**: consensus calling algorithm. 'HB': heaviest bunlding, 'MF': most frequent bases; default: **'HB'**\n\nThe `msa_aligner` handler provides one method which performs multiple sequence alignment and takes four arguments:\n```\npyabpoa.msa_aligner.msa(seqs, out_cons, out_msa, out_pog='', incr_fn='')\n```\n\n* **seqs**: a list variable containing a set of input sequences; **positional**\n* **out_cons**: a bool variable to ask pyabpoa to generate consensus sequence; **positional**\n* **out_msa**: a bool variable to ask pyabpoa to generate RC-MSA; **positional**\n* **max_n_cons**: maximum number of consensus sequence to generate; default: **1**\n* **min_freq**: minimum frequency of each consensus to output (effective when **max_n_cons** > 1); default: **0.3**\n* **out_pog**: name of a file (`.png` or `.pdf`) to store the plot of the final alignment graph; default: **''**\n* **incr_fn**: name of an existing graph (GFA) or MSA (FASTA) file, incrementally align sequence to this graph/MSA; default: **''**\n\n### Class pyabpoa.msa_result\n```\npyabpoa.msa_result(seq_n, cons_n, cons_len, ...)\n```\nThis class describes the information of the generated consensus sequence and the RC-MSA. The returned result of `pyabpoa.msa_aligner.msa()` is an object of this class that has the following properties:\n\n* **n_seq**: number of input aligned sequences\n* **n_cons**: number of generated consensus sequences (generally 1, could be 2 or more if **max_n_cons** is set as > 1)\n* **clu_n_seq**: an array of sequence cluster size\n* **cons_len**: an array of consensus sequence length(s)\n* **cons_seq**: an array of consensus sequence(s)\n* **cons_cov**: an array of consensus sequence coverage for each base\n* **msa_len**: size of each row in the RC-MSA\n* **msa_seq**: an array containing `n_seq`+`n_cons` strings that demonstrates the RC-MSA, each consisting of one input sequence and several `-` indicating the alignment gaps. \n\n`pyabpoa.msa_result()` has a function of `print_msa` which prints the RC-MSA to screen.\n\n```\npyabpoa.msa_result().print_msa()\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "pyabpoa: SIMD-based partial order alignment using adaptive band",
"version": "1.5.3",
"project_urls": {
"Homepage": "https://github.com/yangao07/abPOA"
},
"split_keywords": [
"multiple-sequence-alignment",
"",
"partial-order-graph-alignment"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6eb82ec2e44c82b8011e4179b1da91748d232789a3f8a8b7b8bec1d61c39cac3",
"md5": "dc42f963d0029461b4225fbdc88b9ba9",
"sha256": "94714bb5c6be9f5ca35b66a5c63490237ebff2498ff93b82a842a9512b0bbc08"
},
"downloads": -1,
"filename": "pyabpoa-1.5.3.tar.gz",
"has_sig": false,
"md5_digest": "dc42f963d0029461b4225fbdc88b9ba9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 690103,
"upload_time": "2024-09-18T19:35:59",
"upload_time_iso_8601": "2024-09-18T19:35:59.174362Z",
"url": "https://files.pythonhosted.org/packages/6e/b8/2ec2e44c82b8011e4179b1da91748d232789a3f8a8b7b8bec1d61c39cac3/pyabpoa-1.5.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-18 19:35:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yangao07",
"github_project": "abPOA",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"lcname": "pyabpoa"
}