| Name | pywfa JSON |
| Version |
0.5.1
JSON |
| download |
| home_page | |
| Summary | Align sequences using WFA2-lib |
| upload_time | 2023-06-15 10:37:33 |
| maintainer | |
| docs_url | None |
| author | |
| requires_python | >=3.7 |
| license | |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
=====
pyWFA
=====
A python wrapper for wavefront alignment using `WFA2-lib
<https://github.com/smarco/WFA2-lib/>`_
Installation
------------
To download from pypi::
pip install pywfa
From conda::
conda install -c bioconda pywfa
Build from source::
git clone https://github.com/kcleal/pywfa
cd pywfa
pip install .
Overview
--------
Alignment of pattern and text strings can be performed by accessing WFA2-lib functions directly:
.. code-block:: python
from pywfa import WavefrontAligner
pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
text = "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
a = WavefrontAligner(pattern)
score = a.wavefront_align(text)
assert a.status == 0 # alignment was successful
assert a.cigarstring == "3M1X4M1D7M1I9M1X6M"
assert a.score == -24
a.cigartuples
>>> [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)]
a.cigar_print_pretty()
.. code-block:: text
>>> 3M1X4M1D7M1I9M1X6M ALIGNMENT
1X1D1I1X ALIGNMENT.COMPACT
PATTERN TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
||| |||| ||||||| ||||||||| ||||||
TEXT TCTATACT-GCGCGTTTGGAGAAATAAAATAGT
The output of cigar_pretty_print can be directed to a file, rather than stdout using:
.. code-block:: python
a.cigar_print_pretty("file.txt")
To obtain a python str of this print out, access the results object (see below).
Cigartuples follow the convention:
.. list-table::
:widths: 15 15
:header-rows: 1
* - Operation
- Code
* - M
- 0
* - I
- 1
* - D
- 2
* - N
- 3
* - S
- 4
* - H
- 5
* - =
- 7
* - X
- 8
* - B
- 9
For convenience, a results object can be obtained by calling the `WavefrontAligner` with a pattern and text:
.. code-block:: python
pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
text = "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
a = WavefrontAligner(pattern)
result = a(text) # alignment result
result.__dict__
>>> {'pattern_length': 32, 'text_length': 32, 'pattern_start': 0, 'pattern_end': 32, 'text_start': 0, 'text_end': 32, 'cigartuples': [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)], 'score': -24, 'pattern': 'TCTTTACTCGCGCGTTGGAGAAATACAATAGT', 'text': 'TCTATACTGCGCGTTTGGAGAAATAAAATAGT', 'status': 0}
# Alignment can also be called with a pattern like this:
a(text, pattern)
# obtain a string in the same format as cigar_print_pretty
a.pretty
>>> 3M1X4M1D7M1I9M1X6M ALIGNMENT
1X1D1I1X ALIGNMENT.COMPACT
PATTERN TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
|||*|||| ||||||| |||||||||*||||||
TEXT TCTATACT-GCGCGTTTGGAGAAATAAAATAGT
Configure
---------
To configure the `WaveFrontAligner`, options can be provided during initialization:
.. code-block:: python
from pywfa import WavefrontAligner
a = WavefrontAligner(scope="score",
distance="affine2p",
span="end-to-end",
heuristic="adaptive")
Supported distance metrics are "affine" (default) and "affine2p". Scope can be "full" (default)
or "score". Span can be "ends-free" (default) or "end-to-end". Heuristic can be None (default),
"adaptive" or "X-drop".
When using heuristic functions it is recommended to check the status attribute:
.. code-block:: python
pattern = "AAAAACCTTTTTAAAAAA"
text = "GGCCAAAAACCAAAAAA"
a = WavefrontAligner(heuristic="adaptive")
a(pattern, text)
a.status
>>> 0 # successful alignment, -1 indicates the alignment was stopped due to the heuristic
Default options
---------------
The `WavefrontAligner` will be initialized with the following default options:
.. list-table::
:widths: 15 10
:header-rows: 1
* - Parameter
- Default value
* - pattern
- None
* - distance
- "affine"
* - match
- 0
* - gap_opening
- 6
* - gep_extension
- 2
* - gap_opening2
- 24
* - gap_extension2
- 1
* - scope
- "full"
* - span
- "ends-free"
* - pattern_begin_free
- 0
* - pattern_end_free
- 0
* - text_begin_free
- 0
* - text_end_free
- 0
* - heuristic
- None
* - min_wavefront_length
- 10
* - max_distance_threshold
- 50
* - steps_between_cutoffs
- 1
* - xdrop
- 20
Modifying the cigar
-------------------
If desired the cigar can be modified so the end operation is either a soft-clip or a match, this makes the
alignment cigar resemble those produced by bwa, for example:
.. code-block:: python
pattern = "AAAAACCTTTTTAAAAAA"
text = "GGCCAAAAACCAAAAAA"
a = WavefrontAligner(pattern)
res = a(text, clip_cigar=False)
print(cigartuples_to_str(res.cigartuples))
>>> 4I7M5D6M
res(text, clip_cigar=True)
print(cigartuples_to_str(res.cigartuples))
>>> 4S7M5D6M
An experimental feature is to trim short matches at the end of alignments. This results in alignments that approximate local alignments:
.. code-block:: python
pattern = "AAAAAAAAAAAACCTTTTAAAAAAGAAAAAAA"
text = "ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA"
a = WavefrontAligner(pattern)
# The unmodified cigar may have short matches at the end:
res = a(text, clip_cigar=False)
res.cigartuples
>>> [(0, 1), (1, 5), (8, 6), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
res.aligned_text
>>> ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA
res.text_start, res.text_end
>>> 0, 32
# The minimum allowed block of matches can be set at e.g. 5 bp, which will trim off short matches
res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5)
res.cigartuples
>>> [(4, 12), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
res.aligned_text
>>> AAAAACCAAAAAAAAAAAAA
res.text_start, res.text_end
>>> 12, 32
# Mismatch operations X can also be elided, note this occurs after the clip_cigar stage
res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5, elide_mismatches=True)
res.cigartuples
>>> [(4, 12), (0, 7), (2, 5), (0, 13)]
res.aligned_text
>>> AAAAACCAAAAAAAAAAAAA
Notes: The alignment score is not modified currently by trimming the cigar, however the pattern_start, pattern_end,
test_start and text_end are modified when the cigar is modified.
Raw data
{
"_id": null,
"home_page": "",
"name": "pywfa",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "Kez Cleal <clealk@cardiff.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/93/40/ec1c77237515eb618ba7407d4ac954b9dd127808a303cd85b0928f8dc12a/pywfa-0.5.1.tar.gz",
"platform": null,
"description": "=====\npyWFA\n=====\n\nA python wrapper for wavefront alignment using `WFA2-lib\n<https://github.com/smarco/WFA2-lib/>`_\n\nInstallation\n------------\n\nTo download from pypi::\n\n pip install pywfa\n\nFrom conda::\n\n conda install -c bioconda pywfa\n\nBuild from source::\n\n git clone https://github.com/kcleal/pywfa\n cd pywfa\n pip install .\n\nOverview\n--------\n\nAlignment of pattern and text strings can be performed by accessing WFA2-lib functions directly:\n\n.. code-block:: python\n\n from pywfa import WavefrontAligner\n\n pattern = \"TCTTTACTCGCGCGTTGGAGAAATACAATAGT\"\n text = \"TCTATACTGCGCGTTTGGAGAAATAAAATAGT\"\n a = WavefrontAligner(pattern)\n score = a.wavefront_align(text)\n assert a.status == 0 # alignment was successful\n assert a.cigarstring == \"3M1X4M1D7M1I9M1X6M\"\n assert a.score == -24\n a.cigartuples\n >>> [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)]\n a.cigar_print_pretty()\n\n.. code-block:: text\n\n >>> 3M1X4M1D7M1I9M1X6M ALIGNMENT\n 1X1D1I1X ALIGNMENT.COMPACT\n PATTERN TCTTTACTCGCGCGTT-GGAGAAATACAATAGT\n ||| |||| ||||||| ||||||||| ||||||\n TEXT TCTATACT-GCGCGTTTGGAGAAATAAAATAGT\n\nThe output of cigar_pretty_print can be directed to a file, rather than stdout using:\n\n.. code-block:: python\n\n a.cigar_print_pretty(\"file.txt\")\n\nTo obtain a python str of this print out, access the results object (see below).\n\nCigartuples follow the convention:\n\n.. list-table::\n :widths: 15 15\n :header-rows: 1\n\n * - Operation\n - Code\n * - M\n - 0\n * - I\n - 1\n * - D\n - 2\n * - N\n - 3\n * - S\n - 4\n * - H\n - 5\n * - =\n - 7\n * - X\n - 8\n * - B\n - 9\n\nFor convenience, a results object can be obtained by calling the `WavefrontAligner` with a pattern and text:\n\n.. code-block:: python\n\n pattern = \"TCTTTACTCGCGCGTTGGAGAAATACAATAGT\"\n text = \"TCTATACTGCGCGTTTGGAGAAATAAAATAGT\"\n a = WavefrontAligner(pattern)\n result = a(text) # alignment result\n result.__dict__\n >>> {'pattern_length': 32, 'text_length': 32, 'pattern_start': 0, 'pattern_end': 32, 'text_start': 0, 'text_end': 32, 'cigartuples': [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)], 'score': -24, 'pattern': 'TCTTTACTCGCGCGTTGGAGAAATACAATAGT', 'text': 'TCTATACTGCGCGTTTGGAGAAATAAAATAGT', 'status': 0}\n\n # Alignment can also be called with a pattern like this:\n a(text, pattern)\n\n # obtain a string in the same format as cigar_print_pretty\n a.pretty\n >>> 3M1X4M1D7M1I9M1X6M ALIGNMENT\n 1X1D1I1X ALIGNMENT.COMPACT\n PATTERN TCTTTACTCGCGCGTT-GGAGAAATACAATAGT\n |||*|||| ||||||| |||||||||*||||||\n TEXT TCTATACT-GCGCGTTTGGAGAAATAAAATAGT\n\n\nConfigure\n---------\nTo configure the `WaveFrontAligner`, options can be provided during initialization:\n\n\n.. code-block:: python\n\n from pywfa import WavefrontAligner\n\n a = WavefrontAligner(scope=\"score\",\n distance=\"affine2p\",\n span=\"end-to-end\",\n heuristic=\"adaptive\")\n\nSupported distance metrics are \"affine\" (default) and \"affine2p\". Scope can be \"full\" (default)\nor \"score\". Span can be \"ends-free\" (default) or \"end-to-end\". Heuristic can be None (default),\n\"adaptive\" or \"X-drop\".\n\nWhen using heuristic functions it is recommended to check the status attribute:\n\n\n.. code-block:: python\n\n pattern = \"AAAAACCTTTTTAAAAAA\"\n text = \"GGCCAAAAACCAAAAAA\"\n a = WavefrontAligner(heuristic=\"adaptive\")\n a(pattern, text)\n a.status\n >>> 0 # successful alignment, -1 indicates the alignment was stopped due to the heuristic\n\n\nDefault options\n---------------\n\nThe `WavefrontAligner` will be initialized with the following default options:\n\n.. list-table::\n :widths: 15 10\n :header-rows: 1\n\n * - Parameter\n - Default value\n * - pattern\n - None\n * - distance\n - \"affine\"\n * - match\n - 0\n * - gap_opening\n - 6\n * - gep_extension\n - 2\n * - gap_opening2\n - 24\n * - gap_extension2\n - 1\n * - scope\n - \"full\"\n * - span\n - \"ends-free\"\n * - pattern_begin_free\n - 0\n * - pattern_end_free\n - 0\n * - text_begin_free\n - 0\n * - text_end_free\n - 0\n * - heuristic\n - None\n * - min_wavefront_length\n - 10\n * - max_distance_threshold\n - 50\n * - steps_between_cutoffs\n - 1\n * - xdrop\n - 20\n\n\nModifying the cigar\n-------------------\n\nIf desired the cigar can be modified so the end operation is either a soft-clip or a match, this makes the\nalignment cigar resemble those produced by bwa, for example:\n\n.. code-block:: python\n\n pattern = \"AAAAACCTTTTTAAAAAA\"\n text = \"GGCCAAAAACCAAAAAA\"\n a = WavefrontAligner(pattern)\n\n res = a(text, clip_cigar=False)\n print(cigartuples_to_str(res.cigartuples))\n >>> 4I7M5D6M\n\n res(text, clip_cigar=True)\n print(cigartuples_to_str(res.cigartuples))\n >>> 4S7M5D6M\n\n\nAn experimental feature is to trim short matches at the end of alignments. This results in alignments that approximate local alignments:\n\n.. code-block:: python\n\n pattern = \"AAAAAAAAAAAACCTTTTAAAAAAGAAAAAAA\"\n text = \"ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA\"\n a = WavefrontAligner(pattern)\n\n # The unmodified cigar may have short matches at the end:\n res = a(text, clip_cigar=False)\n res.cigartuples\n >>> [(0, 1), (1, 5), (8, 6), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]\n res.aligned_text\n >>> ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA\n res.text_start, res.text_end\n >>> 0, 32\n\n # The minimum allowed block of matches can be set at e.g. 5 bp, which will trim off short matches\n res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5)\n res.cigartuples\n >>> [(4, 12), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]\n res.aligned_text\n >>> AAAAACCAAAAAAAAAAAAA\n res.text_start, res.text_end\n >>> 12, 32\n\n # Mismatch operations X can also be elided, note this occurs after the clip_cigar stage\n res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5, elide_mismatches=True)\n res.cigartuples\n >>> [(4, 12), (0, 7), (2, 5), (0, 13)]\n res.aligned_text\n >>> AAAAACCAAAAAAAAAAAAA\n\nNotes: The alignment score is not modified currently by trimming the cigar, however the pattern_start, pattern_end,\ntest_start and text_end are modified when the cigar is modified.\n",
"bugtrack_url": null,
"license": "",
"summary": "Align sequences using WFA2-lib",
"version": "0.5.1",
"project_urls": {
"Repository": "https://github.com/kcleal/pywfa"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9340ec1c77237515eb618ba7407d4ac954b9dd127808a303cd85b0928f8dc12a",
"md5": "9a09de286de428aea98b3f0fc3ca9b35",
"sha256": "e972bf53f9e6d8957e9105ecc22cf704ac4bfad4d882d79c82f11fc260381483"
},
"downloads": -1,
"filename": "pywfa-0.5.1.tar.gz",
"has_sig": false,
"md5_digest": "9a09de286de428aea98b3f0fc3ca9b35",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 3496881,
"upload_time": "2023-06-15T10:37:33",
"upload_time_iso_8601": "2023-06-15T10:37:33.414047Z",
"url": "https://files.pythonhosted.org/packages/93/40/ec1c77237515eb618ba7407d4ac954b9dd127808a303cd85b0928f8dc12a/pywfa-0.5.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-15 10:37:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kcleal",
"github_project": "pywfa",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "pywfa"
}