pywfa


Namepywfa JSON
Version 0.5.1 PyPI version JSON
download
home_page
SummaryAlign sequences using WFA2-lib
upload_time2023-06-15 10:37:33
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            =====
pyWFA
=====

A python wrapper for wavefront alignment using `WFA2-lib
<https://github.com/smarco/WFA2-lib/>`_

Installation
------------

To download from pypi::

    pip install pywfa

From conda::

    conda install -c bioconda pywfa

Build from source::

    git clone https://github.com/kcleal/pywfa
    cd pywfa
    pip install .

Overview
--------

Alignment of pattern and text strings can be performed by accessing WFA2-lib functions directly:

.. code-block:: python

    from pywfa import WavefrontAligner

    pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
    text =    "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
    a = WavefrontAligner(pattern)
    score = a.wavefront_align(text)
    assert a.status == 0  # alignment was successful
    assert a.cigarstring == "3M1X4M1D7M1I9M1X6M"
    assert a.score == -24
    a.cigartuples
    >>> [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)]
    a.cigar_print_pretty()

.. code-block:: text

    >>> 3M1X4M1D7M1I9M1X6M      ALIGNMENT
        1X1D1I1X      ALIGNMENT.COMPACT
        PATTERN    TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
                   ||| |||| ||||||| ||||||||| ||||||
        TEXT       TCTATACT-GCGCGTTTGGAGAAATAAAATAGT

The output of cigar_pretty_print can be directed to a file, rather than stdout using:

.. code-block:: python

    a.cigar_print_pretty("file.txt")

To obtain a python str of this print out, access the results object (see below).

Cigartuples follow the convention:

.. list-table::
   :widths: 15 15
   :header-rows: 1

   * - Operation
     - Code
   * - M
     - 0
   * - I
     - 1
   * - D
     - 2
   * - N
     - 3
   * - S
     - 4
   * - H
     - 5
   * - =
     - 7
   * - X
     - 8
   * - B
     - 9

For convenience, a results object can be obtained by calling the `WavefrontAligner` with a pattern and text:

.. code-block:: python

    pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
    text =    "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
    a = WavefrontAligner(pattern)
    result = a(text)  # alignment result
    result.__dict__
    >>> {'pattern_length': 32, 'text_length': 32, 'pattern_start': 0, 'pattern_end': 32, 'text_start': 0, 'text_end': 32, 'cigartuples': [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)], 'score': -24, 'pattern': 'TCTTTACTCGCGCGTTGGAGAAATACAATAGT', 'text': 'TCTATACTGCGCGTTTGGAGAAATAAAATAGT', 'status': 0}

    # Alignment can also be called with a pattern like this:
    a(text, pattern)

    # obtain a string in the same format as cigar_print_pretty
    a.pretty
    >>> 3M1X4M1D7M1I9M1X6M      ALIGNMENT
        1X1D1I1X      ALIGNMENT.COMPACT
              PATTERN    TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
                         |||*|||| ||||||| |||||||||*||||||
              TEXT       TCTATACT-GCGCGTTTGGAGAAATAAAATAGT


Configure
---------
To configure the `WaveFrontAligner`, options can be provided during initialization:


.. code-block:: python

    from pywfa import WavefrontAligner

    a = WavefrontAligner(scope="score",
                         distance="affine2p",
                         span="end-to-end",
                         heuristic="adaptive")

Supported distance metrics are "affine" (default) and "affine2p". Scope can be "full" (default)
or "score". Span can be "ends-free" (default) or "end-to-end". Heuristic can be None (default),
"adaptive" or "X-drop".

When using heuristic functions it is recommended to check the status attribute:


.. code-block:: python

    pattern = "AAAAACCTTTTTAAAAAA"
    text = "GGCCAAAAACCAAAAAA"
    a = WavefrontAligner(heuristic="adaptive")
    a(pattern, text)
    a.status
    >>> 0   # successful alignment, -1 indicates the alignment was stopped due to the heuristic


Default options
---------------

The `WavefrontAligner` will be initialized with the following default options:

.. list-table::
   :widths: 15 10
   :header-rows: 1

   * - Parameter
     - Default value
   * - pattern
     - None
   * - distance
     - "affine"
   * - match
     - 0
   * - gap_opening
     - 6
   * - gep_extension
     - 2
   * - gap_opening2
     - 24
   * - gap_extension2
     - 1
   * - scope
     - "full"
   * - span
     - "ends-free"
   * - pattern_begin_free
     - 0
   * - pattern_end_free
     - 0
   * - text_begin_free
     - 0
   * - text_end_free
     - 0
   * - heuristic
     - None
   * - min_wavefront_length
     - 10
   * - max_distance_threshold
     - 50
   * - steps_between_cutoffs
     - 1
   * - xdrop
     - 20


Modifying the cigar
-------------------

If desired the cigar can be modified so the end operation is either a soft-clip or a match, this makes the
alignment cigar resemble those produced by bwa, for example:

.. code-block:: python

    pattern = "AAAAACCTTTTTAAAAAA"
    text = "GGCCAAAAACCAAAAAA"
    a = WavefrontAligner(pattern)

    res = a(text, clip_cigar=False)
    print(cigartuples_to_str(res.cigartuples))
    >>> 4I7M5D6M

    res(text, clip_cigar=True)
    print(cigartuples_to_str(res.cigartuples))
    >>> 4S7M5D6M


An experimental feature is to trim short matches at the end of alignments. This results in alignments that approximate local alignments:

.. code-block:: python

    pattern = "AAAAAAAAAAAACCTTTTAAAAAAGAAAAAAA"
    text = "ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA"
    a = WavefrontAligner(pattern)

    # The unmodified cigar may have short matches at the end:
    res = a(text, clip_cigar=False)
    res.cigartuples
    >>> [(0, 1), (1, 5), (8, 6), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
    res.aligned_text
    >>> ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA
    res.text_start, res.text_end
    >>> 0, 32

    # The minimum allowed block of matches can be set at e.g. 5 bp, which will trim off short matches
    res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5)
    res.cigartuples
    >>> [(4, 12), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
    res.aligned_text
    >>> AAAAACCAAAAAAAAAAAAA
    res.text_start, res.text_end
    >>> 12, 32

    # Mismatch operations X can also be elided, note this occurs after the clip_cigar stage
    res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5, elide_mismatches=True)
    res.cigartuples
    >>> [(4, 12), (0, 7), (2, 5), (0, 13)]
    res.aligned_text
    >>> AAAAACCAAAAAAAAAAAAA

Notes: The alignment score is not modified currently by trimming the cigar, however the pattern_start, pattern_end,
test_start and text_end are modified when the cigar is modified.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pywfa",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "Kez Cleal <clealk@cardiff.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/93/40/ec1c77237515eb618ba7407d4ac954b9dd127808a303cd85b0928f8dc12a/pywfa-0.5.1.tar.gz",
    "platform": null,
    "description": "=====\npyWFA\n=====\n\nA python wrapper for wavefront alignment using `WFA2-lib\n<https://github.com/smarco/WFA2-lib/>`_\n\nInstallation\n------------\n\nTo download from pypi::\n\n    pip install pywfa\n\nFrom conda::\n\n    conda install -c bioconda pywfa\n\nBuild from source::\n\n    git clone https://github.com/kcleal/pywfa\n    cd pywfa\n    pip install .\n\nOverview\n--------\n\nAlignment of pattern and text strings can be performed by accessing WFA2-lib functions directly:\n\n.. code-block:: python\n\n    from pywfa import WavefrontAligner\n\n    pattern = \"TCTTTACTCGCGCGTTGGAGAAATACAATAGT\"\n    text =    \"TCTATACTGCGCGTTTGGAGAAATAAAATAGT\"\n    a = WavefrontAligner(pattern)\n    score = a.wavefront_align(text)\n    assert a.status == 0  # alignment was successful\n    assert a.cigarstring == \"3M1X4M1D7M1I9M1X6M\"\n    assert a.score == -24\n    a.cigartuples\n    >>> [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)]\n    a.cigar_print_pretty()\n\n.. code-block:: text\n\n    >>> 3M1X4M1D7M1I9M1X6M      ALIGNMENT\n        1X1D1I1X      ALIGNMENT.COMPACT\n        PATTERN    TCTTTACTCGCGCGTT-GGAGAAATACAATAGT\n                   ||| |||| ||||||| ||||||||| ||||||\n        TEXT       TCTATACT-GCGCGTTTGGAGAAATAAAATAGT\n\nThe output of cigar_pretty_print can be directed to a file, rather than stdout using:\n\n.. code-block:: python\n\n    a.cigar_print_pretty(\"file.txt\")\n\nTo obtain a python str of this print out, access the results object (see below).\n\nCigartuples follow the convention:\n\n.. list-table::\n   :widths: 15 15\n   :header-rows: 1\n\n   * - Operation\n     - Code\n   * - M\n     - 0\n   * - I\n     - 1\n   * - D\n     - 2\n   * - N\n     - 3\n   * - S\n     - 4\n   * - H\n     - 5\n   * - =\n     - 7\n   * - X\n     - 8\n   * - B\n     - 9\n\nFor convenience, a results object can be obtained by calling the `WavefrontAligner` with a pattern and text:\n\n.. code-block:: python\n\n    pattern = \"TCTTTACTCGCGCGTTGGAGAAATACAATAGT\"\n    text =    \"TCTATACTGCGCGTTTGGAGAAATAAAATAGT\"\n    a = WavefrontAligner(pattern)\n    result = a(text)  # alignment result\n    result.__dict__\n    >>> {'pattern_length': 32, 'text_length': 32, 'pattern_start': 0, 'pattern_end': 32, 'text_start': 0, 'text_end': 32, 'cigartuples': [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)], 'score': -24, 'pattern': 'TCTTTACTCGCGCGTTGGAGAAATACAATAGT', 'text': 'TCTATACTGCGCGTTTGGAGAAATAAAATAGT', 'status': 0}\n\n    # Alignment can also be called with a pattern like this:\n    a(text, pattern)\n\n    # obtain a string in the same format as cigar_print_pretty\n    a.pretty\n    >>> 3M1X4M1D7M1I9M1X6M      ALIGNMENT\n        1X1D1I1X      ALIGNMENT.COMPACT\n              PATTERN    TCTTTACTCGCGCGTT-GGAGAAATACAATAGT\n                         |||*|||| ||||||| |||||||||*||||||\n              TEXT       TCTATACT-GCGCGTTTGGAGAAATAAAATAGT\n\n\nConfigure\n---------\nTo configure the `WaveFrontAligner`, options can be provided during initialization:\n\n\n.. code-block:: python\n\n    from pywfa import WavefrontAligner\n\n    a = WavefrontAligner(scope=\"score\",\n                         distance=\"affine2p\",\n                         span=\"end-to-end\",\n                         heuristic=\"adaptive\")\n\nSupported distance metrics are \"affine\" (default) and \"affine2p\". Scope can be \"full\" (default)\nor \"score\". Span can be \"ends-free\" (default) or \"end-to-end\". Heuristic can be None (default),\n\"adaptive\" or \"X-drop\".\n\nWhen using heuristic functions it is recommended to check the status attribute:\n\n\n.. code-block:: python\n\n    pattern = \"AAAAACCTTTTTAAAAAA\"\n    text = \"GGCCAAAAACCAAAAAA\"\n    a = WavefrontAligner(heuristic=\"adaptive\")\n    a(pattern, text)\n    a.status\n    >>> 0   # successful alignment, -1 indicates the alignment was stopped due to the heuristic\n\n\nDefault options\n---------------\n\nThe `WavefrontAligner` will be initialized with the following default options:\n\n.. list-table::\n   :widths: 15 10\n   :header-rows: 1\n\n   * - Parameter\n     - Default value\n   * - pattern\n     - None\n   * - distance\n     - \"affine\"\n   * - match\n     - 0\n   * - gap_opening\n     - 6\n   * - gep_extension\n     - 2\n   * - gap_opening2\n     - 24\n   * - gap_extension2\n     - 1\n   * - scope\n     - \"full\"\n   * - span\n     - \"ends-free\"\n   * - pattern_begin_free\n     - 0\n   * - pattern_end_free\n     - 0\n   * - text_begin_free\n     - 0\n   * - text_end_free\n     - 0\n   * - heuristic\n     - None\n   * - min_wavefront_length\n     - 10\n   * - max_distance_threshold\n     - 50\n   * - steps_between_cutoffs\n     - 1\n   * - xdrop\n     - 20\n\n\nModifying the cigar\n-------------------\n\nIf desired the cigar can be modified so the end operation is either a soft-clip or a match, this makes the\nalignment cigar resemble those produced by bwa, for example:\n\n.. code-block:: python\n\n    pattern = \"AAAAACCTTTTTAAAAAA\"\n    text = \"GGCCAAAAACCAAAAAA\"\n    a = WavefrontAligner(pattern)\n\n    res = a(text, clip_cigar=False)\n    print(cigartuples_to_str(res.cigartuples))\n    >>> 4I7M5D6M\n\n    res(text, clip_cigar=True)\n    print(cigartuples_to_str(res.cigartuples))\n    >>> 4S7M5D6M\n\n\nAn experimental feature is to trim short matches at the end of alignments. This results in alignments that approximate local alignments:\n\n.. code-block:: python\n\n    pattern = \"AAAAAAAAAAAACCTTTTAAAAAAGAAAAAAA\"\n    text = \"ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA\"\n    a = WavefrontAligner(pattern)\n\n    # The unmodified cigar may have short matches at the end:\n    res = a(text, clip_cigar=False)\n    res.cigartuples\n    >>> [(0, 1), (1, 5), (8, 6), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]\n    res.aligned_text\n    >>> ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA\n    res.text_start, res.text_end\n    >>> 0, 32\n\n    # The minimum allowed block of matches can be set at e.g. 5 bp, which will trim off short matches\n    res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5)\n    res.cigartuples\n    >>> [(4, 12), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]\n    res.aligned_text\n    >>> AAAAACCAAAAAAAAAAAAA\n    res.text_start, res.text_end\n    >>> 12, 32\n\n    # Mismatch operations X can also be elided, note this occurs after the clip_cigar stage\n    res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5, elide_mismatches=True)\n    res.cigartuples\n    >>> [(4, 12), (0, 7), (2, 5), (0, 13)]\n    res.aligned_text\n    >>> AAAAACCAAAAAAAAAAAAA\n\nNotes: The alignment score is not modified currently by trimming the cigar, however the pattern_start, pattern_end,\ntest_start and text_end are modified when the cigar is modified.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Align sequences using WFA2-lib",
    "version": "0.5.1",
    "project_urls": {
        "Repository": "https://github.com/kcleal/pywfa"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9340ec1c77237515eb618ba7407d4ac954b9dd127808a303cd85b0928f8dc12a",
                "md5": "9a09de286de428aea98b3f0fc3ca9b35",
                "sha256": "e972bf53f9e6d8957e9105ecc22cf704ac4bfad4d882d79c82f11fc260381483"
            },
            "downloads": -1,
            "filename": "pywfa-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9a09de286de428aea98b3f0fc3ca9b35",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 3496881,
            "upload_time": "2023-06-15T10:37:33",
            "upload_time_iso_8601": "2023-06-15T10:37:33.414047Z",
            "url": "https://files.pythonhosted.org/packages/93/40/ec1c77237515eb618ba7407d4ac954b9dd127808a303cd85b0928f8dc12a/pywfa-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-15 10:37:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kcleal",
    "github_project": "pywfa",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "pywfa"
}
        
Elapsed time: 0.48395s