mappy


Namemappy JSON
Version 2.28 PyPI version JSON
download
home_pagehttps://github.com/lh3/minimap2
SummaryMinimap2 python binding
upload_time2024-03-27 15:13:20
maintainerNone
docs_urlNone
authorHeng Li
requires_pythonNone
licenseMIT
keywords sequence-alignment
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ==============================
Mappy: Minimap2 Python Binding
==============================

Mappy provides a convenient interface to `minimap2
<https://github.com/lh3/minimap2>`_, a fast and accurate C program to align
genomic and transcribe nucleotide sequences.

Installation
------------

Mappy depends on `zlib <http://zlib.net>`_. It can be installed with `pip
<https://en.wikipedia.org/wiki/Pip_(package_manager)>`_:

.. code:: shell

	pip install --user mappy

or from the minimap2 github repo (`Cython <http://cython.org>`_ required):

.. code:: shell

	git clone https://github.com/lh3/minimap2
	cd minimap2
	python setup.py install

Usage
-----

The following Python script demonstrates the key functionality of mappy:

.. code:: python

	import mappy as mp
	a = mp.Aligner("test/MT-human.fa")  # load or build index
	if not a: raise Exception("ERROR: failed to load/build index")
	s = a.seq("MT_human", 100, 200)     # retrieve a subsequence from the index
	print(mp.revcomp(s))                # reverse complement
	for name, seq, qual in mp.fastx_read("test/MT-orang.fa"): # read a fasta/q sequence
		for hit in a.map(seq): # traverse alignments
			print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))

APIs
----

Mappy implements two classes and two global function.

Class mappy.Aligner
~~~~~~~~~~~~~~~~~~~

.. code:: python

	mappy.Aligner(fn_idx_in=None, preset=None, ...)

This constructor accepts the following arguments:

* **fn_idx_in**: index or sequence file name. Minimap2 automatically tests the
  file type. If a sequence file is provided, minimap2 builds an index. The
  sequence file can be optionally gzip'd. This option has no effect if **seq**
  is set.

* **seq**: a single sequence to index. The sequence name will be set to
  :code:`N/A`.

* **preset**: minimap2 preset. Currently, minimap2 supports the following
  presets: **sr** for single-end short reads; **map-pb** for PacBio
  read-to-reference mapping; **map-ont** for Oxford Nanopore read mapping;
  **splice** for long-read spliced alignment; **asm5** for assembly-to-assembly
  alignment; **asm10** for full genome alignment of closely related species. Note
  that the Python module does not support all-vs-all read overlapping.

* **k**: k-mer length, no larger than 28

* **w**: minimizer window size, no larger than 255

* **min_cnt**: mininum number of minimizers on a chain

* **min_chain_score**: minimum chaing score

* **bw**: chaining and alignment band width (initial chaining and extension)

* **bw_long**: chaining and alignment band width (RMQ-based rechaining and closing gaps)

* **best_n**: max number of alignments to return

* **n_threads**: number of indexing threads; 3 by default

* **extra_flags**: additional flags defined in minimap.h

* **fn_idx_out**: name of file to which the index is written. This parameter
  has no effect if **seq** is set.

* **scoring**: scoring system. It is a tuple/list consisting of 4, 6 or 7
  positive integers. The first 4 elements specify match scoring, mismatch
  penalty, gap open and gap extension penalty. The 5th and 6th elements, if
  present, set long-gap open and long-gap extension penalty. The 7th sets a
  mismatch penalty involving ambiguous bases.

.. code:: python

	mappy.Aligner.map(seq, seq2=None, cs=False, MD=False)

This method aligns :code:`seq` against the index. It is a generator, *yielding*
a series of :code:`mappy.Alignment` objects. If :code:`seq2` is present, mappy
performs paired-end alignment, assuming the two ends are in the FR orientation.
Alignments of the two ends can be distinguished by the :code:`read_num` field
(see Class mappy.Alignment below). Argument :code:`cs` asks mappy to generate
the :code:`cs` tag; :code:`MD` is similar. These two arguments might slightly
degrade performance and are not enabled by default.

.. code:: python

	mappy.Aligner.seq(name, start=0, end=0x7fffffff)

This method retrieves a (sub)sequence from the index and returns it as a Python
string. :code:`None` is returned if :code:`name` is not present in the index or
the start/end coordinates are invalid.

.. code:: python

	mappy.Aligner.seq_names

This property gives the array of sequence names in the index.

Class mappy.Alignment
~~~~~~~~~~~~~~~~~~~~~

This class describes an alignment. An object of this class has the following
properties:

* **ctg**: name of the reference sequence the query is mapped to

* **ctg_len**: total length of the reference sequence

* **r_st** and **r_en**: start and end positions on the reference

* **q_st** and **q_en**: start and end positions on the query

* **strand**: +1 if on the forward strand; -1 if on the reverse strand

* **mapq**: mapping quality

* **blen**: length of the alignment, including both alignment matches and gaps
  but excluding ambiguous bases.

* **mlen**: length of the matching bases in the alignment, excluding ambiguous
  base matches.

* **NM**: number of mismatches, gaps and ambiguous positions in the alignment

* **trans_strand**: transcript strand. +1 if on the forward strand; -1 if on the
  reverse strand; 0 if unknown

* **is_primary**: if the alignment is primary (typically the best and the first
  to generate)

* **read_num**: read number that the alignment corresponds to; 1 for the first
  read and 2 for the second read

* **cigar_str**: CIGAR string

* **cigar**: CIGAR returned as an array of shape :code:`(n_cigar,2)`. The two
  numbers give the length and the operator of each CIGAR operation.

* **MD**: the :code:`MD` tag as in the SAM format. It is an empty string unless
  the :code:`MD` argument is applied when calling :code:`mappy.Aligner.map()`.

* **cs**: the :code:`cs` tag.

An :code:`Alignment` object can be converted to a string with :code:`str()` in
the following format:

::

	q_st  q_en  strand  ctg  ctg_len  r_st  r_en  mlen  blen  mapq  cg:Z:cigar_str

It is effectively the PAF format without the QueryName and QueryLength columns
(the first two columns in PAF).

Miscellaneous Functions
~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

	mappy.fastx_read(fn, read_comment=False)

This generator function opens a FASTA/FASTQ file and *yields* a
:code:`(name,seq,qual)` tuple for each sequence entry. The input file may be
optionally gzip'd. If :code:`read_comment` is True, this generator yields
a :code:`(name,seq,qual,comment)` tuple instead.

.. code:: python

	mappy.revcomp(seq)

Return the reverse complement of DNA string :code:`seq`. This function
recognizes IUB code and preserves the letter cases. Uracil :code:`U` is
complemented to :code:`A`.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lh3/minimap2",
    "name": "mappy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "sequence-alignment",
    "author": "Heng Li",
    "author_email": "lh3@me.com",
    "download_url": "https://files.pythonhosted.org/packages/90/48/9818341371f20a2644267a1769aaecb7642f74839192afad278a38e0fe14/mappy-2.28.tar.gz",
    "platform": null,
    "description": "==============================\nMappy: Minimap2 Python Binding\n==============================\n\nMappy provides a convenient interface to `minimap2\n<https://github.com/lh3/minimap2>`_, a fast and accurate C program to align\ngenomic and transcribe nucleotide sequences.\n\nInstallation\n------------\n\nMappy depends on `zlib <http://zlib.net>`_. It can be installed with `pip\n<https://en.wikipedia.org/wiki/Pip_(package_manager)>`_:\n\n.. code:: shell\n\n\tpip install --user mappy\n\nor from the minimap2 github repo (`Cython <http://cython.org>`_ required):\n\n.. code:: shell\n\n\tgit clone https://github.com/lh3/minimap2\n\tcd minimap2\n\tpython setup.py install\n\nUsage\n-----\n\nThe following Python script demonstrates the key functionality of mappy:\n\n.. code:: python\n\n\timport mappy as mp\n\ta = mp.Aligner(\"test/MT-human.fa\")  # load or build index\n\tif not a: raise Exception(\"ERROR: failed to load/build index\")\n\ts = a.seq(\"MT_human\", 100, 200)     # retrieve a subsequence from the index\n\tprint(mp.revcomp(s))                # reverse complement\n\tfor name, seq, qual in mp.fastx_read(\"test/MT-orang.fa\"): # read a fasta/q sequence\n\t\tfor hit in a.map(seq): # traverse alignments\n\t\t\tprint(\"{}\\t{}\\t{}\\t{}\".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))\n\nAPIs\n----\n\nMappy implements two classes and two global function.\n\nClass mappy.Aligner\n~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n\tmappy.Aligner(fn_idx_in=None, preset=None, ...)\n\nThis constructor accepts the following arguments:\n\n* **fn_idx_in**: index or sequence file name. Minimap2 automatically tests the\n  file type. If a sequence file is provided, minimap2 builds an index. The\n  sequence file can be optionally gzip'd. This option has no effect if **seq**\n  is set.\n\n* **seq**: a single sequence to index. The sequence name will be set to\n  :code:`N/A`.\n\n* **preset**: minimap2 preset. Currently, minimap2 supports the following\n  presets: **sr** for single-end short reads; **map-pb** for PacBio\n  read-to-reference mapping; **map-ont** for Oxford Nanopore read mapping;\n  **splice** for long-read spliced alignment; **asm5** for assembly-to-assembly\n  alignment; **asm10** for full genome alignment of closely related species. Note\n  that the Python module does not support all-vs-all read overlapping.\n\n* **k**: k-mer length, no larger than 28\n\n* **w**: minimizer window size, no larger than 255\n\n* **min_cnt**: mininum number of minimizers on a chain\n\n* **min_chain_score**: minimum chaing score\n\n* **bw**: chaining and alignment band width (initial chaining and extension)\n\n* **bw_long**: chaining and alignment band width (RMQ-based rechaining and closing gaps)\n\n* **best_n**: max number of alignments to return\n\n* **n_threads**: number of indexing threads; 3 by default\n\n* **extra_flags**: additional flags defined in minimap.h\n\n* **fn_idx_out**: name of file to which the index is written. This parameter\n  has no effect if **seq** is set.\n\n* **scoring**: scoring system. It is a tuple/list consisting of 4, 6 or 7\n  positive integers. The first 4 elements specify match scoring, mismatch\n  penalty, gap open and gap extension penalty. The 5th and 6th elements, if\n  present, set long-gap open and long-gap extension penalty. The 7th sets a\n  mismatch penalty involving ambiguous bases.\n\n.. code:: python\n\n\tmappy.Aligner.map(seq, seq2=None, cs=False, MD=False)\n\nThis method aligns :code:`seq` against the index. It is a generator, *yielding*\na series of :code:`mappy.Alignment` objects. If :code:`seq2` is present, mappy\nperforms paired-end alignment, assuming the two ends are in the FR orientation.\nAlignments of the two ends can be distinguished by the :code:`read_num` field\n(see Class mappy.Alignment below). Argument :code:`cs` asks mappy to generate\nthe :code:`cs` tag; :code:`MD` is similar. These two arguments might slightly\ndegrade performance and are not enabled by default.\n\n.. code:: python\n\n\tmappy.Aligner.seq(name, start=0, end=0x7fffffff)\n\nThis method retrieves a (sub)sequence from the index and returns it as a Python\nstring. :code:`None` is returned if :code:`name` is not present in the index or\nthe start/end coordinates are invalid.\n\n.. code:: python\n\n\tmappy.Aligner.seq_names\n\nThis property gives the array of sequence names in the index.\n\nClass mappy.Alignment\n~~~~~~~~~~~~~~~~~~~~~\n\nThis class describes an alignment. An object of this class has the following\nproperties:\n\n* **ctg**: name of the reference sequence the query is mapped to\n\n* **ctg_len**: total length of the reference sequence\n\n* **r_st** and **r_en**: start and end positions on the reference\n\n* **q_st** and **q_en**: start and end positions on the query\n\n* **strand**: +1 if on the forward strand; -1 if on the reverse strand\n\n* **mapq**: mapping quality\n\n* **blen**: length of the alignment, including both alignment matches and gaps\n  but excluding ambiguous bases.\n\n* **mlen**: length of the matching bases in the alignment, excluding ambiguous\n  base matches.\n\n* **NM**: number of mismatches, gaps and ambiguous positions in the alignment\n\n* **trans_strand**: transcript strand. +1 if on the forward strand; -1 if on the\n  reverse strand; 0 if unknown\n\n* **is_primary**: if the alignment is primary (typically the best and the first\n  to generate)\n\n* **read_num**: read number that the alignment corresponds to; 1 for the first\n  read and 2 for the second read\n\n* **cigar_str**: CIGAR string\n\n* **cigar**: CIGAR returned as an array of shape :code:`(n_cigar,2)`. The two\n  numbers give the length and the operator of each CIGAR operation.\n\n* **MD**: the :code:`MD` tag as in the SAM format. It is an empty string unless\n  the :code:`MD` argument is applied when calling :code:`mappy.Aligner.map()`.\n\n* **cs**: the :code:`cs` tag.\n\nAn :code:`Alignment` object can be converted to a string with :code:`str()` in\nthe following format:\n\n::\n\n\tq_st  q_en  strand  ctg  ctg_len  r_st  r_en  mlen  blen  mapq  cg:Z:cigar_str\n\nIt is effectively the PAF format without the QueryName and QueryLength columns\n(the first two columns in PAF).\n\nMiscellaneous Functions\n~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n\tmappy.fastx_read(fn, read_comment=False)\n\nThis generator function opens a FASTA/FASTQ file and *yields* a\n:code:`(name,seq,qual)` tuple for each sequence entry. The input file may be\noptionally gzip'd. If :code:`read_comment` is True, this generator yields\na :code:`(name,seq,qual,comment)` tuple instead.\n\n.. code:: python\n\n\tmappy.revcomp(seq)\n\nReturn the reverse complement of DNA string :code:`seq`. This function\nrecognizes IUB code and preserves the letter cases. Uracil :code:`U` is\ncomplemented to :code:`A`.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Minimap2 python binding",
    "version": "2.28",
    "project_urls": {
        "Homepage": "https://github.com/lh3/minimap2"
    },
    "split_keywords": [
        "sequence-alignment"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90489818341371f20a2644267a1769aaecb7642f74839192afad278a38e0fe14",
                "md5": "cf1c5b88cfdcb1b3cad75f67725c1851",
                "sha256": "0ebf7a5d62bd668f5456028215e26176e180ca68161ac18d4f7b48045484cebb"
            },
            "downloads": -1,
            "filename": "mappy-2.28.tar.gz",
            "has_sig": false,
            "md5_digest": "cf1c5b88cfdcb1b3cad75f67725c1851",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 136481,
            "upload_time": "2024-03-27T15:13:20",
            "upload_time_iso_8601": "2024-03-27T15:13:20.206877Z",
            "url": "https://files.pythonhosted.org/packages/90/48/9818341371f20a2644267a1769aaecb7642f74839192afad278a38e0fe14/mappy-2.28.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-27 15:13:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lh3",
    "github_project": "minimap2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mappy"
}
        
Elapsed time: 0.52131s