massgenotyping


Namemassgenotyping JSON
Version 0.2.2 PyPI version JSON
download
home_page
SummaryPython package for microsatellite genotyping from amplicon sequencing data
upload_time2024-01-12 03:39:47
maintainer
docs_urlNone
authorTetsuo Kohyama
requires_python>=3.8
licenseMIT
keywords ngs amplicon sequencing genotyping microsatellite
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ==============
massgenotyping
==============

.. image:: https://badge.fury.io/py/massgenotyping.svg
    :target: https://badge.fury.io/py/massgenotyping
    :alt: PyPI version

.. image:: https://img.shields.io/pypi/pyversions/massgenotyping.svg
    :target: https://pypi.org/project/massgenotyping
    :alt: Python versions

.. image:: https://img.shields.io/pypi/l/massgenotyping.svg
    :target: https://pypi.org/project/massgenotyping
    :alt: License


Python package for microsatellite genotyping from highly multiplexed amplicon sequencing data


Features
--------

* Semi-automatic genotyping optimized for amplicon sequencing data of microsatellite loci

* Visual genotyping with interactive plots

* Fast SSR search in sequences

* Automatic grouping and naming of alleles based on polymorphisms in both SSR and non-SSR regions

* Support for multi-core processing


Requirements
------------

* Python 3.8 or higher

* `NGmerge <https://github.com/jsh58/NGmerge>`_

* `MAFFT <https://mafft.cbrc.jp/alignment/software/>`_

* BLASTn (included in `BLAST+ <https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download>`_ command line applications provided by NCBI)

* Optional: `ripgrep <https://github.com/BurntSushi/ripgrep>`_


Installation
------------

Install from PyPI

.. code:: bash

    pip install massgenotyping

Install the latest version from a Git repository

.. code:: bash

    pip install git+https://github.com/kohyamat/massgenotyping


Usage
-----

.. code:: bash

    mgt [-h] SUBCOMMAND [OPTIONS]

**Subcommand list:**

* :code:`demultiplex`: demultiplex raw amplicon sequences based on primer sequences

* :code:`merge-pairs`: merge paired-end reads

* :code:`denoise`: reduce any noise that may have been generated during sequencing and PCR

* :code:`filter`: filtering for erroneous sequence variants and screening for putative alleles

* :code:`allele-check`: check allele candidates and create an allele database

* :code:`allele-call`: assign alleles to raw amplicon sequences

* :code:`show-alignment`: show a sequence alingment

The details of the options for each subcommand can be checked by :code:`mgt SUBCOMMAND -h`.


Tutorials with example data
---------------------------

Here's a step-by-step tutorial using the `example data <https://github.com/kohyamat/massgenotyping/tree/master/examples>`_.

**1. Demultiplex raw amplicon sequences based on primer sequences**

As a first step, the sequence data is split based on the primer sequence. 
The input can be one or two sequence files in the FASTQ format, or a directory containing multiple sequence files.
Primer sequences can be read from CSV or FASTA files.
Please check the example data for the format of the input data.

.. code:: bash

    mgt demultiplex examples/sequence_data -g "*_R[12]_*" -m examples/marker_data.csv

The result files are written in subdirectories within the output directory (:code:`./project` by default) for each marker.

**2. Merge paired-end reads and trim primer sequecnes**

For the paired-end sequencing data, the respective sequence pairs are merged using NGmerge program.
The following command removes the the primer sequences after merging sequence pairs.

.. code:: bash

    mgt merge-pairs ./project -m examples/marker_data.csv --trim-primer

For single-end data, this step can be skipped. The removal of the primer sequence can also be performed in the step 1.

**3. Reduce noise (optional but recommended)**

This step corrects any noise (very low-frequency point mutations) that may have been generated during sequencing or PCR.
This step is not necessarily required, but it will make the following step easier.

.. code:: bash

    mgt denoise ./project/*/*_merged.fastq.gz

**4. Filter out erroneous sequence variants**

In this step, the sequence of putative alleles is extracted for each marker in each sample,
while removing any erroneous sequence variants, such as 'stutter' sequences.
After some rough filtering, an interactive plot allows you to choose which sequence variants to keep.
You can skip this visual-checking procedure with the :code:`--force-no-visual-check` option.

.. code:: bash

    mgt filter ./project -m examples/marker_data.csv

**5. Check a multiple sequence alignment and make an allele database**

The database is created after checking the alignment of the putative allele sequences.
If necessary, you can further filter out the erroneous sequence variants.

.. code:: bash

    mgt allele-check ./project


**6. Assign alleles to raw amplicon sequences**

Finally, the following command perform a BLASTn search against the database created for each marker and assign alleles to the raw sequence data.
The genotype tables are created within the output directory.

.. code:: bash

    mgt allele-call ./project -m examples/marker_data.csv

Screenshots
-----------

.. image:: https://user-images.githubusercontent.com/6261781/78501753-205e3280-7798-11ea-98ce-32a4f631bb05.png
   :scale: 50%
   :alt: Figure 1

**Figure 1.** Checking the multiple sequence alignment across the samples (*STEP 5*).

.. image:: https://user-images.githubusercontent.com/6261781/78501825-877be700-7798-11ea-8382-3b991a42502f.png
   :scale: 50%
   :alt: Figure 2

**Figure 2.** Visual genotyping (*STEP 6*).


Contributing to massgenotyping
------------------------------

Contributions of any kind are welcome!


License
-------

`MIT <LICENSE>`_

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "massgenotyping",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "NGS,amplicon sequencing,genotyping,microsatellite",
    "author": "Tetsuo Kohyama",
    "author_email": "tetsuo.kohyama@gmil.com",
    "download_url": "https://files.pythonhosted.org/packages/75/65/0fc38318b16c6b93f8c67412edd134e10162a30c8b2ee808ddec5829d8f3/massgenotyping-0.2.2.tar.gz",
    "platform": null,
    "description": "==============\nmassgenotyping\n==============\n\n.. image:: https://badge.fury.io/py/massgenotyping.svg\n    :target: https://badge.fury.io/py/massgenotyping\n    :alt: PyPI version\n\n.. image:: https://img.shields.io/pypi/pyversions/massgenotyping.svg\n    :target: https://pypi.org/project/massgenotyping\n    :alt: Python versions\n\n.. image:: https://img.shields.io/pypi/l/massgenotyping.svg\n    :target: https://pypi.org/project/massgenotyping\n    :alt: License\n\n\nPython package for microsatellite genotyping from highly multiplexed amplicon sequencing data\n\n\nFeatures\n--------\n\n* Semi-automatic genotyping optimized for amplicon sequencing data of microsatellite loci\n\n* Visual genotyping with interactive plots\n\n* Fast SSR search in sequences\n\n* Automatic grouping and naming of alleles based on polymorphisms in both SSR and non-SSR regions\n\n* Support for multi-core processing\n\n\nRequirements\n------------\n\n* Python 3.8 or higher\n\n* `NGmerge <https://github.com/jsh58/NGmerge>`_\n\n* `MAFFT <https://mafft.cbrc.jp/alignment/software/>`_\n\n* BLASTn (included in `BLAST+ <https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download>`_ command line applications provided by NCBI)\n\n* Optional: `ripgrep <https://github.com/BurntSushi/ripgrep>`_\n\n\nInstallation\n------------\n\nInstall from PyPI\n\n.. code:: bash\n\n    pip install massgenotyping\n\nInstall the latest version from a Git repository\n\n.. code:: bash\n\n    pip install git+https://github.com/kohyamat/massgenotyping\n\n\nUsage\n-----\n\n.. code:: bash\n\n    mgt [-h] SUBCOMMAND [OPTIONS]\n\n**Subcommand list:**\n\n* :code:`demultiplex`: demultiplex raw amplicon sequences based on primer sequences\n\n* :code:`merge-pairs`: merge paired-end reads\n\n* :code:`denoise`: reduce any noise that may have been generated during sequencing and PCR\n\n* :code:`filter`: filtering for erroneous sequence variants and screening for putative alleles\n\n* :code:`allele-check`: check allele candidates and create an allele database\n\n* :code:`allele-call`: assign alleles to raw amplicon sequences\n\n* :code:`show-alignment`: show a sequence alingment\n\nThe details of the options for each subcommand can be checked by :code:`mgt SUBCOMMAND -h`.\n\n\nTutorials with example data\n---------------------------\n\nHere's a step-by-step tutorial using the `example data <https://github.com/kohyamat/massgenotyping/tree/master/examples>`_.\n\n**1. Demultiplex raw amplicon sequences based on primer sequences**\n\nAs a first step, the sequence data is split based on the primer sequence. \nThe input can be one or two sequence files in the FASTQ format, or a directory containing multiple sequence files.\nPrimer sequences can be read from CSV or FASTA files.\nPlease check the example data for the format of the input data.\n\n.. code:: bash\n\n    mgt demultiplex examples/sequence_data -g \"*_R[12]_*\" -m examples/marker_data.csv\n\nThe result files are written in subdirectories within the output directory (:code:`./project` by default) for each marker.\n\n**2. Merge paired-end reads and trim primer sequecnes**\n\nFor the paired-end sequencing data, the respective sequence pairs are merged using NGmerge program.\nThe following command removes the the primer sequences after merging sequence pairs.\n\n.. code:: bash\n\n    mgt merge-pairs ./project -m examples/marker_data.csv --trim-primer\n\nFor single-end data, this step can be skipped. The removal of the primer sequence can also be performed in the step 1.\n\n**3. Reduce noise (optional but recommended)**\n\nThis step corrects any noise (very low-frequency point mutations) that may have been generated during sequencing or PCR.\nThis step is not necessarily required, but it will make the following step easier.\n\n.. code:: bash\n\n    mgt denoise ./project/*/*_merged.fastq.gz\n\n**4. Filter out erroneous sequence variants**\n\nIn this step, the sequence of putative alleles is extracted for each marker in each sample,\nwhile removing any erroneous sequence variants, such as 'stutter' sequences.\nAfter some rough filtering, an interactive plot allows you to choose which sequence variants to keep.\nYou can skip this visual-checking procedure with the :code:`--force-no-visual-check` option.\n\n.. code:: bash\n\n    mgt filter ./project -m examples/marker_data.csv\n\n**5. Check a multiple sequence alignment and make an allele database**\n\nThe database is created after checking the alignment of the putative allele sequences.\nIf necessary, you can further filter out the erroneous sequence variants.\n\n.. code:: bash\n\n    mgt allele-check ./project\n\n\n**6. Assign alleles to raw amplicon sequences**\n\nFinally, the following command perform a BLASTn search against the database created for each marker and assign alleles to the raw sequence data.\nThe genotype tables are created within the output directory.\n\n.. code:: bash\n\n    mgt allele-call ./project -m examples/marker_data.csv\n\nScreenshots\n-----------\n\n.. image:: https://user-images.githubusercontent.com/6261781/78501753-205e3280-7798-11ea-98ce-32a4f631bb05.png\n   :scale: 50%\n   :alt: Figure 1\n\n**Figure 1.** Checking the multiple sequence alignment across the samples (*STEP 5*).\n\n.. image:: https://user-images.githubusercontent.com/6261781/78501825-877be700-7798-11ea-8382-3b991a42502f.png\n   :scale: 50%\n   :alt: Figure 2\n\n**Figure 2.** Visual genotyping (*STEP 6*).\n\n\nContributing to massgenotyping\n------------------------------\n\nContributions of any kind are welcome!\n\n\nLicense\n-------\n\n`MIT <LICENSE>`_\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package for microsatellite genotyping from amplicon sequencing data",
    "version": "0.2.2",
    "project_urls": {
        "Documentation": "https://github.com/kohyamat/massgenotyping#readme",
        "Issues": "https://github.com/kohyamat/massgenotyping/issues",
        "Source": "https://github.com/kohyamat/massgenotyping"
    },
    "split_keywords": [
        "ngs",
        "amplicon sequencing",
        "genotyping",
        "microsatellite"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fe05c9f7c84f394f5530b36fb7d628c9d2f15e9feac0433d8258c72d45ec72e5",
                "md5": "affbb5ea340b14661bd07af75e4bcc9b",
                "sha256": "29a6b85f866970e75c4e03cee78d98af503d233ca3ab4ba9f0c230129926a0c5"
            },
            "downloads": -1,
            "filename": "massgenotyping-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "affbb5ea340b14661bd07af75e4bcc9b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 53182,
            "upload_time": "2024-01-12T03:39:44",
            "upload_time_iso_8601": "2024-01-12T03:39:44.906473Z",
            "url": "https://files.pythonhosted.org/packages/fe/05/c9f7c84f394f5530b36fb7d628c9d2f15e9feac0433d8258c72d45ec72e5/massgenotyping-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "75650fc38318b16c6b93f8c67412edd134e10162a30c8b2ee808ddec5829d8f3",
                "md5": "98911fe8ea2621034417aae2073217bd",
                "sha256": "c9cb26e2b2e125104ad7cc42488159d075826afe734cd49803940be9a436e168"
            },
            "downloads": -1,
            "filename": "massgenotyping-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "98911fe8ea2621034417aae2073217bd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 327156,
            "upload_time": "2024-01-12T03:39:47",
            "upload_time_iso_8601": "2024-01-12T03:39:47.026674Z",
            "url": "https://files.pythonhosted.org/packages/75/65/0fc38318b16c6b93f8c67412edd134e10162a30c8b2ee808ddec5829d8f3/massgenotyping-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-12 03:39:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kohyamat",
    "github_project": "massgenotyping#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "massgenotyping"
}
        
Elapsed time: 2.03206s