mercat2


Namemercat2 JSON
Version 1.4.1 PyPI version JSON
download
home_pagehttps://github.com/raw-lab/mercat2
Summaryversatile k-mer counter and diversity estimator for database independent property analysis (DIPA) for multi-omic analysis
upload_time2024-03-05 17:41:58
maintainer
docs_urlNone
authorJose L. Figueroa III, Richard A. White III
requires_python>=3.9
licenseBSD License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MerCat2: python code for versatile k-mer counter for database independent property analysis (DIPA) for omic analysis

[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/mercat2/README.html)

![GitHub Logo](https://github.com/raw-lab/mercat2/blob/master/MerCat2.jpg)

## Installing MerCat2

### Option 1: Bioconda Installer

#### Install mamba using conda

```bash
conda activate base
conda install mamba
```

- NOTE: Make sure you install mamba in your base conda environment
- We have found that mamba is faster than conda for installing packages and creating environments. Using conda might fail to resolve dependencies.

#### Install MerCat2

- Available via Bioconda:

```bash
mamba create -n mercat2 -c conda-forge -c bioconda mercat2
conda activate mercat2
```

### Option 2: Source Installer

- Clone mercat2 from github
- Run install_mercat2.sh to install all required dependencies
- This script creates a conda environment for you

```bash
git clone https://github.com/raw-lab/mercat2.git
cd mercat2
bash install_mercat2.sh
conda activate mercat2
```

## Dependencies

MerCat2 runs on python version 3.9 and higher.

### external dependencies

MerCat2 can run without external dependencies based on the options used.  

Required dependencies:

- When a raw read .fastq file is given
  - [Fastqc](<https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>)
  - [Fastp](<https://github.com/OpenGene/fastp>)

- For bacteria/archaea rich samples (-prod option)
  - [Prodigal](<https://github.com/hyattpd/Prodigal>)

- For eukaryote rich samples or general applications (-fgs option)
  - [FragGeneScanRs](<https://github.com/unipept/FragGeneScanRs>)

These are available through BioConda, except FragGeneScanRS, which is included in the MerCat2 distribution.  

```bash
conda install -c bioconda fastqc fastp prodigal
```

## Usage

usage: mercat2.py [-h] [-i I [I ...]] [-f F] -k K [-n N] [-c C] [-prod] [-fgs] [-s S] [-o O] [-replace] [-lowmem LOWMEM] [-skipclean] [-toupper] [-pca] [--version]

```bash
options:
  -h, --help      show this help message and exit
  -i I            path to input file(s)
  -f F            path to folder containing input files
  -k K            kmer length
  -n N            no of cores [auto detect]
  -c C            minimum kmer count [10]
  -prod           run Prodigal on fasta files
  -fgs            run FragGeneScanRS on fasta files
  -s S            Split into x MB files. [100]
  -o O            Output folder, default = 'mercat_results' in current directory
  -replace        Replace existing output directory [False]
  -lowmem LOWMEM  Flag to use incremental PCA when low memory is available. [auto]
  -skipclean      skip trimming of fastq files
  -toupper        convert all input sequences to uppercase
  -pca            create interactive PCA plot of the samples (minimum of 4 fasta files required)
  --version, -v   show the version number and exit
```

Mercat assumes the input file format based on the extension provided

- raw fastq file: ['.fastq', '.fq']
- nucleotide fasta: ['.fa', '.fna', '.ffn', '.fasta']
- amino acid fasta: ['.faa']

- It also accepts gzipped versions of these filetypes with the added '.gz' suffix

## Usage examples

### Run mercat2 on a protein file (protein fasta - '.faa')

```bash
mercat2.py -i file-name.faa -k 3 -c 10
```

### Run mercat2 on a nucleotide file (nucleotide fasta - '.fa', '.fna', '.ffn', '.fasta')

```bash
mercat2.py -i file-name.fna -k 3 -n 8 -c 10
```

### Run mercat2 on a nucleotide file raw data (nucleotide fastq - '.fastq')

```bash
mercat2.py -i file-name.fastq -k 3 -n 8 -c 10
```

### Run on many samples within a folder

```bash
mercat2.py -f /path/to/input-folder -k 3 -n 8 -c 10
```

### Run on sample with prodigal/FragGeneScanRS option (raw reads or nucleotide contigs - '.fa', '.fna', '.ffn', '.fasta', '.fastq')

```bash
mercat2.py -i /path/to/input-file -k 3 -n 8 -c 10 -prod
```

```bash
mercat2.py -i /path/to/input-file -k 3 -n 8 -c 10 -fgs
```

- the prodigal and FragGeneScanRS options run the k-mer counter on both contigs and produced amino acids

## Outputs

- Results are stored in the output folder (default 'mercat_results' of the current working directory)
  - the 'report' folder contains an html report with interactive plotly figures
    - If at least 4 samples are provided a PCA plot will be included in the html report
  - the 'tsv' folder contains counts tables in tab separated format
    - if protein files are given, or the -prod option, a .tsv file is created for each sample containing k-mer count, pI, Molecular Weight, and Hydrophobicity metrics
    - if nucleotide files are given a .tsv file is created for each sample containing k-mer count and GC content
  - if .fastq raw reads files are used, a 'clean' folder is created with the clean fasta file.
  - if the -prod option is used, a 'prodigal' folder is created with the amino acid .faa and .gff files
  - if the -fgs option is used, a 'fgs' folder is created with the amino acid .faa file

![GitHub Logo](https://github.com/raw-lab/mercat2/raw/master/doc/PCA.png)

## Diversity estimation

Alpha and Beta diversity metrics provided by MerCat2 are experimental. We are currently working on the robustness of these measures.  

Alpha diversity metrics provided:

- shannon
- simpson
- simpson_e
- goods_coverage
- fisher_alpha
- dominance
- chao1
- chao1_ci
- ace

Beta diversity metrics provided:

- euclidean
- cityblock
- braycurtis
- canberra
- chebyshev
- correlation
- cosine
- dice
- hamming
- jaccard
- mahalanobis
- manhattan (same as City Block in this case)
- matching
- minkowski
- rogerstanimoto
- russellrao
- seuclidean
- sokalmichener
- sokalsneath
- sqeuclidean
- yule

## Notes on memory usage and speed

MerCat2 uses a substantial amount of memory when the k-mer is high.  
Running MerCat2 on a personal computer using a k-mer length of ~4 should be OK. Total memory usage can be reduced using the Chunker feature (-s option), but keep in mind that in testing when the chunk size is too small (1MB) some of the least significant k-mers will get lost. This does not seem to affect the overall results, but it is something to keep in mind. Using the chunker and reducing the number of CPUs available (-n option) can help reduce memory requirements.  
  
The speed of MerCat2 can be increased when more memory or computer nodes are available on a cluster and using a chunk size of about 100Mb.

## Citing Mercat

If you are publishing results obtained using MerCat2, please cite: <br />
Figueroa JL, Panyala A, Colby S, Friesen M, Tiemann L, White III RA. 2022.  <br />
MerCat2: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from omics data. bioRxiv.  <br />
[paper](https://www.biorxiv.org/content/10.1101/2022.11.22.517562v1)   <br />

### CONTACT

Please send all queries to [Jose Luis Figueroa III](mailto:jlfiguer@charlotte.edu) <br />
[Dr. Richard Allen White III](mailto:rwhit101@charlotte.edu)<br />
Or [open an issue](https://github.com/raw-lab/mercat2/issues)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/raw-lab/mercat2",
    "name": "mercat2",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "",
    "author": "Jose L. Figueroa III, Richard A. White III",
    "author_email": "jlfiguer@charlotte.edu",
    "download_url": "https://files.pythonhosted.org/packages/04/b8/c9276e46ca28adcbeb88354e599ae3c12737e81ab0faaebfdf70c669c583/mercat2-1.4.1.tar.gz",
    "platform": "Unix",
    "description": "# MerCat2: python code for versatile k-mer counter for database independent property analysis (DIPA) for omic analysis\n\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/mercat2/README.html)\n\n![GitHub Logo](https://github.com/raw-lab/mercat2/blob/master/MerCat2.jpg)\n\n## Installing MerCat2\n\n### Option 1: Bioconda Installer\n\n#### Install mamba using conda\n\n```bash\nconda activate base\nconda install mamba\n```\n\n- NOTE: Make sure you install mamba in your base conda environment\n- We have found that mamba is faster than conda for installing packages and creating environments. Using conda might fail to resolve dependencies.\n\n#### Install MerCat2\n\n- Available via Bioconda:\n\n```bash\nmamba create -n mercat2 -c conda-forge -c bioconda mercat2\nconda activate mercat2\n```\n\n### Option 2: Source Installer\n\n- Clone mercat2 from github\n- Run install_mercat2.sh to install all required dependencies\n- This script creates a conda environment for you\n\n```bash\ngit clone https://github.com/raw-lab/mercat2.git\ncd mercat2\nbash install_mercat2.sh\nconda activate mercat2\n```\n\n## Dependencies\n\nMerCat2 runs on python version 3.9 and higher.\n\n### external dependencies\n\nMerCat2 can run without external dependencies based on the options used.  \n\nRequired dependencies:\n\n- When a raw read .fastq file is given\n  - [Fastqc](<https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>)\n  - [Fastp](<https://github.com/OpenGene/fastp>)\n\n- For bacteria/archaea rich samples (-prod option)\n  - [Prodigal](<https://github.com/hyattpd/Prodigal>)\n\n- For eukaryote rich samples or general applications (-fgs option)\n  - [FragGeneScanRs](<https://github.com/unipept/FragGeneScanRs>)\n\nThese are available through BioConda, except FragGeneScanRS, which is included in the MerCat2 distribution.  \n\n```bash\nconda install -c bioconda fastqc fastp prodigal\n```\n\n## Usage\n\nusage: mercat2.py [-h] [-i I [I ...]] [-f F] -k K [-n N] [-c C] [-prod] [-fgs] [-s S] [-o O] [-replace] [-lowmem LOWMEM] [-skipclean] [-toupper] [-pca] [--version]\n\n```bash\noptions:\n  -h, --help      show this help message and exit\n  -i I            path to input file(s)\n  -f F            path to folder containing input files\n  -k K            kmer length\n  -n N            no of cores [auto detect]\n  -c C            minimum kmer count [10]\n  -prod           run Prodigal on fasta files\n  -fgs            run FragGeneScanRS on fasta files\n  -s S            Split into x MB files. [100]\n  -o O            Output folder, default = 'mercat_results' in current directory\n  -replace        Replace existing output directory [False]\n  -lowmem LOWMEM  Flag to use incremental PCA when low memory is available. [auto]\n  -skipclean      skip trimming of fastq files\n  -toupper        convert all input sequences to uppercase\n  -pca            create interactive PCA plot of the samples (minimum of 4 fasta files required)\n  --version, -v   show the version number and exit\n```\n\nMercat assumes the input file format based on the extension provided\n\n- raw fastq file: ['.fastq', '.fq']\n- nucleotide fasta: ['.fa', '.fna', '.ffn', '.fasta']\n- amino acid fasta: ['.faa']\n\n- It also accepts gzipped versions of these filetypes with the added '.gz' suffix\n\n## Usage examples\n\n### Run mercat2 on a protein file (protein fasta - '.faa')\n\n```bash\nmercat2.py -i file-name.faa -k 3 -c 10\n```\n\n### Run mercat2 on a nucleotide file (nucleotide fasta - '.fa', '.fna', '.ffn', '.fasta')\n\n```bash\nmercat2.py -i file-name.fna -k 3 -n 8 -c 10\n```\n\n### Run mercat2 on a nucleotide file raw data (nucleotide fastq - '.fastq')\n\n```bash\nmercat2.py -i file-name.fastq -k 3 -n 8 -c 10\n```\n\n### Run on many samples within a folder\n\n```bash\nmercat2.py -f /path/to/input-folder -k 3 -n 8 -c 10\n```\n\n### Run on sample with prodigal/FragGeneScanRS option (raw reads or nucleotide contigs - '.fa', '.fna', '.ffn', '.fasta', '.fastq')\n\n```bash\nmercat2.py -i /path/to/input-file -k 3 -n 8 -c 10 -prod\n```\n\n```bash\nmercat2.py -i /path/to/input-file -k 3 -n 8 -c 10 -fgs\n```\n\n- the prodigal and FragGeneScanRS options run the k-mer counter on both contigs and produced amino acids\n\n## Outputs\n\n- Results are stored in the output folder (default 'mercat_results' of the current working directory)\n  - the 'report' folder contains an html report with interactive plotly figures\n    - If at least 4 samples are provided a PCA plot will be included in the html report\n  - the 'tsv' folder contains counts tables in tab separated format\n    - if protein files are given, or the -prod option, a .tsv file is created for each sample containing k-mer count, pI, Molecular Weight, and Hydrophobicity metrics\n    - if nucleotide files are given a .tsv file is created for each sample containing k-mer count and GC content\n  - if .fastq raw reads files are used, a 'clean' folder is created with the clean fasta file.\n  - if the -prod option is used, a 'prodigal' folder is created with the amino acid .faa and .gff files\n  - if the -fgs option is used, a 'fgs' folder is created with the amino acid .faa file\n\n![GitHub Logo](https://github.com/raw-lab/mercat2/raw/master/doc/PCA.png)\n\n## Diversity estimation\n\nAlpha and Beta diversity metrics provided by MerCat2 are experimental. We are currently working on the robustness of these measures.  \n\nAlpha diversity metrics provided:\n\n- shannon\n- simpson\n- simpson_e\n- goods_coverage\n- fisher_alpha\n- dominance\n- chao1\n- chao1_ci\n- ace\n\nBeta diversity metrics provided:\n\n- euclidean\n- cityblock\n- braycurtis\n- canberra\n- chebyshev\n- correlation\n- cosine\n- dice\n- hamming\n- jaccard\n- mahalanobis\n- manhattan (same as City Block in this case)\n- matching\n- minkowski\n- rogerstanimoto\n- russellrao\n- seuclidean\n- sokalmichener\n- sokalsneath\n- sqeuclidean\n- yule\n\n## Notes on memory usage and speed\n\nMerCat2 uses a substantial amount of memory when the k-mer is high.  \nRunning MerCat2 on a personal computer using a k-mer length of ~4 should be OK. Total memory usage can be reduced using the Chunker feature (-s option), but keep in mind that in testing when the chunk size is too small (1MB) some of the least significant k-mers will get lost. This does not seem to affect the overall results, but it is something to keep in mind. Using the chunker and reducing the number of CPUs available (-n option) can help reduce memory requirements.  \n  \nThe speed of MerCat2 can be increased when more memory or computer nodes are available on a cluster and using a chunk size of about 100Mb.\n\n## Citing Mercat\n\nIf you are publishing results obtained using MerCat2, please cite: <br />\nFigueroa JL, Panyala A, Colby S, Friesen M, Tiemann L, White III RA. 2022.  <br />\nMerCat2: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from omics data. bioRxiv.  <br />\n[paper](https://www.biorxiv.org/content/10.1101/2022.11.22.517562v1)   <br />\n\n### CONTACT\n\nPlease send all queries to [Jose Luis Figueroa III](mailto:jlfiguer@charlotte.edu) <br />\n[Dr. Richard Allen White III](mailto:rwhit101@charlotte.edu)<br />\nOr [open an issue](https://github.com/raw-lab/mercat2/issues)\n",
    "bugtrack_url": null,
    "license": "BSD License",
    "summary": "versatile k-mer counter and diversity estimator for database independent property analysis (DIPA) for multi-omic analysis",
    "version": "1.4.1",
    "project_urls": {
        "Homepage": "https://github.com/raw-lab/mercat2"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "20c3e480bab8eb1a710c7859250d7d9ba1ae927dc189df8391d3d32dff5b1c79",
                "md5": "b2334674768e06c21570acb6af0597f1",
                "sha256": "2e30ddcbf3b0966a50491a215e39af2148cfa6b67ad41c9a3ab890f9cc8dcceb"
            },
            "downloads": -1,
            "filename": "mercat2-1.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b2334674768e06c21570acb6af0597f1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 5623726,
            "upload_time": "2024-03-05T17:41:55",
            "upload_time_iso_8601": "2024-03-05T17:41:55.544321Z",
            "url": "https://files.pythonhosted.org/packages/20/c3/e480bab8eb1a710c7859250d7d9ba1ae927dc189df8391d3d32dff5b1c79/mercat2-1.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "04b8c9276e46ca28adcbeb88354e599ae3c12737e81ab0faaebfdf70c669c583",
                "md5": "e457a5c6f5d7fb3e09bc386c7906f937",
                "sha256": "5a090b19db6beecae8d3a1f2014a239b306fa428e92cc8f02e98e0e4b4a89d31"
            },
            "downloads": -1,
            "filename": "mercat2-1.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e457a5c6f5d7fb3e09bc386c7906f937",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 5623465,
            "upload_time": "2024-03-05T17:41:58",
            "upload_time_iso_8601": "2024-03-05T17:41:58.727464Z",
            "url": "https://files.pythonhosted.org/packages/04/b8/c9276e46ca28adcbeb88354e599ae3c12737e81ab0faaebfdf70c669c583/mercat2-1.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-05 17:41:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "raw-lab",
    "github_project": "mercat2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mercat2"
}
        
Elapsed time: 0.21778s