metav

Name	metav JSON
Version	1.0.6 JSON
	download
home_page	https://github.com/ZhijianZhou01/nextvirus
Summary	rapid detection and classification of viruses in metagenomics sequencing.
upload_time	2024-08-26 10:33:53
maintainer	None
docs_url	None
author	Zhi-Jian Zhou
requires_python	None
license	None
keywords	virus detection sequencing metagenomics
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Metagenomics virus detection (MetaV): rapid detection and classification of viruses in metagenomics sequencing


![](https://img.shields.io/badge/System-Linux-green.svg)
![](https://img.shields.io/pypi/wheel/metav)
![](https://img.shields.io/pypi/dm/metav)

## 1. Introduction
### 1.1. workflow of metav
metav is a command-line-interface program, which is used to rapidly identify and classify viral sequences from metagenomic sequencing data. metav is developed via `Python 3`, and can be run on Linux systems and deployed to the cloud. 

The workflow of metav is simple but efficientas,

<div  align="left">    
<kbd><img src="https://github.com/ZhijianZhou01/metav/blob/main/figure/metav.jpg" width = "532" height = "552" alt="work" align=left /></kbd>
</div>


### 1.2. Functional expansion
metav was originally designed to detect and count the viral composition in metagenomics-sequencing-data, but it's flexible and not limited to viruses.

In fact, the viral nr database can be replaced by protein databases of other pathogenic, for example, bacteria, pathogenic fungi. These nr database cam be download from [database of ncbi refseqs](https://ftp.ncbi.nlm.nih.gov/refseq/release/). In a word, metav can detect and count other pathogens of metagenomics-sequencing-data by using the corresponding nr database and taxonomy information file.

## 2. Download and install

metav has been distributed to the standard library of PyPI (https://pypi.org/project/metav/), and can be easily installed by the tool ```pip```.
```
pip install metav
metav -h
```
<b>Note, if metav is installed by `pip` tool, you also need to manually install the software dependencies, please see section 3.</b>


## 3. Software dependencies

The running of `metav` relies on these softwares:

+  [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) (version >=0.39), which is used to remove the contamination from adapter primer.

+  [Bowtie2](https://github.com/BenLangmead/bowtie2/releases) (version >=2.3.0), which is used to remove the contamination from host genome.
  
+  [Trinity](https://github.com/trinityrnaseq/trinityrnaseq) (version >=2.15.1), in the second sub-pipeline of `metav`, the Trinity is used to splice reads to contigs. <b>Note</b>, the running of Trinity relies on [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [jellyfish](https://github.com/gmarcais/Jellyfish/releases), [samtools](https://github.com/samtools/samtools/releases) and [salmon](https://github.com/COMBINE-lab/salmon/releases/), and they can be easily installed,
```
# (1) bowtie2 
sudo apt-get install c

# (2) jellyfish
sudo apt-get install jellyfish

# (3) salmon
sudo apt install salmon

# (4) samtools
wget https://github.com/samtools/samtools/releases/download/1.20/samtools-1.20.tar.bz2
tar -zxvf samtools-1.20.tar.bz2
cd samtools-1.20
./configure
make
make install
```

+  [diamond](https://github.com/bbuchfink/diamond) (version >=2.0.9), the diamond is used to map reads (or contigs) to  proteins.

<b>Note, if metav is installed by `pip` method, the four dependencies (Trimmomatic, Bowtie2, Trinity and diamond) need to be installed manually by users in advance and be added to `PATH` (system or user)</b>. 

##  4. Database dependencies
### 4.1. prepare host database

The host database is used to remove contamination from host genome. <b>How to prepare a host database?</b>

(1) download the genomic data of host with *.fasta format.

(2) creat the host database using [Bowtie2](https://github.com/BenLangmead/bowtie2/releases) software, for example,
 `bowtie2-build /home/zzj/host_db/host_genome.fna /home/zzj/host_db/host_genome`. It then generates six files, which starts with "host_genome" and suffix are '.1.bt2', '.2.bt2', '.3.bt2', '.4.bt2', '.rev.1.bt2', and '.rev.2.bt2'.

<b>Next</b>, you need to fill in the path `/home/zzj/host_db/host_genome` into file `profiles.xml`. <b>Note</b>, the path `/home/zzj/host_db/host_genome` is not a directory!


(3) metav also supports multiplehost databases, please use `,` to separate these path in file `profiles.xml`, for example, `/home/zzj/host_db/host_genome1, /home/zzj/host_db/host_genome2`.

<b>Tip</b>, different samples may come from different hosts, please adjust them in file `profiles.xml` in time.

### 4.2. prepare viral nr database

The viral nr database was used to identity viral components from sequenced reads. <b>How to prepare a viral nr database?</b>

(1) firstly, download the refseqs of viral proteins (amino acid, `*.1.protein.faa.gz`) from [viral database of ncbi refseqs](https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/), besides, please also download `*.protein.gpff.gz` containning the taxonomic information of these sequences. <b>Note</b>, the format of file `*.1.protein.faa.gz` is `fasta`.

(2) next, unzip `*.1.protein.faa.gz` and rename to `ViralProtein.fasta`, then creat the viral nr database using [diamond](https://github.com/bbuchfink/diamond) software, for example, 
`diamond makedb -p 10 --in /home/zzj/nr/ViralProtein.fasta --db /home/zzj/nr/ViralProtein.dmnd`. Then, fill in the path `/home/zzj/nr/ViralProtein.dmnd` into file `profiles.xml`.

(3) then, extract the viral taxonomy information from file `*.protein.gpff.gz` , which is used to classfy viral reads. This repository provides the [taxonomy_information](https://github.com/ZhijianZhou01/metav/releases/tag/data) made by ourselves, in which the accession is consistent with the file `ViralProtein.fasta`. If you want to add some information, please keep it in the same format (four columns, don't change the name of column). Finally, fill in the path of taxonomy information file into the file `profiles.xml`.

<b>Tip</b>, the viral nr database generally does not need to be replaced in the short term.



## 5. Configuration of dependencies
In order to manage the parameters of dependent softwares and databases convenienty, the `profiles.xml` file is used to record their configuration. 

the template of `profiles.xml` is provided in the github repository, please note,

+ <b>currently version of metav only supports the sequenced data from `second-generation sequencing`</b>.
  
+ the paths of these databases in file `profiles.xml` need to be adjusted with reference to your computer, databases paths in `profiles.xml` we provided were just some examples. <b>Note, they have to be absolute path, not relative path.</b>
  
+ the parameters of software in `profiles.xml` generally does not need to be modified because they are suitable in most cases.
<b>But, note, the path of adapters file needs to be modified</b>, see field `ILLUMINACLIP:/home/zzj/anaconda3/envs/metav_env/share/trimmomatic-0.39-2/adapters/merge_adapter.fas` in setting of trimmomatic in `profiles.xml`. The path `/home/zzj/anaconda3/envs/metav_env/share/trimmomatic-0.39-2/adapters/merge_adapter.fas` here is only an example, adapter file is generally in the `adapters` folder of the installation directory of trimmomatic software, or you can make this file yourself, just fill in the corresponding absolute path here.

<b>Tip</b>, in general, these parameters only need to be configured once in the first running, except for the host database used to filter contamination of host genome.


## 6. Getting help
Users can view the help documentation by entering `metav -h`  or `metav --help` .
| Parameter | Description |
| --- | --- |
|-h, --help | show this help message and exit|
|-pe | paired-end sequencing.|
|-se | single-end sequencing.|
|-i1 FORWARD | forward reads (*.fq) using paired-end sequencing.|
|-i2 REVERSE | reverse reads (*.fq) using paired-end sequencing.|
|-u UNPAIRED | reads file using single-end sequencing (unpaired reads).|
|-q QUALITIES | the qualities (phred33 or phred64) of sequenced reads, default: phred33.|
|-xml PROFILES | the *.xml file with parameters of dependent software and databases.|
|-len LENGTH | threshold of length of aa alignment in diamond, default: 10.|
|-s IDENTITY | threshold of identity(%) of alignment aa in diamond, default: 20.|
|-e E_VALUE | specify three e-values threshold used to filter the reads (or contigs) hit nr database, default: 1e-6,1e-3,1e-1.|
|-r1 | run the sub-pipeline 1 (reads → nr database).|
|-r2 | run the sub-pipeline 2 (reads → contigs → nr database).|
|-t THREAD | number of used threads, default: 1.|
|-o OUTDIR | output directory to store all results.|


## 7. Example of usage

+ <b>if reads are from paired-end sequencing:</b>
  
```
metav -pe -i1 reads_R1.fq -i2 reads_R2.fq -xml profiles.xml -r1 -r2 -t 8 -o outdir
```

+ <b>if reads are from single-end sequencing:</b>
 
 ```
metav -se -u reads.fq -xml profiles.xml -r1 -r2 -t 8 -o outdir
```

<b> Tip </b>
+ metav is also supported to run one of `-r1` and `-r2`.

+ if `-r2` is used, the output directory behind `-o` have to be <b>absolute path</b>.

+ if an error is displayed, please check the input parameters and XML file.


## 8. Output results

### 8.1. input-parameter.txt

the output file `input-parameter.txt` recorded the input parameters in command-line interface.

```
the used parameters of metav in command-line interface.

pair_end:	True
single_end:	False
sub-pipeline 1:	True
sub-pipeline 2:	True
forward_reads:	/home/zzj/datas/test/reads_1.fq
reverse_reads:	/home/zzj/datas/test/reads_2.fq
unpaired:	None
qualities:	phred33
set_file:	/home/zzj/datas/test/profiles.xml
length_threshold:	10.0
identity_threshold:	20.0
e-value:	['1e-6', '1e-3', '1e-1']
thread:	8
outdir:	/home/zzj/datas/test/out6
```

### 8.2. directory pipeline1
the directory `pipeline1` contains intermediate results and `finally_result` from `sub-pipeline 1` (reads → nr database). 

In the example, three thresholds (`1e-6`, `1e-3` and `1e-1`) of e-value are used to filter the output diamond program. Thus, three corresponding sub-directories is created and used to store results. 
![https://github.com/ZhijianZhou01/metav/blob/main/figure/e-value.png](https://github.com/ZhijianZhou01/metav/blob/main/figure/e-value.png)

The meanings of directory name with e-value in `pipeline1` are as follows,

| sub-directories | description |
| --- | --- |
|lower_1e-6 | e-value of hit reads < `1e-6` |
|lower_0.001 | `1e-6` < e-value of hit reads < `1e-3` |
|lower_0.1 | `1e-3` < e-value of hit reads < `1e-1` |

The hierarchy is same in all three sub-directories with e-value. For example, in the directory `hit_summary` of the directory `lower_1e-6`, metav provides a summary file (`hit_reads_taxonomy_information.txt`) with taxonomy information. 

What's more, metav counts these hit reads according to `order`, `family` and `strain`(organism) and provides three `*.csv` summary files.

![https://github.com/ZhijianZhou01/metav/blob/main/figure/reads_summary.png](https://github.com/ZhijianZhou01/metav/blob/main/figure/reads_summary.png)

<b>In particular, metav extract all hit reads sequences (*fasta format) according to the hierarchical relationship of `order`, `family` and `strain`(organism)</b>. These hit reads sequences are stored in directory `hit_reads_seq`.


### 8.3. directory pipeline2
the directory `pipeline2` contains intermediate results and `finally_result` from `sub-pipeline 2` (reads → contigs → nr database). The hierarchy of the directory of output results is the same as directory `pipeline1`.

However, the output in directory `finally_result` of the directory `pipeline2` are hit contigs, not reads.

The meanings of directory name with e-value in `pipeline2` are as follows,

| sub-directories | description |
| --- | --- |
|lower_1e-6 | e-value of hit contigs < `1e-6` |
|lower_0.001 | `1e-6` < e-value of hit contigs < `1e-3` |
|lower_0.1 | `1e-3` < e-value of hit contigs < `1e-1` |

In the directory `hit_summary` of each sub-directory with e-value, the sequences and summary information of hit contigs are provided, and these hit contigs sequences are stored in directory `hit_contigs_seq`.

![https://github.com/ZhijianZhou01/metav/blob/main/figure/contigs_symmary.png](https://github.com/ZhijianZhou01/metav/blob/main/figure/contigs_symmary.png)



## 9. Cite

For example, `the viral (or other pathogens) reads/components were identified by metav pipeline (https://github.com/ZhijianZhou01/metav)`.

## 10. Bug report
metav was test on Ubuntu 16.04 and Ubuntu 20.02, which can work well. If you run into a problem or find a bug, please contact us.

[Github issues](https://github.com/ZhijianZhou01/BioAider/issues) or send email to zjzhou@hnu.edu.cn.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ZhijianZhou01/nextvirus",
    "name": "metav",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "virus detection, sequencing, metagenomics",
    "author": "Zhi-Jian Zhou",
    "author_email": "zjzhou@hnu.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/d7/f8/a9a984664af84c89b0c1701b521b35e8098a9a98fdf4af97466914da62d2/metav-1.0.6.tar.gz",
    "platform": null,
    "description": "# Metagenomics virus detection (MetaV): rapid detection and classification of viruses in metagenomics sequencing\r\n\r\n\r\n![](https://img.shields.io/badge/System-Linux-green.svg)\r\n![](https://img.shields.io/pypi/wheel/metav)\r\n![](https://img.shields.io/pypi/dm/metav)\r\n\r\n## 1. Introduction\r\n### 1.1. workflow of metav\r\nmetav is a command-line-interface program, which is used to rapidly identify and classify viral sequences from metagenomic sequencing data. metav is developed via `Python 3`, and can be run on Linux systems and deployed to the cloud. \r\n\r\nThe workflow of metav is simple but efficientas,\r\n\r\n<div  align=\"left\">    \r\n<kbd><img src=\"https://github.com/ZhijianZhou01/metav/blob/main/figure/metav.jpg\" width = \"532\" height = \"552\" alt=\"work\" align=left /></kbd>\r\n</div>\r\n\r\n\r\n### 1.2. Functional expansion\r\nmetav was originally designed to detect and count the viral composition in metagenomics-sequencing-data, but it's flexible and not limited to viruses.\r\n\r\nIn fact, the viral nr database can be replaced by protein databases of other pathogenic, for example, bacteria, pathogenic fungi. These nr database cam be download from [database of ncbi refseqs](https://ftp.ncbi.nlm.nih.gov/refseq/release/). In a word, metav can detect and count other pathogens of metagenomics-sequencing-data by using the corresponding nr database and taxonomy information file.\r\n\r\n## 2. Download and install\r\n\r\nmetav has been distributed to the standard library of PyPI (https://pypi.org/project/metav/), and can be easily installed by the tool ```pip```.\r\n```\r\npip install metav\r\nmetav -h\r\n```\r\n<b>Note, if metav is installed by `pip` tool, you also need to manually install the software dependencies, please see section 3.</b>\r\n\r\n\r\n## 3. Software dependencies\r\n\r\nThe running of `metav` relies on these softwares:\r\n\r\n+  [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) (version >=0.39), which is used to remove the contamination from adapter primer.\r\n\r\n+  [Bowtie2](https://github.com/BenLangmead/bowtie2/releases) (version >=2.3.0), which is used to remove the contamination from host genome.\r\n  \r\n+  [Trinity](https://github.com/trinityrnaseq/trinityrnaseq) (version >=2.15.1), in the second sub-pipeline of `metav`, the Trinity is used to splice reads to contigs. <b>Note</b>, the running of Trinity relies on [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), [jellyfish](https://github.com/gmarcais/Jellyfish/releases), [samtools](https://github.com/samtools/samtools/releases) and [salmon](https://github.com/COMBINE-lab/salmon/releases/), and they can be easily installed,\r\n```\r\n# (1) bowtie2 \r\nsudo apt-get install c\r\n\r\n# (2) jellyfish\r\nsudo apt-get install jellyfish\r\n\r\n# (3) salmon\r\nsudo apt install salmon\r\n\r\n# (4) samtools\r\nwget https://github.com/samtools/samtools/releases/download/1.20/samtools-1.20.tar.bz2\r\ntar -zxvf samtools-1.20.tar.bz2\r\ncd samtools-1.20\r\n./configure\r\nmake\r\nmake install\r\n```\r\n\r\n+  [diamond](https://github.com/bbuchfink/diamond) (version >=2.0.9), the diamond is used to map reads (or contigs) to  proteins.\r\n\r\n<b>Note, if metav is installed by `pip` method, the four dependencies (Trimmomatic, Bowtie2, Trinity and diamond) need to be installed manually by users in advance and be added to `PATH` (system or user)</b>. \r\n\r\n##  4. Database dependencies\r\n### 4.1. prepare host database\r\n\r\nThe host database is used to remove contamination from host genome. <b>How to prepare a host database?</b>\r\n\r\n(1) download the genomic data of host with *.fasta format.\r\n\r\n(2) creat the host database using [Bowtie2](https://github.com/BenLangmead/bowtie2/releases) software, for example,\r\n `bowtie2-build /home/zzj/host_db/host_genome.fna /home/zzj/host_db/host_genome`. It then generates six files, which starts with \"host_genome\" and suffix are '.1.bt2', '.2.bt2', '.3.bt2', '.4.bt2', '.rev.1.bt2', and '.rev.2.bt2'.\r\n\r\n<b>Next</b>, you need to fill in the path `/home/zzj/host_db/host_genome` into file `profiles.xml`. <b>Note</b>, the path `/home/zzj/host_db/host_genome` is not a directory!\r\n\r\n\r\n(3) metav also supports multiplehost databases, please use `,` to separate these path in file `profiles.xml`, for example, `/home/zzj/host_db/host_genome1, /home/zzj/host_db/host_genome2`.\r\n\r\n<b>Tip</b>, different samples may come from different hosts, please adjust them in file `profiles.xml` in time.\r\n\r\n### 4.2. prepare viral nr database\r\n\r\nThe viral nr database was used to identity viral components from sequenced reads. <b>How to prepare a viral nr database?</b>\r\n\r\n(1) firstly, download the refseqs of viral proteins (amino acid, `*.1.protein.faa.gz`) from [viral database of ncbi refseqs](https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/), besides, please also download `*.protein.gpff.gz` containning the taxonomic information of these sequences. <b>Note</b>, the format of file `*.1.protein.faa.gz` is `fasta`.\r\n\r\n(2) next, unzip `*.1.protein.faa.gz` and rename to `ViralProtein.fasta`, then creat the viral nr database using [diamond](https://github.com/bbuchfink/diamond) software, for example, \r\n`diamond makedb -p 10 --in /home/zzj/nr/ViralProtein.fasta --db /home/zzj/nr/ViralProtein.dmnd`. Then, fill in the path `/home/zzj/nr/ViralProtein.dmnd` into file `profiles.xml`.\r\n\r\n(3) then, extract the viral taxonomy information from file `*.protein.gpff.gz` , which is used to classfy viral reads. This repository provides the [taxonomy_information](https://github.com/ZhijianZhou01/metav/releases/tag/data) made by ourselves, in which the accession is consistent with the file `ViralProtein.fasta`. If you want to add some information, please keep it in the same format (four columns, don't change the name of column). Finally, fill in the path of taxonomy information file into the file `profiles.xml`.\r\n\r\n<b>Tip</b>, the viral nr database generally does not need to be replaced in the short term.\r\n\r\n\r\n\r\n## 5. Configuration of dependencies\r\nIn order to manage the parameters of dependent softwares and databases convenienty, the `profiles.xml` file is used to record their configuration. \r\n\r\nthe template of `profiles.xml` is provided in the github repository, please note,\r\n\r\n+ <b>currently version of metav only supports the sequenced data from `second-generation sequencing`</b>.\r\n  \r\n+ the paths of these databases in file `profiles.xml` need to be adjusted with reference to your computer, databases paths in `profiles.xml` we provided were just some examples. <b>Note, they have to be absolute path, not relative path.</b>\r\n  \r\n+ the parameters of software in `profiles.xml` generally does not need to be modified because they are suitable in most cases.\r\n<b>But, note, the path of adapters file needs to be modified</b>, see field `ILLUMINACLIP:/home/zzj/anaconda3/envs/metav_env/share/trimmomatic-0.39-2/adapters/merge_adapter.fas` in setting of trimmomatic in `profiles.xml`. The path `/home/zzj/anaconda3/envs/metav_env/share/trimmomatic-0.39-2/adapters/merge_adapter.fas` here is only an example, adapter file is generally in the `adapters` folder of the installation directory of trimmomatic software, or you can make this file yourself, just fill in the corresponding absolute path here.\r\n\r\n<b>Tip</b>, in general, these parameters only need to be configured once in the first running, except for the host database used to filter contamination of host genome.\r\n\r\n\r\n## 6. Getting help\r\nUsers can view the help documentation by entering `metav -h`  or `metav --help` .\r\n| Parameter | Description |\r\n| --- | --- |\r\n|-h, --help | show this help message and exit|\r\n|-pe | paired-end sequencing.|\r\n|-se | single-end sequencing.|\r\n|-i1 FORWARD | forward reads (*.fq) using paired-end sequencing.|\r\n|-i2 REVERSE | reverse reads (*.fq) using paired-end sequencing.|\r\n|-u UNPAIRED | reads file using single-end sequencing (unpaired reads).|\r\n|-q QUALITIES | the qualities (phred33 or phred64) of sequenced reads, default: phred33.|\r\n|-xml PROFILES | the *.xml file with parameters of dependent software and databases.|\r\n|-len LENGTH | threshold of length of aa alignment in diamond, default: 10.|\r\n|-s IDENTITY | threshold of identity(%) of alignment aa in diamond, default: 20.|\r\n|-e E_VALUE | specify three e-values threshold used to filter the reads (or contigs) hit nr database, default: 1e-6,1e-3,1e-1.|\r\n|-r1 | run the sub-pipeline 1 (reads \u2192 nr database).|\r\n|-r2 | run the sub-pipeline 2 (reads \u2192 contigs \u2192 nr database).|\r\n|-t THREAD | number of used threads, default: 1.|\r\n|-o OUTDIR | output directory to store all results.|\r\n\r\n\r\n## 7. Example of usage\r\n\r\n+ <b>if reads are from paired-end sequencing:</b>\r\n  \r\n```\r\nmetav -pe -i1 reads_R1.fq -i2 reads_R2.fq -xml profiles.xml -r1 -r2 -t 8 -o outdir\r\n```\r\n\r\n+ <b>if reads are from single-end sequencing:</b>\r\n \r\n ```\r\nmetav -se -u reads.fq -xml profiles.xml -r1 -r2 -t 8 -o outdir\r\n```\r\n\r\n<b> Tip </b>\r\n+ metav is also supported to run one of `-r1` and `-r2`.\r\n\r\n+ if `-r2` is used, the output directory behind `-o` have to be <b>absolute path</b>.\r\n\r\n+ if an error is displayed, please check the input parameters and XML file.\r\n\r\n\r\n## 8. Output results\r\n\r\n### 8.1. input-parameter.txt\r\n\r\nthe output file `input-parameter.txt` recorded the input parameters in command-line interface.\r\n\r\n```\r\nthe used parameters of metav in command-line interface.\r\n\r\npair_end:\tTrue\r\nsingle_end:\tFalse\r\nsub-pipeline 1:\tTrue\r\nsub-pipeline 2:\tTrue\r\nforward_reads:\t/home/zzj/datas/test/reads_1.fq\r\nreverse_reads:\t/home/zzj/datas/test/reads_2.fq\r\nunpaired:\tNone\r\nqualities:\tphred33\r\nset_file:\t/home/zzj/datas/test/profiles.xml\r\nlength_threshold:\t10.0\r\nidentity_threshold:\t20.0\r\ne-value:\t['1e-6', '1e-3', '1e-1']\r\nthread:\t8\r\noutdir:\t/home/zzj/datas/test/out6\r\n```\r\n\r\n### 8.2. directory pipeline1\r\nthe directory `pipeline1` contains intermediate results and `finally_result` from `sub-pipeline 1` (reads \u2192 nr database). \r\n\r\nIn the example, three thresholds (`1e-6`, `1e-3` and `1e-1`) of e-value are used to filter the output diamond program. Thus, three corresponding sub-directories is created and used to store results. \r\n![https://github.com/ZhijianZhou01/metav/blob/main/figure/e-value.png](https://github.com/ZhijianZhou01/metav/blob/main/figure/e-value.png)\r\n\r\nThe meanings of directory name with e-value in `pipeline1` are as follows,\r\n\r\n| sub-directories | description |\r\n| --- | --- |\r\n|lower_1e-6 | e-value of hit reads < `1e-6` |\r\n|lower_0.001 | `1e-6` < e-value of hit reads < `1e-3` |\r\n|lower_0.1 | `1e-3` < e-value of hit reads < `1e-1` |\r\n\r\nThe hierarchy is same in all three sub-directories with e-value. For example, in the directory `hit_summary` of the directory `lower_1e-6`, metav provides a summary file (`hit_reads_taxonomy_information.txt`) with taxonomy information. \r\n\r\nWhat's more, metav counts these hit reads according to `order`, `family` and `strain`(organism) and provides three `*.csv` summary files.\r\n\r\n![https://github.com/ZhijianZhou01/metav/blob/main/figure/reads_summary.png](https://github.com/ZhijianZhou01/metav/blob/main/figure/reads_summary.png)\r\n\r\n<b>In particular, metav extract all hit reads sequences (*fasta format) according to the hierarchical relationship of `order`, `family` and `strain`(organism)</b>. These hit reads sequences are stored in directory `hit_reads_seq`.\r\n\r\n\r\n### 8.3. directory pipeline2\r\nthe directory `pipeline2` contains intermediate results and `finally_result` from `sub-pipeline 2` (reads \u2192 contigs \u2192 nr database). The hierarchy of the directory of output results is the same as directory `pipeline1`.\r\n\r\nHowever, the output in directory `finally_result` of the directory `pipeline2` are hit contigs, not reads.\r\n\r\nThe meanings of directory name with e-value in `pipeline2` are as follows,\r\n\r\n| sub-directories | description |\r\n| --- | --- |\r\n|lower_1e-6 | e-value of hit contigs < `1e-6` |\r\n|lower_0.001 | `1e-6` < e-value of hit contigs < `1e-3` |\r\n|lower_0.1 | `1e-3` < e-value of hit contigs < `1e-1` |\r\n\r\nIn the directory `hit_summary` of each sub-directory with e-value, the sequences and summary information of hit contigs are provided, and these hit contigs sequences are stored in directory `hit_contigs_seq`.\r\n\r\n![https://github.com/ZhijianZhou01/metav/blob/main/figure/contigs_symmary.png](https://github.com/ZhijianZhou01/metav/blob/main/figure/contigs_symmary.png)\r\n\r\n\r\n\r\n## 9. Cite\r\n\r\nFor example, `the viral (or other pathogens) reads/components were identified by metav pipeline (https://github.com/ZhijianZhou01/metav)`.\r\n\r\n## 10. Bug report\r\nmetav was test on Ubuntu 16.04 and Ubuntu 20.02, which can work well. If you run into a problem or find a bug, please contact us.\r\n\r\n[Github issues](https://github.com/ZhijianZhou01/BioAider/issues) or send email to zjzhou@hnu.edu.cn.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "rapid detection and classification of viruses in metagenomics sequencing.",
    "version": "1.0.6",
    "project_urls": {
        "Homepage": "https://github.com/ZhijianZhou01/nextvirus"
    },
    "split_keywords": [
        "virus detection",
        " sequencing",
        " metagenomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "85497300cf22ed147e04f0d7523f93fee2e69032e92fc7b608f854d65ce7ee26",
                "md5": "99687a37378fe113eb5eddd95c14aa50",
                "sha256": "7c831388117704834144dfb593ab8c463dbcd10d2e31a5c3b9b68e86c6953103"
            },
            "downloads": -1,
            "filename": "metav-1.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "99687a37378fe113eb5eddd95c14aa50",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 30202,
            "upload_time": "2024-08-26T10:33:51",
            "upload_time_iso_8601": "2024-08-26T10:33:51.447753Z",
            "url": "https://files.pythonhosted.org/packages/85/49/7300cf22ed147e04f0d7523f93fee2e69032e92fc7b608f854d65ce7ee26/metav-1.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d7f8a9a984664af84c89b0c1701b521b35e8098a9a98fdf4af97466914da62d2",
                "md5": "68c3bdbe522fc9a9ea968a821f99a5bd",
                "sha256": "54c59f9d72aff1a2129a7e02c1cb10a1f14196d87242c8fd6ea18de30a771612"
            },
            "downloads": -1,
            "filename": "metav-1.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "68c3bdbe522fc9a9ea968a821f99a5bd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 30708,
            "upload_time": "2024-08-26T10:33:53",
            "upload_time_iso_8601": "2024-08-26T10:33:53.440603Z",
            "url": "https://files.pythonhosted.org/packages/d7/f8/a9a984664af84c89b0c1701b521b35e8098a9a98fdf4af97466914da62d2/metav-1.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-26 10:33:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ZhijianZhou01",
    "github_project": "nextvirus",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "metav"
}

Zhi-Jian Zhou