Name | MetaPont JSON |
Version |
0.0.3
JSON |
| download |
home_page | https://github.com/TheHuwsLab/MetaPont |
Summary | MetaPont - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data |
upload_time | 2024-11-28 20:39:21 |
maintainer | None |
docs_url | None |
author | Nicholas Dimonaco |
requires_python | >=3.6 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# MetaPont
**MetaPont** - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data
## Features - These are the current aims of this project - Still under development
- **Targeted Functional Analysis:** Search for specific functional IDs (e.g., GO terms) within the `_Final_Contig.tsv` files provided by the HuwsLab Metagenome Workflow (https://github.com/TheHuwsLab/Metagenome_Workflow) .
- **Taxonomic Breakdown:** Extract genus-level taxonomy information and calculate their proportions in the dataset.
- **Batch Processing:** Analyse all `_Final_Contig.tsv` files in a specified directory.
- **Customisable Output:** Save results in a format suitable for downstream analysis.
---
## Installation
### Prerequisites
Ensure you have the following installed:
- Python ~3.6 or later
### Installation via pip
MetaPont is provided as a pip distribution.
```bash
pip install MetaPont
```
---
## Usage
---
### Extract-By-Function Command-line Arguments
```Extract-By-Function -h ```
```bash
usage: Extract_By_Function.py [-h] -d DIRECTORY -f FUNCTION_ID -o OUTPUT
[-m MIN_PROPORTION] [-top TOP_TAXA]
MetaPont v0.0.3: Extract-By-Function - Identify taxa contributing to a
specific function.
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-f FUNCTION_ID, --function_id FUNCTION_ID
Specific function ID to search for (e.g.,
'GO:0016597').
-o OUTPUT, --output OUTPUT
Output file to save results.
-m MIN_PROPORTION, --min_proportion MIN_PROPORTION
Minimum proportion threshold for taxa to be included
in the output.
-top TOP_TAXA, --top_taxa TOP_TAXA
Top n taxa to be included in the output.
```
The `Extract-By-Function` tool provides several command-line options: \
Note: Either -m or -top is required.
| Option | Description | Required | Default |
|--------------------------|------------------------------------------------------------|----------|---------|
| `-d`, `--directory` | Directory containing `_Final_Contig.tsv` files to analyse. | Yes | None |
| `-f`, `--function_id` | Functional ID to search for (e.g., `GO:0016597`). | Yes | None |
| `-m`, `--min_proportion` | Minimum proportion needed for reporting. | Yes/No | None |
| `-top`, `--top_taxa` | Number of taxa to report. | Yes/No | None |
| `-o`, `--output` | Output file name to save results. | Yes | None |
### Example
To search for the functional ID `GO:0016597` in all `_Final_Contig.tsv` files within the `test_data/` directory:
```bash
Extract=By-Function -d .../test_data/Final_contig/ -f GO:0016597 -top 3 -o .../test_data/Final_Contig/Extract_By_Function_Out/results.tsv
```
---
## Output
The tool generates a tab-delimited output file with the following columns:
1. **Sample:** Name of the processed Sample.
2. **Taxa:** Genus-level taxonomic assignment extracted from the `Lineage` column.
3. **Reads Assigned (Function):** Number of reads assigned to contigs with the given functional ID.
3. **Proportion:** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the sample.
4. **Proportion (Total Reads):** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the total reads of the sample.
Example output:
```
Function ID: GO:0016597
Sample Taxa Reads Assigned (Function) Proportion (Function) Proportion (Total Reads)
PN0536_0001_S1_Final_Contig.tsv Lactobacillus 111963 0.602 0.004
PN0536_0003_S83_Final_Contig.tsv Lactobacillus 20072 0.457 0.001
PN0536_0002_S2_Final_Contig.tsv Acutalibacter 145222 0.795 0.005
PN0536_0004_S3_Final_Contig.tsv Lactobacillus 40076 0.404 0.002
```
---
### Workflow - unfinished
1. The script reads `_Final_Contig.tsv` files from the specified directory.
2. For each file, it searches for occurrences of the given functional ID within specific columns.
3. Matches are associated with genus-level taxonomic information extracted from the `Lineage` column.
4. Taxa proportions are calculated and saved to the output file.
---
## Extract-By-Taxa Command-line Arguments
```Extract-By-Taxa -h ```
```bash
usage: Extract_By_Taxa.py [-h] -d DIRECTORY -t TAXON -o OUTPUT -func
FUNCTIONAL_CLASSES [-top TOP_FUNCTIONS]
MetaPont: Extract Top Functions by Taxon
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-t TAXON, --taxon TAXON
Target taxon to search for (e.g., 'g__Bacillus').
-o OUTPUT, --output OUTPUT
Output file to save results.
-func FUNCTIONAL_CLASSES, --functional_classes FUNCTIONAL_CLASSES
Which functional classes to report (e.g. GO,EC,KEGG
etc).
-top TOP_FUNCTIONS, --top_functions TOP_FUNCTIONS
Top n functions to include in the output for each
sample (default: 3).
```
The `Extract-By-Taxa` tool provides several command-line options:
| Option | Description | Required | Default |
|-----------------------------------|-------------------------------------------------------------|----------|---------|
| `-d`, `--directory` | Directory containing `_Fincal_Contig.tsv` files to analyse. | Yes | None |
| `-t`, `--taxon` | Taxa to search for (e.g., `g__Bacillus`). | Yes | None |
| `-func`, `--functional_classes` | Functional classes to report (e.g. GO,EC,KEGG etc). | Yes | None |
| `-top`, `--top_taxa` | Number of functions to report (default 3). | No | None |
| `-o`, `--output` | Output file name to save results. | Yes | None |
### Example
To search for the top reported functions for taxon `g__Bacillus` in all `_Final_Contig.tsv` files within the `test_data/` directory:
```bash
Extract-By-Taxa -d .../test_data/Final_Contig -t g__Bacillus -o .../test_data/Final_Contig/Extract_By_Taxa/results.tsv -func GO
```
---
## Output
The tool generates a tab-delimited output file with the following columns:
1. **Sample:** Name of the processed Sample.
2. **Function:** Reported 'top' function.
3. **Num of Assignments (Functions):** Number of times the function has been assigned across all contigs reported as chosen Taxon.
Example output:
```
Selected Taxon: g__Bacillus
Sample Function Num of Assignments
PN0536_0001_S1 GO:0008150 296
PN0536_0001_S1 GO:0003674 285
PN0536_0001_S1 GO:0005575 254
PN0536_0003_S83 GO:0005575 45
PN0536_0003_S83 GO:0008150 44
PN0536_0003_S83 GO:0003674 43
PN0536_0002_S2 GO:0005575 5
PN0536_0002_S2 GO:0008150 5
PN0536_0002_S2 GO:0005623 4
PN0536_0004_S3 GO:0008150 4
PN0536_0004_S3 GO:0003674 3
PN0536_0004_S3 GO:0005488 3
```
---
### Large File Handling (Might be a failure point)
The script uses `csv.field_size_limit` to handle exceptionally large `.tsv` files.
---
Raw data
{
"_id": null,
"home_page": "https://github.com/TheHuwsLab/MetaPont",
"name": "MetaPont",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Nicholas Dimonaco",
"author_email": "nicholas@dimonaco.co.uk",
"download_url": "https://files.pythonhosted.org/packages/86/f1/d25b638405f4f3a5c74a4d0abb4134e4e7bb5fbf27b9c0e7755775ce5ad1/metapont-0.0.3.tar.gz",
"platform": null,
"description": "# MetaPont\n**MetaPont** - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data\n\n## Features - These are the current aims of this project - Still under development\n\n- **Targeted Functional Analysis:** Search for specific functional IDs (e.g., GO terms) within the `_Final_Contig.tsv` files provided by the HuwsLab Metagenome Workflow (https://github.com/TheHuwsLab/Metagenome_Workflow) .\n- **Taxonomic Breakdown:** Extract genus-level taxonomy information and calculate their proportions in the dataset.\n- **Batch Processing:** Analyse all `_Final_Contig.tsv` files in a specified directory.\n- **Customisable Output:** Save results in a format suitable for downstream analysis.\n\n---\n\n## Installation\n\n### Prerequisites\n\nEnsure you have the following installed:\n\n- Python ~3.6 or later\n\n### Installation via pip\n\nMetaPont is provided as a pip distribution. \n\n```bash\npip install MetaPont \n```\n\n---\n\n## Usage\n\n--- \n### Extract-By-Function Command-line Arguments\n```Extract-By-Function -h ``` \n```bash\nusage: Extract_By_Function.py [-h] -d DIRECTORY -f FUNCTION_ID -o OUTPUT\n [-m MIN_PROPORTION] [-top TOP_TAXA]\n\nMetaPont v0.0.3: Extract-By-Function - Identify taxa contributing to a\nspecific function.\n\noptions:\n -h, --help show this help message and exit\n -d DIRECTORY, --directory DIRECTORY\n Directory containing TSV files to analyse.\n -f FUNCTION_ID, --function_id FUNCTION_ID\n Specific function ID to search for (e.g.,\n 'GO:0016597').\n -o OUTPUT, --output OUTPUT\n Output file to save results.\n -m MIN_PROPORTION, --min_proportion MIN_PROPORTION\n Minimum proportion threshold for taxa to be included\n in the output.\n -top TOP_TAXA, --top_taxa TOP_TAXA\n Top n taxa to be included in the output.\n\n```\n\nThe `Extract-By-Function` tool provides several command-line options: \\\nNote: Either -m or -top is required.\n\n| Option | Description | Required | Default |\n|--------------------------|------------------------------------------------------------|----------|---------|\n| `-d`, `--directory` | Directory containing `_Final_Contig.tsv` files to analyse. | Yes | None |\n| `-f`, `--function_id` | Functional ID to search for (e.g., `GO:0016597`). | Yes | None |\n| `-m`, `--min_proportion` | Minimum proportion needed for reporting. | Yes/No | None |\n| `-top`, `--top_taxa` | Number of taxa to report. | Yes/No | None |\n| `-o`, `--output` | Output file name to save results. | Yes | None |\n\n### Example\n\nTo search for the functional ID `GO:0016597` in all `_Final_Contig.tsv` files within the `test_data/` directory:\n\n```bash\nExtract=By-Function -d .../test_data/Final_contig/ -f GO:0016597 -top 3 -o .../test_data/Final_Contig/Extract_By_Function_Out/results.tsv\n```\n\n---\n\n## Output\n\nThe tool generates a tab-delimited output file with the following columns:\n\n1. **Sample:** Name of the processed Sample.\n2. **Taxa:** Genus-level taxonomic assignment extracted from the `Lineage` column.\n3. **Reads Assigned (Function):** Number of reads assigned to contigs with the given functional ID.\n3. **Proportion:** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the sample.\n4. **Proportion (Total Reads):** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the total reads of the sample.\n\nExample output:\n\n```\nFunction ID: GO:0016597\nSample\tTaxa\tReads Assigned (Function)\tProportion (Function)\tProportion (Total Reads)\nPN0536_0001_S1_Final_Contig.tsv\tLactobacillus\t111963\t0.602\t0.004\nPN0536_0003_S83_Final_Contig.tsv\tLactobacillus\t20072\t0.457\t0.001\nPN0536_0002_S2_Final_Contig.tsv\tAcutalibacter\t145222\t0.795\t0.005\nPN0536_0004_S3_Final_Contig.tsv\tLactobacillus\t40076\t0.404\t0.002\n```\n\n---\n\n\n\n### Workflow - unfinished\n\n1. The script reads `_Final_Contig.tsv` files from the specified directory.\n2. For each file, it searches for occurrences of the given functional ID within specific columns.\n3. Matches are associated with genus-level taxonomic information extracted from the `Lineage` column.\n4. Taxa proportions are calculated and saved to the output file.\n\n---\n## Extract-By-Taxa Command-line Arguments\n```Extract-By-Taxa -h ``` \n```bash\nusage: Extract_By_Taxa.py [-h] -d DIRECTORY -t TAXON -o OUTPUT -func\n FUNCTIONAL_CLASSES [-top TOP_FUNCTIONS]\n\nMetaPont: Extract Top Functions by Taxon\n\noptions:\n -h, --help show this help message and exit\n -d DIRECTORY, --directory DIRECTORY\n Directory containing TSV files to analyse.\n -t TAXON, --taxon TAXON\n Target taxon to search for (e.g., 'g__Bacillus').\n -o OUTPUT, --output OUTPUT\n Output file to save results.\n -func FUNCTIONAL_CLASSES, --functional_classes FUNCTIONAL_CLASSES\n Which functional classes to report (e.g. GO,EC,KEGG\n etc).\n -top TOP_FUNCTIONS, --top_functions TOP_FUNCTIONS\n Top n functions to include in the output for each\n sample (default: 3).\n\n```\n\nThe `Extract-By-Taxa` tool provides several command-line options: \n\n\n| Option | Description | Required | Default |\n|-----------------------------------|-------------------------------------------------------------|----------|---------|\n| `-d`, `--directory` | Directory containing `_Fincal_Contig.tsv` files to analyse. | Yes | None |\n| `-t`, `--taxon` | Taxa to search for (e.g., `g__Bacillus`). | Yes | None |\n| `-func`, `--functional_classes` | Functional classes to report (e.g. GO,EC,KEGG etc). | Yes | None |\n| `-top`, `--top_taxa` | Number of functions to report (default 3). | No | None |\n| `-o`, `--output` | Output file name to save results. | Yes | None |\n\n### Example\n\nTo search for the top reported functions for taxon `g__Bacillus` in all `_Final_Contig.tsv` files within the `test_data/` directory:\n\n```bash\nExtract-By-Taxa -d .../test_data/Final_Contig -t g__Bacillus -o .../test_data/Final_Contig/Extract_By_Taxa/results.tsv -func GO\n```\n\n---\n## Output\n\nThe tool generates a tab-delimited output file with the following columns:\n\n1. **Sample:** Name of the processed Sample.\n2. **Function:** Reported 'top' function.\n3. **Num of Assignments (Functions):** Number of times the function has been assigned across all contigs reported as chosen Taxon.\n\nExample output:\n\n```\nSelected Taxon: g__Bacillus\nSample\tFunction\tNum of Assignments\nPN0536_0001_S1\tGO:0008150\t296\nPN0536_0001_S1\tGO:0003674\t285\nPN0536_0001_S1\tGO:0005575\t254\nPN0536_0003_S83\tGO:0005575\t45\nPN0536_0003_S83\tGO:0008150\t44\nPN0536_0003_S83\tGO:0003674\t43\nPN0536_0002_S2\tGO:0005575\t5\nPN0536_0002_S2\tGO:0008150\t5\nPN0536_0002_S2\tGO:0005623\t4\nPN0536_0004_S3\tGO:0008150\t4\nPN0536_0004_S3\tGO:0003674\t3\nPN0536_0004_S3\tGO:0005488\t3\n\n```\n\n---\n\n### Large File Handling (Might be a failure point)\n\nThe script uses `csv.field_size_limit` to handle exceptionally large `.tsv` files.\n\n---\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "MetaPont - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data",
"version": "0.0.3",
"project_urls": {
"Bug Tracker": "https://github.com/TheHuwsLab/MetaPont/issues",
"Homepage": "https://github.com/TheHuwsLab/MetaPont"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "27b7068bf892927c59d0fc89c56a5f186067c7188958d9b4ac1e1c2608dc562a",
"md5": "d325d62d2fd5e257e4ea56833642c678",
"sha256": "4009b1adcaad19362943504eddf871812458a8c2830a342f5c5e6cbf23dea09e"
},
"downloads": -1,
"filename": "MetaPont-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d325d62d2fd5e257e4ea56833642c678",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 31387,
"upload_time": "2024-11-28T20:39:20",
"upload_time_iso_8601": "2024-11-28T20:39:20.244104Z",
"url": "https://files.pythonhosted.org/packages/27/b7/068bf892927c59d0fc89c56a5f186067c7188958d9b4ac1e1c2608dc562a/MetaPont-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "86f1d25b638405f4f3a5c74a4d0abb4134e4e7bb5fbf27b9c0e7755775ce5ad1",
"md5": "5214cdfaf7f6275e8e7872f715e14e3f",
"sha256": "6c46a0bdc16923ac0129a715058b943acb764a6d7062d5298278609612cd832d"
},
"downloads": -1,
"filename": "metapont-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "5214cdfaf7f6275e8e7872f715e14e3f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 27764,
"upload_time": "2024-11-28T20:39:21",
"upload_time_iso_8601": "2024-11-28T20:39:21.214732Z",
"url": "https://files.pythonhosted.org/packages/86/f1/d25b638405f4f3a5c74a4d0abb4134e4e7bb5fbf27b9c0e7755775ce5ad1/metapont-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-28 20:39:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TheHuwsLab",
"github_project": "MetaPont",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "metapont"
}