MetaPont


NameMetaPont JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/TheHuwsLab/MetaPont
SummaryMetaPont - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data
upload_time2024-11-28 20:39:21
maintainerNone
docs_urlNone
authorNicholas Dimonaco
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MetaPont
**MetaPont**  - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data

## Features - These are the current aims of this project  - Still under development

- **Targeted Functional Analysis:** Search for specific functional IDs (e.g., GO terms) within the `_Final_Contig.tsv` files provided by the HuwsLab Metagenome Workflow (https://github.com/TheHuwsLab/Metagenome_Workflow) .
- **Taxonomic Breakdown:** Extract genus-level taxonomy information and calculate their proportions in the dataset.
- **Batch Processing:** Analyse all `_Final_Contig.tsv` files in a specified directory.
- **Customisable Output:** Save results in a format suitable for downstream analysis.

---

## Installation

### Prerequisites

Ensure you have the following installed:

- Python ~3.6 or later

### Installation via pip

MetaPont is provided as a pip distribution. 

```bash
pip install MetaPont 
```

---

## Usage

--- 
### Extract-By-Function Command-line Arguments
```Extract-By-Function -h ``` 
```bash
usage: Extract_By_Function.py [-h] -d DIRECTORY -f FUNCTION_ID -o OUTPUT
                              [-m MIN_PROPORTION] [-top TOP_TAXA]

MetaPont v0.0.3: Extract-By-Function - Identify taxa contributing to a
specific function.

options:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Directory containing TSV files to analyse.
  -f FUNCTION_ID, --function_id FUNCTION_ID
                        Specific function ID to search for (e.g.,
                        'GO:0016597').
  -o OUTPUT, --output OUTPUT
                        Output file to save results.
  -m MIN_PROPORTION, --min_proportion MIN_PROPORTION
                        Minimum proportion threshold for taxa to be included
                        in the output.
  -top TOP_TAXA, --top_taxa TOP_TAXA
                        Top n taxa to be included in the output.

```

The `Extract-By-Function` tool provides several command-line options: \
Note: Either -m or -top is required.

| Option                   | Description                                                | Required | Default |
|--------------------------|------------------------------------------------------------|----------|---------|
| `-d`, `--directory`      | Directory containing `_Final_Contig.tsv` files to analyse. | Yes      | None    |
| `-f`, `--function_id`    | Functional ID to search for (e.g., `GO:0016597`).          | Yes      | None    |
| `-m`, `--min_proportion` | Minimum proportion needed for reporting.                   | Yes/No   | None    |
| `-top`, `--top_taxa`     | Number of taxa to report.                                  | Yes/No   | None    |
| `-o`, `--output`         | Output file name to save results.                          | Yes      | None    |

### Example

To search for the functional ID `GO:0016597` in all `_Final_Contig.tsv` files within the `test_data/` directory:

```bash
Extract=By-Function -d .../test_data/Final_contig/ -f GO:0016597 -top 3 -o .../test_data/Final_Contig/Extract_By_Function_Out/results.tsv
```

---

## Output

The tool generates a tab-delimited output file with the following columns:

1. **Sample:** Name of the processed Sample.
2. **Taxa:** Genus-level taxonomic assignment extracted from the `Lineage` column.
3. **Reads Assigned (Function):** Number of reads assigned to contigs with the given functional ID.
3. **Proportion:** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the sample.
4. **Proportion (Total Reads):** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the total reads of the sample.

Example output:

```
Function ID: GO:0016597
Sample	Taxa	Reads Assigned (Function)	Proportion (Function)	Proportion (Total Reads)
PN0536_0001_S1_Final_Contig.tsv	Lactobacillus	111963	0.602	0.004
PN0536_0003_S83_Final_Contig.tsv	Lactobacillus	20072	0.457	0.001
PN0536_0002_S2_Final_Contig.tsv	Acutalibacter	145222	0.795	0.005
PN0536_0004_S3_Final_Contig.tsv	Lactobacillus	40076	0.404	0.002
```

---



### Workflow - unfinished

1. The script reads `_Final_Contig.tsv` files from the specified directory.
2. For each file, it searches for occurrences of the given functional ID within specific columns.
3. Matches are associated with genus-level taxonomic information extracted from the `Lineage` column.
4. Taxa proportions are calculated and saved to the output file.

---
## Extract-By-Taxa Command-line Arguments
```Extract-By-Taxa -h ``` 
```bash
usage: Extract_By_Taxa.py [-h] -d DIRECTORY -t TAXON -o OUTPUT -func
                          FUNCTIONAL_CLASSES [-top TOP_FUNCTIONS]

MetaPont: Extract Top Functions by Taxon

options:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Directory containing TSV files to analyse.
  -t TAXON, --taxon TAXON
                        Target taxon to search for (e.g., 'g__Bacillus').
  -o OUTPUT, --output OUTPUT
                        Output file to save results.
  -func FUNCTIONAL_CLASSES, --functional_classes FUNCTIONAL_CLASSES
                        Which functional classes to report (e.g. GO,EC,KEGG
                        etc).
  -top TOP_FUNCTIONS, --top_functions TOP_FUNCTIONS
                        Top n functions to include in the output for each
                        sample (default: 3).

```

The `Extract-By-Taxa` tool provides several command-line options: 


| Option                            | Description                                                 | Required | Default |
|-----------------------------------|-------------------------------------------------------------|----------|---------|
| `-d`, `--directory`               | Directory containing `_Fincal_Contig.tsv` files to analyse. | Yes      | None    |
| `-t`, `--taxon`                   | Taxa to search for (e.g., `g__Bacillus`).                   | Yes      | None    |
| `-func`, `--functional_classes`   | Functional classes to report (e.g. GO,EC,KEGG etc).         | Yes      | None |
| `-top`, `--top_taxa`              | Number of functions to report (default 3).                  | No       | None    |
| `-o`, `--output`                  | Output file name to save results.                           | Yes      | None    |

### Example

To search for the top reported functions for taxon `g__Bacillus` in all `_Final_Contig.tsv` files within the `test_data/` directory:

```bash
Extract-By-Taxa -d .../test_data/Final_Contig -t g__Bacillus -o .../test_data/Final_Contig/Extract_By_Taxa/results.tsv  -func GO
```

---
## Output

The tool generates a tab-delimited output file with the following columns:

1. **Sample:** Name of the processed Sample.
2. **Function:** Reported 'top' function.
3. **Num of Assignments (Functions):** Number of times the function has been assigned across all contigs reported as chosen Taxon.

Example output:

```
Selected Taxon: g__Bacillus
Sample	Function	Num of Assignments
PN0536_0001_S1	GO:0008150	296
PN0536_0001_S1	GO:0003674	285
PN0536_0001_S1	GO:0005575	254
PN0536_0003_S83	GO:0005575	45
PN0536_0003_S83	GO:0008150	44
PN0536_0003_S83	GO:0003674	43
PN0536_0002_S2	GO:0005575	5
PN0536_0002_S2	GO:0008150	5
PN0536_0002_S2	GO:0005623	4
PN0536_0004_S3	GO:0008150	4
PN0536_0004_S3	GO:0003674	3
PN0536_0004_S3	GO:0005488	3

```

---

### Large File Handling (Might be a failure point)

The script uses `csv.field_size_limit` to handle exceptionally large `.tsv` files.

---



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TheHuwsLab/MetaPont",
    "name": "MetaPont",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Nicholas Dimonaco",
    "author_email": "nicholas@dimonaco.co.uk",
    "download_url": "https://files.pythonhosted.org/packages/86/f1/d25b638405f4f3a5c74a4d0abb4134e4e7bb5fbf27b9c0e7755775ce5ad1/metapont-0.0.3.tar.gz",
    "platform": null,
    "description": "# MetaPont\n**MetaPont**  - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data\n\n## Features - These are the current aims of this project  - Still under development\n\n- **Targeted Functional Analysis:** Search for specific functional IDs (e.g., GO terms) within the `_Final_Contig.tsv` files provided by the HuwsLab Metagenome Workflow (https://github.com/TheHuwsLab/Metagenome_Workflow) .\n- **Taxonomic Breakdown:** Extract genus-level taxonomy information and calculate their proportions in the dataset.\n- **Batch Processing:** Analyse all `_Final_Contig.tsv` files in a specified directory.\n- **Customisable Output:** Save results in a format suitable for downstream analysis.\n\n---\n\n## Installation\n\n### Prerequisites\n\nEnsure you have the following installed:\n\n- Python ~3.6 or later\n\n### Installation via pip\n\nMetaPont is provided as a pip distribution. \n\n```bash\npip install MetaPont \n```\n\n---\n\n## Usage\n\n--- \n### Extract-By-Function Command-line Arguments\n```Extract-By-Function -h ``` \n```bash\nusage: Extract_By_Function.py [-h] -d DIRECTORY -f FUNCTION_ID -o OUTPUT\n                              [-m MIN_PROPORTION] [-top TOP_TAXA]\n\nMetaPont v0.0.3: Extract-By-Function - Identify taxa contributing to a\nspecific function.\n\noptions:\n  -h, --help            show this help message and exit\n  -d DIRECTORY, --directory DIRECTORY\n                        Directory containing TSV files to analyse.\n  -f FUNCTION_ID, --function_id FUNCTION_ID\n                        Specific function ID to search for (e.g.,\n                        'GO:0016597').\n  -o OUTPUT, --output OUTPUT\n                        Output file to save results.\n  -m MIN_PROPORTION, --min_proportion MIN_PROPORTION\n                        Minimum proportion threshold for taxa to be included\n                        in the output.\n  -top TOP_TAXA, --top_taxa TOP_TAXA\n                        Top n taxa to be included in the output.\n\n```\n\nThe `Extract-By-Function` tool provides several command-line options: \\\nNote: Either -m or -top is required.\n\n| Option                   | Description                                                | Required | Default |\n|--------------------------|------------------------------------------------------------|----------|---------|\n| `-d`, `--directory`      | Directory containing `_Final_Contig.tsv` files to analyse. | Yes      | None    |\n| `-f`, `--function_id`    | Functional ID to search for (e.g., `GO:0016597`).          | Yes      | None    |\n| `-m`, `--min_proportion` | Minimum proportion needed for reporting.                   | Yes/No   | None    |\n| `-top`, `--top_taxa`     | Number of taxa to report.                                  | Yes/No   | None    |\n| `-o`, `--output`         | Output file name to save results.                          | Yes      | None    |\n\n### Example\n\nTo search for the functional ID `GO:0016597` in all `_Final_Contig.tsv` files within the `test_data/` directory:\n\n```bash\nExtract=By-Function -d .../test_data/Final_contig/ -f GO:0016597 -top 3 -o .../test_data/Final_Contig/Extract_By_Function_Out/results.tsv\n```\n\n---\n\n## Output\n\nThe tool generates a tab-delimited output file with the following columns:\n\n1. **Sample:** Name of the processed Sample.\n2. **Taxa:** Genus-level taxonomic assignment extracted from the `Lineage` column.\n3. **Reads Assigned (Function):** Number of reads assigned to contigs with the given functional ID.\n3. **Proportion:** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the sample.\n4. **Proportion (Total Reads):** Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the total reads of the sample.\n\nExample output:\n\n```\nFunction ID: GO:0016597\nSample\tTaxa\tReads Assigned (Function)\tProportion (Function)\tProportion (Total Reads)\nPN0536_0001_S1_Final_Contig.tsv\tLactobacillus\t111963\t0.602\t0.004\nPN0536_0003_S83_Final_Contig.tsv\tLactobacillus\t20072\t0.457\t0.001\nPN0536_0002_S2_Final_Contig.tsv\tAcutalibacter\t145222\t0.795\t0.005\nPN0536_0004_S3_Final_Contig.tsv\tLactobacillus\t40076\t0.404\t0.002\n```\n\n---\n\n\n\n### Workflow - unfinished\n\n1. The script reads `_Final_Contig.tsv` files from the specified directory.\n2. For each file, it searches for occurrences of the given functional ID within specific columns.\n3. Matches are associated with genus-level taxonomic information extracted from the `Lineage` column.\n4. Taxa proportions are calculated and saved to the output file.\n\n---\n## Extract-By-Taxa Command-line Arguments\n```Extract-By-Taxa -h ``` \n```bash\nusage: Extract_By_Taxa.py [-h] -d DIRECTORY -t TAXON -o OUTPUT -func\n                          FUNCTIONAL_CLASSES [-top TOP_FUNCTIONS]\n\nMetaPont: Extract Top Functions by Taxon\n\noptions:\n  -h, --help            show this help message and exit\n  -d DIRECTORY, --directory DIRECTORY\n                        Directory containing TSV files to analyse.\n  -t TAXON, --taxon TAXON\n                        Target taxon to search for (e.g., 'g__Bacillus').\n  -o OUTPUT, --output OUTPUT\n                        Output file to save results.\n  -func FUNCTIONAL_CLASSES, --functional_classes FUNCTIONAL_CLASSES\n                        Which functional classes to report (e.g. GO,EC,KEGG\n                        etc).\n  -top TOP_FUNCTIONS, --top_functions TOP_FUNCTIONS\n                        Top n functions to include in the output for each\n                        sample (default: 3).\n\n```\n\nThe `Extract-By-Taxa` tool provides several command-line options: \n\n\n| Option                            | Description                                                 | Required | Default |\n|-----------------------------------|-------------------------------------------------------------|----------|---------|\n| `-d`, `--directory`               | Directory containing `_Fincal_Contig.tsv` files to analyse. | Yes      | None    |\n| `-t`, `--taxon`                   | Taxa to search for (e.g., `g__Bacillus`).                   | Yes      | None    |\n| `-func`, `--functional_classes`   | Functional classes to report (e.g. GO,EC,KEGG etc).         | Yes      | None |\n| `-top`, `--top_taxa`              | Number of functions to report (default 3).                  | No       | None    |\n| `-o`, `--output`                  | Output file name to save results.                           | Yes      | None    |\n\n### Example\n\nTo search for the top reported functions for taxon `g__Bacillus` in all `_Final_Contig.tsv` files within the `test_data/` directory:\n\n```bash\nExtract-By-Taxa -d .../test_data/Final_Contig -t g__Bacillus -o .../test_data/Final_Contig/Extract_By_Taxa/results.tsv  -func GO\n```\n\n---\n## Output\n\nThe tool generates a tab-delimited output file with the following columns:\n\n1. **Sample:** Name of the processed Sample.\n2. **Function:** Reported 'top' function.\n3. **Num of Assignments (Functions):** Number of times the function has been assigned across all contigs reported as chosen Taxon.\n\nExample output:\n\n```\nSelected Taxon: g__Bacillus\nSample\tFunction\tNum of Assignments\nPN0536_0001_S1\tGO:0008150\t296\nPN0536_0001_S1\tGO:0003674\t285\nPN0536_0001_S1\tGO:0005575\t254\nPN0536_0003_S83\tGO:0005575\t45\nPN0536_0003_S83\tGO:0008150\t44\nPN0536_0003_S83\tGO:0003674\t43\nPN0536_0002_S2\tGO:0005575\t5\nPN0536_0002_S2\tGO:0008150\t5\nPN0536_0002_S2\tGO:0005623\t4\nPN0536_0004_S3\tGO:0008150\t4\nPN0536_0004_S3\tGO:0003674\t3\nPN0536_0004_S3\tGO:0005488\t3\n\n```\n\n---\n\n### Large File Handling (Might be a failure point)\n\nThe script uses `csv.field_size_limit` to handle exceptionally large `.tsv` files.\n\n---\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "MetaPont - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data",
    "version": "0.0.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/TheHuwsLab/MetaPont/issues",
        "Homepage": "https://github.com/TheHuwsLab/MetaPont"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "27b7068bf892927c59d0fc89c56a5f186067c7188958d9b4ac1e1c2608dc562a",
                "md5": "d325d62d2fd5e257e4ea56833642c678",
                "sha256": "4009b1adcaad19362943504eddf871812458a8c2830a342f5c5e6cbf23dea09e"
            },
            "downloads": -1,
            "filename": "MetaPont-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d325d62d2fd5e257e4ea56833642c678",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 31387,
            "upload_time": "2024-11-28T20:39:20",
            "upload_time_iso_8601": "2024-11-28T20:39:20.244104Z",
            "url": "https://files.pythonhosted.org/packages/27/b7/068bf892927c59d0fc89c56a5f186067c7188958d9b4ac1e1c2608dc562a/MetaPont-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "86f1d25b638405f4f3a5c74a4d0abb4134e4e7bb5fbf27b9c0e7755775ce5ad1",
                "md5": "5214cdfaf7f6275e8e7872f715e14e3f",
                "sha256": "6c46a0bdc16923ac0129a715058b943acb764a6d7062d5298278609612cd832d"
            },
            "downloads": -1,
            "filename": "metapont-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "5214cdfaf7f6275e8e7872f715e14e3f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 27764,
            "upload_time": "2024-11-28T20:39:21",
            "upload_time_iso_8601": "2024-11-28T20:39:21.214732Z",
            "url": "https://files.pythonhosted.org/packages/86/f1/d25b638405f4f3a5c74a4d0abb4134e4e7bb5fbf27b9c0e7755775ce5ad1/metapont-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-28 20:39:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TheHuwsLab",
    "github_project": "MetaPont",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "metapont"
}
        
Elapsed time: 0.58602s