tnseeker

Name	tnseeker JSON
Version	1.1.0.5 JSON
	download
home_page	https://github.com/afombravo/tnseeker
Summary	TnSeeker
upload_time	2025-02-20 20:37:05
maintainer	None
docs_url	None
author	Afonso M Bravo
requires_python	None
license	None
keywords	tn-seq essentiality rb-seq
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">

[![Active Development](https://img.shields.io/badge/Maintenance%20Level-Actively%20Developed-brightgreen.svg)](https://gist.github.com/cheerfulstoic/d107229326a01ff0f333a1d3476e068d)
![PyPI](https://img.shields.io/pypi/v/tnseeker.svg)
![Docker Image Version (latest by date)](https://img.shields.io/docker/v/afombravo/tnseeker/latest)
![Docker Pulls](https://img.shields.io/docker/pulls/afombravo/tnseeker)

</div>

# Tnseeker
Tnseeker is an advanced pipeline tailored for transposon insertion sequencing (Tn-Seq) analysis. 

It performs an array of tasks: 
1. Read trimming based on the presence of transposon sequences & extraction of associated linked barcodes (if present)
2. Alignment to reference genome (bowtie2)
3. Links genomic locations (using .gff or .gb files as input) with transposon insertion locations
4. Infer essential genes based on global and local (contig wise) transposon insertion densities

What truly distinguishes Tnseeker from other tools is its unique capability to automatically infer and adjust threshold/cutoff parameters. This negates the need for intricate user input, allowing for a more hands-off determination of gene essentiality based on the data. 

Tnseeker is also compatible with any transposon disruption experiment, be it Tn5, HIMAR, or anything else. Tnseeker is thus versatile enough to handle all Tn-Seq datasets.

Tnseeker is under active developement and is available as is. Contact me if you are interested in adapting the program or have any questions. Bugs can be expected. Please report any weird or unintented behaviour. 

## Instalation
There are two main ways of installing tnseeker:


### 1. Recommended installation (using singularity)
1.  Install singularity in your system 

```
conda create -n singularity -c conda-forge singularity -y
```

2.  Active the conda environment
```
conda activate singularity
```

3.  Download the docker image from dockerhub. This will write a singularity container named `tnseeker_latest.sif` into your current work directory.
```bash
singularity pull docker://afombravo/tnseeker:latest
```

4. Start an interactive session of the container. Importantly, you need to `--bind` all the input files. It is easiest if you put all input files into a single folder, as this is how `tnseeker` expects it's input. The results will be written into the same folder.
Note: the `:rw` at the end of the path is crucial for singularity to obtain read/write permission and hence be able to compute.
```
singularity shell --bind /path/to/folder/containing/all/input/files:/input_files:rw \
                  tnseeker_latest.sif
```

5. Start a `tnseeker` run with, for example, the following command:

```
cd /input_files; 
tnseeker \
  --cpu 4 \
  -s ORGANISM_FASTA/GB/GFF_NAME \
  -sd .  \
  -ad .  \
  -at gff \
  -st SE \
  --tn TN_SEQUENCE (ex: AGATTA) \ 
  --m 6 \
  --b \
  --b1 UPSTREAM_BARCODE_SEQUENCE (ex: AGAGA) \
  --b2 DOWNSTREAM_BARCODE_SEQUENCE (ex: ATATAT) \
  --ph 10 \
  --mq 20 \
  --b1m 3 \
  --b2m 3 \
  --ig 100 
```
When using HPC systems it is advisable to include the `--cpu` flag and specify the amount of threads.


---
### 2. Recommended installation #2 (using docker)
1.  Install docker in your system
2.  Download the docker image from dockerhub
```bash
docker pull afombravo/tnseeker:latest
```

3.  Rename to just tnseeker
```bash
docker tag afombravo/tnseeker:latest tnseeker
```

Alternatively, download the docker file from this repo and build it yourself.
```bash
docker build --no-cache -t tnseeker .
```

4.  Start tnseeker docker image with the comand:
```bash
docker run -it -v "<local_path/to/all/your/data>:/data" tnseeker
```

5.  Start tnseeker with:
```bash
tnseeker -sd ./ -ad ./ <ALL OTHER TNSEEKER COMANDS HERE>
```

NOTE: all files required by tnseeker, such as .fasta, .fastq, .gb, or .gff, need to be in the local folder indicated in 4. You then can use the -sd and -ad flags as indicated here in 5.


---
### 3. Alternative installation
The tnseeker pipeline requires Python3, Bowtie2, and BLAST, to be callable from the terminal (and added to path). 

#### For local BLAST
```bash
apt update
apt install ncbi-blast+
```

#### For bowtie2
```bash
apt install bowtie2
```

#### PyPI module
tnseeker can be installed as PyPI module with the folowing:
```bash
pip install tnseeker
```

tnseeker is executable from the command line by typing:

```bash
tnseeker
```

---
## Running Tnseeker

Tnseeker also has a test mode where the blast and Bowtie2 instalations are tested, and a small run on a test dataset is performed.

```bash
tnseeker --tst
```

A quick to use example case is the folowing. See below the meaning of the input arguments:

```bash
tnseeker -s BW25113 -sd ./ -ad ./ -at gb -st SE --tn AGATGTGTATAAGAGACAG --ph 10 --mq 40
```


### File requirements

tnseeker requires several input files:

 1. A '.fastq.gz' file (needs to be .gz)
 
 2. An annotation file in genbank format (.gb), or a .gff (there is an example gff format file in this repo)
 
 3. A FASTA file with the genome under analysis.

---


## Optional Arguments and their meaning:

  -h, --help   show this help message and exit

  -s [S]        Strain name. Must match the annotation (FASTA/GB) file names

  -sd [SD]      The full path to the sequencing files FOLDER

  --sd_2 [SD_2] The full path to the pair ended sequencing files FOLDER (needs
                to be different from the first folder. In this case keep both ends in different folders.)

  -ad [AD]      The full path to the directory with the .gb and .fasta files

  -at [AT]      Annotation Type (Genbank: -at gb | gff: -at gff)

  -st [ST]      Sequencing type (Paired-ended (PE)/Single-ended(SE))

  --tst [TST]  Test mode to confirm everything works as expected.

  --tn [TN]    Transposon border sequence (tn5: GATGTGTATAAGAGACAG). Required for triming and better/faster mapping.

  --m [M]      Mismatches in the transposon border sequence (default is 0)

  --k [K]      Remove intermediate files. Default is yes, remove.

  --e [E]      Run only the essential determing script. Requires the
               all_insertions_STRAIN.csv file to have been generated first.

  --t [T]      Trims to the indicated nucleotides length AFTER finding the
               transposon sequence. For example, 100 would mean to keep the first
               100bp after the transposon (this trimmed read will be used for
               alignement after)

  --b [B]      Run with barcode extraction.

  --b1 [B1]    upstream barcode search sequence (example: ATC)

  --b2 [B2]    downstream barcode search sequence (example: CTA)

  --b1m [B1M]  upstream barcode search sequence mismatches (default is 0)

  --b2m [B2M]  downstream barcode search sequence mismatches (default is 0)

  --b1p [B1P]  upstream barcode sequence Phred-score filtering. (default is 1)

  --b2p [B2P]  downstream barcode sequence Phred-score filtering. (default is 1)

  --rt [RT]    Read threshold number in absolute reads number (default is 0). All insertions with <= reads will be removed.

  --ne [NE]    Run without essential finding (outputs a .csv file with all annotated transposon insertions)

  --ph [PH]    Phred Score (removes reads where nucleotides have lower phred
               scores) (default is 1)

  --mq [MQ]    Bowtie2 MAPQ threshold (default is 42)

  --ig [IG]    The number of bp up and down stream of any gene that will be ignored before a non maped DNA strech is considered an intergenic region.

  --pv [PV]    Starting pvalue threshold for essentiality determination. Default is 0.05. p-value will be lowered iterativelly based on the optimization of the gold set essential genes. If no genes are present. The value will be the default one (or other if indicated)

  --dut [DUT]  The correct essentiality calling of features that have both 'too small domains' and 'non-essential' sub-gene divisions needs a special case. 
               When the latter two are present, if 'too small domains' > ('too small domains' + 'non-essential') * dut-value (The default is 0.75), then a feature will be demeed 'too small domains' in its entirity. Otherwise, 'non-essential'.

  --sl5 [SL5]  5' gene trimming percent to be ignored for essentiality determination (number
               between 0 and 1)

  --sl3 [SL3]  3' gene trimming percent to be ignored for essentiality determination (number
               between 0 and 1)

  --cpu [CPU]  Define the number of threads (must be an integer). Advisable when using HPC systems.

---
## Python Dependencies

Tnseeker requires several dependencies, all automatically instalable.
A notable exception is the poibin module, which is available in the current tnseeker folder (you as the user don't need to do anything else), and can be originally be found here: https://github.com/tsakim/poibin


---
### Working modes

tnseeker is composed of 2 submodules: 

1. the initial sequencing processing: Handles the read trimming and alignment, creating a compiled .csv with all found transposon insertions. When individual transposon read associated barcodes are present, these are also extracted. Running only this subpipeline can be achieved by including the `--ne` flag

2. The Essential_finder: Infers gene essentiality from the insertion information found in the previous .csv file. tnseeker can thus be run on a standalone mode if the appropriate .csv and annotation files are indicated. Running only this subpipeline can be achieved by including the `--e` flag

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/afombravo/tnseeker",
    "name": "tnseeker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "tn-seq, essentiality, rb-seq",
    "author": "Afonso M Bravo",
    "author_email": "<afonsombravo@hotmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e8/45/6cb32c59817fd9414be8c803d0b20bd9b092011eea87ac130869346cbc60/tnseeker-1.1.0.5.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n[![Active Development](https://img.shields.io/badge/Maintenance%20Level-Actively%20Developed-brightgreen.svg)](https://gist.github.com/cheerfulstoic/d107229326a01ff0f333a1d3476e068d)\n![PyPI](https://img.shields.io/pypi/v/tnseeker.svg)\n![Docker Image Version (latest by date)](https://img.shields.io/docker/v/afombravo/tnseeker/latest)\n![Docker Pulls](https://img.shields.io/docker/pulls/afombravo/tnseeker)\n\n</div>\n\n# Tnseeker\nTnseeker is an advanced pipeline tailored for transposon insertion sequencing (Tn-Seq) analysis. \n\nIt performs an array of tasks: \n1. Read trimming based on the presence of transposon sequences & extraction of associated linked barcodes (if present)\n2. Alignment to reference genome (bowtie2)\n3. Links genomic locations (using .gff or .gb files as input) with transposon insertion locations\n4. Infer essential genes based on global and local (contig wise) transposon insertion densities\n\nWhat truly distinguishes Tnseeker from other tools is its unique capability to automatically infer and adjust threshold/cutoff parameters. This negates the need for intricate user input, allowing for a more hands-off determination of gene essentiality based on the data. \n\nTnseeker is also compatible with any transposon disruption experiment, be it Tn5, HIMAR, or anything else. Tnseeker is thus versatile enough to handle all Tn-Seq datasets.\n\nTnseeker is under active developement and is available as is. Contact me if you are interested in adapting the program or have any questions. Bugs can be expected. Please report any weird or unintented behaviour. \n\n## Instalation\nThere are two main ways of installing tnseeker:\n\n\n### 1. Recommended installation (using singularity)\n1.  Install singularity in your system \n\n```\nconda create -n singularity -c conda-forge singularity -y\n```\n\n2.  Active the conda environment\n```\nconda activate singularity\n```\n\n3.  Download the docker image from dockerhub. This will write a singularity container named `tnseeker_latest.sif` into your current work directory.\n```bash\nsingularity pull docker://afombravo/tnseeker:latest\n```\n\n4. Start an interactive session of the container. Importantly, you need to `--bind` all the input files. It is easiest if you put all input files into a single folder, as this is how `tnseeker` expects it's input. The results will be written into the same folder.\nNote: the `:rw` at the end of the path is crucial for singularity to obtain read/write permission and hence be able to compute.\n```\nsingularity shell --bind /path/to/folder/containing/all/input/files:/input_files:rw \\\n                  tnseeker_latest.sif\n```\n\n5. Start a `tnseeker` run with, for example, the following command:\n\n```\ncd /input_files; \ntnseeker \\\n  --cpu 4 \\\n  -s ORGANISM_FASTA/GB/GFF_NAME \\\n  -sd .  \\\n  -ad .  \\\n  -at gff \\\n  -st SE \\\n  --tn TN_SEQUENCE (ex: AGATTA) \\ \n  --m 6 \\\n  --b \\\n  --b1 UPSTREAM_BARCODE_SEQUENCE (ex: AGAGA) \\\n  --b2 DOWNSTREAM_BARCODE_SEQUENCE (ex: ATATAT) \\\n  --ph 10 \\\n  --mq 20 \\\n  --b1m 3 \\\n  --b2m 3 \\\n  --ig 100 \n```\nWhen using HPC systems it is advisable to include the `--cpu` flag and specify the amount of threads.\n\n\n---\n### 2. Recommended installation #2 (using docker)\n1.  Install docker in your system\n2.  Download the docker image from dockerhub\n```bash\ndocker pull afombravo/tnseeker:latest\n```\n\n3.  Rename to just tnseeker\n```bash\ndocker tag afombravo/tnseeker:latest tnseeker\n```\n\nAlternatively, download the docker file from this repo and build it yourself.\n```bash\ndocker build --no-cache -t tnseeker .\n```\n\n4.  Start tnseeker docker image with the comand:\n```bash\ndocker run -it -v \"<local_path/to/all/your/data>:/data\" tnseeker\n```\n\n5.  Start tnseeker with:\n```bash\ntnseeker -sd ./ -ad ./ <ALL OTHER TNSEEKER COMANDS HERE>\n```\n\nNOTE: all files required by tnseeker, such as .fasta, .fastq, .gb, or .gff, need to be in the local folder indicated in 4. You then can use the -sd and -ad flags as indicated here in 5.\n\n\n---\n### 3. Alternative installation\nThe tnseeker pipeline requires Python3, Bowtie2, and BLAST, to be callable from the terminal (and added to path). \n\n#### For local BLAST\n```bash\napt update\napt install ncbi-blast+\n```\n\n#### For bowtie2\n```bash\napt install bowtie2\n```\n\n#### PyPI module\ntnseeker can be installed as PyPI module with the folowing:\n```bash\npip install tnseeker\n```\n\ntnseeker is executable from the command line by typing:\n\n```bash\ntnseeker\n```\n\n---\n## Running Tnseeker\n\nTnseeker also has a test mode where the blast and Bowtie2 instalations are tested, and a small run on a test dataset is performed.\n\n```bash\ntnseeker --tst\n```\n\nA quick to use example case is the folowing. See below the meaning of the input arguments:\n\n```bash\ntnseeker -s BW25113 -sd ./ -ad ./ -at gb -st SE --tn AGATGTGTATAAGAGACAG --ph 10 --mq 40\n```\n\n\n### File requirements\n\ntnseeker requires several input files:\n\n 1. A '.fastq.gz' file (needs to be .gz)\n \n 2. An annotation file in genbank format (.gb), or a .gff (there is an example gff format file in this repo)\n \n 3. A FASTA file with the genome under analysis.\n\n---\n\n\n## Optional Arguments and their meaning:\n\n  -h, --help   show this help message and exit\n\n  -s [S]        Strain name. Must match the annotation (FASTA/GB) file names\n\n  -sd [SD]      The full path to the sequencing files FOLDER\n\n  --sd_2 [SD_2] The full path to the pair ended sequencing files FOLDER (needs\n                to be different from the first folder. In this case keep both ends in different folders.)\n\n  -ad [AD]      The full path to the directory with the .gb and .fasta files\n\n  -at [AT]      Annotation Type (Genbank: -at gb | gff: -at gff)\n\n  -st [ST]      Sequencing type (Paired-ended (PE)/Single-ended(SE))\n\n  --tst [TST]  Test mode to confirm everything works as expected.\n\n  --tn [TN]    Transposon border sequence (tn5: GATGTGTATAAGAGACAG). Required for triming and better/faster mapping.\n\n  --m [M]      Mismatches in the transposon border sequence (default is 0)\n\n  --k [K]      Remove intermediate files. Default is yes, remove.\n\n  --e [E]      Run only the essential determing script. Requires the\n               all_insertions_STRAIN.csv file to have been generated first.\n\n  --t [T]      Trims to the indicated nucleotides length AFTER finding the\n               transposon sequence. For example, 100 would mean to keep the first\n               100bp after the transposon (this trimmed read will be used for\n               alignement after)\n\n  --b [B]      Run with barcode extraction.\n\n  --b1 [B1]    upstream barcode search sequence (example: ATC)\n\n  --b2 [B2]    downstream barcode search sequence (example: CTA)\n\n  --b1m [B1M]  upstream barcode search sequence mismatches (default is 0)\n\n  --b2m [B2M]  downstream barcode search sequence mismatches (default is 0)\n\n  --b1p [B1P]  upstream barcode sequence Phred-score filtering. (default is 1)\n\n  --b2p [B2P]  downstream barcode sequence Phred-score filtering. (default is 1)\n\n  --rt [RT]    Read threshold number in absolute reads number (default is 0). All insertions with <= reads will be removed.\n\n  --ne [NE]    Run without essential finding (outputs a .csv file with all annotated transposon insertions)\n\n  --ph [PH]    Phred Score (removes reads where nucleotides have lower phred\n               scores) (default is 1)\n\n  --mq [MQ]    Bowtie2 MAPQ threshold (default is 42)\n\n  --ig [IG]    The number of bp up and down stream of any gene that will be ignored before a non maped DNA strech is considered an intergenic region.\n\n  --pv [PV]    Starting pvalue threshold for essentiality determination. Default is 0.05. p-value will be lowered iterativelly based on the optimization of the gold set essential genes. If no genes are present. The value will be the default one (or other if indicated)\n\n  --dut [DUT]  The correct essentiality calling of features that have both 'too small domains' and 'non-essential' sub-gene divisions needs a special case. \n               When the latter two are present, if 'too small domains' > ('too small domains' + 'non-essential') * dut-value (The default is 0.75), then a feature will be demeed 'too small domains' in its entirity. Otherwise, 'non-essential'.\n\n  --sl5 [SL5]  5' gene trimming percent to be ignored for essentiality determination (number\n               between 0 and 1)\n\n  --sl3 [SL3]  3' gene trimming percent to be ignored for essentiality determination (number\n               between 0 and 1)\n\n  --cpu [CPU]  Define the number of threads (must be an integer). Advisable when using HPC systems.\n\n---\n## Python Dependencies\n\nTnseeker requires several dependencies, all automatically instalable.\nA notable exception is the poibin module, which is available in the current tnseeker folder (you as the user don't need to do anything else), and can be originally be found here: https://github.com/tsakim/poibin\n\n\n---\n### Working modes\n\ntnseeker is composed of 2 submodules: \n\n1. the initial sequencing processing: Handles the read trimming and alignment, creating a compiled .csv with all found transposon insertions. When individual transposon read associated barcodes are present, these are also extracted. Running only this subpipeline can be achieved by including the `--ne` flag\n\n2. The Essential_finder: Infers gene essentiality from the insertion information found in the previous .csv file. tnseeker can thus be run on a standalone mode if the appropriate .csv and annotation files are indicated. Running only this subpipeline can be achieved by including the `--e` flag\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "TnSeeker",
    "version": "1.1.0.5",
    "project_urls": {
        "Homepage": "https://github.com/afombravo/tnseeker"
    },
    "split_keywords": [
        "tn-seq",
        " essentiality",
        " rb-seq"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed023e603f6d9dd24ff70313babf796f885fd56781e2cdd78a28b3cad53615ed",
                "md5": "06826cb29d92bdcbfac6b7ca7c63c7fa",
                "sha256": "b0a1ecfe80a3dc35167929fb100dd4b9634ce74ff7f2761682d5a7df6149bc41"
            },
            "downloads": -1,
            "filename": "tnseeker-1.1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "06826cb29d92bdcbfac6b7ca7c63c7fa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 21483846,
            "upload_time": "2025-02-20T20:36:59",
            "upload_time_iso_8601": "2025-02-20T20:36:59.050321Z",
            "url": "https://files.pythonhosted.org/packages/ed/02/3e603f6d9dd24ff70313babf796f885fd56781e2cdd78a28b3cad53615ed/tnseeker-1.1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e8456cb32c59817fd9414be8c803d0b20bd9b092011eea87ac130869346cbc60",
                "md5": "6abdc18b6c98967b4fc768dd490aa204",
                "sha256": "2094e8292033f666c5267db59e740f282e2644d11afd24fe585d74965330a87b"
            },
            "downloads": -1,
            "filename": "tnseeker-1.1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "6abdc18b6c98967b4fc768dd490aa204",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 21363670,
            "upload_time": "2025-02-20T20:37:05",
            "upload_time_iso_8601": "2025-02-20T20:37:05.448255Z",
            "url": "https://files.pythonhosted.org/packages/e8/45/6cb32c59817fd9414be8c803d0b20bd9b092011eea87ac130869346cbc60/tnseeker-1.1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-20 20:37:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "afombravo",
    "github_project": "tnseeker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tnseeker"
}

Afonso M Bravo