# Tnseeker
Tnseeker is an advanced pipeline tailored for transposon insertion sequencing (Tn-Seq) analysis. It performs an array of tasks: from read trimming and alignment to associating genomic locations with transposon insertions and inferring essential genes based on transposon insertion densities. Additionally, Tnseeker is adept at extracting barcodes from raw fastq files and linking them to corresponding transposon genomic locations for subsequent analysis. What truly distinguishes Tnseeker from other tools is its unique capability to automatically infer and adjust threshold/cutoff parameters. This negates the need for intricate user input, allowing for a more precise determination of gene essentiality based on the data. Compatible with any transposon disruption experiment, Tnseeker efficiently mitigates transposon-specific biases, including those seen with HIMAR. Hence, Tnseeker is versatile enough to handle all Tn-Seq datasets.
Tnseeker is under active developement and is available as is. Contact me if you are interested in using the program or have any questions. Bugs can be expected. Please report any weird or unintented behaviour.
## Requirements
The tnseeker pipeline requires Python3, Bowtie2, and BLAST, to be callable from the terminal (and added to path).
### For local BLAST
`sudo apt install ncbi-blast+`
### For bowtie2
`sudo apt update`
`sudo apt install bowtie2`
## Tnseeker
tnseeker can be installed as PyPI module with the folowing:
`pip install tnseeker`
## Executing
tnseeker is executable from the command line by typing:
`python -m tnseeker`
Tnseeker also has a test mode, where the blast, Bowtie2 instalations are tested, and a small run on a test dataset is performed.
`python -m tnseeker --tst`
An example use case is the folowing. See below the meaning of the input arguments:
`python -m tnseeker -s BW25113 -sd '/your/data/directory/folder_with_fastq.gz_files' -ad /your/annotations/directory/ -at gb -st SE --tn AGATGTGTATAAGAGACAG --ph 10 --mq 40`
## Optional Arguments:
-h, --help show this help message and exit
-s S Strain name. Must match the annotation (FASTA/GB) file
names
-sd SD The full path to the sequencing files FOLDER
--sd_2 SD_2 The full path to the pair ended sequencing files FOLDER (needs
to be different from the first folder)
-ad AD The full path to the directory with the .gb and .fasta files
-at AT Annotation Type (Genbank)
-st ST Sequencing type (Paired-ended (PE)/Single-ended(SE)
--tst [TST] Test mode to confirm everything works as expected.
--tn [TN] Transposon border sequence (tn5: GATGTGTATAAGAGACAG). Required for triming and proper mapping
--m [M] Mismatches in the transposon border sequence (default is 0)
--k [K] Remove intermediate files. Default is yes, remove.
--e [E] Run only the essential determing script. required the
all_insertions_STRAIN.csv file to have been generated first.
--t [T] Trims to the indicated nucleotides length AFTER finding the
transposon sequence. For example, 100 would mean to keep the
100bp after the transposon (this trimmed read will be used for
alignement after)
--b [B] Run with barcode extraction
--b1 [B1] upstream barcode sequence (example: ATC)
--b2 [B2] downstream barcode sequence (example: CTA)
--b1m [B1M] upstream barcode sequence mismatches
--b2m [B2M] downstream barcode sequence mismatches
--b1p [B1P] upstream barcode sequence Phred-score filtering. Default is no
filtering
--b2p [B2P] downstream barcode sequence Phred-score filtering. Default is
no filtering
--rt [RT] Read threshold number
--ne [NE] Run without essential Finding
--ph [PH] Phred Score (removes reads where nucleotides have lower phred
scores)
--mq [MQ] Bowtie2 MAPQ threshold
--ig [IG] The number of bp up and down stream of any gene to be
considered an intergenic region
--pv [PV] Essential Finder pvalue threshold for essentiality
determination
--dut [DUT] fraction of the minimal amount of 'too small domains' in a gene before the entire gene is deemed
uncertain for essentiality inference
--sl5 [SL5] 5' gene trimming percent for essentiality determination (number
between 0 and 1)
--sl3 [SL3] 3' gene trimming percent for essentiality determination (number
between 0 and 1)
## Dependencies
tnseeker requires several dependencies, all automatically instalable
A notable exception is the poibin module, which is available in the current tnseeker folder (you as the user don't need to do anything else), and can be originally be found here: https://github.com/tsakim/poibin
### File requirements
tnseeker requires several input files:
1. A '.fastq.gz' file (needs to be .gz)
2. An annotation file in genbank format (.gb)
3. A FASTA file with the genome under analysis (needs to be .fasta).
### Working modes
tnseeker is composed of 2 submodules:
1. the initial sequencing processing: Handles the read trimming and alignment, creating a compiled .csv with all found transposon insertions. When individual transposon read associated barcodes are present, these are also extracted.
2. The Essential_finder: Infers gene essentiality from the insertion information found in the previous .csv file. tnseeker can thus be run on a standalone mode if the appropriate .csv and annotation files are indicated.
Raw data
{
"_id": null,
"home_page": "https://github.com/afombravo/tnseeker",
"name": "tnseeker",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "tn-seq, essentiality, rb-seq",
"author": "Afonso M Bravo",
"author_email": "<afonsombravo@hotmail.com>",
"download_url": "https://files.pythonhosted.org/packages/02/4c/443c98c9b0f30ea12946d6f80237e2faf4b0f5dc9392c07f30bfec80ee26/tnseeker-1.0.7.2.tar.gz",
"platform": null,
"description": "# Tnseeker\nTnseeker is an advanced pipeline tailored for transposon insertion sequencing (Tn-Seq) analysis. It performs an array of tasks: from read trimming and alignment to associating genomic locations with transposon insertions and inferring essential genes based on transposon insertion densities. Additionally, Tnseeker is adept at extracting barcodes from raw fastq files and linking them to corresponding transposon genomic locations for subsequent analysis. What truly distinguishes Tnseeker from other tools is its unique capability to automatically infer and adjust threshold/cutoff parameters. This negates the need for intricate user input, allowing for a more precise determination of gene essentiality based on the data. Compatible with any transposon disruption experiment, Tnseeker efficiently mitigates transposon-specific biases, including those seen with HIMAR. Hence, Tnseeker is versatile enough to handle all Tn-Seq datasets.\n\nTnseeker is under active developement and is available as is. Contact me if you are interested in using the program or have any questions. Bugs can be expected. Please report any weird or unintented behaviour. \n\n## Requirements\nThe tnseeker pipeline requires Python3, Bowtie2, and BLAST, to be callable from the terminal (and added to path). \n\n### For local BLAST\n`sudo apt install ncbi-blast+`\n\n### For bowtie2\n`sudo apt update`\n`sudo apt install bowtie2`\n\n## Tnseeker \ntnseeker can be installed as PyPI module with the folowing:\n\n`pip install tnseeker`\n\n## Executing \n\ntnseeker is executable from the command line by typing:\n\n`python -m tnseeker`\n\nTnseeker also has a test mode, where the blast, Bowtie2 instalations are tested, and a small run on a test dataset is performed.\n\n`python -m tnseeker --tst`\n\nAn example use case is the folowing. See below the meaning of the input arguments:\n\n`python -m tnseeker -s BW25113 -sd '/your/data/directory/folder_with_fastq.gz_files' -ad /your/annotations/directory/ -at gb -st SE --tn AGATGTGTATAAGAGACAG --ph 10 --mq 40`\n\n## Optional Arguments:\n\n -h, --help show this help message and exit\n\n -s S Strain name. Must match the annotation (FASTA/GB) file\n names\n\n -sd SD The full path to the sequencing files FOLDER\n\n --sd_2 SD_2 The full path to the pair ended sequencing files FOLDER (needs\n to be different from the first folder)\n\n -ad AD The full path to the directory with the .gb and .fasta files\n\n -at AT Annotation Type (Genbank)\n\n -st ST Sequencing type (Paired-ended (PE)/Single-ended(SE)\n\n --tst [TST] Test mode to confirm everything works as expected.\n\n --tn [TN] Transposon border sequence (tn5: GATGTGTATAAGAGACAG). Required for triming and proper mapping\n\n --m [M] Mismatches in the transposon border sequence (default is 0)\n\n --k [K] Remove intermediate files. Default is yes, remove.\n\n --e [E] Run only the essential determing script. required the\n all_insertions_STRAIN.csv file to have been generated first.\n\n --t [T] Trims to the indicated nucleotides length AFTER finding the\n transposon sequence. For example, 100 would mean to keep the\n 100bp after the transposon (this trimmed read will be used for\n alignement after)\n\n --b [B] Run with barcode extraction\n\n --b1 [B1] upstream barcode sequence (example: ATC)\n\n --b2 [B2] downstream barcode sequence (example: CTA)\n\n --b1m [B1M] upstream barcode sequence mismatches\n\n --b2m [B2M] downstream barcode sequence mismatches\n\n --b1p [B1P] upstream barcode sequence Phred-score filtering. Default is no\n filtering\n\n --b2p [B2P] downstream barcode sequence Phred-score filtering. Default is\n no filtering\n --rt [RT] Read threshold number\n\n --ne [NE] Run without essential Finding\n\n --ph [PH] Phred Score (removes reads where nucleotides have lower phred\n scores)\n\n --mq [MQ] Bowtie2 MAPQ threshold\n\n --ig [IG] The number of bp up and down stream of any gene to be\n considered an intergenic region\n\n --pv [PV] Essential Finder pvalue threshold for essentiality\n determination\n\n --dut [DUT] fraction of the minimal amount of 'too small domains' in a gene before the entire gene is deemed\n uncertain for essentiality inference\n \n --sl5 [SL5] 5' gene trimming percent for essentiality determination (number\n between 0 and 1)\n\n --sl3 [SL3] 3' gene trimming percent for essentiality determination (number\n between 0 and 1)\n \n## Dependencies\n\ntnseeker requires several dependencies, all automatically instalable\nA notable exception is the poibin module, which is available in the current tnseeker folder (you as the user don't need to do anything else), and can be originally be found here: https://github.com/tsakim/poibin\n\n### File requirements\n\ntnseeker requires several input files:\n\n 1. A '.fastq.gz' file (needs to be .gz)\n \n 2. An annotation file in genbank format (.gb)\n \n 3. A FASTA file with the genome under analysis (needs to be .fasta).\n\n\n### Working modes\n\ntnseeker is composed of 2 submodules: \n\n1. the initial sequencing processing: Handles the read trimming and alignment, creating a compiled .csv with all found transposon insertions. When individual transposon read associated barcodes are present, these are also extracted.\n\n2. The Essential_finder: Infers gene essentiality from the insertion information found in the previous .csv file. tnseeker can thus be run on a standalone mode if the appropriate .csv and annotation files are indicated. \n",
"bugtrack_url": null,
"license": null,
"summary": "TnSeeker",
"version": "1.0.7.2",
"project_urls": {
"Homepage": "https://github.com/afombravo/tnseeker"
},
"split_keywords": [
"tn-seq",
" essentiality",
" rb-seq"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c9ba55d1fea96e49c16e4d1075f0bd07ecca2374815d452c6fa0c49c4368c52b",
"md5": "de2201bcc06af8833e93e18b526a457a",
"sha256": "add6a2f7d6d07db4e5d871e87dff3ac5036bea5e812b2faae0c5b069783e100d"
},
"downloads": -1,
"filename": "tnseeker-1.0.7.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "de2201bcc06af8833e93e18b526a457a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 21474342,
"upload_time": "2024-11-16T20:35:39",
"upload_time_iso_8601": "2024-11-16T20:35:39.030610Z",
"url": "https://files.pythonhosted.org/packages/c9/ba/55d1fea96e49c16e4d1075f0bd07ecca2374815d452c6fa0c49c4368c52b/tnseeker-1.0.7.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "024c443c98c9b0f30ea12946d6f80237e2faf4b0f5dc9392c07f30bfec80ee26",
"md5": "9ae9e68954ab6aca5c0c2290d7b70ea6",
"sha256": "7f45a43d09c76353cdd1e58c288f17afc118eda4860f562626fa221a55b5ac3f"
},
"downloads": -1,
"filename": "tnseeker-1.0.7.2.tar.gz",
"has_sig": false,
"md5_digest": "9ae9e68954ab6aca5c0c2290d7b70ea6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 21363150,
"upload_time": "2024-11-16T20:36:04",
"upload_time_iso_8601": "2024-11-16T20:36:04.678507Z",
"url": "https://files.pythonhosted.org/packages/02/4c/443c98c9b0f30ea12946d6f80237e2faf4b0f5dc9392c07f30bfec80ee26/tnseeker-1.0.7.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-16 20:36:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "afombravo",
"github_project": "tnseeker",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "tnseeker"
}