# TranSigner: assigning long RNA-seq reads to transcripts
TranSigner is a Python program that assigns long reads (compatible with both ONT and PacBio) to transcripts. This tool takes in a set of reads, and the assembled transcriptome. The transcriptome must be available in both fasta and gtf/gff formats. If you only have it in a gtf/gff format, you'll need to run [gffread](https://github.com/gpertea/gffread) as follows:
```
gffread -w transcripts.fa -g genome.fa transcripts.gtf
```
### Installation ###
You can install TranSigner with pip by running:
```
pip install transigner
```
or from source:
```
git clone https://github.com/haydenji0731/transigner transigner
cd transigner
python setup.py install
```
You also need [minimap2](https://github.com/lh3/minimap2) and [samtools](http://www.htslib.org/) installed. You can also use the precompiled binaries as long as they are available in your PATH. If you encounter trouble during installation, please open up a Github issue.
### Usage ###
TranSigner consists of three modules: align, prefilter, and em (short for expectation-maximization). You can check the arguments for each module by running:
```
transigner [module] -h
```
The `align` module takes a set of reads contained in a fastq file and align them to a transcriptome provided as a fasta file. See below:
```
transigner align -q reads.fastq -t transcripts.fa -d output_dir -o alignment.bam -p threads
```
Next, TranSigner processes the alignment results to compute compatibility scores between reads and transcripts and filter out alignments with questioning 5' and/or 3' end positions. The recommendedfilter thresholds differ by the read type:
```
transigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -1 # noisy ONT direct RNA reads
transigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -500 -fp -600 # ONT cDNA or PacBio IsoSeq
```
Finally, an expectation-maximization algorithm is run as follows:
```
transigner em -s output_dir/scores.tsv -i output_dir/ti.pkl -o output_dir --drop --use-score
```
By running all three modules, you'll obtain `abundances.tsv` and `assignments.tsv` files. See below for the header information for these files:
```
# abundances.tsv
transcript_id read_count relative_abundance
NR_024540.1 1.7149614376942495e-28 1.6757869165652874e-21
# assignments.tsv
read_id (transcript_id_1, read_fraction_1) (transcript_id_2, read_fraction_2) (transcript_id_3, read_fraction_3) ...
57ebcba2-cde0-4096-b9b9-9bdc4306cb6c (ENST00000559163, 6.640795584395568e-07) (ENST00000559884, 0.9507564850356304) (ENST00000354296, 0.04924285088481117)
```
You can also add the `--push` flag to obtain hard, 1-to-1 assignments between reads and transcripts.
### References ###
> Ji, H. J. and M. Pertea (2024). "Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner." bioRxiv: 2024.2004.2013.589356. [doi:https://doi.org/10.1101/2024.04.13.589356]
> Pertea, G., & Pertea, M. (2020). GFF utilities: GffRead and GffCompare [version 2; peer review: 3 approved].
> *F1000Research*, **9**:304. [doi:10.12688/f1000research.23297.2]
> Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., ... & Li, H. (2021). Twelve years of SAMtools and BCFtools. *Gigascience*, > **10(2)**, giab008. [doi:10.1093/gigascience/giab008]
> Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.
> *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191]
> Li, H. (2021). New strategies to improve minimap2 alignment accuracy.
> *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705]
> Lianming Du, Qin Liu, Zhenxin Fan, Jie Tang, Xiuyue Zhang, Megan Price, Bisong Yue, Kelei Zhao. Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files. Briefings in Bioinformatics, 2021, 22(4):bbaa368.
Raw data
{
"_id": null,
"home_page": "https://github.com/haydenji0731/transigner",
"name": "transigner",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Hyun Joo Ji",
"author_email": "hji20@jh.edu",
"download_url": "https://files.pythonhosted.org/packages/d0/95/f89ee64d7473124757edf7df4133fa21ade44adde905fc28c7401ab34018/transigner-1.1.1.tar.gz",
"platform": null,
"description": "# TranSigner: assigning long RNA-seq reads to transcripts\n\nTranSigner is a Python program that assigns long reads (compatible with both ONT and PacBio) to transcripts. This tool takes in a set of reads, and the assembled transcriptome. The transcriptome must be available in both fasta and gtf/gff formats. If you only have it in a gtf/gff format, you'll need to run [gffread](https://github.com/gpertea/gffread) as follows:\n```\ngffread -w transcripts.fa -g genome.fa transcripts.gtf\n```\n### Installation ###\n\nYou can install TranSigner with pip by running:\n```\npip install transigner\n```\nor from source:\n```\ngit clone https://github.com/haydenji0731/transigner transigner\ncd transigner\npython setup.py install\n```\nYou also need [minimap2](https://github.com/lh3/minimap2) and [samtools](http://www.htslib.org/) installed. You can also use the precompiled binaries as long as they are available in your PATH. If you encounter trouble during installation, please open up a Github issue.\n\n### Usage ###\n\nTranSigner consists of three modules: align, prefilter, and em (short for expectation-maximization). You can check the arguments for each module by running:\n```\ntransigner [module] -h\n```\nThe `align` module takes a set of reads contained in a fastq file and align them to a transcriptome provided as a fasta file. See below:\n```\ntransigner align -q reads.fastq -t transcripts.fa -d output_dir -o alignment.bam -p threads\n```\nNext, TranSigner processes the alignment results to compute compatibility scores between reads and transcripts and filter out alignments with questioning 5' and/or 3' end positions. The recommendedfilter thresholds differ by the read type:\n```\ntransigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -1 # noisy ONT direct RNA reads\ntransigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -500 -fp -600 # ONT cDNA or PacBio IsoSeq\n```\nFinally, an expectation-maximization algorithm is run as follows:\n```\ntransigner em -s output_dir/scores.tsv -i output_dir/ti.pkl -o output_dir --drop --use-score\n```\nBy running all three modules, you'll obtain `abundances.tsv` and `assignments.tsv` files. See below for the header information for these files:\n```\n# abundances.tsv\ntranscript_id read_count relative_abundance\nNR_024540.1\t1.7149614376942495e-28\t1.6757869165652874e-21\n\n# assignments.tsv\nread_id (transcript_id_1, read_fraction_1) (transcript_id_2, read_fraction_2) (transcript_id_3, read_fraction_3) ...\n57ebcba2-cde0-4096-b9b9-9bdc4306cb6c (ENST00000559163, 6.640795584395568e-07) (ENST00000559884, 0.9507564850356304) (ENST00000354296, 0.04924285088481117)\n```\nYou can also add the `--push` flag to obtain hard, 1-to-1 assignments between reads and transcripts.\n### References ###\n> Ji, H. J. and M. Pertea (2024). \"Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner.\" bioRxiv: 2024.2004.2013.589356. [doi:https://doi.org/10.1101/2024.04.13.589356]\n\n> Pertea, G., & Pertea, M. (2020). GFF utilities: GffRead and GffCompare [version 2; peer review: 3 approved].\n> *F1000Research*, **9**:304. [doi:10.12688/f1000research.23297.2]\n\n> Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., ... & Li, H. (2021). Twelve years of SAMtools and BCFtools. *Gigascience*, > **10(2)**, giab008. [doi:10.1093/gigascience/giab008]\n\n> Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.\n> *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191]\n\n> Li, H. (2021). New strategies to improve minimap2 alignment accuracy.\n> *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705]\n\n> Lianming Du, Qin Liu, Zhenxin Fan, Jie Tang, Xiuyue Zhang, Megan Price, Bisong Yue, Kelei Zhao. Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files. Briefings in Bioinformatics, 2021, 22(4):bbaa368. \n",
"bugtrack_url": null,
"license": null,
"summary": "Long read-to-transcript assignment creator",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/haydenji0731/transigner"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c707a707b7cdf3fd0bb070a8aff41c960b1eca9976537425172a590dc878e9ba",
"md5": "85f9cc7b279086d254f7e5f48a97ab47",
"sha256": "a6ce13bed12c53c9f7f8660c1f9262e99122f8cf9a2eb2dcb768fa01c1903850"
},
"downloads": -1,
"filename": "transigner-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "85f9cc7b279086d254f7e5f48a97ab47",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 11019,
"upload_time": "2024-08-16T22:53:13",
"upload_time_iso_8601": "2024-08-16T22:53:13.002110Z",
"url": "https://files.pythonhosted.org/packages/c7/07/a707b7cdf3fd0bb070a8aff41c960b1eca9976537425172a590dc878e9ba/transigner-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d095f89ee64d7473124757edf7df4133fa21ade44adde905fc28c7401ab34018",
"md5": "cb793dd94051f3da63ee7ae0d42e1a93",
"sha256": "afadaed2e0ea37491134d93cafffe7a12846e169acc40a6134adb3fd8dd04ec0"
},
"downloads": -1,
"filename": "transigner-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "cb793dd94051f3da63ee7ae0d42e1a93",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 11262,
"upload_time": "2024-08-16T22:53:14",
"upload_time_iso_8601": "2024-08-16T22:53:14.287465Z",
"url": "https://files.pythonhosted.org/packages/d0/95/f89ee64d7473124757edf7df4133fa21ade44adde905fc28c7401ab34018/transigner-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-16 22:53:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "haydenji0731",
"github_project": "transigner",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "transigner"
}