transigner


Nametransigner JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/haydenji0731/transigner
SummaryLong read-to-transcript assignment creator
upload_time2024-08-16 22:53:14
maintainerNone
docs_urlNone
authorHyun Joo Ji
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TranSigner: assigning long RNA-seq reads to transcripts

TranSigner is a Python program that assigns long reads (compatible with both ONT and PacBio) to transcripts. This tool takes in a set of reads, and the assembled transcriptome. The transcriptome must be available in both fasta and gtf/gff formats. If you only have it in a gtf/gff format, you'll need to run [gffread](https://github.com/gpertea/gffread) as follows:
```
gffread -w transcripts.fa -g genome.fa transcripts.gtf
```
### Installation ###

You can install TranSigner with pip by running:
```
pip install transigner
```
or from source:
```
git clone https://github.com/haydenji0731/transigner transigner
cd transigner
python setup.py install
```
You also need [minimap2](https://github.com/lh3/minimap2) and [samtools](http://www.htslib.org/) installed. You can also use the precompiled binaries as long as they are available in your PATH. If you encounter trouble during installation, please open up a Github issue.

### Usage ###

TranSigner consists of three modules: align, prefilter, and em (short for expectation-maximization). You can check the arguments for each module by running:
```
transigner [module] -h
```
The `align` module takes a set of reads contained in a fastq file and align them to a transcriptome provided as a fasta file. See below:
```
transigner align -q reads.fastq -t transcripts.fa -d output_dir -o alignment.bam -p threads
```
Next, TranSigner processes the alignment results to compute compatibility scores between reads and transcripts and filter out alignments with questioning 5' and/or 3' end positions. The recommendedfilter thresholds differ by the read type:
```
transigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -1 # noisy ONT direct RNA reads
transigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -500 -fp -600 # ONT cDNA or PacBio IsoSeq
```
Finally, an expectation-maximization algorithm is run as follows:
```
transigner em -s output_dir/scores.tsv -i output_dir/ti.pkl -o output_dir --drop --use-score
```
By running all three modules, you'll obtain `abundances.tsv` and `assignments.tsv` files. See below for the header information for these files:
```
# abundances.tsv
transcript_id  read_count  relative_abundance
NR_024540.1	1.7149614376942495e-28	1.6757869165652874e-21

# assignments.tsv
read_id  (transcript_id_1, read_fraction_1)  (transcript_id_2, read_fraction_2)  (transcript_id_3, read_fraction_3) ...
57ebcba2-cde0-4096-b9b9-9bdc4306cb6c    (ENST00000559163, 6.640795584395568e-07)        (ENST00000559884, 0.9507564850356304)   (ENST00000354296, 0.04924285088481117)
```
You can also add the `--push` flag to obtain hard, 1-to-1 assignments between reads and transcripts.
### References ###
> Ji, H. J. and M. Pertea (2024). "Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner." bioRxiv: 2024.2004.2013.589356. [doi:https://doi.org/10.1101/2024.04.13.589356]

> Pertea, G., & Pertea, M. (2020). GFF utilities: GffRead and GffCompare [version 2; peer review: 3 approved].
> *F1000Research*, **9**:304. [doi:10.12688/f1000research.23297.2]

> Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., ... & Li, H. (2021). Twelve years of SAMtools and BCFtools. *Gigascience*, > **10(2)**, giab008. [doi:10.1093/gigascience/giab008]

> Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.
> *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191]

> Li, H. (2021). New strategies to improve minimap2 alignment accuracy.
> *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705]

> Lianming Du, Qin Liu, Zhenxin Fan, Jie Tang, Xiuyue Zhang, Megan Price, Bisong Yue, Kelei Zhao. Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files. Briefings in Bioinformatics, 2021, 22(4):bbaa368. 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/haydenji0731/transigner",
    "name": "transigner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Hyun Joo Ji",
    "author_email": "hji20@jh.edu",
    "download_url": "https://files.pythonhosted.org/packages/d0/95/f89ee64d7473124757edf7df4133fa21ade44adde905fc28c7401ab34018/transigner-1.1.1.tar.gz",
    "platform": null,
    "description": "# TranSigner: assigning long RNA-seq reads to transcripts\n\nTranSigner is a Python program that assigns long reads (compatible with both ONT and PacBio) to transcripts. This tool takes in a set of reads, and the assembled transcriptome. The transcriptome must be available in both fasta and gtf/gff formats. If you only have it in a gtf/gff format, you'll need to run [gffread](https://github.com/gpertea/gffread) as follows:\n```\ngffread -w transcripts.fa -g genome.fa transcripts.gtf\n```\n### Installation ###\n\nYou can install TranSigner with pip by running:\n```\npip install transigner\n```\nor from source:\n```\ngit clone https://github.com/haydenji0731/transigner transigner\ncd transigner\npython setup.py install\n```\nYou also need [minimap2](https://github.com/lh3/minimap2) and [samtools](http://www.htslib.org/) installed. You can also use the precompiled binaries as long as they are available in your PATH. If you encounter trouble during installation, please open up a Github issue.\n\n### Usage ###\n\nTranSigner consists of three modules: align, prefilter, and em (short for expectation-maximization). You can check the arguments for each module by running:\n```\ntransigner [module] -h\n```\nThe `align` module takes a set of reads contained in a fastq file and align them to a transcriptome provided as a fasta file. See below:\n```\ntransigner align -q reads.fastq -t transcripts.fa -d output_dir -o alignment.bam -p threads\n```\nNext, TranSigner processes the alignment results to compute compatibility scores between reads and transcripts and filter out alignments with questioning 5' and/or 3' end positions. The recommendedfilter thresholds differ by the read type:\n```\ntransigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -1 # noisy ONT direct RNA reads\ntransigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -500 -fp -600 # ONT cDNA or PacBio IsoSeq\n```\nFinally, an expectation-maximization algorithm is run as follows:\n```\ntransigner em -s output_dir/scores.tsv -i output_dir/ti.pkl -o output_dir --drop --use-score\n```\nBy running all three modules, you'll obtain `abundances.tsv` and `assignments.tsv` files. See below for the header information for these files:\n```\n# abundances.tsv\ntranscript_id  read_count  relative_abundance\nNR_024540.1\t1.7149614376942495e-28\t1.6757869165652874e-21\n\n# assignments.tsv\nread_id  (transcript_id_1, read_fraction_1)  (transcript_id_2, read_fraction_2)  (transcript_id_3, read_fraction_3) ...\n57ebcba2-cde0-4096-b9b9-9bdc4306cb6c    (ENST00000559163, 6.640795584395568e-07)        (ENST00000559884, 0.9507564850356304)   (ENST00000354296, 0.04924285088481117)\n```\nYou can also add the `--push` flag to obtain hard, 1-to-1 assignments between reads and transcripts.\n### References ###\n> Ji, H. J. and M. Pertea (2024). \"Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner.\" bioRxiv: 2024.2004.2013.589356. [doi:https://doi.org/10.1101/2024.04.13.589356]\n\n> Pertea, G., & Pertea, M. (2020). GFF utilities: GffRead and GffCompare [version 2; peer review: 3 approved].\n> *F1000Research*, **9**:304. [doi:10.12688/f1000research.23297.2]\n\n> Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., ... & Li, H. (2021). Twelve years of SAMtools and BCFtools. *Gigascience*, > **10(2)**, giab008. [doi:10.1093/gigascience/giab008]\n\n> Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.\n> *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191]\n\n> Li, H. (2021). New strategies to improve minimap2 alignment accuracy.\n> *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705]\n\n> Lianming Du, Qin Liu, Zhenxin Fan, Jie Tang, Xiuyue Zhang, Megan Price, Bisong Yue, Kelei Zhao. Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files. Briefings in Bioinformatics, 2021, 22(4):bbaa368. \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Long read-to-transcript assignment creator",
    "version": "1.1.1",
    "project_urls": {
        "Homepage": "https://github.com/haydenji0731/transigner"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c707a707b7cdf3fd0bb070a8aff41c960b1eca9976537425172a590dc878e9ba",
                "md5": "85f9cc7b279086d254f7e5f48a97ab47",
                "sha256": "a6ce13bed12c53c9f7f8660c1f9262e99122f8cf9a2eb2dcb768fa01c1903850"
            },
            "downloads": -1,
            "filename": "transigner-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "85f9cc7b279086d254f7e5f48a97ab47",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11019,
            "upload_time": "2024-08-16T22:53:13",
            "upload_time_iso_8601": "2024-08-16T22:53:13.002110Z",
            "url": "https://files.pythonhosted.org/packages/c7/07/a707b7cdf3fd0bb070a8aff41c960b1eca9976537425172a590dc878e9ba/transigner-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d095f89ee64d7473124757edf7df4133fa21ade44adde905fc28c7401ab34018",
                "md5": "cb793dd94051f3da63ee7ae0d42e1a93",
                "sha256": "afadaed2e0ea37491134d93cafffe7a12846e169acc40a6134adb3fd8dd04ec0"
            },
            "downloads": -1,
            "filename": "transigner-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "cb793dd94051f3da63ee7ae0d42e1a93",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 11262,
            "upload_time": "2024-08-16T22:53:14",
            "upload_time_iso_8601": "2024-08-16T22:53:14.287465Z",
            "url": "https://files.pythonhosted.org/packages/d0/95/f89ee64d7473124757edf7df4133fa21ade44adde905fc28c7401ab34018/transigner-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-16 22:53:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "haydenji0731",
    "github_project": "transigner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "transigner"
}
        
Elapsed time: 0.58920s