uLTRA
===========
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/ultra_bioinformatics/README.html) [![Build Status](https://travis-ci.org/ksahlin/uLTRA.svg?branch=master)](https://travis-ci.org/ksahlin/uLTRA)
uLTRA is a tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. uLTRA is particularly accurate when aligning to small exons [see some examples](https://github.com/ksahlin/ultra/tree/master/data/images).
uLTRA is distributed as a python package supported on Linux / OSX with python (versions 3.4 or above).
Table of Contents
=================
* [INSTALLATION](#INSTALLATION)
* [USAGE](#USAGE)
* [Indexing](#Indexing)
* [aligning](#Aligning)
* [Output](#Output)
* [Parameters](#Parameters)
* [CREDITS](#CREDITS)
* [LICENCE](#LICENCE)
INSTALLATION
=================
## Conda recipe
There is a [bioconda recipe](https://bioconda.github.io/recipes/ultra_bioinformatics/README.html), [docker image](https://quay.io/repository/biocontainers/ultra_bioinformatics?tab=tags), and a [singularity container](https://depot.galaxyproject.org/singularity/ultra_bioinformatics%3A0.0.4--pyh5e36f6f_1) of uLTRA v0.0.4 created by [sguizard](https://github.com/sguizard). You can use, e.g., the bioconda recipe for an easy automated installation.
If a newer version of uLTRA is not available through bioconda (or you simply want more control of/customize your installation), alternative ways of installations are provided below. Current version of uLTRA is 0.1 (see changelog at end of this readme).
## Using the INSTALL.sh script
You can clone this repository and
run the script `INSTALL.sh` as
```
git clone https://github.com/ksahlin/uLTRA.git --depth 1
cd uLTRA
./INSTALL.sh <install_directory>
```
The install script is tested in bash environment.
To run uLTRA, you need to activate the conda environment "ultra":
```
conda activate ultra
```
## Without the INSTALL.sh script
You can also manually perform below steps for more control.
#### 1. Create conda environment
Create a conda environment called ultra and activate it
```
conda create -n ultra python=3 pip
conda activate ultra
```
#### 2. Install uLTRA
```
pip install ultra-bioinformatics
```
#### 3. Install third party tools
Install [namfinder](https://github.com/ksahlin/namfinder) and [minimap2](https://github.com/lh3/minimap2) and
place the generated binaries `namfinder` and `minimap2` in your path.
#### 4. Verify installation
You should now have 'uLTRA' installed; try it
```
uLTRA --help
```
Upon start/login to your server/computer you need to activate the conda environment "ultra" to run uLTRA as:
```
conda activate ultra
```
You can also download and use test data available in this repository [here](https://github.com/ksahlin/ultra/tree/master/test) and run:
```
uLTRA pipeline [/your/full/path/to/test]/SIRV_genes.fasta \
/your/full/path/to/test/SIRV_genes_C_170612a.gtf \
[/your/full/path/to/test]/reads.fa outfolder/ [optional parameters]
```
## Entirly from source
Make sure the below-listed dependencies are installed (installation links below). All below dependencies except `namfinder` can be installed as `pip install X` or through conda.
* [parasail](https://github.com/jeffdaily/parasail-python)
* [edlib](https://github.com/Martinsos/edlib)
* [pysam](http://pysam.readthedocs.io/en/latest/installation.html) (>= v0.11)
* [dill](https://pypi.org/project/dill/)
* [intervaltree](https://github.com/chaimleib/intervaltree/tree/master/intervaltree)
* [gffutils](https://pythonhosted.org/gffutils/)
* [namfinder](https://github.com/ksahlin/namfinder)
With these dependencies installed. Run
```sh
git clone https://github.com/ksahlin/uLTRA.git
cd uLTRA
./uLTRA
```
USAGE
----------------
uLTRA can be used with either PacBio Iso-Seq or ONT cDNA/dRNA reads.
### Indexing
```
uLTRA index genome.fasta /full/path/to/annotation.gtf outfolder/ [parameters]
```
Important parameters:
1. `--disable_infer` can speed up the indexing considerably, but it only works if you have the `gene feature` and `transcript feature` in your GTF file.
### Aligning
For example
```
uLTRA align genome.fasta reads.[fa/fq] outfolder/ --ont --t 8 # ONT cDNA reads using 8 cores
uLTRA align genome.fasta reads.[fa/fq] outfolder/ --isoseq --t 8 # PacBio isoseq reads
```
Important parameters:
1. `--index [PATH]`: You can set a custom location of where to get the index from using, otherwise, uLTRA will try to read the index from the `outfolder/` by default.
2. `--prefix [PREFIX OF FILE]`: The aligned reads will be written to `outfolder/reads.sam` unless `--prefix` is set. For example, `--prefix sample_X` will output the reads in `outfolder/sample_X.sam`.
### Pipeline
Perform all the steps in one
```
uLTRA pipeline genome.fasta /full/path/to/annotation.gtf reads.fa outfolder/ [parameters]
```
### Common errors
Not having a properly formatted GTF file. Before running uLTRA, notice that it reqires a _properly formatted GTF file_. If you have a GFF file or other annotation format, it is adviced to use [AGAT](https://github.com/NBISweden/AGAT) for file conversion to GTF as many other conversion tools do not respect GTF format. For example, you can run AGAT as:
```
agat_convert_sp_gff2gtf.pl --gff annot.gff3 --gtf annot.gtf
```
CREDITS
----------------
Please cite
1. Kristoffer Sahlin, Veli Mäkinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, Volume 37, Issue 24, 15 December 2021, Pages 4643–4651, https://doi.org/10.1093/bioinformatics/btab540
when using uLTRA. **Please also cite** [minimap2](https://github.com/lh3/minimap2) as uLTRA incorporates minimap2 for alignment of some genomic reads outside indexed regions. For example "We aligned reads to the genome using uLTRA [1], which incorporates minimap2 [CIT].".
LICENCE
----------------
GPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/uLTRA/blob/master/LICENCE.txt).
Raw data
{
"_id": null,
"home_page": "https://github.com/ksahlin/uLTRA",
"name": "ultra-bioinformatics",
"maintainer": "",
"docs_url": null,
"requires_python": "!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4",
"maintainer_email": "",
"keywords": "Oxford Nanopore transcript long read error correction",
"author": "Kristoffer Sahlin",
"author_email": "ksahlin@math.su.se",
"download_url": "https://files.pythonhosted.org/packages/e2/3a/fe1e24c89ea48d45c10418466527979d68bd6383c749ea4dc33e0fab18dd/ultra_bioinformatics-0.1.tar.gz",
"platform": null,
"description": "uLTRA\n===========\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/ultra_bioinformatics/README.html) [![Build Status](https://travis-ci.org/ksahlin/uLTRA.svg?branch=master)](https://travis-ci.org/ksahlin/uLTRA)\n\n\nuLTRA is a tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. uLTRA is particularly accurate when aligning to small exons [see some examples](https://github.com/ksahlin/ultra/tree/master/data/images). \n\nuLTRA is distributed as a python package supported on Linux / OSX with python (versions 3.4 or above). \n\n\nTable of Contents\n=================\n\n * [INSTALLATION](#INSTALLATION)\n * [USAGE](#USAGE)\n * [Indexing](#Indexing)\n * [aligning](#Aligning)\n * [Output](#Output)\n * [Parameters](#Parameters)\n * [CREDITS](#CREDITS)\n * [LICENCE](#LICENCE)\n\n\n\nINSTALLATION\n=================\n\n## Conda recipe\n\nThere is a [bioconda recipe](https://bioconda.github.io/recipes/ultra_bioinformatics/README.html), [docker image](https://quay.io/repository/biocontainers/ultra_bioinformatics?tab=tags), and a [singularity container](https://depot.galaxyproject.org/singularity/ultra_bioinformatics%3A0.0.4--pyh5e36f6f_1) of uLTRA v0.0.4 created by [sguizard](https://github.com/sguizard). You can use, e.g., the bioconda recipe for an easy automated installation. \n\nIf a newer version of uLTRA is not available through bioconda (or you simply want more control of/customize your installation), alternative ways of installations are provided below. Current version of uLTRA is 0.1 (see changelog at end of this readme).\n\n## Using the INSTALL.sh script\n\nYou can clone this repository and \nrun the script `INSTALL.sh` as\n\n```\ngit clone https://github.com/ksahlin/uLTRA.git --depth 1\ncd uLTRA\n./INSTALL.sh <install_directory>\n```\n\nThe install script is tested in bash environment. \n\nTo run uLTRA, you need to activate the conda environment \"ultra\":\n\n```\nconda activate ultra\n```\n\n## Without the INSTALL.sh script\n\nYou can also manually perform below steps for more control.\n\n#### 1. Create conda environment\n\nCreate a conda environment called ultra and activate it\n\n```\nconda create -n ultra python=3 pip \nconda activate ultra\n```\n\n#### 2. Install uLTRA \n\n```\npip install ultra-bioinformatics\n```\n\n#### 3. Install third party tools \n\nInstall [namfinder](https://github.com/ksahlin/namfinder) and [minimap2](https://github.com/lh3/minimap2) and\nplace the generated binaries `namfinder` and `minimap2` in your path. \n\n#### 4. Verify installation\n\nYou should now have 'uLTRA' installed; try it\n\n```\nuLTRA --help\n```\n\nUpon start/login to your server/computer you need to activate the conda environment \"ultra\" to run uLTRA as:\n```\nconda activate ultra\n```\n\nYou can also download and use test data available in this repository [here](https://github.com/ksahlin/ultra/tree/master/test) and run: \n\n```\nuLTRA pipeline [/your/full/path/to/test]/SIRV_genes.fasta \\\n /your/full/path/to/test/SIRV_genes_C_170612a.gtf \\\n [/your/full/path/to/test]/reads.fa outfolder/ [optional parameters]\n```\n\n\n\n## Entirly from source\n\n\nMake sure the below-listed dependencies are installed (installation links below). All below dependencies except `namfinder` can be installed as `pip install X` or through conda.\n* [parasail](https://github.com/jeffdaily/parasail-python)\n* [edlib](https://github.com/Martinsos/edlib)\n* [pysam](http://pysam.readthedocs.io/en/latest/installation.html) (>= v0.11)\n* [dill](https://pypi.org/project/dill/)\n* [intervaltree](https://github.com/chaimleib/intervaltree/tree/master/intervaltree)\n* [gffutils](https://pythonhosted.org/gffutils/)\n* [namfinder](https://github.com/ksahlin/namfinder)\n\nWith these dependencies installed. Run\n\n```sh\ngit clone https://github.com/ksahlin/uLTRA.git\ncd uLTRA\n./uLTRA\n```\n\n\nUSAGE\n----------------\n\nuLTRA can be used with either PacBio Iso-Seq or ONT cDNA/dRNA reads. \n\n### Indexing\n\n```\nuLTRA index genome.fasta /full/path/to/annotation.gtf outfolder/ [parameters]\n```\n\nImportant parameters: \n\n1. `--disable_infer` can speed up the indexing considerably, but it only works if you have the `gene feature` and `transcript feature` in your GTF file.\n\n### Aligning\n\nFor example\n\n```\nuLTRA align genome.fasta reads.[fa/fq] outfolder/ --ont --t 8 # ONT cDNA reads using 8 cores\nuLTRA align genome.fasta reads.[fa/fq] outfolder/ --isoseq --t 8 # PacBio isoseq reads\n```\n\nImportant parameters:\n\n1. `--index [PATH]`: You can set a custom location of where to get the index from using, otherwise, uLTRA will try to read the index from the `outfolder/` by default. \n2. `--prefix [PREFIX OF FILE]`: The aligned reads will be written to `outfolder/reads.sam` unless `--prefix` is set. For example, `--prefix sample_X` will output the reads in `outfolder/sample_X.sam`.\n\n### Pipeline\n\nPerform all the steps in one\n\n```\nuLTRA pipeline genome.fasta /full/path/to/annotation.gtf reads.fa outfolder/ [parameters]\n```\n\n### Common errors\n\nNot having a properly formatted GTF file. Before running uLTRA, notice that it reqires a _properly formatted GTF file_. If you have a GFF file or other annotation format, it is adviced to use [AGAT](https://github.com/NBISweden/AGAT) for file conversion to GTF as many other conversion tools do not respect GTF format. For example, you can run AGAT as:\n\n```\nagat_convert_sp_gff2gtf.pl --gff annot.gff3 --gtf annot.gtf\n```\n\n\n\nCREDITS\n----------------\n\nPlease cite \n\n1. Kristoffer Sahlin, Veli M\u00e4kinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, Volume 37, Issue 24, 15 December 2021, Pages 4643\u20134651, https://doi.org/10.1093/bioinformatics/btab540\n\nwhen using uLTRA. **Please also cite** [minimap2](https://github.com/lh3/minimap2) as uLTRA incorporates minimap2 for alignment of some genomic reads outside indexed regions. For example \"We aligned reads to the genome using uLTRA [1], which incorporates minimap2 [CIT].\".\n\n\n\n\nLICENCE\n----------------\n\nGPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/uLTRA/blob/master/LICENCE.txt).\n\n\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Splice aligner of long transcriptomic reads to genome.",
"version": "0.1",
"project_urls": {
"Homepage": "https://github.com/ksahlin/uLTRA"
},
"split_keywords": [
"oxford",
"nanopore",
"transcript",
"long",
"read",
"error",
"correction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e23afe1e24c89ea48d45c10418466527979d68bd6383c749ea4dc33e0fab18dd",
"md5": "ae0b4f5508701c965f48b6c78804607c",
"sha256": "c477b526cba52fcaf369efb67f0d0096ea1702d7adc8457746a91ae5f750a0ef"
},
"downloads": -1,
"filename": "ultra_bioinformatics-0.1.tar.gz",
"has_sig": false,
"md5_digest": "ae0b4f5508701c965f48b6c78804607c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4",
"size": 54693,
"upload_time": "2023-05-17T12:26:48",
"upload_time_iso_8601": "2023-05-17T12:26:48.877852Z",
"url": "https://files.pythonhosted.org/packages/e2/3a/fe1e24c89ea48d45c10418466527979d68bd6383c749ea4dc33e0fab18dd/ultra_bioinformatics-0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-17 12:26:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ksahlin",
"github_project": "uLTRA",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"lcname": "ultra-bioinformatics"
}