ultra-bioinformatics


Nameultra-bioinformatics JSON
Version 0.1 PyPI version JSON
download
home_pagehttps://github.com/ksahlin/uLTRA
SummarySplice aligner of long transcriptomic reads to genome.
upload_time2023-05-17 12:26:48
maintainer
docs_urlNone
authorKristoffer Sahlin
requires_python!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
license
keywords oxford nanopore transcript long read error correction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            uLTRA
===========
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/ultra_bioinformatics/README.html) [![Build Status](https://travis-ci.org/ksahlin/uLTRA.svg?branch=master)](https://travis-ci.org/ksahlin/uLTRA)


uLTRA is a tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. uLTRA is particularly accurate when aligning to small exons [see some examples](https://github.com/ksahlin/ultra/tree/master/data/images). 

uLTRA is distributed as a python package supported on Linux / OSX with python (versions 3.4 or above). 


Table of Contents
=================

  * [INSTALLATION](#INSTALLATION)
  * [USAGE](#USAGE)
    * [Indexing](#Indexing)
    * [aligning](#Aligning)
    * [Output](#Output)
    * [Parameters](#Parameters)
  * [CREDITS](#CREDITS)
  * [LICENCE](#LICENCE)



INSTALLATION
=================

## Conda recipe

There is a [bioconda recipe](https://bioconda.github.io/recipes/ultra_bioinformatics/README.html), [docker image](https://quay.io/repository/biocontainers/ultra_bioinformatics?tab=tags), and a [singularity container](https://depot.galaxyproject.org/singularity/ultra_bioinformatics%3A0.0.4--pyh5e36f6f_1) of uLTRA v0.0.4 created by [sguizard](https://github.com/sguizard). You can use, e.g., the bioconda recipe for an easy automated installation. 

If a newer version of uLTRA is not available through bioconda (or you simply want more control of/customize your installation), alternative ways of installations are provided below. Current version of uLTRA is 0.1 (see changelog at end of this readme).

## Using the INSTALL.sh script

You can clone this repository and 
run the script `INSTALL.sh` as

```
git clone https://github.com/ksahlin/uLTRA.git --depth 1
cd uLTRA
./INSTALL.sh <install_directory>
```

The install script is tested in bash environment. 

To run uLTRA, you need to activate the conda environment "ultra":

```
conda activate ultra
```

## Without the INSTALL.sh script

You can also manually perform below steps for more control.

#### 1. Create conda environment

Create a conda environment called ultra and activate it

```
conda create -n ultra python=3 pip 
conda activate ultra
```

#### 2. Install uLTRA 

```
pip install ultra-bioinformatics
```

#### 3. Install third party tools 

Install [namfinder](https://github.com/ksahlin/namfinder) and [minimap2](https://github.com/lh3/minimap2) and
place the generated binaries `namfinder` and `minimap2` in your path. 

#### 4. Verify installation

You should now have 'uLTRA' installed; try it

```
uLTRA --help
```

Upon start/login to your server/computer you need to activate the conda environment "ultra" to run uLTRA as:
```
conda activate ultra
```

You can also download and use test data available in this repository [here](https://github.com/ksahlin/ultra/tree/master/test) and run: 

```
uLTRA pipeline [/your/full/path/to/test]/SIRV_genes.fasta  \
               /your/full/path/to/test/SIRV_genes_C_170612a.gtf  \
               [/your/full/path/to/test]/reads.fa outfolder/  [optional parameters]
```



## Entirly from source


Make sure the below-listed dependencies are installed (installation links below). All below dependencies except `namfinder` can be installed as `pip install X` or through conda.
* [parasail](https://github.com/jeffdaily/parasail-python)
* [edlib](https://github.com/Martinsos/edlib)
* [pysam](http://pysam.readthedocs.io/en/latest/installation.html) (>= v0.11)
* [dill](https://pypi.org/project/dill/)
* [intervaltree](https://github.com/chaimleib/intervaltree/tree/master/intervaltree)
* [gffutils](https://pythonhosted.org/gffutils/)
* [namfinder](https://github.com/ksahlin/namfinder)

With these dependencies installed. Run

```sh
git clone https://github.com/ksahlin/uLTRA.git
cd uLTRA
./uLTRA
```


USAGE
----------------

uLTRA can be used with either PacBio Iso-Seq or ONT cDNA/dRNA reads. 

### Indexing

```
uLTRA index genome.fasta  /full/path/to/annotation.gtf  outfolder/  [parameters]
```

Important parameters: 

1. `--disable_infer` can speed up the indexing considerably, but it only works if you have the `gene feature` and `transcript feature` in your GTF file.

### Aligning

For example

```
uLTRA align genome.fasta reads.[fa/fq] outfolder/  --ont --t 8   # ONT cDNA reads using 8 cores
uLTRA align genome.fasta reads.[fa/fq] outfolder/  --isoseq --t 8 # PacBio isoseq reads
```

Important parameters:

1. `--index [PATH]`: You can set a custom location of where to get the index from using, otherwise, uLTRA will try to read the index from the `outfolder/` by default. 
2. `--prefix [PREFIX OF FILE]`: The aligned reads will be written to `outfolder/reads.sam` unless `--prefix` is set. For example, `--prefix sample_X` will output the reads in `outfolder/sample_X.sam`.

### Pipeline

Perform all the steps in one

```
uLTRA pipeline genome.fasta /full/path/to/annotation.gtf reads.fa outfolder/  [parameters]
```

### Common errors

Not having a properly formatted GTF file. Before running uLTRA, notice that it reqires a _properly formatted GTF file_. If you have a GFF file or other annotation format, it is adviced to use [AGAT](https://github.com/NBISweden/AGAT) for file conversion to GTF as many other conversion tools do not respect GTF format. For example, you can run AGAT as:

```
agat_convert_sp_gff2gtf.pl --gff annot.gff3 --gtf annot.gtf
```



CREDITS
----------------

Please cite 

1. Kristoffer Sahlin, Veli Mäkinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, Volume 37, Issue 24, 15 December 2021, Pages 4643–4651, https://doi.org/10.1093/bioinformatics/btab540

when using uLTRA. **Please also cite** [minimap2](https://github.com/lh3/minimap2) as uLTRA incorporates minimap2 for alignment of some genomic reads outside indexed regions. For example "We aligned reads to the genome using uLTRA [1], which incorporates minimap2 [CIT].".




LICENCE
----------------

GPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/uLTRA/blob/master/LICENCE.txt).





            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ksahlin/uLTRA",
    "name": "ultra-bioinformatics",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4",
    "maintainer_email": "",
    "keywords": "Oxford Nanopore transcript long read error correction",
    "author": "Kristoffer Sahlin",
    "author_email": "ksahlin@math.su.se",
    "download_url": "https://files.pythonhosted.org/packages/e2/3a/fe1e24c89ea48d45c10418466527979d68bd6383c749ea4dc33e0fab18dd/ultra_bioinformatics-0.1.tar.gz",
    "platform": null,
    "description": "uLTRA\n===========\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/ultra_bioinformatics/README.html) [![Build Status](https://travis-ci.org/ksahlin/uLTRA.svg?branch=master)](https://travis-ci.org/ksahlin/uLTRA)\n\n\nuLTRA is a tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. uLTRA is particularly accurate when aligning to small exons [see some examples](https://github.com/ksahlin/ultra/tree/master/data/images). \n\nuLTRA is distributed as a python package supported on Linux / OSX with python (versions 3.4 or above). \n\n\nTable of Contents\n=================\n\n  * [INSTALLATION](#INSTALLATION)\n  * [USAGE](#USAGE)\n    * [Indexing](#Indexing)\n    * [aligning](#Aligning)\n    * [Output](#Output)\n    * [Parameters](#Parameters)\n  * [CREDITS](#CREDITS)\n  * [LICENCE](#LICENCE)\n\n\n\nINSTALLATION\n=================\n\n## Conda recipe\n\nThere is a [bioconda recipe](https://bioconda.github.io/recipes/ultra_bioinformatics/README.html), [docker image](https://quay.io/repository/biocontainers/ultra_bioinformatics?tab=tags), and a [singularity container](https://depot.galaxyproject.org/singularity/ultra_bioinformatics%3A0.0.4--pyh5e36f6f_1) of uLTRA v0.0.4 created by [sguizard](https://github.com/sguizard). You can use, e.g., the bioconda recipe for an easy automated installation. \n\nIf a newer version of uLTRA is not available through bioconda (or you simply want more control of/customize your installation), alternative ways of installations are provided below. Current version of uLTRA is 0.1 (see changelog at end of this readme).\n\n## Using the INSTALL.sh script\n\nYou can clone this repository and \nrun the script `INSTALL.sh` as\n\n```\ngit clone https://github.com/ksahlin/uLTRA.git --depth 1\ncd uLTRA\n./INSTALL.sh <install_directory>\n```\n\nThe install script is tested in bash environment. \n\nTo run uLTRA, you need to activate the conda environment \"ultra\":\n\n```\nconda activate ultra\n```\n\n## Without the INSTALL.sh script\n\nYou can also manually perform below steps for more control.\n\n#### 1. Create conda environment\n\nCreate a conda environment called ultra and activate it\n\n```\nconda create -n ultra python=3 pip \nconda activate ultra\n```\n\n#### 2. Install uLTRA \n\n```\npip install ultra-bioinformatics\n```\n\n#### 3. Install third party tools \n\nInstall [namfinder](https://github.com/ksahlin/namfinder) and [minimap2](https://github.com/lh3/minimap2) and\nplace the generated binaries `namfinder` and `minimap2` in your path. \n\n#### 4. Verify installation\n\nYou should now have 'uLTRA' installed; try it\n\n```\nuLTRA --help\n```\n\nUpon start/login to your server/computer you need to activate the conda environment \"ultra\" to run uLTRA as:\n```\nconda activate ultra\n```\n\nYou can also download and use test data available in this repository [here](https://github.com/ksahlin/ultra/tree/master/test) and run: \n\n```\nuLTRA pipeline [/your/full/path/to/test]/SIRV_genes.fasta  \\\n               /your/full/path/to/test/SIRV_genes_C_170612a.gtf  \\\n               [/your/full/path/to/test]/reads.fa outfolder/  [optional parameters]\n```\n\n\n\n## Entirly from source\n\n\nMake sure the below-listed dependencies are installed (installation links below). All below dependencies except `namfinder` can be installed as `pip install X` or through conda.\n* [parasail](https://github.com/jeffdaily/parasail-python)\n* [edlib](https://github.com/Martinsos/edlib)\n* [pysam](http://pysam.readthedocs.io/en/latest/installation.html) (>= v0.11)\n* [dill](https://pypi.org/project/dill/)\n* [intervaltree](https://github.com/chaimleib/intervaltree/tree/master/intervaltree)\n* [gffutils](https://pythonhosted.org/gffutils/)\n* [namfinder](https://github.com/ksahlin/namfinder)\n\nWith these dependencies installed. Run\n\n```sh\ngit clone https://github.com/ksahlin/uLTRA.git\ncd uLTRA\n./uLTRA\n```\n\n\nUSAGE\n----------------\n\nuLTRA can be used with either PacBio Iso-Seq or ONT cDNA/dRNA reads. \n\n### Indexing\n\n```\nuLTRA index genome.fasta  /full/path/to/annotation.gtf  outfolder/  [parameters]\n```\n\nImportant parameters: \n\n1. `--disable_infer` can speed up the indexing considerably, but it only works if you have the `gene feature` and `transcript feature` in your GTF file.\n\n### Aligning\n\nFor example\n\n```\nuLTRA align genome.fasta reads.[fa/fq] outfolder/  --ont --t 8   # ONT cDNA reads using 8 cores\nuLTRA align genome.fasta reads.[fa/fq] outfolder/  --isoseq --t 8 # PacBio isoseq reads\n```\n\nImportant parameters:\n\n1. `--index [PATH]`: You can set a custom location of where to get the index from using, otherwise, uLTRA will try to read the index from the `outfolder/` by default. \n2. `--prefix [PREFIX OF FILE]`: The aligned reads will be written to `outfolder/reads.sam` unless `--prefix` is set. For example, `--prefix sample_X` will output the reads in `outfolder/sample_X.sam`.\n\n### Pipeline\n\nPerform all the steps in one\n\n```\nuLTRA pipeline genome.fasta /full/path/to/annotation.gtf reads.fa outfolder/  [parameters]\n```\n\n### Common errors\n\nNot having a properly formatted GTF file. Before running uLTRA, notice that it reqires a _properly formatted GTF file_. If you have a GFF file or other annotation format, it is adviced to use [AGAT](https://github.com/NBISweden/AGAT) for file conversion to GTF as many other conversion tools do not respect GTF format. For example, you can run AGAT as:\n\n```\nagat_convert_sp_gff2gtf.pl --gff annot.gff3 --gtf annot.gtf\n```\n\n\n\nCREDITS\n----------------\n\nPlease cite \n\n1. Kristoffer Sahlin, Veli M\u00e4kinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, Volume 37, Issue 24, 15 December 2021, Pages 4643\u20134651, https://doi.org/10.1093/bioinformatics/btab540\n\nwhen using uLTRA. **Please also cite** [minimap2](https://github.com/lh3/minimap2) as uLTRA incorporates minimap2 for alignment of some genomic reads outside indexed regions. For example \"We aligned reads to the genome using uLTRA [1], which incorporates minimap2 [CIT].\".\n\n\n\n\nLICENCE\n----------------\n\nGPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/uLTRA/blob/master/LICENCE.txt).\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Splice aligner of long transcriptomic reads to genome.",
    "version": "0.1",
    "project_urls": {
        "Homepage": "https://github.com/ksahlin/uLTRA"
    },
    "split_keywords": [
        "oxford",
        "nanopore",
        "transcript",
        "long",
        "read",
        "error",
        "correction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e23afe1e24c89ea48d45c10418466527979d68bd6383c749ea4dc33e0fab18dd",
                "md5": "ae0b4f5508701c965f48b6c78804607c",
                "sha256": "c477b526cba52fcaf369efb67f0d0096ea1702d7adc8457746a91ae5f750a0ef"
            },
            "downloads": -1,
            "filename": "ultra_bioinformatics-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ae0b4f5508701c965f48b6c78804607c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4",
            "size": 54693,
            "upload_time": "2023-05-17T12:26:48",
            "upload_time_iso_8601": "2023-05-17T12:26:48.877852Z",
            "url": "https://files.pythonhosted.org/packages/e2/3a/fe1e24c89ea48d45c10418466527979d68bd6383c749ea4dc33e0fab18dd/ultra_bioinformatics-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-17 12:26:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ksahlin",
    "github_project": "uLTRA",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ultra-bioinformatics"
}
        
Elapsed time: 0.20752s