# isONform - Reference-free isoform reconstruction from long read sequencing data
# Table of contents
1. [Installation](#installation)
2. [Introduction](#introduction)
3. [Output](#output)
4. [Input data](#Input_data)
5. [Running isONform](#Running)
1. [Running a test](#runtest)
6. [Credits](#credits)
## Installation <a name="installation"></a>
### Via pip
```
pip install isONform
```
This command installs isONforms dependencies:
1. `networkx`
2. `ordered-set`
3. `matplotlib`
4. `parasail`
5. `edlib`
6. `pyinstrument`
7. `namedtuple`
8. `recordclass`
### From github source
1. Create a new environment for isONform (at least python 3.7 required):<br />
`conda create -n isonform python=3.10 pip` <br />
`conda activate isonform` <br />
2. Install isONcorrect and SPOA <br />
`pip install isONcorrect` <br />
`conda install -c bioconda spoa` <br />
3. Install other dependencies of isONform:<br />
`conda install networkx`<br />
`pip install parasail`<br />
4. clone this repository
## Introduction <a name="introduction"></a>
IsONform generates isoforms out of clustered and corrected long reads.
For this a graph is built up using the networkx api and different simplification strategies are applied to it, such as bubble popping and node merging.
The algorithm uses spoa to generate the final isoforms.<br />
## Input data <a name="Input_data"></a>
The isONpipeline takes .fastq files generated with long-read sequencing techniques (ONT or Pacbio) as an input that additionally have been cleaned of barcodes.
Please make sure that you run the isONpipeline on data that have been processed with [LIMA](https://lima.how/) (Pacbio data) or [Pychopper](https://github.com/epi2me-labs/pychopper) (ONT data) so that all the barcodes are removed from the reads
## Running isONform <a name="Running"></a>
To only run the isONform algorithm:<br />
```
isONform_parallel --fastq_folder path/to/input/files --t <nr_cores> --outfolder /path/to/outfolder --split_wrt_batches
```
Note: Please always use absolute paths to the files or folders
The full isON-pipeline (isONclust, isONcorrect, isONform) can be found [here](https://github.com/aljpetri/isONform/blob/master/isON_pipeline.sh) and is run via:
```
./isON_pipeline.sh --raw_reads </absolute/path/to/raw_reads.fq> --outfolder <outfolder> --num_cores <num_cores> --isONform_folder <isONform_folder> --iso_abundance <iso_abundance> --mode <mode>
```
(Please note that this requires isONclust [LINK](https://github.com/ksahlin/isONclust) and isONcorrect [LINK](https://github.com/ksahlin/isONcorrect) to be installed in addition to isONform)
To receive more information about the arguments used for the isON_pipeline script:
```
./isON_pipeline.sh --help
```
## Outputs <a name="Outputs"></a>
IsONform outputs three main files: transcriptome.fasta, mapping.txt, and support.txt.
For each isoform that isONform reconstructs the id has the following form: x_y_z.
'x' denotes the isONclust cluster that the isoform stems from.
As we cluster reads as in isONcorrect in batches of 1000 reads the 'y' denotes from which batch the isoform was reconstructed.
The 'z' denotes a unique identifier which enables us to have unique ids for each isoform that we reconstructed.
In mapping.txt it is indicated from which original reads an isoform has been reconstructed.
support_txt gives the support (i.e. how many original reads make up the isoform).
## Contact <a name="Contact"></a>
If you encounter any problems, please raise an issue on the issues page, you can also contact the developer of this repository via:
alexander.petri[at]math.su.se
## Credits <a name="credits"></a>
Please cite [1] when using isONform.
1. Petri, A. J., & Sahlin, K. (2023). isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics, 39(Supplement_1), i222-i231. https://academic.oup.com/bioinformatics/article/39/Supplement_1/i222/7210488 .
Please additionally cite [2] and [3] when running the full pipeline.
2. Kristoffer Sahlin, Paul Medvedev. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm, Journal of Computational Biology 2020, 27:4, 472-484. [Link](https://www.liebertpub.com/doi/abs/10.1089/cmb.2019.0299).
3. Sahlin, K., Medvedev, P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 12, 2 (2021). https://doi.org/10.1038/s41467-020-20340-8 [Link](https://www.nature.com/articles/s41467-020-20340-8).
Raw data
{
"_id": null,
"home_page": "https://github.com/aljpetri/isONform",
"name": "isONform",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "Oxford Nanopore isoform prediction, Pacific Biosciences isoform prediction",
"author": "Alexander Petri",
"author_email": "alexander.petri@math.su.se",
"download_url": "https://files.pythonhosted.org/packages/25/8b/d1fb4abec0ca1d281e8b70d896db4c7dd20e47f2bf411bac1b5d9b6ab687/isonform-0.3.9.tar.gz",
"platform": null,
"description": "# isONform - Reference-free isoform reconstruction from long read sequencing data\n# Table of contents\n1. [Installation](#installation)\n2. [Introduction](#introduction)\n3. [Output](#output) \n4. [Input data](#Input_data)\n5. [Running isONform](#Running)\n\t1. [Running a test](#runtest)\n6. [Credits](#credits)\n\n## Installation <a name=\"installation\"></a>\n\n\n### Via pip\n```\npip install isONform\n```\n\nThis command installs isONforms dependencies:\n\n1. `networkx`\n2. `ordered-set`\n3. `matplotlib`\n4. `parasail`\n5. `edlib`\n6. `pyinstrument`\n7. `namedtuple`\n8. `recordclass`\n\n\n### From github source\n1. Create a new environment for isONform (at least python 3.7 required):<br />\n\t\t`conda create -n isonform python=3.10 pip` <br />\n\t\t`conda activate isonform` <br />\n2. Install isONcorrect and SPOA <br />\n\t\t`pip install isONcorrect` <br />\n\t\t`conda install -c bioconda spoa` <br />\n3. Install other dependencies of isONform:<br />\n\t\t`conda install networkx`<br />\n\t\t`pip install parasail`<br />\n\n4. clone this repository\n\n\n## Introduction <a name=\"introduction\"></a>\n\nIsONform generates isoforms out of clustered and corrected long reads.\nFor this a graph is built up using the networkx api and different simplification strategies are applied to it, such as bubble popping and node merging.\nThe algorithm uses spoa to generate the final isoforms.<br />\n## Input data <a name=\"Input_data\"></a>\nThe isONpipeline takes .fastq files generated with long-read sequencing techniques (ONT or Pacbio) as an input that additionally have been cleaned of barcodes.\nPlease make sure that you run the isONpipeline on data that have been processed with [LIMA](https://lima.how/) (Pacbio data) or [Pychopper](https://github.com/epi2me-labs/pychopper) (ONT data) so that all the barcodes are removed from the reads\n\n## Running isONform <a name=\"Running\"></a>\n\nTo only run the isONform algorithm:<br />\n\n\n```\nisONform_parallel --fastq_folder path/to/input/files --t <nr_cores> --outfolder /path/to/outfolder --split_wrt_batches \n```\n\nNote: Please always use absolute paths to the files or folders\n\nThe full isON-pipeline (isONclust, isONcorrect, isONform) can be found [here](https://github.com/aljpetri/isONform/blob/master/isON_pipeline.sh) and is run via:\n\n```\n./isON_pipeline.sh --raw_reads </absolute/path/to/raw_reads.fq> --outfolder <outfolder> --num_cores <num_cores> --isONform_folder <isONform_folder> --iso_abundance <iso_abundance> --mode <mode>\n```\n(Please note that this requires isONclust [LINK](https://github.com/ksahlin/isONclust) and isONcorrect [LINK](https://github.com/ksahlin/isONcorrect) to be installed in addition to isONform)\n\nTo receive more information about the arguments used for the isON_pipeline script:\n```\n./isON_pipeline.sh --help\n```\n\n## Outputs <a name=\"Outputs\"></a>\nIsONform outputs three main files: transcriptome.fasta, mapping.txt, and support.txt.\nFor each isoform that isONform reconstructs the id has the following form: x_y_z.\n\n'x' denotes the isONclust cluster that the isoform stems from.\nAs we cluster reads as in isONcorrect in batches of 1000 reads the 'y' denotes from which batch the isoform was reconstructed.\nThe 'z' denotes a unique identifier which enables us to have unique ids for each isoform that we reconstructed.\nIn mapping.txt it is indicated from which original reads an isoform has been reconstructed.\nsupport_txt gives the support (i.e. how many original reads make up the isoform).\n\n## Contact <a name=\"Contact\"></a>\nIf you encounter any problems, please raise an issue on the issues page, you can also contact the developer of this repository via:\nalexander.petri[at]math.su.se\n\n\n## Credits <a name=\"credits\"></a>\n\nPlease cite [1] when using isONform.\n\n1. Petri, A. J., & Sahlin, K. (2023). isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics, 39(Supplement_1), i222-i231. https://academic.oup.com/bioinformatics/article/39/Supplement_1/i222/7210488 .\n\nPlease additionally cite [2] and [3] when running the full pipeline.\n\n2. Kristoffer Sahlin, Paul Medvedev. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm, Journal of Computational Biology 2020, 27:4, 472-484. [Link](https://www.liebertpub.com/doi/abs/10.1089/cmb.2019.0299).\n3. Sahlin, K., Medvedev, P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 12, 2 (2021). https://doi.org/10.1038/s41467-020-20340-8 [Link](https://www.nature.com/articles/s41467-020-20340-8).\n",
"bugtrack_url": null,
"license": null,
"summary": "De novo construction of isoforms from long-read data",
"version": "0.3.9",
"project_urls": {
"Homepage": "https://github.com/aljpetri/isONform"
},
"split_keywords": [
"oxford nanopore isoform prediction",
" pacific biosciences isoform prediction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c8288885891a4f74762d04cdb15a506926ae05ae9bf1abc0ce682aba39c09e4f",
"md5": "6b4c97b3207706f60a88858fa73d9974",
"sha256": "502259f8772eb39827684f90b9bc92f0d892453d06e6f39c3be4c187b7d54703"
},
"downloads": -1,
"filename": "isONform-0.3.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6b4c97b3207706f60a88858fa73d9974",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 65763,
"upload_time": "2024-11-26T11:49:03",
"upload_time_iso_8601": "2024-11-26T11:49:03.724917Z",
"url": "https://files.pythonhosted.org/packages/c8/28/8885891a4f74762d04cdb15a506926ae05ae9bf1abc0ce682aba39c09e4f/isONform-0.3.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "258bd1fb4abec0ca1d281e8b70d896db4c7dd20e47f2bf411bac1b5d9b6ab687",
"md5": "27525846c42df061602e5464f302f104",
"sha256": "744efc1ed4ea1247687cdc2c3d4a3361a1c9355beedda06270cca81448f1f43e"
},
"downloads": -1,
"filename": "isonform-0.3.9.tar.gz",
"has_sig": false,
"md5_digest": "27525846c42df061602e5464f302f104",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 61247,
"upload_time": "2024-11-26T11:49:07",
"upload_time_iso_8601": "2024-11-26T11:49:07.491563Z",
"url": "https://files.pythonhosted.org/packages/25/8b/d1fb4abec0ca1d281e8b70d896db4c7dd20e47f2bf411bac1b5d9b6ab687/isonform-0.3.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-26 11:49:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aljpetri",
"github_project": "isONform",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "networkx",
"specs": [
[
"==",
"2.8.4"
]
]
},
{
"name": "parasail",
"specs": [
[
"==",
"1.2"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"70.0.0"
]
]
}
],
"lcname": "isonform"
}