xron

Name	xron JSON
Version	1.0.7 JSON
	download
home_page	https://github.com/haotianteng/Xron
Summary	A deep neural network basecaller for nanopore sequencing.
upload_time	2023-10-29 04:52:38
maintainer
docs_url	None
author	Haotian Teng
requires_python
license	GPL 3.0
keywords	basecaller nanopore sequencing neural network rna methylation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![xron_logo](https://github.com/haotianteng/Xron/blob/master/docs/images/xron_logo.png)
Xron (ˈkairɑn) is a methylation basecaller that could identify m6A methylation modification from ONT direct RNA sequencing.  
Using a deep learning CNN+RNN+CTC structure to establish end-to-end basecalling for the nanopore sequencer.  
The name is inherited from [Chiron](https://github.com/haotianteng/Chiron)
Built with **PyTorch** and python 3.8+

<!--
%If you found Xron useful, please consider to cite:  
%Cite paper need to be released
-->


m6A-aware RNA basecall one-liner:
```
xron call -i <input_fast5_folder> -o <output_folder> -m models/ENEYFT --boostnano
```

---
## Table of contents

- [Table of contents](#table-of-contents)
- [Install](#install)
  - [Install from Source](#install-from-source)
  - [Install from Pypi](#install-from-pypi)
- [Basecall](#basecall)
- [Segmentation using NHMM](#segmentation-using-nhmm)
  - [Prepare chunk dataset](#prepare-chunk-dataset)
  - [Realign the signal using NHMM.](#realign-the-signal-using-nhmm)
- [Training](#training)
  
## Install
For either installation method, recommend to create a vritual environment first using conda or venv, take conda for example
```bash
conda create --name YOUR_VIRTUAL_ENVIRONMENT python=3.8
conda activate YOUR_VIRTUAL_ENVIRONMENT
```
Then you can install from our pypi repository or install the newest version from github repository.

### Install
```bash
pip install xron
```
Xron requires at least PyTorch 1.11.0 to be installed. If you have not yet installed PyTorch, install it via guide from [official repository](https://pytorch.org/get-started/locally/).
## Basecall
Before running basecall using Xron, you need to download the models from our AWS s3 bucket by running **xron init**
```bash
xron init
```
This will automatically download the models and put them into the *models* folder.
We provided sample code in xron-samples folder to achieve m6A-aware basecall and identify m6A site. 
To run xron on raw fast5 files:
```
xron call -i ${INPUT_FAST5} -o ${OUTPUT} -m models/ENEYFT --fast5 --beam 50 --chunk_len 2000
```

## Segmentation using NHMM
### Prepare chunk dataset
Xron also include a non-homegeneous HMM (NHMM) for signal re-sqquigle. To use it:
Firstly we need to extract the chunk and basecalled sequence using **prepare** module
```bash
xron prepare -i ${FAST5_FOLDER} -o ${CHUNK_FOLDER} --extract_seq --basecaller guppy --reference ${REFERENCE} --mode rna_meth --extract_kmer -k 5 --chunk_len 4000 --write_correction
```
Replace the FAST5_FOLDER, CHUNK_FOLDER and REFERENCE with your basecalled fast5 file folder, your output folder and the path to the reference genome fasta file.

### Realign the signal using NHMM.
Then run the NHMM to realign ("resquiggle") the signal.
```bash
xron relabel -i ${CHUNK_FOLDER} -m ${MODEL} --device $DEVICE
```
This will generate a paths.py file under CHUNK_FOLDER which gives the kmer segmentation of the chunks.

## Training
To train a new Xron model using your own dataset, you need to prepare your own training dataset, the dataset should includes a signal file (chunks.npy), labelled sequences (seqs.npy) and sequence length for each read (seq_lens.npy), and then run the xron supervised training module
```bash
xron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder OUTPUT_MODEL_FOLDER
```
Training Xron model from scratch is hard, I would recommend to fine-tune our model by specify --load flag, for example we can finetune the provided ENEYFT model (model trained using cross-linked ENE dataset and finetuned on Yeast dataset):
```bash
xron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder models/ENEYFT --load
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/haotianteng/Xron",
    "name": "xron",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "basecaller,nanopore,sequencing,neural network,RNA methylation",
    "author": "Haotian Teng",
    "author_email": "havens.teng@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/aa/50/182c84813b05518fc040b3fab2067a3254cf90d49ff087e6d024ef65cef2/xron-1.0.7.tar.gz",
    "platform": null,
    "description": "![xron_logo](https://github.com/haotianteng/Xron/blob/master/docs/images/xron_logo.png)\nXron (\u02c8kair\u0251n) is a methylation basecaller that could identify m6A methylation modification from ONT direct RNA sequencing.  \nUsing a deep learning CNN+RNN+CTC structure to establish end-to-end basecalling for the nanopore sequencer.  \nThe name is inherited from [Chiron](https://github.com/haotianteng/Chiron)\nBuilt with **PyTorch** and python 3.8+\n\n<!--\n%If you found Xron useful, please consider to cite:  \n%Cite paper need to be released\n-->\n\n\nm6A-aware RNA basecall one-liner:\n```\nxron call -i <input_fast5_folder> -o <output_folder> -m models/ENEYFT --boostnano\n```\n\n---\n## Table of contents\n\n- [Table of contents](#table-of-contents)\n- [Install](#install)\n  - [Install from Source](#install-from-source)\n  - [Install from Pypi](#install-from-pypi)\n- [Basecall](#basecall)\n- [Segmentation using NHMM](#segmentation-using-nhmm)\n  - [Prepare chunk dataset](#prepare-chunk-dataset)\n  - [Realign the signal using NHMM.](#realign-the-signal-using-nhmm)\n- [Training](#training)\n  \n## Install\nFor either installation method, recommend to create a vritual environment first using conda or venv, take conda for example\n```bash\nconda create --name YOUR_VIRTUAL_ENVIRONMENT python=3.8\nconda activate YOUR_VIRTUAL_ENVIRONMENT\n```\nThen you can install from our pypi repository or install the newest version from github repository.\n\n### Install\n```bash\npip install xron\n```\nXron requires at least PyTorch 1.11.0 to be installed. If you have not yet installed PyTorch, install it via guide from [official repository](https://pytorch.org/get-started/locally/).\n## Basecall\nBefore running basecall using Xron, you need to download the models from our AWS s3 bucket by running **xron init**\n```bash\nxron init\n```\nThis will automatically download the models and put them into the *models* folder.\nWe provided sample code in xron-samples folder to achieve m6A-aware basecall and identify m6A site. \nTo run xron on raw fast5 files:\n```\nxron call -i ${INPUT_FAST5} -o ${OUTPUT} -m models/ENEYFT --fast5 --beam 50 --chunk_len 2000\n```\n\n## Segmentation using NHMM\n### Prepare chunk dataset\nXron also include a non-homegeneous HMM (NHMM) for signal re-sqquigle. To use it:\nFirstly we need to extract the chunk and basecalled sequence using **prepare** module\n```bash\nxron prepare -i ${FAST5_FOLDER} -o ${CHUNK_FOLDER} --extract_seq --basecaller guppy --reference ${REFERENCE} --mode rna_meth --extract_kmer -k 5 --chunk_len 4000 --write_correction\n```\nReplace the FAST5_FOLDER, CHUNK_FOLDER and REFERENCE with your basecalled fast5 file folder, your output folder and the path to the reference genome fasta file.\n\n### Realign the signal using NHMM.\nThen run the NHMM to realign (\"resquiggle\") the signal.\n```bash\nxron relabel -i ${CHUNK_FOLDER} -m ${MODEL} --device $DEVICE\n```\nThis will generate a paths.py file under CHUNK_FOLDER which gives the kmer segmentation of the chunks.\n\n## Training\nTo train a new Xron model using your own dataset, you need to prepare your own training dataset, the dataset should includes a signal file (chunks.npy), labelled sequences (seqs.npy) and sequence length for each read (seq_lens.npy), and then run the xron supervised training module\n```bash\nxron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder OUTPUT_MODEL_FOLDER\n```\nTraining Xron model from scratch is hard, I would recommend to fine-tune our model by specify --load flag, for example we can finetune the provided ENEYFT model (model trained using cross-linked ENE dataset and finetuned on Yeast dataset):\n```bash\nxron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder models/ENEYFT --load\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": "GPL 3.0",
    "summary": "A deep neural network basecaller for nanopore sequencing.",
    "version": "1.0.7",
    "project_urls": {
        "Download": "https://github.com/haotianteng/Xron/archive/1.0.0.tar.gz",
        "Homepage": "https://github.com/haotianteng/Xron"
    },
    "split_keywords": [
        "basecaller",
        "nanopore",
        "sequencing",
        "neural network",
        "rna methylation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa50182c84813b05518fc040b3fab2067a3254cf90d49ff087e6d024ef65cef2",
                "md5": "d37203d2a3c35835f701c07b8bbafa23",
                "sha256": "75980d86776433214edf49b814200a126a2c60d6ecfab895212467f492dd0ec8"
            },
            "downloads": -1,
            "filename": "xron-1.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "d37203d2a3c35835f701c07b8bbafa23",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 134877,
            "upload_time": "2023-10-29T04:52:38",
            "upload_time_iso_8601": "2023-10-29T04:52:38.769983Z",
            "url": "https://files.pythonhosted.org/packages/aa/50/182c84813b05518fc040b3fab2067a3254cf90d49ff087e6d024ef65cef2/xron-1.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-29 04:52:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "haotianteng",
    "github_project": "Xron",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "xron"
}

Haotian Teng