# LAPA
[![pypi](https://img.shields.io/pypi/v/lapa.svg)](https://pypi.python.org/pypi/lapa)
[![tests](https://github.com/mortazavilab/lapa/actions/workflows/python-app.yml/badge.svg)](https://github.com/mortazavilab/lapa/actions)
[![codecov](https://codecov.io/gh/mortazavilab/lapa/branch/master/graph/badge.svg?token=MJQ88T8JWK)](https://codecov.io/gh/mortazavilab/lapa)
[![Documentation Status](https://readthedocs.org/projects/lapa/badge/?version=latest)](https://lapa.readthedocs.io/en/latest/?badge=latest)
Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
![method](docs/method.png)
## Installation
```
pip install lapa
```
## Poly(A) site calling from long-read RNA-seq or 3'-seq
```
lapa --alignment {rep1.bam},{rep2.bam},{rep3.bam} \
--fasta {fasta} \
--annotation {gtf} \
--chrom_sizes {chrom_sizes} \
--output_dir {output}
```
Argument details ([doc](https://lapa.readthedocs.io/en/tss_clustering/cli.html#lapa)):
```
$ lapa --help
Usage: lapa [OPTIONS]
CLI interface for lapa polyA cluster calling.
Options:
--alignment TEXT Single or multiple bam file paths are
separated with a comma.Alternatively, CSV
file with columns of sample, dataset, path
where the sample columns contains the name
of the sample, the dataset is the group of
samples replicates of each other, and path
is the path of bam file. [required]
--fasta TEXT Genome reference (GENCODE or ENSEMBL fasta)
[required]
--annotation TEXT Standart genome annotation (GENCODE or
ENSEMBL gtf). GENCODE gtf file do not
contains annotation for `five_prime_utr` and
`three_prime_utr` so need to be corrected
with `gencode_utr_fix` (see https://github.c
om/MuhammedHasan/gencode_utr_fix.git).
[required]
--chrom_sizes TEXT Chrom sizes files (can be generated with
`faidx fasta -i chromsizes > chrom_sizes`)
[required]
--output_dir TEXT Output directory of LAPA. See
lapa.readthedocs.io/en/latest/output.html)
for the details of the directory structure
and file format. [required]
...
```
Recommend setting is including all the samples with thier biosample/experimental replicates (tissue, cell line):
samples.csv
```
sample,dataset,path
ENCFF772LYG,myoblast,ENCFF772LYG.bam
ENCFF421MIL,myoblast,ENCFF421MIL.bam
ENCFF699KOR,myotube,ENCFF699KOR.bam
ENCFF731HHB,myotube,ENCFF731HHB.bam
```
then LAPA takes samples_config.csv as input:
```
lapa --alignment samples.csv \
--fasta {fasta} \
--annotation {gtf} \
--chrom_sizes {chrom_sizes} \
--output_dir {output}
...
```
## TSS calling from long-read RNA-seq
```
lapa_tss --alignment samples.csv \
--fasta {fasta} \
--annotation {gtf} \
--chrom_sizes {chrom_sizes} \
--output_dir {output}
```
Argument details ([doc](https://lapa.readthedocs.io/en/latest/cli.html#lapa-tss)):
```
$ lapa_tss --help
Usage: lapa_tss [OPTIONS]
CLI interface for lapa tss cluster calling.
Options:
--alignment TEXT Single or multiple bam file paths are
separated with a comma.Alternatively, CSV
file with columns of sample, dataset, path
where the sample columns contains the name
of the sample, the dataset is the group of
samples replicates of each other, and path
is the path of bam file. [required]
--fasta TEXT Genome reference (GENCODE or ENSEMBL fasta)
[required]
--annotation TEXT Standart genome annotation (GENCODE or
ENSEMBL gtf). GENCODE gtf file do not
contains annotation for `five_prime_utr` and
`three_prime_utr` so need to be corrected
with `gencode_utr_fix` (see https://github.c
om/MuhammedHasan/gencode_utr_fix.git)
[required]
--chrom_sizes TEXT Chrom sizes files (can be generated
with)`faidx fasta -i chromsizes >
chrom_sizes`) [required]
--output_dir TEXT Output directory of LAPA. See
lapa.readthedocs.io/en/latest/output.html)
for the details of the directory structure
and file format. [required]
```
## Documentation
See the following documentation links for other features, parameters of LAPA, python api and statistical testing:
Readthedocs: https://lapa.readthedocs.io/en/latest/index.html
API reference: https://lapa.readthedocs.io/en/latest/autoapi/index.html
Colab tutorials (analysis of myoblast myotube cell differentiation): https://colab.research.google.com/drive/1QzMxCRjCk3i5_MuHzjozSRWMaJgdEdSI?usp=sharing
## Cite
If you are using LAPA on academic studies cite the following paper:
```
@article{celik2022analysis,
title={Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA},
author={Celik, Muhammed Hasan and Mortazavi, Ali},
journal={bioRxiv},
year={2022},
publisher={Cold Spring Harbor Laboratory}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/mortazavilab/lapa",
"name": "lapa",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "genomics,long read RNA-seq,APA",
"author": "M. Hasan \u00c7elik",
"author_email": "muhammedhasancelik@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/64/cc/e9059f9e9108f5a9a22e08164e29be3a937bab6c7636a8c2e51bf5b85ceb/lapa-0.0.5.tar.gz",
"platform": null,
"description": "# LAPA\n\n[![pypi](https://img.shields.io/pypi/v/lapa.svg)](https://pypi.python.org/pypi/lapa)\n[![tests](https://github.com/mortazavilab/lapa/actions/workflows/python-app.yml/badge.svg)](https://github.com/mortazavilab/lapa/actions)\n[![codecov](https://codecov.io/gh/mortazavilab/lapa/branch/master/graph/badge.svg?token=MJQ88T8JWK)](https://codecov.io/gh/mortazavilab/lapa)\n[![Documentation Status](https://readthedocs.org/projects/lapa/badge/?version=latest)](https://lapa.readthedocs.io/en/latest/?badge=latest)\n\nAlternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.\n\n![method](docs/method.png)\n\n## Installation\n\n```\npip install lapa\n```\n\n## Poly(A) site calling from long-read RNA-seq or 3'-seq\n\n```\nlapa --alignment {rep1.bam},{rep2.bam},{rep3.bam} \\\n\t--fasta {fasta} \\\n\t--annotation {gtf} \\\n\t--chrom_sizes {chrom_sizes} \\\n\t--output_dir {output}\n```\n\nArgument details ([doc](https://lapa.readthedocs.io/en/tss_clustering/cli.html#lapa)):\n\n```\n$ lapa --help\n\nUsage: lapa [OPTIONS]\n\n CLI interface for lapa polyA cluster calling.\n\nOptions:\n --alignment TEXT Single or multiple bam file paths are\n separated with a comma.Alternatively, CSV\n file with columns of sample, dataset, path\n where the sample columns contains the name\n of the sample, the dataset is the group of\n samples replicates of each other, and path\n is the path of bam file. [required]\n --fasta TEXT Genome reference (GENCODE or ENSEMBL fasta)\n [required]\n --annotation TEXT Standart genome annotation (GENCODE or\n ENSEMBL gtf). GENCODE gtf file do not\n contains annotation for `five_prime_utr` and\n `three_prime_utr` so need to be corrected\n with `gencode_utr_fix` (see https://github.c\n om/MuhammedHasan/gencode_utr_fix.git).\n [required]\n --chrom_sizes TEXT Chrom sizes files (can be generated with\n `faidx fasta -i chromsizes > chrom_sizes`)\n [required]\n --output_dir TEXT Output directory of LAPA. See\n lapa.readthedocs.io/en/latest/output.html)\n for the details of the directory structure\n and file format. [required]\n ...\n```\n\nRecommend setting is including all the samples with thier biosample/experimental replicates (tissue, cell line):\n\nsamples.csv\n```\nsample,dataset,path\nENCFF772LYG,myoblast,ENCFF772LYG.bam\nENCFF421MIL,myoblast,ENCFF421MIL.bam\nENCFF699KOR,myotube,ENCFF699KOR.bam\nENCFF731HHB,myotube,ENCFF731HHB.bam\n```\n\nthen LAPA takes samples_config.csv as input:\n\n```\nlapa --alignment samples.csv \\\n\t--fasta {fasta} \\\n\t--annotation {gtf} \\\n\t--chrom_sizes {chrom_sizes} \\\n\t--output_dir {output}\n ...\n```\n\n\n## TSS calling from long-read RNA-seq\n\n```\nlapa_tss --alignment samples.csv \\\n\t--fasta {fasta} \\\n\t--annotation {gtf} \\\n\t--chrom_sizes {chrom_sizes} \\\n\t--output_dir {output}\n```\n\nArgument details ([doc](https://lapa.readthedocs.io/en/latest/cli.html#lapa-tss)):\n\n```\n$ lapa_tss --help\n\nUsage: lapa_tss [OPTIONS]\n\n CLI interface for lapa tss cluster calling.\n\nOptions:\n --alignment TEXT Single or multiple bam file paths are\n separated with a comma.Alternatively, CSV\n file with columns of sample, dataset, path\n where the sample columns contains the name\n of the sample, the dataset is the group of\n samples replicates of each other, and path\n is the path of bam file. [required]\n --fasta TEXT Genome reference (GENCODE or ENSEMBL fasta)\n [required]\n --annotation TEXT Standart genome annotation (GENCODE or\n ENSEMBL gtf). GENCODE gtf file do not\n contains annotation for `five_prime_utr` and\n `three_prime_utr` so need to be corrected\n with `gencode_utr_fix` (see https://github.c\n om/MuhammedHasan/gencode_utr_fix.git)\n [required]\n --chrom_sizes TEXT Chrom sizes files (can be generated\n with)`faidx fasta -i chromsizes >\n chrom_sizes`) [required]\n --output_dir TEXT Output directory of LAPA. See\n lapa.readthedocs.io/en/latest/output.html)\n for the details of the directory structure\n and file format. [required]\n```\n\n\n## Documentation\n\nSee the following documentation links for other features, parameters of LAPA, python api and statistical testing:\n\nReadthedocs: https://lapa.readthedocs.io/en/latest/index.html\n\nAPI reference: https://lapa.readthedocs.io/en/latest/autoapi/index.html\n\nColab tutorials (analysis of myoblast myotube cell differentiation): https://colab.research.google.com/drive/1QzMxCRjCk3i5_MuHzjozSRWMaJgdEdSI?usp=sharing\n\n\n## Cite\n\nIf you are using LAPA on academic studies cite the following paper:\n\n```\n@article{celik2022analysis,\n title={Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA},\n author={Celik, Muhammed Hasan and Mortazavi, Ali},\n journal={bioRxiv},\n year={2022},\n publisher={Cold Spring Harbor Laboratory}\n}\n```\n\n\n\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Tools for alternative polyadenylation detection and analysis from diverse data sources (3'seq and long-reads) and transcript start site detection and analysis from long-read RNA-seq.",
"version": "0.0.5",
"split_keywords": [
"genomics",
"long read rna-seq",
"apa"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5a165e29b4ebbadd96dad3de71e12ab64bcf7dc0a5b825a34bc2676ec224ee8a",
"md5": "21274dd4b535324b98077ba685f420e3",
"sha256": "e2d6378d8318ce1dd442d27072cda849a3cfec991062edb7d221a6e6031586c0"
},
"downloads": -1,
"filename": "lapa-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "21274dd4b535324b98077ba685f420e3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 36268,
"upload_time": "2023-01-11T02:19:31",
"upload_time_iso_8601": "2023-01-11T02:19:31.601524Z",
"url": "https://files.pythonhosted.org/packages/5a/16/5e29b4ebbadd96dad3de71e12ab64bcf7dc0a5b825a34bc2676ec224ee8a/lapa-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "64cce9059f9e9108f5a9a22e08164e29be3a937bab6c7636a8c2e51bf5b85ceb",
"md5": "39f233bac19500ae1dafeb07fe8e5b8d",
"sha256": "3240405cbdd0e68c344676bc780e62368359024726f0ef2c04ef487d49793669"
},
"downloads": -1,
"filename": "lapa-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "39f233bac19500ae1dafeb07fe8e5b8d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 29558,
"upload_time": "2023-01-11T02:19:33",
"upload_time_iso_8601": "2023-01-11T02:19:33.202539Z",
"url": "https://files.pythonhosted.org/packages/64/cc/e9059f9e9108f5a9a22e08164e29be3a937bab6c7636a8c2e51bf5b85ceb/lapa-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-11 02:19:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "mortazavilab",
"github_project": "lapa",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "lapa"
}