lapa

Name	lapa JSON
Version	0.0.5 JSON
	download
home_page	https://github.com/mortazavilab/lapa
Summary	Tools for alternative polyadenylation detection and analysis from diverse data sources (3'seq and long-reads) and transcript start site detection and analysis from long-read RNA-seq.
upload_time	2023-01-11 02:19:33
maintainer
docs_url	None
author	M. Hasan Çelik
requires_python
license	MIT license
keywords	genomics long read rna-seq apa
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # LAPA

[![pypi](https://img.shields.io/pypi/v/lapa.svg)](https://pypi.python.org/pypi/lapa)
[![tests](https://github.com/mortazavilab/lapa/actions/workflows/python-app.yml/badge.svg)](https://github.com/mortazavilab/lapa/actions)
[![codecov](https://codecov.io/gh/mortazavilab/lapa/branch/master/graph/badge.svg?token=MJQ88T8JWK)](https://codecov.io/gh/mortazavilab/lapa)
[![Documentation Status](https://readthedocs.org/projects/lapa/badge/?version=latest)](https://lapa.readthedocs.io/en/latest/?badge=latest)

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.

![method](docs/method.png)

## Installation

```
pip install lapa
```

## Poly(A) site calling from long-read RNA-seq or 3'-seq

```
lapa --alignment {rep1.bam},{rep2.bam},{rep3.bam} \
	--fasta {fasta} \
	--annotation {gtf} \
	--chrom_sizes {chrom_sizes} \
	--output_dir {output}
```

Argument details ([doc](https://lapa.readthedocs.io/en/tss_clustering/cli.html#lapa)):

```
$ lapa --help

Usage: lapa [OPTIONS]

  CLI interface for lapa polyA cluster calling.

Options:
  --alignment TEXT                Single or multiple bam file paths are
                                  separated with a comma.Alternatively, CSV
                                  file with columns of sample, dataset, path
                                  where the sample columns contains the name
                                  of the sample, the dataset is the group of
                                  samples replicates of each other, and path
                                  is the path of bam file.  [required]
  --fasta TEXT                    Genome reference (GENCODE or ENSEMBL fasta)
                                  [required]
  --annotation TEXT               Standart genome annotation (GENCODE or
                                  ENSEMBL gtf). GENCODE gtf file do not
                                  contains annotation for `five_prime_utr` and
                                  `three_prime_utr` so need to be corrected
                                  with `gencode_utr_fix` (see https://github.c
                                  om/MuhammedHasan/gencode_utr_fix.git).
                                  [required]
  --chrom_sizes TEXT              Chrom sizes files (can be generated with
                                  `faidx fasta -i chromsizes > chrom_sizes`)
                                  [required]
  --output_dir TEXT               Output directory of LAPA. See
                                  lapa.readthedocs.io/en/latest/output.html)
                                  for the details of the directory structure
                                  and file format.  [required]
  ...
```

Recommend setting is including all the samples with thier biosample/experimental replicates (tissue, cell line):

samples.csv
```
sample,dataset,path
ENCFF772LYG,myoblast,ENCFF772LYG.bam
ENCFF421MIL,myoblast,ENCFF421MIL.bam
ENCFF699KOR,myotube,ENCFF699KOR.bam
ENCFF731HHB,myotube,ENCFF731HHB.bam
```

then LAPA takes samples_config.csv as input:

```
lapa --alignment samples.csv \
	--fasta {fasta} \
	--annotation {gtf} \
	--chrom_sizes {chrom_sizes} \
	--output_dir {output}
 ...
```


## TSS calling from long-read RNA-seq

```
lapa_tss --alignment samples.csv \
	--fasta {fasta} \
	--annotation {gtf} \
	--chrom_sizes {chrom_sizes} \
	--output_dir {output}
```

Argument details ([doc](https://lapa.readthedocs.io/en/latest/cli.html#lapa-tss)):

```
$ lapa_tss --help

Usage: lapa_tss [OPTIONS]

  CLI interface for lapa tss cluster calling.

Options:
  --alignment TEXT                Single or multiple bam file paths are
                                  separated with a comma.Alternatively, CSV
                                  file with columns of sample, dataset, path
                                  where the sample columns contains the name
                                  of the sample, the dataset is the group of
                                  samples replicates of each other, and path
                                  is the path of bam file.  [required]
  --fasta TEXT                    Genome reference (GENCODE or ENSEMBL fasta)
                                  [required]
  --annotation TEXT               Standart genome annotation (GENCODE or
                                  ENSEMBL gtf). GENCODE gtf file do not
                                  contains annotation for `five_prime_utr` and
                                  `three_prime_utr` so need to be corrected
                                  with `gencode_utr_fix` (see https://github.c
                                  om/MuhammedHasan/gencode_utr_fix.git)
                                  [required]
  --chrom_sizes TEXT              Chrom sizes files (can be generated
                                  with)`faidx fasta -i chromsizes >
                                  chrom_sizes`)  [required]
  --output_dir TEXT               Output directory of LAPA. See
                                  lapa.readthedocs.io/en/latest/output.html)
                                  for the details of the directory structure
                                  and file format.  [required]
```


## Documentation

See the following documentation links for other features, parameters of LAPA, python api and statistical testing:

Readthedocs: https://lapa.readthedocs.io/en/latest/index.html

API reference: https://lapa.readthedocs.io/en/latest/autoapi/index.html

Colab tutorials (analysis of myoblast myotube cell differentiation): https://colab.research.google.com/drive/1QzMxCRjCk3i5_MuHzjozSRWMaJgdEdSI?usp=sharing


## Cite

If you are using LAPA on academic studies cite the following paper:

```
@article{celik2022analysis,
  title={Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA},
  author={Celik, Muhammed Hasan and Mortazavi, Ali},
  journal={bioRxiv},
  year={2022},
  publisher={Cold Spring Harbor Laboratory}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mortazavilab/lapa",
    "name": "lapa",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "genomics,long read RNA-seq,APA",
    "author": "M. Hasan \u00c7elik",
    "author_email": "muhammedhasancelik@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/64/cc/e9059f9e9108f5a9a22e08164e29be3a937bab6c7636a8c2e51bf5b85ceb/lapa-0.0.5.tar.gz",
    "platform": null,
    "description": "# LAPA\n\n[![pypi](https://img.shields.io/pypi/v/lapa.svg)](https://pypi.python.org/pypi/lapa)\n[![tests](https://github.com/mortazavilab/lapa/actions/workflows/python-app.yml/badge.svg)](https://github.com/mortazavilab/lapa/actions)\n[![codecov](https://codecov.io/gh/mortazavilab/lapa/branch/master/graph/badge.svg?token=MJQ88T8JWK)](https://codecov.io/gh/mortazavilab/lapa)\n[![Documentation Status](https://readthedocs.org/projects/lapa/badge/?version=latest)](https://lapa.readthedocs.io/en/latest/?badge=latest)\n\nAlternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.\n\n![method](docs/method.png)\n\n## Installation\n\n```\npip install lapa\n```\n\n## Poly(A) site calling from long-read RNA-seq or 3'-seq\n\n```\nlapa --alignment {rep1.bam},{rep2.bam},{rep3.bam} \\\n\t--fasta {fasta} \\\n\t--annotation {gtf} \\\n\t--chrom_sizes {chrom_sizes} \\\n\t--output_dir {output}\n```\n\nArgument details ([doc](https://lapa.readthedocs.io/en/tss_clustering/cli.html#lapa)):\n\n```\n$ lapa --help\n\nUsage: lapa [OPTIONS]\n\n  CLI interface for lapa polyA cluster calling.\n\nOptions:\n  --alignment TEXT                Single or multiple bam file paths are\n                                  separated with a comma.Alternatively, CSV\n                                  file with columns of sample, dataset, path\n                                  where the sample columns contains the name\n                                  of the sample, the dataset is the group of\n                                  samples replicates of each other, and path\n                                  is the path of bam file.  [required]\n  --fasta TEXT                    Genome reference (GENCODE or ENSEMBL fasta)\n                                  [required]\n  --annotation TEXT               Standart genome annotation (GENCODE or\n                                  ENSEMBL gtf). GENCODE gtf file do not\n                                  contains annotation for `five_prime_utr` and\n                                  `three_prime_utr` so need to be corrected\n                                  with `gencode_utr_fix` (see https://github.c\n                                  om/MuhammedHasan/gencode_utr_fix.git).\n                                  [required]\n  --chrom_sizes TEXT              Chrom sizes files (can be generated with\n                                  `faidx fasta -i chromsizes > chrom_sizes`)\n                                  [required]\n  --output_dir TEXT               Output directory of LAPA. See\n                                  lapa.readthedocs.io/en/latest/output.html)\n                                  for the details of the directory structure\n                                  and file format.  [required]\n  ...\n```\n\nRecommend setting is including all the samples with thier biosample/experimental replicates (tissue, cell line):\n\nsamples.csv\n```\nsample,dataset,path\nENCFF772LYG,myoblast,ENCFF772LYG.bam\nENCFF421MIL,myoblast,ENCFF421MIL.bam\nENCFF699KOR,myotube,ENCFF699KOR.bam\nENCFF731HHB,myotube,ENCFF731HHB.bam\n```\n\nthen LAPA takes samples_config.csv as input:\n\n```\nlapa --alignment samples.csv \\\n\t--fasta {fasta} \\\n\t--annotation {gtf} \\\n\t--chrom_sizes {chrom_sizes} \\\n\t--output_dir {output}\n ...\n```\n\n\n## TSS calling from long-read RNA-seq\n\n```\nlapa_tss --alignment samples.csv \\\n\t--fasta {fasta} \\\n\t--annotation {gtf} \\\n\t--chrom_sizes {chrom_sizes} \\\n\t--output_dir {output}\n```\n\nArgument details ([doc](https://lapa.readthedocs.io/en/latest/cli.html#lapa-tss)):\n\n```\n$ lapa_tss --help\n\nUsage: lapa_tss [OPTIONS]\n\n  CLI interface for lapa tss cluster calling.\n\nOptions:\n  --alignment TEXT                Single or multiple bam file paths are\n                                  separated with a comma.Alternatively, CSV\n                                  file with columns of sample, dataset, path\n                                  where the sample columns contains the name\n                                  of the sample, the dataset is the group of\n                                  samples replicates of each other, and path\n                                  is the path of bam file.  [required]\n  --fasta TEXT                    Genome reference (GENCODE or ENSEMBL fasta)\n                                  [required]\n  --annotation TEXT               Standart genome annotation (GENCODE or\n                                  ENSEMBL gtf). GENCODE gtf file do not\n                                  contains annotation for `five_prime_utr` and\n                                  `three_prime_utr` so need to be corrected\n                                  with `gencode_utr_fix` (see https://github.c\n                                  om/MuhammedHasan/gencode_utr_fix.git)\n                                  [required]\n  --chrom_sizes TEXT              Chrom sizes files (can be generated\n                                  with)`faidx fasta -i chromsizes >\n                                  chrom_sizes`)  [required]\n  --output_dir TEXT               Output directory of LAPA. See\n                                  lapa.readthedocs.io/en/latest/output.html)\n                                  for the details of the directory structure\n                                  and file format.  [required]\n```\n\n\n## Documentation\n\nSee the following documentation links for other features, parameters of LAPA, python api and statistical testing:\n\nReadthedocs: https://lapa.readthedocs.io/en/latest/index.html\n\nAPI reference: https://lapa.readthedocs.io/en/latest/autoapi/index.html\n\nColab tutorials (analysis of myoblast myotube cell differentiation): https://colab.research.google.com/drive/1QzMxCRjCk3i5_MuHzjozSRWMaJgdEdSI?usp=sharing\n\n\n## Cite\n\nIf you are using LAPA on academic studies cite the following paper:\n\n```\n@article{celik2022analysis,\n  title={Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA},\n  author={Celik, Muhammed Hasan and Mortazavi, Ali},\n  journal={bioRxiv},\n  year={2022},\n  publisher={Cold Spring Harbor Laboratory}\n}\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Tools for alternative polyadenylation detection and analysis from diverse data sources (3'seq and long-reads) and transcript start site detection and analysis from long-read RNA-seq.",
    "version": "0.0.5",
    "split_keywords": [
        "genomics",
        "long read rna-seq",
        "apa"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a165e29b4ebbadd96dad3de71e12ab64bcf7dc0a5b825a34bc2676ec224ee8a",
                "md5": "21274dd4b535324b98077ba685f420e3",
                "sha256": "e2d6378d8318ce1dd442d27072cda849a3cfec991062edb7d221a6e6031586c0"
            },
            "downloads": -1,
            "filename": "lapa-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "21274dd4b535324b98077ba685f420e3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 36268,
            "upload_time": "2023-01-11T02:19:31",
            "upload_time_iso_8601": "2023-01-11T02:19:31.601524Z",
            "url": "https://files.pythonhosted.org/packages/5a/16/5e29b4ebbadd96dad3de71e12ab64bcf7dc0a5b825a34bc2676ec224ee8a/lapa-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64cce9059f9e9108f5a9a22e08164e29be3a937bab6c7636a8c2e51bf5b85ceb",
                "md5": "39f233bac19500ae1dafeb07fe8e5b8d",
                "sha256": "3240405cbdd0e68c344676bc780e62368359024726f0ef2c04ef487d49793669"
            },
            "downloads": -1,
            "filename": "lapa-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "39f233bac19500ae1dafeb07fe8e5b8d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 29558,
            "upload_time": "2023-01-11T02:19:33",
            "upload_time_iso_8601": "2023-01-11T02:19:33.202539Z",
            "url": "https://files.pythonhosted.org/packages/64/cc/e9059f9e9108f5a9a22e08164e29be3a937bab6c7636a8c2e51bf5b85ceb/lapa-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-11 02:19:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "mortazavilab",
    "github_project": "lapa",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lapa"
}

M. Hasan Çelik