babachi

Name	babachi JSON
Version	2.0.24 JSON
	download
home_page	https://github.com/autosome-ru/BABACHI
Summary
upload_time	2023-02-11 18:27:36
maintainer
docs_url	None
author	Sergey Abramov, Alexandr Boytsov
requires_python	>=3.6
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # BABACHI: Background Allelic Dosage Bayesian Checkpoint Identification
[![DOI](https://zenodo.org/badge/255952669.svg)](https://zenodo.org/badge/latestdoi/255952669) <br>
BABACHI is a tool for estimation of relative Background Allelic Dosage (BAD) from
non-phased heterozygous SNVs. It estimates BAD directly from enriched sequencing data, where
the precise estimation of allelic copy numbers is not possible.

BAD corresponds to the ratio of Major allele copy number to Minor allele copy number. More details and algorithm description are available [here](https://www.nature.com/articles/s41467-021-23007-0)

## Files format
BABACHI accepts either a BED file with heterozygous SNVs or a standard VCF file sorted by genomic positions (ascending).
- a BED file should begin with the following 8 columns; additional columns are permitted and ignored:
```chromosome, start, end, ID, reference base, alterntive base, reference read count, alternative read count, sample_id```.<br>
Lines starting with # are ignored.

- a VCF file should have GT and AD fields. 

(!) We suggest to use only <b>common SNPs</b> for BAD estimation. User is expected to filter common variants prior to BABACHI usage. See [here](https://www.nature.com/articles/s41467-021-23007-0) for more details.

The output is a BED file with BAD annotations. The file format is described in the [Demo](#demo) section
## System Requirements
### Hardware requirements
`BABACHI` package requires only a standard computer with enough RAM to support in-memory operations.

### Software requirements
#### OS Requirements
The package can be installed on GNU/Linux and OS X platforms from Python Package Index (PyPI) and GitHub.
The package has been tested on the following systems:
+ MacOS Monterey v12.5
+ Linux: Ubuntu 18.04
#### Python Dependencies
`BABACHI` mainly depends on the following Python 3 packages:
```
docopt>=0.6.2
numpy>=1.18.0
schema>=0.7.2
contextlib2>=0.5.5
pandas>=1.0.4
matplotlib>=3.2.1
seaborn>=0.10.1
numba>=0.53.1
```
## Installation
### Install from PyPi
```
pip3 install babachi 
```
### Install from Github
```
git clone https://github.com/autosome-ru/BABACHI
cd BABACHI
python3 setup.py install

```
or
```
pip3 install 'babachi @ git+https://github.com/autosome-ru/BABACHI.git'
```
- `sudo`, if required.
The package should take less than 1 minute to install.

## Requirements
```
python >= 3.6
```

## Usage
```
babachi <options>...
```
To get full usage description one can execute:
```
babachi --help
```
This will produce the following message:
```
Usage:
    babachi (<file> | --test) [options]
    babachi visualize <file> (-b <badmap>| --badmap <badmap>) [options]
    babachi filter <file> [options]

Arguments:
    <file>            Path to input VCF file. Expected to be sorted by (chr, pos)

    <path>            Path to the file
    <int>             Non negative integer
    <float>           Non negative number
    <states-string>   String of states separated with "," (to provide fraction use "/", e.g. 4/3).
                      Each state must be >= 1
    <samples-string>  Comma-separated sample names or indices
    <prior-string>    Either "uniform" or "geometric"
    <file-or-link>    Path to existing file or link


Arguments:
    -h, --help                              Show help

    -O <path>, --output <path>              Output directory or file path. [default: ./]
    --test                                  Run segmentation on test file

    -v, --verbose                           Write debug messages
    --sample-list <samples-string>          Comma-separated sample names or integer indices to use from input VCF
    --snp-strategy <snp-strategy>           Strategy for the SNPs on the same position (from different samples).
                                            Either add read counts 'ADD' or treat as a separate events 'SEP'. [default: SEP]

    -n, --no-filter                         Skip filtering of input file
    -f, --force-sort                        Chromosomes in output file will be sorted in numerical order
    -j <int>, --jobs <int>                  Number of jobs to use, parallel by chromosomes [default: 1]
    --chrom-sizes <file-or-link>            File with chromosome sizes (can be a link), default is hg38
    -a <int>, --allele-reads-tr <int>       Allelic reads threshold. Input SNPs will be filtered by ref_read_count >= x and
                                            alt_read_count >= x. [default: 5]
    -p <string>, --prior <prior-string>     Prior to use. Can be either uniform or geometric [default: uniform]
    -g <float>, --geometric-prior <float>   Coefficient for geometric prior [default: 0.98]
    -s <string>, --states <states-string>   States string [default: 1,2,3,4,5,6]

    -B <float>, --boundary-penalty <float>  Boundary penalty coefficient [default: 4]
    -Z <int>, --min-seg-snps <int>          Only allow segments containing Z or more unique SNPs (IDs/positions) [default: 3]
    -R <int>, --min-seg-bp <int>            Only allow segments containing R or more base pairs [default: 1000]
    -P <int>, --post-segment-filter <int>   Remove segments with less than P unique SNPs (IDs/positions) from output [default: 0]
    -A <int>, --atomic-region-size <int>    Atomic region size in SNPs [default: 600]
    -C <int>, --chr-min-snps <int>          Minimum number of SNPs on a chromosome to start segmentation [default: 100]
    -S <int>, --subchr-filter <int>         Exclude subchromosomes with less than C unique SNPs  [default: 3]

Visualization:
    -b <path>, --badmap <path>              BADmap file created with BABACHI
    --visualize                             Perform visualization of SNP-wise AD and BAD for each chromosome.
                                            Will create a directory in output path for the <ext> visualizations
    -z, --zip                               Zip visualizations directory
    -e <ext>, --ext <ext>                   Extension to save visualizations with [default: svg]
```

## Demo
To perform a test run:
```
babachi --test
```
The test run takes approximately 2 minutes on a standard computer.
<br>
The result is a file named `test.bed` that will be created in the working directory (if `-O` option was not provided).
The contents of the `test.bed` file should have the following format:
```
#chr	start	end	BAD	Q1.00	Q1.33	Q1.50	Q2.00	Q2.50	Q3.00	Q4.00	Q5.00	Q6.00	SNP_count	sum_cover
chr1	1	125183196	2	-63.47825919524621	-24.598710473939718	-8.145646624117944	-2.000888343900442e-11	-30.773041699645546	-78.80480783186977	-189.88685134708248	-299.82657588596703	-401.6012195141575	1325	17280
```
Each row represents a single segment with a constant estimated BAD. The most important columns are:
- \#chr:  chromosome
- start: segment start position
- end: segment end position
- BAD: estimated BAD score

Additional columns:
- SNP_count: number of SNPs in the segment
- SNP_ID_count: number of unique SNPs in the segment
- sum_cover: the total read coverage of all SNPs of the segment
- Q<b>X</b>: the logarithmic likelihood of the segment to have BAD = <b>X</b> 

The BABACHI tool is maintained by Sergey Abramov and Alexandr Boytsov.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/autosome-ru/BABACHI",
    "name": "babachi",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Sergey Abramov, Alexandr Boytsov",
    "author_email": "aswq22013@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/42/d3/989c6ea772cd0e13dfecbbb101fd51a2a1566bfbf4fdc945c3cded462791/babachi-2.0.24.tar.gz",
    "platform": null,
    "description": "# BABACHI: Background Allelic Dosage Bayesian Checkpoint Identification\n[![DOI](https://zenodo.org/badge/255952669.svg)](https://zenodo.org/badge/latestdoi/255952669) <br>\nBABACHI is a tool for estimation of relative Background Allelic Dosage (BAD) from\nnon-phased heterozygous SNVs. It estimates BAD directly from enriched sequencing data, where\nthe precise estimation of allelic copy numbers is not possible.\n\nBAD corresponds to the ratio of Major allele copy number to Minor allele copy number. More details and algorithm description are available [here](https://www.nature.com/articles/s41467-021-23007-0)\n\n## Files format\nBABACHI accepts either a BED file with heterozygous SNVs or a standard VCF file sorted by genomic positions (ascending).\n- a BED file should begin with the following 8 columns; additional columns are permitted and ignored:\n```chromosome, start, end, ID, reference base, alterntive base, reference read count, alternative read count, sample_id```.<br>\nLines starting with # are ignored.\n\n- a VCF file should have GT and AD fields. \n\n(!) We suggest to use only <b>common SNPs</b> for BAD estimation. User is expected to filter common variants prior to BABACHI usage. See [here](https://www.nature.com/articles/s41467-021-23007-0) for more details.\n\nThe output is a BED file with BAD annotations. The file format is described in the [Demo](#demo) section\n## System Requirements\n### Hardware requirements\n`BABACHI` package requires only a standard computer with enough RAM to support in-memory operations.\n\n### Software requirements\n#### OS Requirements\nThe package can be installed on GNU/Linux and OS X platforms from Python Package Index (PyPI) and GitHub.\nThe package has been tested on the following systems:\n+ MacOS Monterey v12.5\n+ Linux: Ubuntu 18.04\n#### Python Dependencies\n`BABACHI` mainly depends on the following Python 3 packages:\n```\ndocopt>=0.6.2\nnumpy>=1.18.0\nschema>=0.7.2\ncontextlib2>=0.5.5\npandas>=1.0.4\nmatplotlib>=3.2.1\nseaborn>=0.10.1\nnumba>=0.53.1\n```\n## Installation\n### Install from PyPi\n```\npip3 install babachi \n```\n### Install from Github\n```\ngit clone https://github.com/autosome-ru/BABACHI\ncd BABACHI\npython3 setup.py install\n\n```\nor\n```\npip3 install 'babachi @ git+https://github.com/autosome-ru/BABACHI.git'\n```\n- `sudo`, if required.\nThe package should take less than 1 minute to install.\n\n## Requirements\n```\npython >= 3.6\n```\n\n## Usage\n```\nbabachi <options>...\n```\nTo get full usage description one can execute:\n```\nbabachi --help\n```\nThis will produce the following message:\n```\nUsage:\n    babachi (<file> | --test) [options]\n    babachi visualize <file> (-b <badmap>| --badmap <badmap>) [options]\n    babachi filter <file> [options]\n\nArguments:\n    <file>            Path to input VCF file. Expected to be sorted by (chr, pos)\n\n    <path>            Path to the file\n    <int>             Non negative integer\n    <float>           Non negative number\n    <states-string>   String of states separated with \",\" (to provide fraction use \"/\", e.g. 4/3).\n                      Each state must be >= 1\n    <samples-string>  Comma-separated sample names or indices\n    <prior-string>    Either \"uniform\" or \"geometric\"\n    <file-or-link>    Path to existing file or link\n\n\nArguments:\n    -h, --help                              Show help\n\n    -O <path>, --output <path>              Output directory or file path. [default: ./]\n    --test                                  Run segmentation on test file\n\n    -v, --verbose                           Write debug messages\n    --sample-list <samples-string>          Comma-separated sample names or integer indices to use from input VCF\n    --snp-strategy <snp-strategy>           Strategy for the SNPs on the same position (from different samples).\n                                            Either add read counts 'ADD' or treat as a separate events 'SEP'. [default: SEP]\n\n    -n, --no-filter                         Skip filtering of input file\n    -f, --force-sort                        Chromosomes in output file will be sorted in numerical order\n    -j <int>, --jobs <int>                  Number of jobs to use, parallel by chromosomes [default: 1]\n    --chrom-sizes <file-or-link>            File with chromosome sizes (can be a link), default is hg38\n    -a <int>, --allele-reads-tr <int>       Allelic reads threshold. Input SNPs will be filtered by ref_read_count >= x and\n                                            alt_read_count >= x. [default: 5]\n    -p <string>, --prior <prior-string>     Prior to use. Can be either uniform or geometric [default: uniform]\n    -g <float>, --geometric-prior <float>   Coefficient for geometric prior [default: 0.98]\n    -s <string>, --states <states-string>   States string [default: 1,2,3,4,5,6]\n\n    -B <float>, --boundary-penalty <float>  Boundary penalty coefficient [default: 4]\n    -Z <int>, --min-seg-snps <int>          Only allow segments containing Z or more unique SNPs (IDs/positions) [default: 3]\n    -R <int>, --min-seg-bp <int>            Only allow segments containing R or more base pairs [default: 1000]\n    -P <int>, --post-segment-filter <int>   Remove segments with less than P unique SNPs (IDs/positions) from output [default: 0]\n    -A <int>, --atomic-region-size <int>    Atomic region size in SNPs [default: 600]\n    -C <int>, --chr-min-snps <int>          Minimum number of SNPs on a chromosome to start segmentation [default: 100]\n    -S <int>, --subchr-filter <int>         Exclude subchromosomes with less than C unique SNPs  [default: 3]\n\nVisualization:\n    -b <path>, --badmap <path>              BADmap file created with BABACHI\n    --visualize                             Perform visualization of SNP-wise AD and BAD for each chromosome.\n                                            Will create a directory in output path for the <ext> visualizations\n    -z, --zip                               Zip visualizations directory\n    -e <ext>, --ext <ext>                   Extension to save visualizations with [default: svg]\n```\n\n## Demo\nTo perform a test run:\n```\nbabachi --test\n```\nThe test run takes approximately 2 minutes on a standard computer.\n<br>\nThe result is a file named `test.bed` that will be created in the working directory (if `-O` option was not provided).\nThe contents of the `test.bed` file should have the following format:\n```\n#chr\tstart\tend\tBAD\tQ1.00\tQ1.33\tQ1.50\tQ2.00\tQ2.50\tQ3.00\tQ4.00\tQ5.00\tQ6.00\tSNP_count\tsum_cover\nchr1\t1\t125183196\t2\t-63.47825919524621\t-24.598710473939718\t-8.145646624117944\t-2.000888343900442e-11\t-30.773041699645546\t-78.80480783186977\t-189.88685134708248\t-299.82657588596703\t-401.6012195141575\t1325\t17280\n```\nEach row represents a single segment with a constant estimated BAD. The most important columns are:\n- \\#chr:  chromosome\n- start: segment start position\n- end: segment end position\n- BAD: estimated BAD score\n\nAdditional columns:\n- SNP_count: number of SNPs in the segment\n- SNP_ID_count: number of unique SNPs in the segment\n- sum_cover: the total read coverage of all SNPs of the segment\n- Q<b>X</b>: the logarithmic likelihood of the segment to have BAD = <b>X</b> \n\nThe BABACHI tool is maintained by Sergey Abramov and Alexandr Boytsov.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "2.0.24",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a3c00e6ad55a88f5e632ae78b0d492aa44ef539f48c8012f9490e9078a651c5e",
                "md5": "95a732abed85688da46bf8b171ff1224",
                "sha256": "1c9952607a08e95f9623ad11360f61f74c66cf3daf13239bcf96b22c54ef73c3"
            },
            "downloads": -1,
            "filename": "babachi-2.0.24-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "95a732abed85688da46bf8b171ff1224",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 506577,
            "upload_time": "2023-02-11T18:27:34",
            "upload_time_iso_8601": "2023-02-11T18:27:34.190466Z",
            "url": "https://files.pythonhosted.org/packages/a3/c0/0e6ad55a88f5e632ae78b0d492aa44ef539f48c8012f9490e9078a651c5e/babachi-2.0.24-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "42d3989c6ea772cd0e13dfecbbb101fd51a2a1566bfbf4fdc945c3cded462791",
                "md5": "9fb69cbe7e011149dff18b58ba6dfed0",
                "sha256": "ffbd34ed0e2c7f1032063a3ef6d7180c5648a0878f4e32c5a5d87f5309d7a538"
            },
            "downloads": -1,
            "filename": "babachi-2.0.24.tar.gz",
            "has_sig": false,
            "md5_digest": "9fb69cbe7e011149dff18b58ba6dfed0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 506319,
            "upload_time": "2023-02-11T18:27:36",
            "upload_time_iso_8601": "2023-02-11T18:27:36.260941Z",
            "url": "https://files.pythonhosted.org/packages/42/d3/989c6ea772cd0e13dfecbbb101fd51a2a1566bfbf4fdc945c3cded462791/babachi-2.0.24.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-11 18:27:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "autosome-ru",
    "github_project": "BABACHI",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "babachi"
}

Sergey Abramov, Alexandr Boytsov