fseq2

Name	fseq2 JSON
Version	2.0.4 JSON
	download
home_page	https://github.com/Boyle-Lab/F-Seq2
Summary	Improving the feature density based peak caller with dynamic statistics.
upload_time	2024-10-23 20:00:18
maintainer	None
docs_url	None
author	Nanxiang Zhao (Samuel)
requires_python	>=3.9
license	GNU General Public License v3
keywords	fseq2
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![PyPI version](https://badge.fury.io/py/fseq2.svg)](https://badge.fury.io/py/fseq2)
[![Conda](https://img.shields.io/conda/v/samzhao/fseq2)](https://anaconda.org/samzhao/fseq2)
[![GitHub](https://img.shields.io/github/license/Boyle-Lab/F-Seq2)](https://github.com/Boyle-Lab/F-Seq2/blob/master/LICENSE)
<br/>

 |Host | Downloads |
 |-----|-----------|
 |PyPI |[![Downloads](https://pepy.tech/badge/fseq2)](https://pepy.tech/project/fseq2)|
 |conda|[![Conda](https://img.shields.io/conda/dn/samzhao/fseq2)](https://anaconda.org/samzhao/fseq2)|



# F-Seq2
## Improving the feature density based peak caller with dynamic statistics

Tag sequencing using high-throughput sequencing technologies are employed to identify specific sequence features such as 
DNase-seq, ATAC-seq, ChIP-seq, and FAIRE-seq. To intuitively summarize and display individual sequence data as an 
accurate and interpretable signal, we have developed the original [F-Seq](http://fureylab.web.unc.edu/software/fseq/) 
[GitHub](https://github.com/aboyle/F-seq), a software package that generates a continuous tag sequence density 
estimation allowing identification of biologically meaningful sites whose output can be displayed directly in the UCSC 
Genome Browser. 

F-Seq2 is a complete rewrite of the original version in Python. We designed a new statistical framework and introduced 
new features to F-Seq to further improve the performance in its second version. F-Seq2 implements a dynamic 
parameter to conduct local statistical analysis with an underlying “continuous” Poisson distribution. By combining the 
power of the local test and the KDE, which model the read probability distribution with statistical rigor, we robustly 
account for local biases and solve ties that occur when ranking candidate summits, making results suitable for 
irreproducible discovery rate (IDR) analysis.

## Citation:
Zhano, N. & Boyle, A.P. TF-Seq2: improving the feature density based peak caller with dynamic statistics. NAR Genom Bioinform. 2021 Feb 23;3(1):lqab012. https://doi.org/10.1093/nargab/lqab012

## Table of contents

1. [Installation](./INSTALL.md)
2. [Usage](#usage)
    - [`callpeak`](#callpeak)
    - [`callpeak_idr`](#callpeak_idr)
    - [`idr`](#idr)
3. [Output files and formats](#output-files-and-formats)
4. [Examples](#examples)
5. [Reference](#reference)
6. [Troubleshooting](#troubleshooting)



## Installation
Prerequisite: [BEDTools](https://bedtools.readthedocs.io/en/latest/content/installation.html).  
See [here](./INSTALL.md) for more details to install F-Seq2.



## Usage

```
fseq2 [-h] [--version]
    {callpeak, callpeak_idr, idr}
```

Available subcommands

Subcommand | Description
-----------|----------
`callpeak` | F-Seq2 main function to call peaks from alignment results.
`callpeak_idr` | Call peaks and follow by IDR framework with recommended settings.
`idr` | A wrapper for [IDR package](https://github.com/nboley/idr) for customized IDR analysis.



## `callpeak`
#### Command line input:
##### `-treatment_file`
REQUIRED argument for fseq2. Treatment file(s) in bam or bed format. If specifiy multiple files (separated by space), 
they are considered as one treatment experiment. See [here](./INPUT_FORMAT.md) for more details about input format.

##### `-control_file`
Control file(s) corresponding to treatment file(s).

##### `-pe`
Paired-end mode. If this flag on, treatment (and control) file(s) are paired-end data, either in format of BAMPE or BEDPE. 
Default is False to treat all data as single-end. See [here](./INPUT_FORMAT.md) for more details about paired-end mode.

##### `-chrom_size_file` 
A file specify chrom sizes, where each line has one chrom and its size. This is required if output signal format is bigwig. 
Note if this file is specified, fseq2 only process the chroms in this file. Default is False to process all and cannot output bigwig.

##### `-o`
Output directory path. Default is current directory.

##### `-name`
Prefix for all output files. This overrides exisiting files. Default is `fseq2_result`.

##### `-sig_format`
Signal format for reconstructed signal. Available format `wig`, `bigwig`, `np_array`. Note if choose `np_array`, arrays 
for each chrom are stored in [`NAME_sig.h5`](#name_sigh5) with `chrom` as key, and no gaussian smooth applied. Default is False, without output signal.

##### `-sort_by`
Sort peaks and summits by `pValue` or `chromAndStart`. Default is `chromAndStart`.

##### `-standard_narrowpeak`
If flag on, `NAME_peaks.narrowPeak` is in standard `.narrowPeak` format, which contains max pvalue summits rather than all summits for each peak region.
Compatible to visualization on UCSC genome browser and convenient for other downstream softwares. 

##### `-v`
Verbose output. Default is False.  

##### `-f`
Fragment size of treatment data. Default is to estimate from data. This determines shift size where `offset = fragment_size/2`. 
For DNase-seq and ATAC-seq data, set `-f 0`. 

##### `-l`
Feature length for treatment data. Default is 600. Recommend 50 for TF ChIP-seq, 600 for DNase-seq and ATAC-seq, 
1000 for histone ChIP-seq.

##### `-fc`
Fragment size of control data.

##### `-t`
Threshold (standard deviations) to call candidate summits. Default is 4.0. Recommend 4.0 for broad peaks, 
8.0 for sharp peaks.

##### `-p_thr`
P value threshold. Default is 0.01. Consider to relax it to 0.05 when without control data or calling broad peaks. 
To resemble F-Seq1 results, specify `-p_thr False`, then filter out peaks whose signalValue 
(7th column in `.narrowPeak`) below est. threshold.

##### `-q_thr`
Q value (FDR) threshold. Default is not set and use `p_thr`. If set, only use `q_thr`.

##### `-cpus`
Number of cores to use. Default is 1.

##### `-tp`
Threshold (standard deviations) to call peak regions. Default is 4.0.

##### `-sparse_data`
If flag on, statistical test includes 1k region for more accurate background estimation. This can be useful for single-cell data.

##### `-nfr_upper_limit`
Nucleosome free region upper limit. Default is 150. Used as window_size and min_distance when `-f 0`.

##### `-pe_fragment_size_range`
Effective only if `-pe` on. Only keep PE fragments whose size within the range to call peaks. Default is False, 
without any selection. Useful for ATAC-seq data:  
(1) to call peaks on nucleosome free regions, specify: `0 150`  
(2) to call peaks on nucleosome centers, specify: `150 inf`  
(3) to call peaks on open chromatin regions, specify: `auto`  
> `auto` is a filter designed for ATAC-seq open chromatin peak calling where we filter out fragments whose size related to 
mono-, di-, tri-, and multi-nucleosomes. Size information is taken from the original ATAC-seq paper (Buenrostro et al.). 
You can design your own auto filter based on specific experiment data by specifying `-nucleosome_size` parameter.

##### `-nucleosome_size`
Effective only if `-pe` on and specify `-pe_fragment_size_range auto`. Default is `180, 247, 315, 473, 558, 615` They 
are the ATAC-seq PE fragment sizes related to mono-, di-, and tri-nucleosomes. Fragments whose size within the ranges 
and above the largest bound (i.e. 615) are filtered out when calling peaks. Change those numbers to design your own auto filter.

##### `-prior_pad_summit`
Prior knowledge about peak length which only padded into `NAME_summits.narrowPeak`. Default is 0. 
Useful for IDR analysis: in `callpeak_idr`, we set it to the minimum distance between summits. 

##### `-num_peaks`
Maximum number of peaks called. Default is not set. If set, overrides `p_thr` and `q_thr`.



## `callpeak_idr`
#### Command line input:
Most arguments are shared between `callpeak` and `callpeak_idr`. Here are the unique ones.  
> Notice if it is `-` or `--` ahead of arguments. `--` arguments are from IDR package. `-` are from fseq2.
##### `-treatment_file_1`
Treatment file in bam or bed format as replicate 1.

##### `-treatment_file_2`
Treatment file in bam or bed format as replicate 2.

##### `-control_file_1`
Control file in bam or bed format, paired with replicate 1 treament file.

##### `-control_file_2`
Control file in bam or bed format, paired with replicate 2 treament file.

##### `-name_1`
Prefix for output files for replicate 1 (default=`fseq2_result_1`).

##### `-name_2`
Prefix for output files for replicate 2 (default=`fseq2_result_2`).

##### `-prior_pad_summit`
Prior knowledge about peak length which only padded into `NAME_summits.narrowPeak`. Default is min distance between summits.

##### `--idr_threshold`
Only return peaks with a global idr threshold below this value. Default: report all peaks.

##### `--soft_idr_threshold`
Report statistics for peaks with a global idr below this value but return all peaks with an idr below --idr Default: 0.05.

##### `--plot`
Plot IDR results. Specify False if no plot. Default is to plot to `NAME_1_NAME_2.png`. Can specify other name here. 
Notice this is different from original IDR package which is only a flag.



## `idr`
#### Command line input and output:
See original [IDR documentation](https://github.com/nboley/idr#usage).  
> Notice all single letter arguments are removed to avoid conflict with fseq2, e.g. no `-s`, use `--samples`



## Output files and formats
#### `NAME_summits.narrowPeak` 
BED6+4 format
1. chrom
2. chromStart 
3. chromEnd 
4. name - `NAME_summit_num`, num is sorted by either `Pvalue` or `chromAndStart`.
5. score - `int(10*-log10(pValue))`.
6. strand - `.`
7. signalValue - Average treatment signal value given window size.
8. pValue - Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.
9. qValue - Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
10. peak - 0 if no specification of `-prior_pad_summit`.
  
#### `NAME_peaks.narrowPeak` 
Similar to summit file except that it can contain multiple summits information. 
For 7-10 columns, if multiple summits in a peak, output a comma separated list for each column. This behavior can be 
turned off by `-standard_narrowpeak` to output single value columns.
1. chrom
2. chromStart 
3. chromEnd 
4. name - `NAME_peak_num`, num is sorted by either `Pvalue` or `chromAndStart`.
5. score - Max `int(10*-log10(pValue))` of all summits.
6. strand - `.`
7. signalValue
8. pValue
9. qValue
10. peak - Relative summit position(s) to peak start.

#### `NAME.bw` and `NAME.wig`
Reconstructed signal files which can be displayed directly in the UCSC Genome Browser. 
Recommend `bw` for efficient indexing in the browser.  

#### `NAME_sig.h5`
Reconstructed signal file without any smoothing. Signal is stored for each chrom in `np.array` and accessed by key `chrom`.   
For example:
```
>>> with h5py.File(NAME_sig.h5, mode='r') as sig_file:
...     signal = sig_file['chr1'][:] # read in all signal on chr1
```

#### `NAME_1_NAME_2_conservative_IDR_thresholded_peaks.narrowPeak` and `NAME_1_NAME_2.png`
Generated by `fseq2 callpeak_idr`.    Detailed format information is [here](https://github.com/nboley/idr#output).



## Examples

#### DNase-seq data
```
$ fseq2 callpeak treatment_file.bam -f 0 -l 600 -t 4.0 -v -cpus 10
```

#### ATAC-seq data
Paired-end ATAC-seq data, and call peaks on open chromatin regions, without calling on nucleosomes
```
$ fseq2 callpeak treatment_file.bam -f 0 -l 600 -t 4.0 -pe -nfr_upper_limit 150 -pe_fragment_size_range auto
```

#### ChIP-seq data
TF ChIP-seq data
```
$ fseq2 callpeak treatment_file.bed -control_file control_file.bed -l 50 -t 8.0 -sig_format bigwig -chrom_size_file /path/to/hg19.chrom.sizes -v -cpus 5 -o /path/to/fseq2_output_dir -name CTCF_results
```
IDR pipeline for TF ChIP-seq data
```
$ fseq2 callpeak_idr treatment_file_rep1.bam treatment_file_rep2.bam -control_file_1 control_file_rep1.bam -control_file_2 control_file_rep2.bam -l 50 -t 8.0 -chrom_size_file /path/to/hg19.chrom.sizes -v -cpus 3 -o /path/to/fseq2_output_dir
```



## Troubleshooting

##### 1. Install error on mac Mojave: 
```
fatal error: 'ios' file not found 
#include "ios"
```
Solution:  
add `CFLAGS='-stdlib=libc++'` in front of `pip install`
```
$ CFLAGS='-stdlib=libc++' pip install fseq2
```

##### 2. Memory error

Solution:  
try with less CPUs


##### 3. `NotImplementedError: "xx" does not appear to be installed or on the path, so this method is disabled.  Please install a more recent version of BEDTools and re-import to use this method.`

Solution:  
update or install bedtools >= 2.29.0  
Or  
one should copy the binaries in `bedtools2/bin/` to either `usr/local/bin/` or some other repository for commonly used 
UNIX tools in your environment.


##### 4. Warnings when `-pe`
Mostly likely bam file is not sorted by name.  
Solution:  
see [here](./INPUT_FORMAT.md)

##### 5. Too few peaks after multi-test correction
This may indicate poor data quality.  
Solution:  
use `-p_thr` instead of `-q_thr`

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Boyle-Lab/F-Seq2",
    "name": "fseq2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "fseq2",
    "author": "Nanxiang Zhao (Samuel)",
    "author_email": "samzhao@umich.edu",
    "download_url": "https://files.pythonhosted.org/packages/e3/88/5c7784f472d13685dd91533e88f2bc9cd70b6e0215956b4c516b9060ece9/fseq2-2.0.4.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://badge.fury.io/py/fseq2.svg)](https://badge.fury.io/py/fseq2)\n[![Conda](https://img.shields.io/conda/v/samzhao/fseq2)](https://anaconda.org/samzhao/fseq2)\n[![GitHub](https://img.shields.io/github/license/Boyle-Lab/F-Seq2)](https://github.com/Boyle-Lab/F-Seq2/blob/master/LICENSE)\n<br/>\n\n |Host | Downloads |\n |-----|-----------|\n |PyPI |[![Downloads](https://pepy.tech/badge/fseq2)](https://pepy.tech/project/fseq2)|\n |conda|[![Conda](https://img.shields.io/conda/dn/samzhao/fseq2)](https://anaconda.org/samzhao/fseq2)|\n\n\n\n# F-Seq2\n## Improving the feature density based peak caller with dynamic statistics\n\nTag sequencing using high-throughput sequencing technologies are employed to identify specific sequence features such as \nDNase-seq, ATAC-seq, ChIP-seq, and FAIRE-seq. To intuitively summarize and display individual sequence data as an \naccurate and interpretable signal, we have developed the original [F-Seq](http://fureylab.web.unc.edu/software/fseq/) \n[GitHub](https://github.com/aboyle/F-seq), a software package that generates a continuous tag sequence density \nestimation allowing identification of biologically meaningful sites whose output can be displayed directly in the UCSC \nGenome Browser. \n\nF-Seq2 is a complete rewrite of the original version in Python. We designed a new statistical framework and introduced \nnew features to F-Seq to further improve the performance in its second version. F-Seq2 implements a dynamic \nparameter to conduct local statistical analysis with an underlying \u201ccontinuous\u201d Poisson distribution. By combining the \npower of the local test and the KDE, which model the read probability distribution with statistical rigor, we robustly \naccount for local biases and solve ties that occur when ranking candidate summits, making results suitable for \nirreproducible discovery rate (IDR) analysis.\n\n## Citation:\nZhano, N. & Boyle, A.P. TF-Seq2: improving the feature density based peak caller with dynamic statistics. NAR Genom Bioinform. 2021 Feb 23;3(1):lqab012. https://doi.org/10.1093/nargab/lqab012\n\n## Table of contents\n\n1. [Installation](./INSTALL.md)\n2. [Usage](#usage)\n    - [`callpeak`](#callpeak)\n    - [`callpeak_idr`](#callpeak_idr)\n    - [`idr`](#idr)\n3. [Output files and formats](#output-files-and-formats)\n4. [Examples](#examples)\n5. [Reference](#reference)\n6. [Troubleshooting](#troubleshooting)\n\n\n\n## Installation\nPrerequisite: [BEDTools](https://bedtools.readthedocs.io/en/latest/content/installation.html).  \nSee [here](./INSTALL.md) for more details to install F-Seq2.\n\n\n\n## Usage\n\n```\nfseq2 [-h] [--version]\n    {callpeak, callpeak_idr, idr}\n```\n\nAvailable subcommands\n\nSubcommand | Description\n-----------|----------\n`callpeak` | F-Seq2 main function to call peaks from alignment results.\n`callpeak_idr` | Call peaks and follow by IDR framework with recommended settings.\n`idr` | A wrapper for [IDR package](https://github.com/nboley/idr) for customized IDR analysis.\n\n\n\n## `callpeak`\n#### Command line input:\n##### `-treatment_file`\nREQUIRED argument for fseq2. Treatment file(s) in bam or bed format. If specifiy multiple files (separated by space), \nthey are considered as one treatment experiment. See [here](./INPUT_FORMAT.md) for more details about input format.\n\n##### `-control_file`\nControl file(s) corresponding to treatment file(s).\n\n##### `-pe`\nPaired-end mode. If this flag on, treatment (and control) file(s) are paired-end data, either in format of BAMPE or BEDPE. \nDefault is False to treat all data as single-end. See [here](./INPUT_FORMAT.md) for more details about paired-end mode.\n\n##### `-chrom_size_file` \nA file specify chrom sizes, where each line has one chrom and its size. This is required if output signal format is bigwig. \nNote if this file is specified, fseq2 only process the chroms in this file. Default is False to process all and cannot output bigwig.\n\n##### `-o`\nOutput directory path. Default is current directory.\n\n##### `-name`\nPrefix for all output files. This overrides exisiting files. Default is `fseq2_result`.\n\n##### `-sig_format`\nSignal format for reconstructed signal. Available format `wig`, `bigwig`, `np_array`. Note if choose `np_array`, arrays \nfor each chrom are stored in [`NAME_sig.h5`](#name_sigh5) with `chrom` as key, and no gaussian smooth applied. Default is False, without output signal.\n\n##### `-sort_by`\nSort peaks and summits by `pValue` or `chromAndStart`. Default is `chromAndStart`.\n\n##### `-standard_narrowpeak`\nIf flag on, `NAME_peaks.narrowPeak` is in standard `.narrowPeak` format, which contains max pvalue summits rather than all summits for each peak region.\nCompatible to visualization on UCSC genome browser and convenient for other downstream softwares. \n\n##### `-v`\nVerbose output. Default is False.  \n\n##### `-f`\nFragment size of treatment data. Default is to estimate from data. This determines shift size where `offset = fragment_size/2`. \nFor DNase-seq and ATAC-seq data, set `-f 0`. \n\n##### `-l`\nFeature length for treatment data. Default is 600. Recommend 50 for TF ChIP-seq, 600 for DNase-seq and ATAC-seq, \n1000 for histone ChIP-seq.\n\n##### `-fc`\nFragment size of control data.\n\n##### `-t`\nThreshold (standard deviations) to call candidate summits. Default is 4.0. Recommend 4.0 for broad peaks, \n8.0 for sharp peaks.\n\n##### `-p_thr`\nP value threshold. Default is 0.01. Consider to relax it to 0.05 when without control data or calling broad peaks. \nTo resemble F-Seq1 results, specify `-p_thr False`, then filter out peaks whose signalValue \n(7th column in `.narrowPeak`) below est. threshold.\n\n##### `-q_thr`\nQ value (FDR) threshold. Default is not set and use `p_thr`. If set, only use `q_thr`.\n\n##### `-cpus`\nNumber of cores to use. Default is 1.\n\n##### `-tp`\nThreshold (standard deviations) to call peak regions. Default is 4.0.\n\n##### `-sparse_data`\nIf flag on, statistical test includes 1k region for more accurate background estimation. This can be useful for single-cell data.\n\n##### `-nfr_upper_limit`\nNucleosome free region upper limit. Default is 150. Used as window_size and min_distance when `-f 0`.\n\n##### `-pe_fragment_size_range`\nEffective only if `-pe` on. Only keep PE fragments whose size within the range to call peaks. Default is False, \nwithout any selection. Useful for ATAC-seq data:  \n(1) to call peaks on nucleosome free regions, specify: `0 150`  \n(2) to call peaks on nucleosome centers, specify: `150 inf`  \n(3) to call peaks on open chromatin regions, specify: `auto`  \n> `auto` is a filter designed for ATAC-seq open chromatin peak calling where we filter out fragments whose size related to \nmono-, di-, tri-, and multi-nucleosomes. Size information is taken from the original ATAC-seq paper (Buenrostro et al.). \nYou can design your own auto filter based on specific experiment data by specifying `-nucleosome_size` parameter.\n\n##### `-nucleosome_size`\nEffective only if `-pe` on and specify `-pe_fragment_size_range auto`. Default is `180, 247, 315, 473, 558, 615` They \nare the ATAC-seq PE fragment sizes related to mono-, di-, and tri-nucleosomes. Fragments whose size within the ranges \nand above the largest bound (i.e. 615) are filtered out when calling peaks. Change those numbers to design your own auto filter.\n\n##### `-prior_pad_summit`\nPrior knowledge about peak length which only padded into `NAME_summits.narrowPeak`. Default is 0. \nUseful for IDR analysis: in `callpeak_idr`, we set it to the minimum distance between summits. \n\n##### `-num_peaks`\nMaximum number of peaks called. Default is not set. If set, overrides `p_thr` and `q_thr`.\n\n\n\n## `callpeak_idr`\n#### Command line input:\nMost arguments are shared between `callpeak` and `callpeak_idr`. Here are the unique ones.  \n> Notice if it is `-` or `--` ahead of arguments. `--` arguments are from IDR package. `-` are from fseq2.\n##### `-treatment_file_1`\nTreatment file in bam or bed format as replicate 1.\n\n##### `-treatment_file_2`\nTreatment file in bam or bed format as replicate 2.\n\n##### `-control_file_1`\nControl file in bam or bed format, paired with replicate 1 treament file.\n\n##### `-control_file_2`\nControl file in bam or bed format, paired with replicate 2 treament file.\n\n##### `-name_1`\nPrefix for output files for replicate 1 (default=`fseq2_result_1`).\n\n##### `-name_2`\nPrefix for output files for replicate 2 (default=`fseq2_result_2`).\n\n##### `-prior_pad_summit`\nPrior knowledge about peak length which only padded into `NAME_summits.narrowPeak`. Default is min distance between summits.\n\n##### `--idr_threshold`\nOnly return peaks with a global idr threshold below this value. Default: report all peaks.\n\n##### `--soft_idr_threshold`\nReport statistics for peaks with a global idr below this value but return all peaks with an idr below --idr Default: 0.05.\n\n##### `--plot`\nPlot IDR results. Specify False if no plot. Default is to plot to `NAME_1_NAME_2.png`. Can specify other name here. \nNotice this is different from original IDR package which is only a flag.\n\n\n\n## `idr`\n#### Command line input and output:\nSee original [IDR documentation](https://github.com/nboley/idr#usage).  \n> Notice all single letter arguments are removed to avoid conflict with fseq2, e.g. no `-s`, use `--samples`\n\n\n\n## Output files and formats\n#### `NAME_summits.narrowPeak` \nBED6+4 format\n1. chrom\n2. chromStart \n3. chromEnd \n4. name - `NAME_summit_num`, num is sorted by either `Pvalue` or `chromAndStart`.\n5. score - `int(10*-log10(pValue))`.\n6. strand - `.`\n7. signalValue - Average treatment signal value given window size.\n8. pValue - Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.\n9. qValue - Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.\n10. peak - 0 if no specification of `-prior_pad_summit`.\n  \n#### `NAME_peaks.narrowPeak` \nSimilar to summit file except that it can contain multiple summits information. \nFor 7-10 columns, if multiple summits in a peak, output a comma separated list for each column. This behavior can be \nturned off by `-standard_narrowpeak` to output single value columns.\n1. chrom\n2. chromStart \n3. chromEnd \n4. name - `NAME_peak_num`, num is sorted by either `Pvalue` or `chromAndStart`.\n5. score - Max `int(10*-log10(pValue))` of all summits.\n6. strand - `.`\n7. signalValue\n8. pValue\n9. qValue\n10. peak - Relative summit position(s) to peak start.\n\n#### `NAME.bw` and `NAME.wig`\nReconstructed signal files which can be displayed directly in the UCSC Genome Browser. \nRecommend `bw` for efficient indexing in the browser.  \n\n#### `NAME_sig.h5`\nReconstructed signal file without any smoothing. Signal is stored for each chrom in `np.array` and accessed by key `chrom`.   \nFor example:\n```\n>>> with h5py.File(NAME_sig.h5, mode='r') as sig_file:\n...     signal = sig_file['chr1'][:] # read in all signal on chr1\n```\n\n#### `NAME_1_NAME_2_conservative_IDR_thresholded_peaks.narrowPeak` and `NAME_1_NAME_2.png`\nGenerated by `fseq2 callpeak_idr`.    Detailed format information is [here](https://github.com/nboley/idr#output).\n\n\n\n## Examples\n\n#### DNase-seq data\n```\n$ fseq2 callpeak treatment_file.bam -f 0 -l 600 -t 4.0 -v -cpus 10\n```\n\n#### ATAC-seq data\nPaired-end ATAC-seq data, and call peaks on open chromatin regions, without calling on nucleosomes\n```\n$ fseq2 callpeak treatment_file.bam -f 0 -l 600 -t 4.0 -pe -nfr_upper_limit 150 -pe_fragment_size_range auto\n```\n\n#### ChIP-seq data\nTF ChIP-seq data\n```\n$ fseq2 callpeak treatment_file.bed -control_file control_file.bed -l 50 -t 8.0 -sig_format bigwig -chrom_size_file /path/to/hg19.chrom.sizes -v -cpus 5 -o /path/to/fseq2_output_dir -name CTCF_results\n```\nIDR pipeline for TF ChIP-seq data\n```\n$ fseq2 callpeak_idr treatment_file_rep1.bam treatment_file_rep2.bam -control_file_1 control_file_rep1.bam -control_file_2 control_file_rep2.bam -l 50 -t 8.0 -chrom_size_file /path/to/hg19.chrom.sizes -v -cpus 3 -o /path/to/fseq2_output_dir\n```\n\n\n\n## Troubleshooting\n\n##### 1. Install error on mac Mojave: \n```\nfatal error: 'ios' file not found \n#include \"ios\"\n```\nSolution:  \nadd `CFLAGS='-stdlib=libc++'` in front of `pip install`\n```\n$ CFLAGS='-stdlib=libc++' pip install fseq2\n```\n\n##### 2. Memory error\n\nSolution:  \ntry with less CPUs\n\n\n##### 3. `NotImplementedError: \"xx\" does not appear to be installed or on the path, so this method is disabled.  Please install a more recent version of BEDTools and re-import to use this method.`\n\nSolution:  \nupdate or install bedtools >= 2.29.0  \nOr  \none should copy the binaries in `bedtools2/bin/` to either `usr/local/bin/` or some other repository for commonly used \nUNIX tools in your environment.\n\n\n##### 4. Warnings when `-pe`\nMostly likely bam file is not sorted by name.  \nSolution:  \nsee [here](./INPUT_FORMAT.md)\n\n##### 5. Too few peaks after multi-test correction\nThis may indicate poor data quality.  \nSolution:  \nuse `-p_thr` instead of `-q_thr`\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3",
    "summary": "Improving the feature density based peak caller with dynamic statistics.",
    "version": "2.0.4",
    "project_urls": {
        "Homepage": "https://github.com/Boyle-Lab/F-Seq2"
    },
    "split_keywords": [
        "fseq2"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fccdc0f4cc8d3ccb303890e28f0b421d3f453a27987da1a55a9bc28a9955bb47",
                "md5": "2ec67f63b728ae89f7ab4818995240da",
                "sha256": "c134c08f915ecb580f62fb75ff66a0fc77f00b1d7e7de7513311c8bf6baffb8e"
            },
            "downloads": -1,
            "filename": "fseq2-2.0.4-cp312-cp312-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "2ec67f63b728ae89f7ab4818995240da",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 154703,
            "upload_time": "2024-10-23T20:00:17",
            "upload_time_iso_8601": "2024-10-23T20:00:17.305825Z",
            "url": "https://files.pythonhosted.org/packages/fc/cd/c0f4cc8d3ccb303890e28f0b421d3f453a27987da1a55a9bc28a9955bb47/fseq2-2.0.4-cp312-cp312-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e3885c7784f472d13685dd91533e88f2bc9cd70b6e0215956b4c516b9060ece9",
                "md5": "e8ede55a69ca7a5761e9ca9b1ec7b3c7",
                "sha256": "8a99143812aa4597842ee8649d67e5253455d8274259e37b29f98246647c182c"
            },
            "downloads": -1,
            "filename": "fseq2-2.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "e8ede55a69ca7a5761e9ca9b1ec7b3c7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 141613,
            "upload_time": "2024-10-23T20:00:18",
            "upload_time_iso_8601": "2024-10-23T20:00:18.811545Z",
            "url": "https://files.pythonhosted.org/packages/e3/88/5c7784f472d13685dd91533e88f2bc9cd70b6e0215956b4c516b9060ece9/fseq2-2.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-23 20:00:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Boyle-Lab",
    "github_project": "F-Seq2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "fseq2"
}

Nanxiang Zhao (Samuel)