fastQpick


NamefastQpick JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryFast and memory-efficient sampling of DNA-Seq or RNA-seq fastq data with or without replacement.
upload_time2025-01-22 23:02:48
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseBSD 2-Clause License Copyright (c) 2024, Pachter Lab Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords fastqpick bioinformatics statistics rna-seq dna-seq
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fastQpick

Fast and memory-efficient sampling of DNA-seq or RNA-seq FASTQ data with or without replacement.

---

## Installation

### Install via PyPI
```bash
pip install fastQpick
```

### Install from Source Code

Using pip:
```bash
pip install git+https://github.com/pachterlab/fastQpick.git
```

Or clone the repository and build manually:
```bash
git clone https://github.com/pachterlab/fastQpick.git
cd fastQpick
python -m build
python -m pip install dist/fastQpick-x.x.x-py3-none-any.whl
```

---

## Usage

### Command-line Interface

Run `fastQpick` with a specified fraction and options:
```bash
fastQpick --fraction FRACTION [OPTIONS] FASTQ_FILE1 FASTQ_FILE2 ...
```

### Python API

Use `fastQpick` in your Python code:
```python
from fastQpick import fastQpick

fastQpick(
    input_file_list=['FASTQ_FILE1', 'FASTQ_FILE2', ...],
    fraction=FRACTION,
    ...
)
```

---

## Documentation

- **Command-line Help**: Use the following command to see all available options:
  ```bash
  fastQpick --help
  ```

- **Python API Help**: Use the `help` function to explore the API:
  ```python
  help(fastQpick)
  ```

### Options
- input_files (str, list, or tuple)        List of input FASTQ files or directories containing FASTQ files. Required. Positional argument on command line.
-  fraction (int or float)                 The fraction of reads to sample, as a float greater than 0. Any value equal to or greater than 1 will turn on the -r flag automatically.
-  seed (int or str)                       Random seed(s). Can provide multiple seeds separated by commas. Default: 42
-  output_dir (str)                        Output directory. Default: ./fastQpick_output
-  gzip_output (bool)                      Whether or not to gzip the output. Default: False (uncompressed)
-  group_size (int)                        The size of grouped files. Provide each pair of files sequentially, separated by a space. E.g., I1, R1, R2 -  would have group_size=3. Default: 1 (unpaired)
-  replacement (bool)                      Sample with replacement. Default: False (without replacement).
-  overwrite (bool)                        Overwrite existing output files. Default: False
-  low_memory (bool)                       Whether to use low memory mode (uses ~5.5x less memory than default, but adds marginal time to the data -  structure generation preprocessing). Default: False
-  verbose (bool)                          Whether to print progress information. Default: True

---

## Features

- Efficient sampling of large FASTQ files.
- Works with both single and paired-end sequencing data.
- Supports sampling with or without replacement.
- Command-line interface and Python API for seamless integration.
- Memory efficient - in low-memory mode, only uses as much memory as a list of (small) integers the length of the number of reads in the fastq file for each file.
- Time efficient - only passes through the fastq once and writes to output in batches - can process 600M reads in 10-15 minutes

## Low memory mode vs. standard
Low memory mode vs. standard, when fraction=1 (i.e., number of reads to sample is the same as the number of reads in the fastq):
- Adds an extra ~1-3 seconds per million reads per group_size (i.e., 500M reads would take 30 minutes instead of 20-25 minutes)
- Saves an extra ~40MB RAM per million reads (i.e., 500M reads would take 3.75GB RAM vs 20.6GB RAM)

---

## Examples

### 1. Sample 10% of reads with replacement from a FASTQ file:

**Command-line**
```bash
fastQpick --fraction 0.1 -r input.fastq
```

**Python**
```python
from fastQpick import fastQpick

fastQpick(
    input_files='input.fastq',
    fraction=0.1,
    replacement=True
)
```

### 2. Sample 100% of reads with replacement from multiple paired FASTQ files (R1, R2) across three seeds (i.e., bootstrapping):

**Command-line**
```bash
fastQpick --fraction 1 -s 42,43,44 -r -g 2 input1_R1.fastq input1_R2.fastq
```

**Python**
```python
from fastQpick import fastQpick

fastQpick(
    input_files='input.fastq',
    fraction=1,
    seed="42,43,44",
    replacement=True,
    group_size=2,
)
```
---

## License

fastQpick is licensed under the 2-clause BSD license. See the [LICENSE](LICENSE) file for details.

---

## Contributing

We welcome contributions! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) file for guidelines on how to get involved.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fastQpick",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Joseph Rich <josephrich98@gmail.com>",
    "keywords": "fastQpick, bioinformatics, statistics, RNA-seq, DNA-seq",
    "author": null,
    "author_email": "Joseph Rich <josephrich98@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/39/b0/5089ef5b978dcecf29f9994b3907d4b24a2efc2f93d549832bf67bf9938e/fastqpick-0.1.0.tar.gz",
    "platform": null,
    "description": "# fastQpick\n\nFast and memory-efficient sampling of DNA-seq or RNA-seq FASTQ data with or without replacement.\n\n---\n\n## Installation\n\n### Install via PyPI\n```bash\npip install fastQpick\n```\n\n### Install from Source Code\n\nUsing pip:\n```bash\npip install git+https://github.com/pachterlab/fastQpick.git\n```\n\nOr clone the repository and build manually:\n```bash\ngit clone https://github.com/pachterlab/fastQpick.git\ncd fastQpick\npython -m build\npython -m pip install dist/fastQpick-x.x.x-py3-none-any.whl\n```\n\n---\n\n## Usage\n\n### Command-line Interface\n\nRun `fastQpick` with a specified fraction and options:\n```bash\nfastQpick --fraction FRACTION [OPTIONS] FASTQ_FILE1 FASTQ_FILE2 ...\n```\n\n### Python API\n\nUse `fastQpick` in your Python code:\n```python\nfrom fastQpick import fastQpick\n\nfastQpick(\n    input_file_list=['FASTQ_FILE1', 'FASTQ_FILE2', ...],\n    fraction=FRACTION,\n    ...\n)\n```\n\n---\n\n## Documentation\n\n- **Command-line Help**: Use the following command to see all available options:\n  ```bash\n  fastQpick --help\n  ```\n\n- **Python API Help**: Use the `help` function to explore the API:\n  ```python\n  help(fastQpick)\n  ```\n\n### Options\n- input_files (str, list, or tuple)        List of input FASTQ files or directories containing FASTQ files. Required. Positional argument on command line.\n-  fraction (int or float)                 The fraction of reads to sample, as a float greater than 0. Any value equal to or greater than 1 will turn on the -r flag automatically.\n-  seed (int or str)                       Random seed(s). Can provide multiple seeds separated by commas. Default: 42\n-  output_dir (str)                        Output directory. Default: ./fastQpick_output\n-  gzip_output (bool)                      Whether or not to gzip the output. Default: False (uncompressed)\n-  group_size (int)                        The size of grouped files. Provide each pair of files sequentially, separated by a space. E.g., I1, R1, R2 -  would have group_size=3. Default: 1 (unpaired)\n-  replacement (bool)                      Sample with replacement. Default: False (without replacement).\n-  overwrite (bool)                        Overwrite existing output files. Default: False\n-  low_memory (bool)                       Whether to use low memory mode (uses ~5.5x less memory than default, but adds marginal time to the data -  structure generation preprocessing). Default: False\n-  verbose (bool)                          Whether to print progress information. Default: True\n\n---\n\n## Features\n\n- Efficient sampling of large FASTQ files.\n- Works with both single and paired-end sequencing data.\n- Supports sampling with or without replacement.\n- Command-line interface and Python API for seamless integration.\n- Memory efficient - in low-memory mode, only uses as much memory as a list of (small) integers the length of the number of reads in the fastq file for each file.\n- Time efficient - only passes through the fastq once and writes to output in batches - can process 600M reads in 10-15 minutes\n\n## Low memory mode vs. standard\nLow memory mode vs. standard, when fraction=1 (i.e., number of reads to sample is the same as the number of reads in the fastq):\n- Adds an extra ~1-3 seconds per million reads per group_size (i.e., 500M reads would take 30 minutes instead of 20-25 minutes)\n- Saves an extra ~40MB RAM per million reads (i.e., 500M reads would take 3.75GB RAM vs 20.6GB RAM)\n\n---\n\n## Examples\n\n### 1. Sample 10% of reads with replacement from a FASTQ file:\n\n**Command-line**\n```bash\nfastQpick --fraction 0.1 -r input.fastq\n```\n\n**Python**\n```python\nfrom fastQpick import fastQpick\n\nfastQpick(\n    input_files='input.fastq',\n    fraction=0.1,\n    replacement=True\n)\n```\n\n### 2. Sample 100% of reads with replacement from multiple paired FASTQ files (R1, R2) across three seeds (i.e., bootstrapping):\n\n**Command-line**\n```bash\nfastQpick --fraction 1 -s 42,43,44 -r -g 2 input1_R1.fastq input1_R2.fastq\n```\n\n**Python**\n```python\nfrom fastQpick import fastQpick\n\nfastQpick(\n    input_files='input.fastq',\n    fraction=1,\n    seed=\"42,43,44\",\n    replacement=True,\n    group_size=2,\n)\n```\n---\n\n## License\n\nfastQpick is licensed under the 2-clause BSD license. See the [LICENSE](LICENSE) file for details.\n\n---\n\n## Contributing\n\nWe welcome contributions! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) file for guidelines on how to get involved.\n\n",
    "bugtrack_url": null,
    "license": "BSD 2-Clause License\n        \n        Copyright (c) 2024, Pachter Lab\n        \n        Redistribution and use in source and binary forms, with or without\n        modification, are permitted provided that the following conditions are met:\n        \n        1. Redistributions of source code must retain the above copyright notice, this\n           list of conditions and the following disclaimer.\n        \n        2. Redistributions in binary form must reproduce the above copyright notice,\n           this list of conditions and the following disclaimer in the documentation\n           and/or other materials provided with the distribution.\n        \n        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\n        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\n        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\n        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n        ",
    "summary": "Fast and memory-efficient sampling of DNA-Seq or RNA-seq fastq data with or without replacement.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/pachterlab/fastQpick"
    },
    "split_keywords": [
        "fastqpick",
        " bioinformatics",
        " statistics",
        " rna-seq",
        " dna-seq"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8d60f06a8ffa90c880a0d148e4b34b26f0068bcd7a16e283eabe5170a23843c8",
                "md5": "ee55d79034e4d68e052c53f93224e38a",
                "sha256": "554ae856cb2df4229741946f47cee15cae27ab167473535820936d9ac8dced1c"
            },
            "downloads": -1,
            "filename": "fastQpick-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ee55d79034e4d68e052c53f93224e38a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 10384,
            "upload_time": "2025-01-22T23:02:46",
            "upload_time_iso_8601": "2025-01-22T23:02:46.440350Z",
            "url": "https://files.pythonhosted.org/packages/8d/60/f06a8ffa90c880a0d148e4b34b26f0068bcd7a16e283eabe5170a23843c8/fastQpick-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "39b05089ef5b978dcecf29f9994b3907d4b24a2efc2f93d549832bf67bf9938e",
                "md5": "e44f6fa04bc4d45ff42736fdc26847eb",
                "sha256": "7d0d82c61952ef0a15f2cc3770b1b525ca6879551bfea17e8a7f725cb3d28156"
            },
            "downloads": -1,
            "filename": "fastqpick-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e44f6fa04bc4d45ff42736fdc26847eb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 12720,
            "upload_time": "2025-01-22T23:02:48",
            "upload_time_iso_8601": "2025-01-22T23:02:48.386236Z",
            "url": "https://files.pythonhosted.org/packages/39/b0/5089ef5b978dcecf29f9994b3907d4b24a2efc2f93d549832bf67bf9938e/fastqpick-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-22 23:02:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pachterlab",
    "github_project": "fastQpick",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "fastqpick"
}
        
Elapsed time: 0.44448s