| Name | parasplit JSON |
| Version |
1.1.3
JSON |
| download |
| home_page | None |
| Summary | An Hi-C tool for cutting sequences using specified enzymes |
| upload_time | 2025-10-27 10:46:52 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | None |
| license | AGPLv3 |
| keywords |
hi-c
hic
bioinformatics
cutsite
|
| VCS |
|
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
<!--
SPDX-FileCopyrightText: 2024 Samir Bertache
SPDX-FileCopyrightText: 2025 2024 Samir Bertache
SPDX-License-Identifier: AGPL-3.0-or-later
SPDX-License-Identifier: CC0-1.0
-->
[]
[]
# PARASPLIT :
## Overview
Parasplit is a Python script designed to process paired-end FASTQ files by fragmenting DNA sequences at specified restriction enzyme sites. It efficiently handles large datasets by leveraging multi-threading for decompression and compression using pigz.
## Features
- **Find and Utilize Restriction Enzyme Sites:** Automatically identifies ligation sites from provided enzyme names and generates regex patterns to locate these sites in sequences.
- **Fragmentation:** Splits sequences at restriction enzyme sites, creating smaller fragments based on specified seed size.
- **Multi-threading:** Efficiently processes large datasets by utilizing multiple threads for decompression and compression.
- **Custom Modes:** Supports different pairing modes for sequence fragments.
## Installation
Ensure you have Python 3 installed along with the required dependencies:
```bash
sudo apt-get install pigz
pip install parasplit
```
## Usage
The script can be executed from the command line with various arguments to customize its behavior.
### Command-Line Arguments
- `--source_forward` (str): Input file path for forward reads. Default is `../data/R1.fq.gz`.
- `--source_reverse` (str): Input file path for reverse reads. Default is `../data/R2.fq.gz`.
- `--output_forward` (str): Output file path for processed forward reads. Default is `../data/output_forward.fq.gz`.
- `--output_reverse` (str): Output file path for processed reverse reads. Default is `../data/output_reverse.fq.gz`.
- `--list_enzyme` (str): Comma-separated list of restriction enzymes. Default is "No restriction enzyme found."
- `--mode` (str): Mode of pairing fragments. Options are `all` or `fr`. Default is `fr`.
- `--seed_size` (int): Minimum length of fragments to keep. Default is 20.
- `--num_threads` (int): Number of threads to use for processing. Default is 8.
- `--borderless`: Non conservation of ligations sites
### Example Command
```bash
parasplit --source_forward="../data/R1.fq.gz" --source_reverse="../data/R2.fq.gz" --output_forward="../data/output_forward.fq.gz" --output_reverse="../data/output_reverse.fq.gz" --list_enzyme=EcoRI,HinfI --mode=all --seed_size=20 --num_threads=8
```
## Main Script
- **Pretreatment:** Retrieval of restriction sites from the Biopython database and allocation of resources for the different processes.
- **Read:** Decompression and simultaneous reading of FastQ files. Send reads to a multiprocessing queue
- **Frag:** Retrieve sequences in a queue. Splits sequences into fragments based on restriction enzyme sites. Create Pairs, and send it in a multiprocessing queue
- **WriteAndControl:** Stream writing from data from the output queue and compression in parallel
## Project architecture

*Schéma de l'architecture - Licence : [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)*
## Dependencies
- pigz
## The tree structure of my project :
├── myproject/
│ ├── __init__.py
│ ├── main.py
│ ├── Frag.py
│ ├── Read.py
│ ├── Pretreatment.py
│ └── WriteAndControl.py
├── pyproject.toml
├── requirements-dev.txt
├── docs/
│ ├── requirements.txt
├── test/
│ ├── __init__.py
│ ├── test_main.py
│ ├── input_data/
│ │ ├── R1.fq.gz
│ │ └── R2.fq.gz
│ └── output_data/
│ ├── output_ref_R1.fq.gz
│ ├── output_ref_R2.fq.gz
│ ├── output_ref_all_R1.fq.gz
│ └── output_ref_all_R2.fq.gz
└── README.md
## Contact
For questions or issues, please contact [samir.bertache.djenadi@gmail.com](mailto:samir.bertache.djenadi@gmail.com).
---
This README provides an overview of the Cutsite Script's functionality, usage instructions, and implementation details. For more detailed information, refer to the script's source code and docstrings.
Raw data
{
"_id": null,
"home_page": null,
"name": "parasplit",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Hi-C, HiC, bioinformatics, cutsite",
"author": null,
"author_email": "Bertache Djenadi <samir.bertache.djenadi@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/47/83/fe066d3ba7a07bdedaba4ea41a4dc908691b603f4bcbb567155b39985571/parasplit-1.1.3.tar.gz",
"platform": null,
"description": "<!--\nSPDX-FileCopyrightText: 2024 Samir Bertache\nSPDX-FileCopyrightText: 2025 2024 Samir Bertache\n\nSPDX-License-Identifier: AGPL-3.0-or-later\nSPDX-License-Identifier: CC0-1.0\n-->\n\n[]\n[]\n\n\n\n# PARASPLIT : \n\n## Overview\n\n\nParasplit is a Python script designed to process paired-end FASTQ files by fragmenting DNA sequences at specified restriction enzyme sites. It efficiently handles large datasets by leveraging multi-threading for decompression and compression using pigz.\n\n## Features\n\n\n- **Find and Utilize Restriction Enzyme Sites:** Automatically identifies ligation sites from provided enzyme names and generates regex patterns to locate these sites in sequences.\n\n- **Fragmentation:** Splits sequences at restriction enzyme sites, creating smaller fragments based on specified seed size.\n\n- **Multi-threading:** Efficiently processes large datasets by utilizing multiple threads for decompression and compression.\n\n- **Custom Modes:** Supports different pairing modes for sequence fragments.\n\n\n## Installation\n\n\nEnsure you have Python 3 installed along with the required dependencies:\n\n```bash\nsudo apt-get install pigz\npip install parasplit\n```\n\n\n## Usage\n\n\nThe script can be executed from the command line with various arguments to customize its behavior.\n\n\n### Command-Line Arguments\n\n\n- `--source_forward` (str): Input file path for forward reads. Default is `../data/R1.fq.gz`.\n\n- `--source_reverse` (str): Input file path for reverse reads. Default is `../data/R2.fq.gz`.\n\n- `--output_forward` (str): Output file path for processed forward reads. Default is `../data/output_forward.fq.gz`.\n\n- `--output_reverse` (str): Output file path for processed reverse reads. Default is `../data/output_reverse.fq.gz`.\n\n- `--list_enzyme` (str): Comma-separated list of restriction enzymes. Default is \"No restriction enzyme found.\"\n\n- `--mode` (str): Mode of pairing fragments. Options are `all` or `fr`. Default is `fr`.\n\n- `--seed_size` (int): Minimum length of fragments to keep. Default is 20.\n\n- `--num_threads` (int): Number of threads to use for processing. Default is 8.\n\n- `--borderless`: Non conservation of ligations sites\n\n### Example Command\n\n\n```bash\nparasplit --source_forward=\"../data/R1.fq.gz\" --source_reverse=\"../data/R2.fq.gz\" --output_forward=\"../data/output_forward.fq.gz\" --output_reverse=\"../data/output_reverse.fq.gz\" --list_enzyme=EcoRI,HinfI --mode=all --seed_size=20 --num_threads=8\n```\n\n\n## Main Script\n\n\n- **Pretreatment:** Retrieval of restriction sites from the Biopython database and allocation of resources for the different processes.\n\n- **Read:** Decompression and simultaneous reading of FastQ files. Send reads to a multiprocessing queue\n\n- **Frag:** Retrieve sequences in a queue. Splits sequences into fragments based on restriction enzyme sites. Create Pairs, and send it in a multiprocessing queue\n\n- **WriteAndControl:** Stream writing from data from the output queue and compression in parallel\n\n\n## Project architecture\n\n\n\n*Sch\u00e9ma de l'architecture - Licence : [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)*\n\n## Dependencies\n\n- pigz\n\n\n## The tree structure of my project : \n\n\n\t\t\t\u251c\u2500\u2500 myproject/\n\t\t\t\u2502 \u251c\u2500\u2500 __init__.py\n\t\t\t\u2502 \u251c\u2500\u2500 main.py\n\t\t\t\u2502 \u251c\u2500\u2500 Frag.py\n\t\t\t\u2502 \u251c\u2500\u2500 Read.py\n\t\t\t\u2502 \u251c\u2500\u2500 Pretreatment.py\n\t\t\t\u2502 \u2514\u2500\u2500 WriteAndControl.py\n\t\t\t\u251c\u2500\u2500 pyproject.toml\n\t\t\t\u251c\u2500\u2500 requirements-dev.txt\n\t\t\t\u251c\u2500\u2500 docs/\n\t\t\t\u2502 \u251c\u2500\u2500 requirements.txt\n\t\t\t\u251c\u2500\u2500 test/\n\t\t\t\u2502 \u251c\u2500\u2500 __init__.py\n\t\t\t\u2502 \u251c\u2500\u2500 test_main.py\t\n\t\t\t\u2502 \u251c\u2500\u2500 input_data/\n\t\t\t\u2502 \u2502 \u251c\u2500\u2500 R1.fq.gz\n\t\t\t\u2502 \u2502 \u2514\u2500\u2500 R2.fq.gz\n\t\t\t\u2502 \u2514\u2500\u2500 output_data/\n\t\t\t\u2502 \u251c\u2500\u2500 output_ref_R1.fq.gz\n\t\t\t\u2502 \u251c\u2500\u2500 output_ref_R2.fq.gz\n\t\t\t\u2502 \u251c\u2500\u2500 output_ref_all_R1.fq.gz\n\t\t\t\u2502 \u2514\u2500\u2500 output_ref_all_R2.fq.gz\n\t\t\t\u2514\u2500\u2500 README.md\n\t\t\t\n## Contact\n\n\nFor questions or issues, please contact [samir.bertache.djenadi@gmail.com](mailto:samir.bertache.djenadi@gmail.com).\n\n\n---\n\nThis README provides an overview of the Cutsite Script's functionality, usage instructions, and implementation details. For more detailed information, refer to the script's source code and docstrings.\n\n\t\t\t\n",
"bugtrack_url": null,
"license": "AGPLv3",
"summary": "An Hi-C tool for cutting sequences using specified enzymes",
"version": "1.1.3",
"project_urls": null,
"split_keywords": [
"hi-c",
" hic",
" bioinformatics",
" cutsite"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5f8e7f80b6676d89003dab572316244f1e417b78beb6b598ae2ab223b3e1a976",
"md5": "1d63c884fb48052bc358a2c5be4d5279",
"sha256": "cb18a51771ac06b7c73fd1ea27be62c19e10134c56e88491ca293e173cc20cfd"
},
"downloads": -1,
"filename": "parasplit-1.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1d63c884fb48052bc358a2c5be4d5279",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 16654,
"upload_time": "2025-10-27T10:46:50",
"upload_time_iso_8601": "2025-10-27T10:46:50.575352Z",
"url": "https://files.pythonhosted.org/packages/5f/8e/7f80b6676d89003dab572316244f1e417b78beb6b598ae2ab223b3e1a976/parasplit-1.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4783fe066d3ba7a07bdedaba4ea41a4dc908691b603f4bcbb567155b39985571",
"md5": "10757c232779f5b1d35e6a101e46ec2c",
"sha256": "542e517f7bd3041c7a6027d041632b166a6b7b76a09d6530981f3a4667840d7d"
},
"downloads": -1,
"filename": "parasplit-1.1.3.tar.gz",
"has_sig": false,
"md5_digest": "10757c232779f5b1d35e6a101e46ec2c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14509,
"upload_time": "2025-10-27T10:46:52",
"upload_time_iso_8601": "2025-10-27T10:46:52.410599Z",
"url": "https://files.pythonhosted.org/packages/47/83/fe066d3ba7a07bdedaba4ea41a4dc908691b603f4bcbb567155b39985571/parasplit-1.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-27 10:46:52",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "parasplit"
}