ArraySplitter

Name	ArraySplitter JSON
Version	1.2.3 JSON
	download
home_page	https://github.com/aglabx/ArraySplitter
Summary	De Novo Decomposition of Satellite DNA Arrays into Monomers within Telomere-to-Telomere Assemblies
upload_time	2024-03-06 18:53:17
maintainer
docs_url	None
author	Aleksey Komissarov
requires_python	>=3.6
license	BSD
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# ArraySplitter: De Novo Decomposition of Satellite DNA Arrays

Decomposes satellite DNA arrays into monomers within telomere-to-telomere (T2T) assemblies. Ideal for analyzing centromeric and pericentromeric regions on monomeric level.

**Status:** In development. Optimized for 100Kb scale arrays; longer arrays will work but may take longer to process. Signigicanlty longer time.

**Update:** From 1.1.6, ArraySplitter now successfully decomposes arrays on megabase scale. Largest arrays takes around 5 minutes to process. Fortunatelly, there are only 41 arrays large 1 Mb in CHM13v20 assembly. And I'm going to add parallel processing to speed up singificantly the process. Currently, it is single-threaded.

**Update:** Monomers are required some polising of borders, I am working on it.

**Update:** To test ArraySplitter, I used CHM13v20 assembly, it takes around 3 hours, to decompose all arrays longer than 1 Kb (13K arrays).

## Installation

**Prerequisites**

* Python 3.6 or later

**Installation with pip:**

```bash
pip install arraysplitter
```

## Usage

**Basic Example**

```bash
time arraysplitter -i chr1.arrays.fa -o chr1.arrays
```

It will create a FASTA file with monomers separated by spaces.

**Explanation**

* **`-i chr1.arrays.fa`:** FASTA file of satDNA arrays.
* **`-o chr1.arrays`:** Prefix for the output FASTA containing decomposed monomers (separated by spaces).

**All Options**

```bash
arraysplitter --help
```

## Rotating monomers to start with the same sequence

We found that different arrays of the same repeat family can be decomposed sligtly differently. To make them comparable, ArraySplitter can rotate monomers to start with the same sequence.

```bash
arraysplitter_rotate -i arrays.fa -o arrays.norm.fa
```

And you can give the sequence to start with:

```bash
arraysplitter_rotate -i arrays.fa -o arrays.norm.fa -s TTTC
```

**Explanation**

* **`-i arrays.fa`:** FASTA file of monomers.
* **`-o arrays.norm.fa`:** Output FASTA file with rotated monomers.

## Extracting and counting monomers

And finally, you can extract and count monomers from the arrays:

```bash
arraysplitter_extract -i arrays.norm.fa -o arrays.norm
```

It will create a file with monomer length, monomer frequency, and monomer sequence (ordered by frequency). For example, for the arrays.norm.fa file above, the output will be like this:

```bash
514 10 ATCCCATTCC
514 10 GATTGGAGTG
514 6 TCCTTT
514 5 TGCTG
514 10 ATTGAATGGA
514 10 ATGCAATGGA
514 5 TCCTA
```

## Contact

For questions or support: ad3002@gmail.com

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aglabx/ArraySplitter",
    "name": "ArraySplitter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Aleksey Komissarov",
    "author_email": "ad3002@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e6/78/3f734a9f91765c6d5e02f2eb6a4f83e3096438de9b8f8a2c2eeb2f157159/ArraySplitter-1.2.3.tar.gz",
    "platform": null,
    "description": "# ArraySplitter: De Novo Decomposition of Satellite DNA Arrays\n\nDecomposes satellite DNA arrays into monomers within telomere-to-telomere (T2T) assemblies. Ideal for analyzing centromeric and pericentromeric regions on monomeric level.\n\n**Status:** In development. Optimized for 100Kb scale arrays; longer arrays will work but may take longer to process. Signigicanlty longer time.\n\n**Update:** From 1.1.6, ArraySplitter now successfully decomposes arrays on megabase scale. Largest arrays takes around 5 minutes to process. Fortunatelly, there are only 41 arrays large 1 Mb in CHM13v20 assembly. And I'm going to add parallel processing to speed up singificantly the process. Currently, it is single-threaded.\n\n**Update:** Monomers are required some polising of borders, I am working on it.\n\n**Update:** To test ArraySplitter, I used CHM13v20 assembly, it takes around 3 hours, to decompose all arrays longer than 1 Kb (13K arrays).\n\n## Installation\n\n**Prerequisites**\n\n* Python 3.6 or later\n\n**Installation with pip:**\n\n```bash\npip install arraysplitter\n```\n\n## Usage\n\n**Basic Example**\n\n```bash\ntime arraysplitter -i chr1.arrays.fa -o chr1.arrays\n```\n\nIt will create a FASTA file with monomers separated by spaces.\n\n**Explanation**\n\n* **`-i chr1.arrays.fa`:**  FASTA file of satDNA arrays.\n* **`-o chr1.arrays`:** Prefix for the output FASTA containing decomposed monomers (separated by spaces).\n\n**All Options** \n\n```bash\narraysplitter --help \n```\n\n## Rotating monomers to start with the same sequence\n\nWe found that different arrays of the same repeat family can be decomposed sligtly differently. To make them comparable, ArraySplitter can rotate monomers to start with the same sequence. \n\n```bash\narraysplitter_rotate -i arrays.fa -o arrays.norm.fa\n```\n\nAnd you can give the sequence to start with:\n\n```bash\narraysplitter_rotate -i arrays.fa -o arrays.norm.fa -s TTTC\n```\n\n**Explanation**\n\n* **`-i arrays.fa`:**  FASTA file of monomers.\n* **`-o arrays.norm.fa`:** Output FASTA file with rotated monomers.\n\n## Extracting and counting monomers\n\nAnd finally, you can extract and count monomers from the arrays:\n\n```bash\narraysplitter_extract -i arrays.norm.fa -o arrays.norm\n```\n\nIt will create a file with monomer length, monomer frequency, and monomer sequence (ordered by frequency). For example, for the arrays.norm.fa file above, the output will be like this:\n\n```bash\n514     10      ATCCCATTCC\n514     10      GATTGGAGTG\n514     6       TCCTTT\n514     5       TGCTG\n514     10      ATTGAATGGA\n514     10      ATGCAATGGA\n514     5       TCCTA\n```\n\n\n## Contact\n\nFor questions or support: ad3002@gmail.com\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "De Novo Decomposition of Satellite DNA Arrays into Monomers within Telomere-to-Telomere Assemblies",
    "version": "1.2.3",
    "project_urls": {
        "Homepage": "https://github.com/aglabx/ArraySplitter"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e6783f734a9f91765c6d5e02f2eb6a4f83e3096438de9b8f8a2c2eeb2f157159",
                "md5": "72dc54b28b29bd173106b7ccac2e0b56",
                "sha256": "1887ba979c8800e0d8b842e0ad425aeafd520084e5b22a3e715d90bd451cd36f"
            },
            "downloads": -1,
            "filename": "ArraySplitter-1.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "72dc54b28b29bd173106b7ccac2e0b56",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 18286,
            "upload_time": "2024-03-06T18:53:17",
            "upload_time_iso_8601": "2024-03-06T18:53:17.239462Z",
            "url": "https://files.pythonhosted.org/packages/e6/78/3f734a9f91765c6d5e02f2eb6a4f83e3096438de9b8f8a2c2eeb2f157159/ArraySplitter-1.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-06 18:53:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aglabx",
    "github_project": "ArraySplitter",
    "github_not_found": true,
    "lcname": "arraysplitter"
}

Aleksey Komissarov