cogent3-h5seqs


Namecogent3-h5seqs JSON
Version 0.5.0 PyPI version JSON
download
home_pageNone
SummaryHDF5 storage driver for cogent3 sequence collections
upload_time2025-07-10 06:30:06
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseBSD 3-Clause License Copyright (c) 2025, cogent3 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords bioinformatics biology evolution genomics phylogeny statistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![CI](https://github.com/cogent3/cogent3-h5seqs/actions/workflows/ci.yml/badge.svg)](https://github.com/cogent3/cogent3-h5seqs/actions/workflows/ci.yml)
[![Coverage Status](https://coveralls.io/repos/github/cogent3/cogent3-h5seqs/badge.svg?branch=develop)](https://coveralls.io/github/cogent3/cogent3-h5seqs?branch=develop)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

# cogent3-h5seqs: a HDF5 storage driver for cogent3 sequence collections

`cogent3-h5seqs` is a sequence storage plug-in for [cogent3](https://cogent3.org). It uses HDF5 as the storage format for biological sequences, supporting both unaligned sequence collections and alignments. Storage can be in memory (the default) or on disk and sequences are compressed using the BLOSC2 compression engine.

The advantage of HDF5 is that once primary sequence formats have been converted from text into numpy arrays, loading and manipulating sequence data is fast and very memory efficient.

> **Note**
> The storage only works with the new type `cogent3` `Alignment` and `SequenceCollection` types.

## Installation

```
pip install cogent3-h5seqs
```

## Usage

### Making `cogent3-h5seqs` the default storage

Using `cogent3.set_storage_defaults()`, you can set `cogent3-h5seqs` as the default storage. This means whenever a sequence collection is loaded from disk or created in memory, it will use the storage within this package.

The following statement makes `cogent3-h5seqs` the default for both unaligned and aligned sequence collections.

```python
import cogent3

cogent3.set_storage_defaults(unaligned_seqs="h5seqs_unaligned",
                             aligned_seqs="h5seqs_aligned")
```

You can undo this setting by

```python
cogent3.set_storage_defaults(unaligned_seqs=None, aligned_seqs=None)
```

### Using `cogent3-h5seqs` as storage per object

You don't have to specify the storage as the default for all instances, but can do it on a per object basis.

```python
coll = cogent3.load_unaligned_seqs(some_path,
                                   moltype="dna",
                                   storage_backend="h5seqs_unaligned")
```

or, for alignments.

```python
aln = cogent3.load_aligned_seqs(some_path,
                                   moltype="dna",
                                   storage_backend="h5seqs_aligned")
```

The same values can also be provided to the `make_unaligned_seqs()`, `make_aligned_seqs()` functions in `cogent3`.

### Saving storage to disk

`cogent3-h5seqs` supports writing to disk, and employs the filename suffix `.c3h5u` for unaligned sequences and `.c3h5a` for aligned sequences. This will work whether your current object is using `cogent3-h5seqs` for storage or not. For example

```python
import cogent3

sample_aln = cogent3.get_dataset("brca1")  # using the cogent3 builtin storage
outpath = "~/Desktop/alignment_output.c3h5a"
sample_aln.write(outpath)  # writes out as cogent3-h5seqs HDF5 storage
```

For a sequence collection, do the following.

```python
sample_coll = cogent3.get_dataset("brca1").degap()
# Note the different suffix
outpath = "~/Desktop/alignment_output.c3h5u"
sample_coll.write(outpath)  # writes out as cogent3-h5seqs HDF5 storage
```
### Loading storage from disk

`cogent3` correctly directs to `cogent3-h5seqs` for loading based on the filename suffix.

```python
inpath = "~/Desktop/alignment_output.c3h5u"
sample_coll = cogent3.load_unaligned_seqs(inpath, moltype="dna")
```

> **Note**
> You cannot write an alignment instance to an unaligned storage type or vice versa. Nor can you read into the different types.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cogent3-h5seqs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "bioinformatics, biology, evolution, genomics, phylogeny, statistics",
    "author": null,
    "author_email": "Gavin Huttley <Gavin.Huttley@anu.edu.au>",
    "download_url": "https://files.pythonhosted.org/packages/14/80/cbe902ef7ec1edbd9bed1aab08a204f165e203c5f3ae4094946253ecbdb8/cogent3_h5seqs-0.5.0.tar.gz",
    "platform": null,
    "description": "[![CI](https://github.com/cogent3/cogent3-h5seqs/actions/workflows/ci.yml/badge.svg)](https://github.com/cogent3/cogent3-h5seqs/actions/workflows/ci.yml)\n[![Coverage Status](https://coveralls.io/repos/github/cogent3/cogent3-h5seqs/badge.svg?branch=develop)](https://coveralls.io/github/cogent3/cogent3-h5seqs?branch=develop)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\n# cogent3-h5seqs: a HDF5 storage driver for cogent3 sequence collections\n\n`cogent3-h5seqs` is a sequence storage plug-in for [cogent3](https://cogent3.org). It uses HDF5 as the storage format for biological sequences, supporting both unaligned sequence collections and alignments. Storage can be in memory (the default) or on disk and sequences are compressed using the BLOSC2 compression engine.\n\nThe advantage of HDF5 is that once primary sequence formats have been converted from text into numpy arrays, loading and manipulating sequence data is fast and very memory efficient.\n\n> **Note**\n> The storage only works with the new type `cogent3` `Alignment` and `SequenceCollection` types.\n\n## Installation\n\n```\npip install cogent3-h5seqs\n```\n\n## Usage\n\n### Making `cogent3-h5seqs` the default storage\n\nUsing `cogent3.set_storage_defaults()`, you can set `cogent3-h5seqs` as the default storage. This means whenever a sequence collection is loaded from disk or created in memory, it will use the storage within this package.\n\nThe following statement makes `cogent3-h5seqs` the default for both unaligned and aligned sequence collections.\n\n```python\nimport cogent3\n\ncogent3.set_storage_defaults(unaligned_seqs=\"h5seqs_unaligned\",\n                             aligned_seqs=\"h5seqs_aligned\")\n```\n\nYou can undo this setting by\n\n```python\ncogent3.set_storage_defaults(unaligned_seqs=None, aligned_seqs=None)\n```\n\n### Using `cogent3-h5seqs` as storage per object\n\nYou don't have to specify the storage as the default for all instances, but can do it on a per object basis.\n\n```python\ncoll = cogent3.load_unaligned_seqs(some_path,\n                                   moltype=\"dna\",\n                                   storage_backend=\"h5seqs_unaligned\")\n```\n\nor, for alignments.\n\n```python\naln = cogent3.load_aligned_seqs(some_path,\n                                   moltype=\"dna\",\n                                   storage_backend=\"h5seqs_aligned\")\n```\n\nThe same values can also be provided to the `make_unaligned_seqs()`, `make_aligned_seqs()` functions in `cogent3`.\n\n### Saving storage to disk\n\n`cogent3-h5seqs` supports writing to disk, and employs the filename suffix `.c3h5u` for unaligned sequences and `.c3h5a` for aligned sequences. This will work whether your current object is using `cogent3-h5seqs` for storage or not. For example\n\n```python\nimport cogent3\n\nsample_aln = cogent3.get_dataset(\"brca1\")  # using the cogent3 builtin storage\noutpath = \"~/Desktop/alignment_output.c3h5a\"\nsample_aln.write(outpath)  # writes out as cogent3-h5seqs HDF5 storage\n```\n\nFor a sequence collection, do the following.\n\n```python\nsample_coll = cogent3.get_dataset(\"brca1\").degap()\n# Note the different suffix\noutpath = \"~/Desktop/alignment_output.c3h5u\"\nsample_coll.write(outpath)  # writes out as cogent3-h5seqs HDF5 storage\n```\n### Loading storage from disk\n\n`cogent3` correctly directs to `cogent3-h5seqs` for loading based on the filename suffix.\n\n```python\ninpath = \"~/Desktop/alignment_output.c3h5u\"\nsample_coll = cogent3.load_unaligned_seqs(inpath, moltype=\"dna\")\n```\n\n> **Note**\n> You cannot write an alignment instance to an unaligned storage type or vice versa. Nor can you read into the different types.\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License\n        \n        Copyright (c) 2025, cogent3\n        \n        Redistribution and use in source and binary forms, with or without\n        modification, are permitted provided that the following conditions are met:\n        \n        1. Redistributions of source code must retain the above copyright notice, this\n           list of conditions and the following disclaimer.\n        \n        2. Redistributions in binary form must reproduce the above copyright notice,\n           this list of conditions and the following disclaimer in the documentation\n           and/or other materials provided with the distribution.\n        \n        3. Neither the name of the copyright holder nor the names of its\n           contributors may be used to endorse or promote products derived from\n           this software without specific prior written permission.\n        \n        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\n        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\n        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\n        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.",
    "summary": "HDF5 storage driver for cogent3 sequence collections",
    "version": "0.5.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/cogent3/cogent3-h5seqs/issues",
        "Source Code": "https://github.com/cogent3/cogent3-h5seqs"
    },
    "split_keywords": [
        "bioinformatics",
        " biology",
        " evolution",
        " genomics",
        " phylogeny",
        " statistics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b82386bb1a9ada830934b87e73c20ce06062f8dc03cd35254d7cf3de5350cbc",
                "md5": "a251f92b4e313ecf4ce030ff61d1b380",
                "sha256": "10788188f7008710f5c1812ed01872363d3eab3adce8c45e13afc763e13f0e50"
            },
            "downloads": -1,
            "filename": "cogent3_h5seqs-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a251f92b4e313ecf4ce030ff61d1b380",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 13244,
            "upload_time": "2025-07-10T06:30:04",
            "upload_time_iso_8601": "2025-07-10T06:30:04.963666Z",
            "url": "https://files.pythonhosted.org/packages/0b/82/386bb1a9ada830934b87e73c20ce06062f8dc03cd35254d7cf3de5350cbc/cogent3_h5seqs-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1480cbe902ef7ec1edbd9bed1aab08a204f165e203c5f3ae4094946253ecbdb8",
                "md5": "be5a6aab98c1195989fefb0407bde67e",
                "sha256": "ab27df78a9d921554c6ed7233f75423af7b7ce370451e0af24cea1c6722acf9b"
            },
            "downloads": -1,
            "filename": "cogent3_h5seqs-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "be5a6aab98c1195989fefb0407bde67e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 77793,
            "upload_time": "2025-07-10T06:30:06",
            "upload_time_iso_8601": "2025-07-10T06:30:06.839531Z",
            "url": "https://files.pythonhosted.org/packages/14/80/cbe902ef7ec1edbd9bed1aab08a204f165e203c5f3ae4094946253ecbdb8/cogent3_h5seqs-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 06:30:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cogent3",
    "github_project": "cogent3-h5seqs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cogent3-h5seqs"
}
        
Elapsed time: 0.57843s