Name | cogent3-h5seqs JSON |
Version |
0.5.0
JSON |
| download |
home_page | None |
Summary | HDF5 storage driver for cogent3 sequence collections |
upload_time | 2025-07-10 06:30:06 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | BSD 3-Clause License
Copyright (c) 2025, cogent3
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
keywords |
bioinformatics
biology
evolution
genomics
phylogeny
statistics
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[](https://github.com/cogent3/cogent3-h5seqs/actions/workflows/ci.yml)
[](https://coveralls.io/github/cogent3/cogent3-h5seqs?branch=develop)
[](https://github.com/astral-sh/ruff)
# cogent3-h5seqs: a HDF5 storage driver for cogent3 sequence collections
`cogent3-h5seqs` is a sequence storage plug-in for [cogent3](https://cogent3.org). It uses HDF5 as the storage format for biological sequences, supporting both unaligned sequence collections and alignments. Storage can be in memory (the default) or on disk and sequences are compressed using the BLOSC2 compression engine.
The advantage of HDF5 is that once primary sequence formats have been converted from text into numpy arrays, loading and manipulating sequence data is fast and very memory efficient.
> **Note**
> The storage only works with the new type `cogent3` `Alignment` and `SequenceCollection` types.
## Installation
```
pip install cogent3-h5seqs
```
## Usage
### Making `cogent3-h5seqs` the default storage
Using `cogent3.set_storage_defaults()`, you can set `cogent3-h5seqs` as the default storage. This means whenever a sequence collection is loaded from disk or created in memory, it will use the storage within this package.
The following statement makes `cogent3-h5seqs` the default for both unaligned and aligned sequence collections.
```python
import cogent3
cogent3.set_storage_defaults(unaligned_seqs="h5seqs_unaligned",
aligned_seqs="h5seqs_aligned")
```
You can undo this setting by
```python
cogent3.set_storage_defaults(unaligned_seqs=None, aligned_seqs=None)
```
### Using `cogent3-h5seqs` as storage per object
You don't have to specify the storage as the default for all instances, but can do it on a per object basis.
```python
coll = cogent3.load_unaligned_seqs(some_path,
moltype="dna",
storage_backend="h5seqs_unaligned")
```
or, for alignments.
```python
aln = cogent3.load_aligned_seqs(some_path,
moltype="dna",
storage_backend="h5seqs_aligned")
```
The same values can also be provided to the `make_unaligned_seqs()`, `make_aligned_seqs()` functions in `cogent3`.
### Saving storage to disk
`cogent3-h5seqs` supports writing to disk, and employs the filename suffix `.c3h5u` for unaligned sequences and `.c3h5a` for aligned sequences. This will work whether your current object is using `cogent3-h5seqs` for storage or not. For example
```python
import cogent3
sample_aln = cogent3.get_dataset("brca1") # using the cogent3 builtin storage
outpath = "~/Desktop/alignment_output.c3h5a"
sample_aln.write(outpath) # writes out as cogent3-h5seqs HDF5 storage
```
For a sequence collection, do the following.
```python
sample_coll = cogent3.get_dataset("brca1").degap()
# Note the different suffix
outpath = "~/Desktop/alignment_output.c3h5u"
sample_coll.write(outpath) # writes out as cogent3-h5seqs HDF5 storage
```
### Loading storage from disk
`cogent3` correctly directs to `cogent3-h5seqs` for loading based on the filename suffix.
```python
inpath = "~/Desktop/alignment_output.c3h5u"
sample_coll = cogent3.load_unaligned_seqs(inpath, moltype="dna")
```
> **Note**
> You cannot write an alignment instance to an unaligned storage type or vice versa. Nor can you read into the different types.
Raw data
{
"_id": null,
"home_page": null,
"name": "cogent3-h5seqs",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "bioinformatics, biology, evolution, genomics, phylogeny, statistics",
"author": null,
"author_email": "Gavin Huttley <Gavin.Huttley@anu.edu.au>",
"download_url": "https://files.pythonhosted.org/packages/14/80/cbe902ef7ec1edbd9bed1aab08a204f165e203c5f3ae4094946253ecbdb8/cogent3_h5seqs-0.5.0.tar.gz",
"platform": null,
"description": "[](https://github.com/cogent3/cogent3-h5seqs/actions/workflows/ci.yml)\n[](https://coveralls.io/github/cogent3/cogent3-h5seqs?branch=develop)\n[](https://github.com/astral-sh/ruff)\n\n# cogent3-h5seqs: a HDF5 storage driver for cogent3 sequence collections\n\n`cogent3-h5seqs` is a sequence storage plug-in for [cogent3](https://cogent3.org). It uses HDF5 as the storage format for biological sequences, supporting both unaligned sequence collections and alignments. Storage can be in memory (the default) or on disk and sequences are compressed using the BLOSC2 compression engine.\n\nThe advantage of HDF5 is that once primary sequence formats have been converted from text into numpy arrays, loading and manipulating sequence data is fast and very memory efficient.\n\n> **Note**\n> The storage only works with the new type `cogent3` `Alignment` and `SequenceCollection` types.\n\n## Installation\n\n```\npip install cogent3-h5seqs\n```\n\n## Usage\n\n### Making `cogent3-h5seqs` the default storage\n\nUsing `cogent3.set_storage_defaults()`, you can set `cogent3-h5seqs` as the default storage. This means whenever a sequence collection is loaded from disk or created in memory, it will use the storage within this package.\n\nThe following statement makes `cogent3-h5seqs` the default for both unaligned and aligned sequence collections.\n\n```python\nimport cogent3\n\ncogent3.set_storage_defaults(unaligned_seqs=\"h5seqs_unaligned\",\n aligned_seqs=\"h5seqs_aligned\")\n```\n\nYou can undo this setting by\n\n```python\ncogent3.set_storage_defaults(unaligned_seqs=None, aligned_seqs=None)\n```\n\n### Using `cogent3-h5seqs` as storage per object\n\nYou don't have to specify the storage as the default for all instances, but can do it on a per object basis.\n\n```python\ncoll = cogent3.load_unaligned_seqs(some_path,\n moltype=\"dna\",\n storage_backend=\"h5seqs_unaligned\")\n```\n\nor, for alignments.\n\n```python\naln = cogent3.load_aligned_seqs(some_path,\n moltype=\"dna\",\n storage_backend=\"h5seqs_aligned\")\n```\n\nThe same values can also be provided to the `make_unaligned_seqs()`, `make_aligned_seqs()` functions in `cogent3`.\n\n### Saving storage to disk\n\n`cogent3-h5seqs` supports writing to disk, and employs the filename suffix `.c3h5u` for unaligned sequences and `.c3h5a` for aligned sequences. This will work whether your current object is using `cogent3-h5seqs` for storage or not. For example\n\n```python\nimport cogent3\n\nsample_aln = cogent3.get_dataset(\"brca1\") # using the cogent3 builtin storage\noutpath = \"~/Desktop/alignment_output.c3h5a\"\nsample_aln.write(outpath) # writes out as cogent3-h5seqs HDF5 storage\n```\n\nFor a sequence collection, do the following.\n\n```python\nsample_coll = cogent3.get_dataset(\"brca1\").degap()\n# Note the different suffix\noutpath = \"~/Desktop/alignment_output.c3h5u\"\nsample_coll.write(outpath) # writes out as cogent3-h5seqs HDF5 storage\n```\n### Loading storage from disk\n\n`cogent3` correctly directs to `cogent3-h5seqs` for loading based on the filename suffix.\n\n```python\ninpath = \"~/Desktop/alignment_output.c3h5u\"\nsample_coll = cogent3.load_unaligned_seqs(inpath, moltype=\"dna\")\n```\n\n> **Note**\n> You cannot write an alignment instance to an unaligned storage type or vice versa. Nor can you read into the different types.\n",
"bugtrack_url": null,
"license": "BSD 3-Clause License\n \n Copyright (c) 2025, cogent3\n \n Redistribution and use in source and binary forms, with or without\n modification, are permitted provided that the following conditions are met:\n \n 1. Redistributions of source code must retain the above copyright notice, this\n list of conditions and the following disclaimer.\n \n 2. Redistributions in binary form must reproduce the above copyright notice,\n this list of conditions and the following disclaimer in the documentation\n and/or other materials provided with the distribution.\n \n 3. Neither the name of the copyright holder nor the names of its\n contributors may be used to endorse or promote products derived from\n this software without specific prior written permission.\n \n THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\n AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\n DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\n FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\n DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\n SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\n OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.",
"summary": "HDF5 storage driver for cogent3 sequence collections",
"version": "0.5.0",
"project_urls": {
"Bug Tracker": "https://github.com/cogent3/cogent3-h5seqs/issues",
"Source Code": "https://github.com/cogent3/cogent3-h5seqs"
},
"split_keywords": [
"bioinformatics",
" biology",
" evolution",
" genomics",
" phylogeny",
" statistics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0b82386bb1a9ada830934b87e73c20ce06062f8dc03cd35254d7cf3de5350cbc",
"md5": "a251f92b4e313ecf4ce030ff61d1b380",
"sha256": "10788188f7008710f5c1812ed01872363d3eab3adce8c45e13afc763e13f0e50"
},
"downloads": -1,
"filename": "cogent3_h5seqs-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a251f92b4e313ecf4ce030ff61d1b380",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 13244,
"upload_time": "2025-07-10T06:30:04",
"upload_time_iso_8601": "2025-07-10T06:30:04.963666Z",
"url": "https://files.pythonhosted.org/packages/0b/82/386bb1a9ada830934b87e73c20ce06062f8dc03cd35254d7cf3de5350cbc/cogent3_h5seqs-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1480cbe902ef7ec1edbd9bed1aab08a204f165e203c5f3ae4094946253ecbdb8",
"md5": "be5a6aab98c1195989fefb0407bde67e",
"sha256": "ab27df78a9d921554c6ed7233f75423af7b7ce370451e0af24cea1c6722acf9b"
},
"downloads": -1,
"filename": "cogent3_h5seqs-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "be5a6aab98c1195989fefb0407bde67e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 77793,
"upload_time": "2025-07-10T06:30:06",
"upload_time_iso_8601": "2025-07-10T06:30:06.839531Z",
"url": "https://files.pythonhosted.org/packages/14/80/cbe902ef7ec1edbd9bed1aab08a204f165e203c5f3ae4094946253ecbdb8/cogent3_h5seqs-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 06:30:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cogent3",
"github_project": "cogent3-h5seqs",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cogent3-h5seqs"
}