# Dogma Data
**Dogma Data** is a Python package built for fast and efficient parsing of FASTA files, optimized for high-performance computing. It leverages multi-threading to fully utilize all available system threads, enabling parallel processing. Additionally, the package supports exporting parsed data to the HDF5 file format for easy storage and access.
## Installation
To install Dogma Data, you can use **pip**:
```bash
pip install dogma-data
```
## Usage
```python
import dogma_data
vocab = {
'a': 0,
'g': 1,
'c': 2,
't': 3,
...
}
mapping = dogma_data.FastaMapping(vocab, vocab['a'])
(tokens, sequences, (taxons)) = dogma_data.parse_fasta('input_path.fa', dogma_data.HeaderType.TaxonId, mapping)
header_info = {"taxons": taxons}
dogma_data.export_hdf5(
'output_path.h5',
dogma_data.Splitter(
train_prop=0.95,
val_prop=0.025,
test_prop=0.025,
length=len(sequences) - 1,
),
tokens,
sequences,
header_info,
mapping
)
```
## Requirements
- Python 3.10
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contact
For any questions, feel free to reach out:
- **Marcel Rød** - roed@stanford.edu
- **Miha Krajnc** - miha.krajnc@cs.stanford.edu
Raw data
{
"_id": null,
"home_page": null,
"name": "dogma-data",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "single-cell, RNA-seq, embedding, pytorch, uce",
"author": null,
"author_email": "Marcel R\u00f8d <roed@stanford.edu>",
"download_url": null,
"platform": null,
"description": "# Dogma Data\n\n**Dogma Data** is a Python package built for fast and efficient parsing of FASTA files, optimized for high-performance computing. It leverages multi-threading to fully utilize all available system threads, enabling parallel processing. Additionally, the package supports exporting parsed data to the HDF5 file format for easy storage and access.\n\n## Installation\n\nTo install Dogma Data, you can use **pip**:\n\n```bash\npip install dogma-data\n```\n\n## Usage\n```python\nimport dogma_data\n\nvocab = {\n 'a': 0,\n 'g': 1,\n 'c': 2,\n 't': 3,\n ...\n}\n\nmapping = dogma_data.FastaMapping(vocab, vocab['a'])\n(tokens, sequences, (taxons)) = dogma_data.parse_fasta('input_path.fa', dogma_data.HeaderType.TaxonId, mapping)\n\nheader_info = {\"taxons\": taxons}\n\ndogma_data.export_hdf5(\n 'output_path.h5',\n dogma_data.Splitter(\n train_prop=0.95,\n val_prop=0.025,\n test_prop=0.025,\n length=len(sequences) - 1,\n ),\n tokens,\n sequences,\n header_info,\n mapping\n)\n```\n\n## Requirements\n- Python 3.10\n \n## License\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Contact\nFor any questions, feel free to reach out:\n- **Marcel R\u00f8d** - roed@stanford.edu\n- **Miha Krajnc** - miha.krajnc@cs.stanford.edu\n",
"bugtrack_url": null,
"license": null,
"summary": "Data processing for Dogma",
"version": "0.2.19",
"project_urls": {
"Homepage": "https://github.com/marcelroed/dogma-data",
"Repository": "https://github.com/marcelroed/dogma-data.git"
},
"split_keywords": [
"single-cell",
" rna-seq",
" embedding",
" pytorch",
" uce"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5e8643c7f73f48e94ce7acf3926b39ae6bd3057bafea7c90aca57977e5a341da",
"md5": "f6100245b01655160e77a64389fef7e0",
"sha256": "c529509cbdac6c84680e8c6ea45561645afcb176414c12917d61f71a36828e7b"
},
"downloads": -1,
"filename": "dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "f6100245b01655160e77a64389fef7e0",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.10",
"size": 498873,
"upload_time": "2024-11-04T21:48:10",
"upload_time_iso_8601": "2024-11-04T21:48:10.137023Z",
"url": "https://files.pythonhosted.org/packages/5e/86/43c7f73f48e94ce7acf3926b39ae6bd3057bafea7c90aca57977e5a341da/dogma_data-0.2.19-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "093a66da3e5b4b658089f8b982129bb4932123fdb1c6c367845989ec8572e347",
"md5": "d2f08fe33ce26c3ef524817395505513",
"sha256": "877785e212ae028020f45f3bb8bffd9d1be25ffa39d474a363defdd381157c48"
},
"downloads": -1,
"filename": "dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "d2f08fe33ce26c3ef524817395505513",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.10",
"size": 498660,
"upload_time": "2024-11-04T21:48:14",
"upload_time_iso_8601": "2024-11-04T21:48:14.499967Z",
"url": "https://files.pythonhosted.org/packages/09/3a/66da3e5b4b658089f8b982129bb4932123fdb1c6c367845989ec8572e347/dogma_data-0.2.19-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1dbd44a028a602d65c2cb2874e706b4ca62d68bfe8f950ce1478c7a454ba58b0",
"md5": "1164291cabe1868812a7af6b52d4c669",
"sha256": "fd3bead441a36aa4759ecbc8a7d1d0c5ef4a340970925e6311da2ae8058ec4ea"
},
"downloads": -1,
"filename": "dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "1164291cabe1868812a7af6b52d4c669",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.10",
"size": 498044,
"upload_time": "2024-11-04T21:48:08",
"upload_time_iso_8601": "2024-11-04T21:48:08.537288Z",
"url": "https://files.pythonhosted.org/packages/1d/bd/44a028a602d65c2cb2874e706b4ca62d68bfe8f950ce1478c7a454ba58b0/dogma_data-0.2.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-04 21:48:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "marcelroed",
"github_project": "dogma-data",
"github_not_found": true,
"lcname": "dogma-data"
}