rnaseqdata

Name	rnaseqdata JSON
Version	0.0.8 JSON
	download
home_page	https://github.com/Tiezhengyuan/ernav2_seqdata
Summary	New Data type known as SeqData for RNA-Seq data analysis
upload_time	2024-04-01 22:33:22
maintainer	None
docs_url	None
author	Tiezheng Yuan
requires_python	None
license	None
keywords	pypi cicd python
VCS
bugtrack_url
requirements	ddt pytest numpy pandas
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            \n# ernav2_seqdata
A new data type known as SeqData is designed for RNA-seq data analysis. The data type is designed for data integration from various sources and dimensions:

<img src="static/mrnaseq_data.png" width="900" height="250">

Expression of RNA is measured by read counts of transcripts. A typical bioinformatics pipeline of mRNA-seq determines reads counts (RC) of transcripts. The RCs are typically 2-D table, of which samples are in rows, and transcripts (or genes) are in columns, or in the reverse. After that, the RC table would be normalized as FPM or FPKM or somewhere else by a certain normalized method. The next, co-founding factors among samples would be removed using a certain method namely DESEQ2 or EdgeR etc. Moreover, those data would be transformed into various table, namely log, or partitioned into some subset. Bioinformatician should manage all those data sets during statistical anlaysis.

<img src="static/SeqData.png" width="600" height="400">

Biological scientists may be more care about significance of mRNA-seq data analysis, and what those significance reveals. In this case, sample informations, or patient information, or features of samples (namely single cells) shall be considered. Moreover, aside from transcript ID or Gene ID, other annotations would be integrated, for example, genomic annoations namely chromosome locus, protein annotations namely domain identification would be integrated, too. Those annoation data may not be used in statistical process, but really needed for further study.

SeqData is tree structure. The root contains data of phenotypes and annotations. Each node contains various attributes including X in m x n, and var (statistical aggregations). Nodes inherite the attributes of the root nodes. Data of children nodes is determined by those of parent nodes.

<img src="static/SeqData_data_structure.png" width="350" height="300">

## installation
It is convenient to install the repository using pip. The package could be found at [Pythone Package Index](https://pypi.org/manage/project/rnaseqdata/releases/).
```
pip install --upgrade rnaseqdata
```

## Development
```
git clone git@github.com:Tiezhengyuan/ernav2_seqdata.git
cd ernav2_seqdata
```

create virtual environment
```
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
```


Unit testing
```
pytest tests/unittests
```

## quick tourial
In Python3

```
from rnaseqdata import RootData, SeqData
import numpy as np
import pandas as pd
```

Create SeqData

```
root = RootData()
c = SeqData(root)
c.put_data('test', np.eye(3), root)
c.to_df('test)
```

          0    1    2
     0  1.0  0.0  0.0
     1  0.0  1.0  0.0
     2  0.0  0.0  1.0

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Tiezhengyuan/ernav2_seqdata",
    "name": "rnaseqdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "pypi, cicd, python",
    "author": "Tiezheng Yuan",
    "author_email": "tiezhengyuan@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ec/69/54812b98888d4db4ab2828dc8c60f0ea645166a4e25130d27a549493c052/rnaseqdata-0.0.8.tar.gz",
    "platform": null,
    "description": "\\n# ernav2_seqdata\nA new data type known as SeqData is designed for RNA-seq data analysis. The data type is designed for data integration from various sources and dimensions:\n\n<img src=\"static/mrnaseq_data.png\" width=\"900\" height=\"250\">\n\nExpression of RNA is measured by read counts of transcripts. A typical bioinformatics pipeline of mRNA-seq determines reads counts (RC) of transcripts. The RCs are typically 2-D table, of which samples are in rows, and transcripts (or genes) are in columns, or in the reverse. After that, the RC table would be normalized as FPM or FPKM or somewhere else by a certain normalized method. The next, co-founding factors among samples would be removed using a certain method namely DESEQ2 or EdgeR etc. Moreover, those data would be transformed into various table, namely log, or partitioned into some subset. Bioinformatician should manage all those data sets during statistical anlaysis.\n\n<img src=\"static/SeqData.png\" width=\"600\" height=\"400\">\n\nBiological scientists may be more care about significance of mRNA-seq data analysis, and what those significance reveals. In this case, sample informations, or patient information, or features of samples (namely single cells) shall be considered. Moreover, aside from transcript ID or Gene ID, other annotations would be integrated, for example, genomic annoations namely chromosome locus, protein annotations namely domain identification would be integrated, too. Those annoation data may not be used in statistical process, but really needed for further study.\n\nSeqData is tree structure. The root contains data of phenotypes and annotations. Each node contains various attributes including X in m x n, and var (statistical aggregations). Nodes inherite the attributes of the root nodes. Data of children nodes is determined by those of parent nodes.\n\n<img src=\"static/SeqData_data_structure.png\" width=\"350\" height=\"300\">\n\n## installation\nIt is convenient to install the repository using pip. The package could be found at [Pythone Package Index](https://pypi.org/manage/project/rnaseqdata/releases/).\n```\npip install --upgrade rnaseqdata\n```\n\n## Development\n```\ngit clone git@github.com:Tiezhengyuan/ernav2_seqdata.git\ncd ernav2_seqdata\n```\n\ncreate virtual environment\n```\nvirtualenv venv\nsource venv/bin/activate\npip install -r requirements.txt\n```\n\n\nUnit testing\n```\npytest tests/unittests\n```\n\n## quick tourial\nIn Python3\n\n```\nfrom rnaseqdata import RootData, SeqData\nimport numpy as np\nimport pandas as pd\n```\n\nCreate SeqData\n\n```\nroot = RootData()\nc = SeqData(root)\nc.put_data('test', np.eye(3), root)\nc.to_df('test)\n```\n\n          0    1    2\n     0  1.0  0.0  0.0\n     1  0.0  1.0  0.0\n     2  0.0  0.0  1.0\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "New Data type known as SeqData for RNA-Seq data analysis",
    "version": "0.0.8",
    "project_urls": {
        "Homepage": "https://github.com/Tiezhengyuan/ernav2_seqdata"
    },
    "split_keywords": [
        "pypi",
        " cicd",
        " python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e82c6074abebed877b5df96336271b76fe83860f0360d4d5f72ab868616d72c",
                "md5": "c70757d8d7c4d58444a2869a12f9e65f",
                "sha256": "f4a27344e70ad1feaee0472fb741e00f0cd201af694bfc214964c7ac155ac410"
            },
            "downloads": -1,
            "filename": "rnaseqdata-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c70757d8d7c4d58444a2869a12f9e65f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 20135,
            "upload_time": "2024-04-01T22:33:20",
            "upload_time_iso_8601": "2024-04-01T22:33:20.351247Z",
            "url": "https://files.pythonhosted.org/packages/9e/82/c6074abebed877b5df96336271b76fe83860f0360d4d5f72ab868616d72c/rnaseqdata-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ec6954812b98888d4db4ab2828dc8c60f0ea645166a4e25130d27a549493c052",
                "md5": "2d706a9de14c411002e34303f6a4b909",
                "sha256": "b693c31050cb04ceb32118ea46a3a10d1839e2d60f9632c8bde5a074f4c2344c"
            },
            "downloads": -1,
            "filename": "rnaseqdata-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "2d706a9de14c411002e34303f6a4b909",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19752,
            "upload_time": "2024-04-01T22:33:22",
            "upload_time_iso_8601": "2024-04-01T22:33:22.030630Z",
            "url": "https://files.pythonhosted.org/packages/ec/69/54812b98888d4db4ab2828dc8c60f0ea645166a4e25130d27a549493c052/rnaseqdata-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-01 22:33:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Tiezhengyuan",
    "github_project": "ernav2_seqdata",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ddt",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        }
    ],
    "lcname": "rnaseqdata"
}

Tiezheng Yuan