Name | vdata JSON |
Version |
0.2.3
JSON |
| download |
home_page | None |
Summary | Annotated multivariate observation of timestamped data |
upload_time | 2024-09-02 14:51:54 |
maintainer | None |
docs_url | None |
author | Matteo Bouvier |
requires_python | <4.0,>=3.10 |
license | LICENSE |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# 🗂 VData
**VData** is used for storing and manipulating multivariate observations of timestamped data.
![The VData structure](docs/images/vdata_diagram.png)
It extends the [AnnData](https://anndata.readthedocs.io/en/latest/) object by adding the **time** dimension.
**Example** : The VData object allows to efficiently store information about cells (**observations**), whose gene
expression (**variables**) is measured over multiple **time points**. It is build around layers (.layers). Each layer
is a 3D matrix of : `obs` x `var` x `time points`. Around those layers, DataFrames allow to describe variables and
time points, while custom TemporalDataFrames describe observations.
The **uns** dictionnary is used to store additional unstructure data.
More generally, VData objects can be used to store any timestamped datasets where annotation of observations and
variables is required.
## 🌟 Features
- complete Python reimplementation based on [ h5py ](https://docs.h5py.org/en/latest)
- very fast loading of any dataset
- memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.
- explicit handling of timestamped data, especially suited for simulated single-cell datesets
- complete compatibility with the [ scverse ](https://scverse.org/) ecosystem
## 👁 Overview
### General
The `vdata` library exposes the actual **VData** object alongside with the **TemporalDataFrame** object which extends
the common `pandas.DataFrame` to a third `time` axis.
**VData** objects can be created from in-RAM objects such as `AnnData`, `TemporalDataFrame`, `pandas.DataFrame` or
mappings of `<layer name>`:`DataFrame`.
It is also possible to load data from a `VData` or an `AnnData` saved as a
[ hdf5 website ](https://www.hdfgroup.org/solutions/hdf5/) file or in `csv` format.
> 🔵 **Note**
> An important distinction with `AnnData` is that when a **VData** is backed on (read from) an hdf5 file, the *whole*
> object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small
amounts of RAM and will be very fast to read.
### Layers and data annotation
The bulk of the data is stored in `TemporalDataFrames`, themselves stacked up in the **layers** dictionnary. Data is
thus represented as `observations` x `variables` x `time points` dataframes. Observation indices can either be unique
at each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple
times).
![TemporalDataFrames, one with unique observations and one with identical observations at all timepoints](docs/images/TDF_diagram.png)
Three additional dataframes are used for annotating the observations (**obs**), variables (**var**) and timepoints
(**timepoints**).
### Multi-dimension annotation
There are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to
be stored). These are the `obsm` and `varm` mappings, which respectively contain TemporalDataFrames and pandas
DataFrames.
> 🟢 **Example**
> You can store PCA or UMAP coordinates in obsm.
### Pairwise annotation
The last two mappings (`obsp` and `varp`) contain pariwise annotations : data in square matrices of `obs` x `obs`
or `var` x `var`.
> 🟢 **Example**
> You can store distance values between observations in obsp.
## 📀 Installation
VData requires Python 3.9+
### pip installation (stable)
```shell
pip install vdata
```
### using git (latest)
```shell
git clone git@github.com:Vidium/vdata.git
```
## 📑 Documentation
See the complete documentation at [INCOMING].
Read the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297
## 🖋 Citation
You can cite the **VData** pre-print as :
> VData: Temporally annotated data manipulation and storage
>
> Matteo Bouvier, Arnaud Bonnaffoux
>
> bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297
Raw data
{
"_id": null,
"home_page": null,
"name": "vdata",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Matteo Bouvier",
"author_email": "m.bouvier@vidium-solutions.com",
"download_url": "https://files.pythonhosted.org/packages/d3/2d/406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c/vdata-0.2.3.tar.gz",
"platform": null,
"description": "# \ud83d\uddc2 VData\n\n**VData** is used for storing and manipulating multivariate observations of timestamped data.\n\n![The VData structure](docs/images/vdata_diagram.png)\n\nIt extends the [AnnData](https://anndata.readthedocs.io/en/latest/) object by adding the **time** dimension.\n\n**Example** : The VData object allows to efficiently store information about cells (**observations**), whose gene \nexpression (**variables**) is measured over multiple **time points**. It is build around layers (.layers). Each layer \nis a 3D matrix of : `obs` x `var` x `time points`. Around those layers, DataFrames allow to describe variables and \ntime points, while custom TemporalDataFrames describe observations.\n\nThe **uns** dictionnary is used to store additional unstructure data.\n\nMore generally, VData objects can be used to store any timestamped datasets where annotation of observations and\nvariables is required.\n\n## \ud83c\udf1f Features\n\n- complete Python reimplementation based on [ h5py ](https://docs.h5py.org/en/latest)\n- very fast loading of any dataset\n- memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.\n- explicit handling of timestamped data, especially suited for simulated single-cell datesets\n- complete compatibility with the [ scverse ](https://scverse.org/) ecosystem \n\n## \ud83d\udc41 Overview\n\n### General\n\nThe `vdata` library exposes the actual **VData** object alongside with the **TemporalDataFrame** object which extends\nthe common `pandas.DataFrame` to a third `time` axis.\n\n**VData** objects can be created from in-RAM objects such as `AnnData`, `TemporalDataFrame`, `pandas.DataFrame` or \nmappings of `<layer name>`:`DataFrame`. \n\nIt is also possible to load data from a `VData` or an `AnnData` saved as a \n[ hdf5 website ](https://www.hdfgroup.org/solutions/hdf5/) file or in `csv` format.\n\n> \ud83d\udd35 **Note**\n> An important distinction with `AnnData` is that when a **VData** is backed on (read from) an hdf5 file, the *whole* \n> object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small \namounts of RAM and will be very fast to read.\n\n### Layers and data annotation\n\nThe bulk of the data is stored in `TemporalDataFrames`, themselves stacked up in the **layers** dictionnary. Data is\nthus represented as `observations` x `variables` x `time points` dataframes. Observation indices can either be unique \nat each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple \ntimes).\n\n![TemporalDataFrames, one with unique observations and one with identical observations at all timepoints](docs/images/TDF_diagram.png)\n\nThree additional dataframes are used for annotating the observations (**obs**), variables (**var**) and timepoints\n(**timepoints**).\n\n### Multi-dimension annotation\n\nThere are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to\nbe stored). These are the `obsm` and `varm` mappings, which respectively contain TemporalDataFrames and pandas \nDataFrames.\n\n> \ud83d\udfe2 **Example**\n> You can store PCA or UMAP coordinates in obsm.\n\n### Pairwise annotation\n\nThe last two mappings (`obsp` and `varp`) contain pariwise annotations : data in square matrices of `obs` x `obs` \nor `var` x `var`.\n\n> \ud83d\udfe2 **Example** \n> You can store distance values between observations in obsp.\n\n## \ud83d\udcc0 Installation\n\nVData requires Python 3.9+\n\n### pip installation (stable)\n\n```shell\npip install vdata\n```\n\n### using git (latest)\n\n```shell\ngit clone git@github.com:Vidium/vdata.git\n```\n\n## \ud83d\udcd1 Documentation\n\nSee the complete documentation at [INCOMING].\n\nRead the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297\n\n## \ud83d\udd8b Citation\n\nYou can cite the **VData** pre-print as :\n\n> VData: Temporally annotated data manipulation and storage\n> \n> Matteo Bouvier, Arnaud Bonnaffoux\n> \n> bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297 \n",
"bugtrack_url": null,
"license": "LICENSE",
"summary": "Annotated multivariate observation of timestamped data",
"version": "0.2.3",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6559c36d68f46680910a0869b873b23739018dc9700befb8f59d8c2835c20f9a",
"md5": "eae97c93654a9c04ecdb5f9321b68c2c",
"sha256": "1256a118dcc8521ac0ddaadc06f756481f1143198277e81aa7226de441029080"
},
"downloads": -1,
"filename": "vdata-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eae97c93654a9c04ecdb5f9321b68c2c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 104010,
"upload_time": "2024-09-02T14:51:52",
"upload_time_iso_8601": "2024-09-02T14:51:52.102260Z",
"url": "https://files.pythonhosted.org/packages/65/59/c36d68f46680910a0869b873b23739018dc9700befb8f59d8c2835c20f9a/vdata-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d32d406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c",
"md5": "06457929b62e029389d1aba2304953d5",
"sha256": "a9a2cb1961c3762d461b2af4a2d0fa010b503e43b3fe979f1b85b7a9999fbee1"
},
"downloads": -1,
"filename": "vdata-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "06457929b62e029389d1aba2304953d5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 78814,
"upload_time": "2024-09-02T14:51:54",
"upload_time_iso_8601": "2024-09-02T14:51:54.463874Z",
"url": "https://files.pythonhosted.org/packages/d3/2d/406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c/vdata-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-02 14:51:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "vdata"
}