vdata


Namevdata JSON
Version 0.2.3 PyPI version JSON
download
home_pageNone
SummaryAnnotated multivariate observation of timestamped data
upload_time2024-09-02 14:51:54
maintainerNone
docs_urlNone
authorMatteo Bouvier
requires_python<4.0,>=3.10
licenseLICENSE
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🗂 VData

**VData** is used for storing and manipulating multivariate observations of timestamped data.

![The VData structure](docs/images/vdata_diagram.png)

It extends the [AnnData](https://anndata.readthedocs.io/en/latest/) object by adding the **time** dimension.

**Example** : The VData object allows to efficiently store information about cells (**observations**), whose gene 
expression (**variables**) is measured over multiple **time points**. It is build around layers (.layers). Each layer 
is a 3D matrix of : `obs` x `var` x `time points`. Around those layers, DataFrames allow to describe variables and 
time points, while custom TemporalDataFrames describe observations.

The **uns** dictionnary is used to store additional unstructure data.

More generally, VData objects can be used to store any timestamped datasets where annotation of observations and
variables is required.

## 🌟 Features

- complete Python reimplementation based on [ h5py ](https://docs.h5py.org/en/latest)
- very fast loading of any dataset
- memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.
- explicit handling of timestamped data, especially suited for simulated single-cell datesets
- complete compatibility with the [ scverse ](https://scverse.org/) ecosystem 

## 👁 Overview

### General

The `vdata` library exposes the actual **VData** object alongside with the **TemporalDataFrame** object which extends
the common `pandas.DataFrame` to a third `time` axis.

**VData** objects can be created from in-RAM objects such as `AnnData`, `TemporalDataFrame`, `pandas.DataFrame` or 
mappings of `<layer name>`:`DataFrame`. 

It is also possible to load data from a `VData` or an `AnnData` saved as a 
[ hdf5 website ](https://www.hdfgroup.org/solutions/hdf5/) file or in `csv` format.

> 🔵 **Note**
> An important distinction with `AnnData` is that when a **VData** is backed on (read from) an hdf5 file, the *whole* 
> object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small 
amounts of RAM and will be very fast to read.

### Layers and data annotation

The bulk of the data is stored in `TemporalDataFrames`, themselves stacked up in the **layers** dictionnary. Data is
thus represented as `observations` x `variables` x `time points` dataframes. Observation indices can either be unique 
at each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple 
times).

![TemporalDataFrames, one with unique observations and one with identical observations at all timepoints](docs/images/TDF_diagram.png)

Three additional dataframes are used for annotating the observations (**obs**), variables (**var**) and timepoints
(**timepoints**).

### Multi-dimension annotation

There are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to
be stored). These are the `obsm` and `varm` mappings, which respectively contain TemporalDataFrames and pandas 
DataFrames.

> 🟢 **Example**
> You can store PCA or UMAP coordinates in obsm.

### Pairwise annotation

The last two mappings (`obsp` and `varp`) contain pariwise annotations : data in square matrices of `obs` x `obs` 
or `var` x `var`.

> 🟢 **Example** 
> You can store distance values between observations in obsp.

## 📀 Installation

VData requires Python 3.9+

### pip installation (stable)

```shell
pip install vdata
```

### using git (latest)

```shell
git clone git@github.com:Vidium/vdata.git
```

## 📑 Documentation

See the complete documentation at [INCOMING].

Read the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297

## 🖋 Citation

You can cite the **VData** pre-print as :

> VData: Temporally annotated data manipulation and storage
> 
> Matteo Bouvier, Arnaud Bonnaffoux
> 
> bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297 

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Matteo Bouvier",
    "author_email": "m.bouvier@vidium-solutions.com",
    "download_url": "https://files.pythonhosted.org/packages/d3/2d/406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c/vdata-0.2.3.tar.gz",
    "platform": null,
    "description": "# \ud83d\uddc2 VData\n\n**VData** is used for storing and manipulating multivariate observations of timestamped data.\n\n![The VData structure](docs/images/vdata_diagram.png)\n\nIt extends the [AnnData](https://anndata.readthedocs.io/en/latest/) object by adding the **time** dimension.\n\n**Example** : The VData object allows to efficiently store information about cells (**observations**), whose gene \nexpression (**variables**) is measured over multiple **time points**. It is build around layers (.layers). Each layer \nis a 3D matrix of : `obs` x `var` x `time points`. Around those layers, DataFrames allow to describe variables and \ntime points, while custom TemporalDataFrames describe observations.\n\nThe **uns** dictionnary is used to store additional unstructure data.\n\nMore generally, VData objects can be used to store any timestamped datasets where annotation of observations and\nvariables is required.\n\n## \ud83c\udf1f Features\n\n- complete Python reimplementation based on [ h5py ](https://docs.h5py.org/en/latest)\n- very fast loading of any dataset\n- memory-efficient data manipulation (<1GB) even for datasets of hundreds of GB.\n- explicit handling of timestamped data, especially suited for simulated single-cell datesets\n- complete compatibility with the [ scverse ](https://scverse.org/) ecosystem \n\n## \ud83d\udc41 Overview\n\n### General\n\nThe `vdata` library exposes the actual **VData** object alongside with the **TemporalDataFrame** object which extends\nthe common `pandas.DataFrame` to a third `time` axis.\n\n**VData** objects can be created from in-RAM objects such as `AnnData`, `TemporalDataFrame`, `pandas.DataFrame` or \nmappings of `<layer name>`:`DataFrame`. \n\nIt is also possible to load data from a `VData` or an `AnnData` saved as a \n[ hdf5 website ](https://www.hdfgroup.org/solutions/hdf5/) file or in `csv` format.\n\n> \ud83d\udd35 **Note**\n> An important distinction with `AnnData` is that when a **VData** is backed on (read from) an hdf5 file, the *whole* \n> object is only loaded on-demand and by small chunks of data. As a result, VData objects will always consume small \namounts of RAM and will be very fast to read.\n\n### Layers and data annotation\n\nThe bulk of the data is stored in `TemporalDataFrames`, themselves stacked up in the **layers** dictionnary. Data is\nthus represented as `observations` x `variables` x `time points` dataframes. Observation indices can either be unique \nat each time point or strictly the same (e.g. to store simulated data where a single cell can be recorded multiple \ntimes).\n\n![TemporalDataFrames, one with unique observations and one with identical observations at all timepoints](docs/images/TDF_diagram.png)\n\nThree additional dataframes are used for annotating the observations (**obs**), variables (**var**) and timepoints\n(**timepoints**).\n\n### Multi-dimension annotation\n\nThere are two additional mappings for storing multi-dimensional annotations (i.e. that require more than one column to\nbe stored). These are the `obsm` and `varm` mappings, which respectively contain TemporalDataFrames and pandas \nDataFrames.\n\n> \ud83d\udfe2 **Example**\n> You can store PCA or UMAP coordinates in obsm.\n\n### Pairwise annotation\n\nThe last two mappings (`obsp` and `varp`) contain pariwise annotations : data in square matrices of `obs` x `obs` \nor `var` x `var`.\n\n> \ud83d\udfe2 **Example** \n> You can store distance values between observations in obsp.\n\n## \ud83d\udcc0 Installation\n\nVData requires Python 3.9+\n\n### pip installation (stable)\n\n```shell\npip install vdata\n```\n\n### using git (latest)\n\n```shell\ngit clone git@github.com:Vidium/vdata.git\n```\n\n## \ud83d\udcd1 Documentation\n\nSee the complete documentation at [INCOMING].\n\nRead the VData article at https://www.biorxiv.org/content/10.1101/2023.08.29.555297\n\n## \ud83d\udd8b Citation\n\nYou can cite the **VData** pre-print as :\n\n> VData: Temporally annotated data manipulation and storage\n> \n> Matteo Bouvier, Arnaud Bonnaffoux\n> \n> bioRxiv 2023.08.29.555297; doi: https://doi.org/10.1101/2023.08.29.555297 \n",
    "bugtrack_url": null,
    "license": "LICENSE",
    "summary": "Annotated multivariate observation of timestamped data",
    "version": "0.2.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6559c36d68f46680910a0869b873b23739018dc9700befb8f59d8c2835c20f9a",
                "md5": "eae97c93654a9c04ecdb5f9321b68c2c",
                "sha256": "1256a118dcc8521ac0ddaadc06f756481f1143198277e81aa7226de441029080"
            },
            "downloads": -1,
            "filename": "vdata-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eae97c93654a9c04ecdb5f9321b68c2c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 104010,
            "upload_time": "2024-09-02T14:51:52",
            "upload_time_iso_8601": "2024-09-02T14:51:52.102260Z",
            "url": "https://files.pythonhosted.org/packages/65/59/c36d68f46680910a0869b873b23739018dc9700befb8f59d8c2835c20f9a/vdata-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d32d406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c",
                "md5": "06457929b62e029389d1aba2304953d5",
                "sha256": "a9a2cb1961c3762d461b2af4a2d0fa010b503e43b3fe979f1b85b7a9999fbee1"
            },
            "downloads": -1,
            "filename": "vdata-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "06457929b62e029389d1aba2304953d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 78814,
            "upload_time": "2024-09-02T14:51:54",
            "upload_time_iso_8601": "2024-09-02T14:51:54.463874Z",
            "url": "https://files.pythonhosted.org/packages/d3/2d/406a8c5f4f72a3d78d9a0f1db49adb645e0bdd704bf3dce98503f1664d1c/vdata-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-02 14:51:54",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "vdata"
}
        
Elapsed time: 1.86250s