inferelator-velocity


Nameinferelator-velocity JSON
Version 1.2.0 PyPI version JSON
download
home_pagehttps://github.com/flatironinstitute/inferelator-velocity
SummaryInferelator-Velocity Calcualtes Dynamic Latent Parameters
upload_time2024-08-20 16:43:46
maintainerChris Jackson
docs_urlNone
authorChris Jackson
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements numpy scipy pandas scikit-learn anndata scanpy joblib inferelator tqdm matplotlib leidenalg scself
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # inferelator-velocity

[![PyPI version](https://badge.fury.io/py/inferelator-velocity.svg)](https://badge.fury.io/py/inferelator-velocity)
[![CI](https://github.com/flatironinstitute/inferelator-velocity/actions/workflows/python-package.yml/badge.svg)](https://github.com/flatironinstitute/inferelator-velocity/actions/workflows/python-package.yml/)
[![codecov](https://codecov.io/gh/flatironinstitute/inferelator-velocity/branch/main/graph/badge.svg)](https://codecov.io/gh/flatironinstitute/inferelator-velocity)

This is a package that calculates dynamic (time-dependent) latent parameters from 
single-cell expression data and associated experimental metadata or bulk RNA-seq data.
It is designed to create data that is compatible with the 
[inferelator](https://github.com/flatironinstitute/inferelator) or 
[supirfactor-dynamical](https://github.com/GreshamLab/supirfactor-dynamical) packages.

### Installation

Install this package using the standard python package manager `python -m pip install inferelator_velocity`.
It depends on standard python scientific computing packages (e.g. scipy, numpy, scikit-learn, pandas),
and on the AnnData data container package.

If you intend to use large sparse matrices (as is common for single-cell data), it is advisable to install
the intel math kernel library (e.g. with `conda install mkl`) and the python `sparse_dot_mkl` package with
`python -m pip install sparse_dot_mkl` to accelerate sparse matrix operations.

### Usage

#### Assigning genes to new time-dependent transcriptional programs

Load single-cell data into an [https://anndata.readthedocs.io/en/latest/](AnnData) object.
Call `program_select` on the raw, unprocessed integer count data, setting `n_programs` to
the expected number of distinct time-dependent transcriptional programs.

```
import anndata as ad
from inferelator_velocity import program_select

adata = ad.read(FILE_NAME)

program_select(
    adata,          # Anndata object
    layer='counts', # Layer with unprocessed integer count data
    n_programs=2,   # Number of transcriptional programs expected
    verbose=True    # Print additional status messages
)
```

This function will return the same anndata object with new attributes:

```
.var['leiden']: Leiden cluster ID
.var['programs']: Program ID
.uns['programs']: {
    'metric': Metric name,
    'leiden_correlation': Absolute value of spearman rho
        between PC1 of each leiden cluster,
    'metric_genes': Gene labels for distance matrix
    '{metric}_distance': Distance matrix for {metric},
    'cluster_program_map': Dict mapping gene clusters to gene programs,
    'program_PCs_variance_ratio': Variance explained by program PCs,
    'n_comps': Number of PCs selected by molecular crossvalidation,
    'molecular_cv_loss': Loss values for molecular crossvalidation
}
```

#### Assigining genes to existing time-dependent transcriptional programs

Call `assign_genes_to_programs` on an anndata object which `program_select` has already
been run on. This will assign any transcripts to the existing programs based on
mutual information. It is advisable to pass `default_program`, identifying the
transcriptional program to assign transcripts that have low mutual information with
all identified programs (these transcripts are often noise-driven and are best assigned
to whichever program best represents experimental wall clock time).

```
import anndata as ad
from inferelator_velocity import assign_genes_to_programs

adata = ad.read(FILE_NAME)

adata.var['programs'] = assign_genes_to_programs(
    adata,                      # Anndata object
    layer='counts',             # Layer with unprocessed integer count data
    default_program='0',        # 'Default' transcriptional program for low-MI transcripts
    default_threshold=0.1,      # Threshold for low-MI assignment in bits
    verbose=True                # Print additional status message
)
```

This function will return program labels for all transcripts without making
changes to the anndata object; they must be explicitly assigned to an attribute.

#### Assigning time values to individual observations

Call `program_times` on an anndata object which `program_select` has already
been run on. This will embed observations into a low-dimensional space, different
for each transcriptional program, find user-defined anchoring points with real-world
time values, and project cells onto that real-world time trajectory.

```
import anndata as ad
from inferelator_velocity import program_times

adata = ad.read(FILE_NAME)

# Dict that maps programs to experimental or inferred cell groups
# which are stored in a column of the `adata.obs` attribute 

time_metadata = {
    '0': 'Experiment_Obs_Column',
    '1': 'Cell_Cycle_Obs_Column'
}

# Dict that orders cell groups and defines the average time value
# for each group. Each entry is of the format
# {'CLUSTER_ID': ('NEXT_CLUSTER_ID', time_at_first_centroid, time_at_next_centroid)}
# and the overall trajectory may be linear or circular

time_order = {
    '0': {
        '1': ('2', 0, 20),
        '2': ('3', 20, 40),
        '3': ('4', 40, 60)
    },
    '1': {
        'M-G1': ('G1', 7, 22.5),
        'G1': ('S', 22.5, 39.5),
        'S': ('G2', 39.5, 56.5),
        'G2': ('M', 56.5, 77.5), 
        'M': ('M-G1', 77.5, 95)
    }
}

# Optional dict to identify programs where times should wrap
# because the trajectory is circular (like the cell cycle)

time_wrapping = {
    '0': None,
    '1': 88.0
}

program_times(
    adata,                      # Anndata object
    time_metadata,              # Group metadata columns in obs
    time_order,                 # Group ordering and anchoring times
    layer='counts',             # Layer with unprocessed integer count data
    wrap_time=time_wrapping,    # Program wrap times for circular trajectories
    verbose=True                # Print additional status message
)
```

This function will return the same anndata object with each transcriptional
program put into anndata attributes:

```
.obs['program_0_time']: Assigned time value
.obsm['program_0_pca']: Low-dimensional projection values
```

#### Embedding k-nearest neighbors graph

Call `global_graph` on an anndata object. The data provided to this function
should be standardized. The noise2self algorithm will select `k` and `n_pcs`
for the k-NN graph.

```
import anndata as ad
import scanpy as sc
from inferelator_velocity import global_graph

adata = ad.read(FILE_NAME)

sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

global_graph(
    adata,          # Anndata object
    layer="X",      # Layer with standardized float count data
    verbose=True    # Print additional status message
)
```

This function will return the same anndata object with a k-nn graph
added to attributes.

```
.obsp['noise2self_distance_graph']: k-NN graph
.uns['noise2self']: {
    'npcs': Number of principal components used to build distance graph,
    'neighbors': Number of neighbors (k) used to build distance graph
}
```

#### Estimating RNA velocity

Call `calc_velocity` on an anndata object. The data provided to this function
should be standardized to depth but not otherwise transformed, so that the velocity
units are interpretable. It may or may not be helpful to denoise count data prior
to calling this function. This requires a k-NN graph and calculated per-observation
time values.

```
import anndata as ad
import scanpy as sc
from inferelator_velocity import calc_velocity

adata = ad.read(FILE_NAME)
sc.pp.normalize_total(adata)

adata.layers['velocity'] = calc_velocity(
    adata.X,                                # Standardized float count data
    adata.obs['program_0_time'].values,     # Assigned time values
    adata.obsp['noise2self'],               # k-NN graph
    wrap_time=None                          # Wrap times for circular trajectories
)
```

This function will return RNA rate of change for all transcripts without making
changes to the anndata object; they must be explicitly assigned to an attribute.

#### Bounded estimate of RNA decay

Call `calc_decay_sliding_windows` on an anndata object. This requires times from
`program_times` and velocities from `calc_velocity`.

```
import anndata as ad
import numpy as np
from inferelator_velocity import calc_decay_sliding_windows

adata = ad.read(FILE_NAME)

_decay_bound = calc_decay_sliding_windows(
    adata.X,                            # Standardized float count data
    adata.layers['velocity'],           # Velocity data
    adata.obs['program_0_time'].values, # Assigned time values
    centers=np.arange(0, 60),           # Centers for sliding window
    width=1.                            # Width of each window
)

adata.varm['decay_rate'] = np.array(_decay_bound[0]).T
adata.varm['decay_rate_standard_error'] = np.array(_decay_bound[1]).T
```

This function will return decay rate, standard error of decay rate, estimate
of maximum transcription, and the centers for each sliding window.
They must be explicitly assigned to an attribute.

#### Denoising data

Call `denoise` on an anndata object. This requires a graph from `global_graph`.

```
import anndata as ad
import numpy as np
from inferelator_velocity import denoise

adata = ad.read(FILE_NAME)

def denoise(
    adata,                   # Anndata object
    layer='X',              # Layer with data to be denoised
)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/flatironinstitute/inferelator-velocity",
    "name": "inferelator-velocity",
    "maintainer": "Chris Jackson",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "cj59@nyu.edu",
    "keywords": null,
    "author": "Chris Jackson",
    "author_email": "cj59@nyu.edu",
    "download_url": "https://files.pythonhosted.org/packages/e4/52/fb495b41b3e24a7437d0835880375365f0126b6059ad1f96d098efbfad63/inferelator_velocity-1.2.0.tar.gz",
    "platform": null,
    "description": "# inferelator-velocity\n\n[![PyPI version](https://badge.fury.io/py/inferelator-velocity.svg)](https://badge.fury.io/py/inferelator-velocity)\n[![CI](https://github.com/flatironinstitute/inferelator-velocity/actions/workflows/python-package.yml/badge.svg)](https://github.com/flatironinstitute/inferelator-velocity/actions/workflows/python-package.yml/)\n[![codecov](https://codecov.io/gh/flatironinstitute/inferelator-velocity/branch/main/graph/badge.svg)](https://codecov.io/gh/flatironinstitute/inferelator-velocity)\n\nThis is a package that calculates dynamic (time-dependent) latent parameters from \nsingle-cell expression data and associated experimental metadata or bulk RNA-seq data.\nIt is designed to create data that is compatible with the \n[inferelator](https://github.com/flatironinstitute/inferelator) or \n[supirfactor-dynamical](https://github.com/GreshamLab/supirfactor-dynamical) packages.\n\n### Installation\n\nInstall this package using the standard python package manager `python -m pip install inferelator_velocity`.\nIt depends on standard python scientific computing packages (e.g. scipy, numpy, scikit-learn, pandas),\nand on the AnnData data container package.\n\nIf you intend to use large sparse matrices (as is common for single-cell data), it is advisable to install\nthe intel math kernel library (e.g. with `conda install mkl`) and the python `sparse_dot_mkl` package with\n`python -m pip install sparse_dot_mkl` to accelerate sparse matrix operations.\n\n### Usage\n\n#### Assigning genes to new time-dependent transcriptional programs\n\nLoad single-cell data into an [https://anndata.readthedocs.io/en/latest/](AnnData) object.\nCall `program_select` on the raw, unprocessed integer count data, setting `n_programs` to\nthe expected number of distinct time-dependent transcriptional programs.\n\n```\nimport anndata as ad\nfrom inferelator_velocity import program_select\n\nadata = ad.read(FILE_NAME)\n\nprogram_select(\n    adata,          # Anndata object\n    layer='counts', # Layer with unprocessed integer count data\n    n_programs=2,   # Number of transcriptional programs expected\n    verbose=True    # Print additional status messages\n)\n```\n\nThis function will return the same anndata object with new attributes:\n\n```\n.var['leiden']: Leiden cluster ID\n.var['programs']: Program ID\n.uns['programs']: {\n    'metric': Metric name,\n    'leiden_correlation': Absolute value of spearman rho\n        between PC1 of each leiden cluster,\n    'metric_genes': Gene labels for distance matrix\n    '{metric}_distance': Distance matrix for {metric},\n    'cluster_program_map': Dict mapping gene clusters to gene programs,\n    'program_PCs_variance_ratio': Variance explained by program PCs,\n    'n_comps': Number of PCs selected by molecular crossvalidation,\n    'molecular_cv_loss': Loss values for molecular crossvalidation\n}\n```\n\n#### Assigining genes to existing time-dependent transcriptional programs\n\nCall `assign_genes_to_programs` on an anndata object which `program_select` has already\nbeen run on. This will assign any transcripts to the existing programs based on\nmutual information. It is advisable to pass `default_program`, identifying the\ntranscriptional program to assign transcripts that have low mutual information with\nall identified programs (these transcripts are often noise-driven and are best assigned\nto whichever program best represents experimental wall clock time).\n\n```\nimport anndata as ad\nfrom inferelator_velocity import assign_genes_to_programs\n\nadata = ad.read(FILE_NAME)\n\nadata.var['programs'] = assign_genes_to_programs(\n    adata,                      # Anndata object\n    layer='counts',             # Layer with unprocessed integer count data\n    default_program='0',        # 'Default' transcriptional program for low-MI transcripts\n    default_threshold=0.1,      # Threshold for low-MI assignment in bits\n    verbose=True                # Print additional status message\n)\n```\n\nThis function will return program labels for all transcripts without making\nchanges to the anndata object; they must be explicitly assigned to an attribute.\n\n#### Assigning time values to individual observations\n\nCall `program_times` on an anndata object which `program_select` has already\nbeen run on. This will embed observations into a low-dimensional space, different\nfor each transcriptional program, find user-defined anchoring points with real-world\ntime values, and project cells onto that real-world time trajectory.\n\n```\nimport anndata as ad\nfrom inferelator_velocity import program_times\n\nadata = ad.read(FILE_NAME)\n\n# Dict that maps programs to experimental or inferred cell groups\n# which are stored in a column of the `adata.obs` attribute \n\ntime_metadata = {\n    '0': 'Experiment_Obs_Column',\n    '1': 'Cell_Cycle_Obs_Column'\n}\n\n# Dict that orders cell groups and defines the average time value\n# for each group. Each entry is of the format\n# {'CLUSTER_ID': ('NEXT_CLUSTER_ID', time_at_first_centroid, time_at_next_centroid)}\n# and the overall trajectory may be linear or circular\n\ntime_order = {\n    '0': {\n        '1': ('2', 0, 20),\n        '2': ('3', 20, 40),\n        '3': ('4', 40, 60)\n    },\n    '1': {\n        'M-G1': ('G1', 7, 22.5),\n        'G1': ('S', 22.5, 39.5),\n        'S': ('G2', 39.5, 56.5),\n        'G2': ('M', 56.5, 77.5), \n        'M': ('M-G1', 77.5, 95)\n    }\n}\n\n# Optional dict to identify programs where times should wrap\n# because the trajectory is circular (like the cell cycle)\n\ntime_wrapping = {\n    '0': None,\n    '1': 88.0\n}\n\nprogram_times(\n    adata,                      # Anndata object\n    time_metadata,              # Group metadata columns in obs\n    time_order,                 # Group ordering and anchoring times\n    layer='counts',             # Layer with unprocessed integer count data\n    wrap_time=time_wrapping,    # Program wrap times for circular trajectories\n    verbose=True                # Print additional status message\n)\n```\n\nThis function will return the same anndata object with each transcriptional\nprogram put into anndata attributes:\n\n```\n.obs['program_0_time']: Assigned time value\n.obsm['program_0_pca']: Low-dimensional projection values\n```\n\n#### Embedding k-nearest neighbors graph\n\nCall `global_graph` on an anndata object. The data provided to this function\nshould be standardized. The noise2self algorithm will select `k` and `n_pcs`\nfor the k-NN graph.\n\n```\nimport anndata as ad\nimport scanpy as sc\nfrom inferelator_velocity import global_graph\n\nadata = ad.read(FILE_NAME)\n\nsc.pp.normalize_total(adata)\nsc.pp.log1p(adata)\n\nglobal_graph(\n    adata,          # Anndata object\n    layer=\"X\",      # Layer with standardized float count data\n    verbose=True    # Print additional status message\n)\n```\n\nThis function will return the same anndata object with a k-nn graph\nadded to attributes.\n\n```\n.obsp['noise2self_distance_graph']: k-NN graph\n.uns['noise2self']: {\n    'npcs': Number of principal components used to build distance graph,\n    'neighbors': Number of neighbors (k) used to build distance graph\n}\n```\n\n#### Estimating RNA velocity\n\nCall `calc_velocity` on an anndata object. The data provided to this function\nshould be standardized to depth but not otherwise transformed, so that the velocity\nunits are interpretable. It may or may not be helpful to denoise count data prior\nto calling this function. This requires a k-NN graph and calculated per-observation\ntime values.\n\n```\nimport anndata as ad\nimport scanpy as sc\nfrom inferelator_velocity import calc_velocity\n\nadata = ad.read(FILE_NAME)\nsc.pp.normalize_total(adata)\n\nadata.layers['velocity'] = calc_velocity(\n    adata.X,                                # Standardized float count data\n    adata.obs['program_0_time'].values,     # Assigned time values\n    adata.obsp['noise2self'],               # k-NN graph\n    wrap_time=None                          # Wrap times for circular trajectories\n)\n```\n\nThis function will return RNA rate of change for all transcripts without making\nchanges to the anndata object; they must be explicitly assigned to an attribute.\n\n#### Bounded estimate of RNA decay\n\nCall `calc_decay_sliding_windows` on an anndata object. This requires times from\n`program_times` and velocities from `calc_velocity`.\n\n```\nimport anndata as ad\nimport numpy as np\nfrom inferelator_velocity import calc_decay_sliding_windows\n\nadata = ad.read(FILE_NAME)\n\n_decay_bound = calc_decay_sliding_windows(\n    adata.X,                            # Standardized float count data\n    adata.layers['velocity'],           # Velocity data\n    adata.obs['program_0_time'].values, # Assigned time values\n    centers=np.arange(0, 60),           # Centers for sliding window\n    width=1.                            # Width of each window\n)\n\nadata.varm['decay_rate'] = np.array(_decay_bound[0]).T\nadata.varm['decay_rate_standard_error'] = np.array(_decay_bound[1]).T\n```\n\nThis function will return decay rate, standard error of decay rate, estimate\nof maximum transcription, and the centers for each sliding window.\nThey must be explicitly assigned to an attribute.\n\n#### Denoising data\n\nCall `denoise` on an anndata object. This requires a graph from `global_graph`.\n\n```\nimport anndata as ad\nimport numpy as np\nfrom inferelator_velocity import denoise\n\nadata = ad.read(FILE_NAME)\n\ndef denoise(\n    adata,                   # Anndata object\n    layer='X',              # Layer with data to be denoised\n)\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Inferelator-Velocity Calcualtes Dynamic Latent Parameters",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/flatironinstitute/inferelator-velocity"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06888f45beabec52785bae93ef913345ee55c6d5a5368de50a3005dd3df5e2c9",
                "md5": "9a4efe84adc861f6cc4be1b59fa38066",
                "sha256": "523b59c16d5edc9ad693772a96db9de826fce8f3347452445fb1a69781e8ad3c"
            },
            "downloads": -1,
            "filename": "inferelator_velocity-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9a4efe84adc861f6cc4be1b59fa38066",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 53013,
            "upload_time": "2024-08-20T16:43:44",
            "upload_time_iso_8601": "2024-08-20T16:43:44.882999Z",
            "url": "https://files.pythonhosted.org/packages/06/88/8f45beabec52785bae93ef913345ee55c6d5a5368de50a3005dd3df5e2c9/inferelator_velocity-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e452fb495b41b3e24a7437d0835880375365f0126b6059ad1f96d098efbfad63",
                "md5": "ba117f06babf963a6ee384ff4375ee12",
                "sha256": "b0a7ac07a00f20ee0baf618f283d4838eee719c4b726633116db0ebbfce296e7"
            },
            "downloads": -1,
            "filename": "inferelator_velocity-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ba117f06babf963a6ee384ff4375ee12",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 44349,
            "upload_time": "2024-08-20T16:43:46",
            "upload_time_iso_8601": "2024-08-20T16:43:46.094248Z",
            "url": "https://files.pythonhosted.org/packages/e4/52/fb495b41b3e24a7437d0835880375365f0126b6059ad1f96d098efbfad63/inferelator_velocity-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-20 16:43:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "flatironinstitute",
    "github_project": "inferelator-velocity",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "anndata",
            "specs": [
                [
                    ">=",
                    "0.8"
                ]
            ]
        },
        {
            "name": "scanpy",
            "specs": []
        },
        {
            "name": "joblib",
            "specs": []
        },
        {
            "name": "inferelator",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "leidenalg",
            "specs": []
        },
        {
            "name": "scself",
            "specs": []
        }
    ],
    "lcname": "inferelator-velocity"
}
        
Elapsed time: 0.32077s