griddify


Namegriddify JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/ersilia-os/griddify
SummaryGriddify high-dimensional tabular data for easy visualization and deep learning
upload_time2024-07-16 12:42:11
maintainerNone
docs_urlNone
authorMiquel Duran-Frigola
requires_python>=3.6
licenseMIT
keywords data-visualization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Griddify
Redistribute tabular data into a grid for easy visualization and image-based deep learning. This library is greatly inspired by the excellent [MolMap](https://github.com/shenwanxiang/bidd-molmap) library.

## Installation

```bash
git clone https://github.com/ersilia-os/griddify.git
cd griddify
pip install -e .
```

Note that you may have to install a C++ compiler. You can just use conda for that:

```bash
conda install -c conda-forge cxx-compiler
```

## Step by step

### Get a multidimensional dataset and preprocess it

In this example, we will use a dataset of 200 physicochemical [descriptors](https://www.rdkit.org/docs/source/rdkit.Chem.Descriptors.html) calculated for about 10k compounds. You can get these data with the following command.

```python
from griddify import datasets

data = datasets.get_compound_descriptors()
```

It is important that you preprocess your data (impute missing values, normalize, etc.). We provide functionality to do so.

```python
from griddify import Preprocessing

pp = Preprocessing()
pp.fit(data)
data = pp.transform(data)
```

### Create a 2D cloud of data features

Start by calculating distances between features.

```python
from griddify import FeatureDistances

fd = FeatureDistances(metric="cosine").calculate(data)
```

You can now obtain a 2D cloud of your data features. By default, [UMAP](https://umap-learn.readthedocs.io/en/latest/) is used.

```python
from griddify import Tabular2Cloud

tc = Tabular2Cloud()
tc.fit(fd)
Xc = tc.transform(fd)
```

It is always good to inspect the resulting projection. The cloud contains as many points as features exist in your dataset.

```python
from griddify.plots import cloud_plot

cloud_plot(Xc)
```

### Rearrange the 2D cloud onto a grid

Distribute cloud points on a grid using a [linear assignment](https://github.com/gatagat/lap) algorithm.

```python
from griddify import Cloud2Grid

cg = Cloud2Grid()
cg.fit(Xc)
Xg = cg.transform(Xc)
```

You can check the rearrangement with an arrows plot.
```python
from griddify.plots import arrows_plot

arrows_plot(Xc, Xg)
```

To continue with the next steps, it is actually more convenient to get mappings as integers. The following method gives you the size of the grid as well.

```python
mappings, side = cg.get_mappings(Xc)
```

### Rearrange your flat data points into grids

Let's go back to the original tabular data. We want to transform the input data, where each data sample is represented with a one-dimensional array, into an output data where each sample is represented with an image (i.e. a two-dimensional grid). Please ensure that data are normalize or scaled.

```python
from griddify import Flat2Grid

fg = Flat2Grid(mappings, side)
Xi = fg.transform(data)
```

Explore one sample.

```python
from griddify.plots import grid_plot

grid_plot(Xi[0])
```

## Full pipeline

You can run the full pipeline described above in only a few lines of code.

```python
from griddify import datasets
from griddify import Griddify

data = datasets.get_compound_descriptors()

gf = Griddify(preprocess=True)
gf.fit(data)
Xi = gf.transform(data)
```

You can find more examples as Jupyter Notebooks in the [notebooks](notebooks) folder.

## Learn more

The [Ersilia Open Source Initiative](https://ersilia.io) is on a mission to strenghten research capacity in low income countries. Please reach out to us if you want to contribute: [hello@ersilia.io]()

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ersilia-os/griddify",
    "name": "griddify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "data-visualization",
    "author": "Miquel Duran-Frigola",
    "author_email": "miquel@ersilia.io",
    "download_url": "https://files.pythonhosted.org/packages/18/b2/36c593e650b7de612c27c45d8dd2e36dd0faa92f5ed52f04d60ec1a1cf9c/griddify-0.0.2.tar.gz",
    "platform": null,
    "description": "# Griddify\nRedistribute tabular data into a grid for easy visualization and image-based deep learning. This library is greatly inspired by the excellent [MolMap](https://github.com/shenwanxiang/bidd-molmap) library.\n\n## Installation\n\n```bash\ngit clone https://github.com/ersilia-os/griddify.git\ncd griddify\npip install -e .\n```\n\nNote that you may have to install a C++ compiler. You can just use conda for that:\n\n```bash\nconda install -c conda-forge cxx-compiler\n```\n\n## Step by step\n\n### Get a multidimensional dataset and preprocess it\n\nIn this example, we will use a dataset of 200 physicochemical [descriptors](https://www.rdkit.org/docs/source/rdkit.Chem.Descriptors.html) calculated for about 10k compounds. You can get these data with the following command.\n\n```python\nfrom griddify import datasets\n\ndata = datasets.get_compound_descriptors()\n```\n\nIt is important that you preprocess your data (impute missing values, normalize, etc.). We provide functionality to do so.\n\n```python\nfrom griddify import Preprocessing\n\npp = Preprocessing()\npp.fit(data)\ndata = pp.transform(data)\n```\n\n### Create a 2D cloud of data features\n\nStart by calculating distances between features.\n\n```python\nfrom griddify import FeatureDistances\n\nfd = FeatureDistances(metric=\"cosine\").calculate(data)\n```\n\nYou can now obtain a 2D cloud of your data features. By default, [UMAP](https://umap-learn.readthedocs.io/en/latest/) is used.\n\n```python\nfrom griddify import Tabular2Cloud\n\ntc = Tabular2Cloud()\ntc.fit(fd)\nXc = tc.transform(fd)\n```\n\nIt is always good to inspect the resulting projection. The cloud contains as many points as features exist in your dataset.\n\n```python\nfrom griddify.plots import cloud_plot\n\ncloud_plot(Xc)\n```\n\n### Rearrange the 2D cloud onto a grid\n\nDistribute cloud points on a grid using a [linear assignment](https://github.com/gatagat/lap) algorithm.\n\n```python\nfrom griddify import Cloud2Grid\n\ncg = Cloud2Grid()\ncg.fit(Xc)\nXg = cg.transform(Xc)\n```\n\nYou can check the rearrangement with an arrows plot.\n```python\nfrom griddify.plots import arrows_plot\n\narrows_plot(Xc, Xg)\n```\n\nTo continue with the next steps, it is actually more convenient to get mappings as integers. The following method gives you the size of the grid as well.\n\n```python\nmappings, side = cg.get_mappings(Xc)\n```\n\n### Rearrange your flat data points into grids\n\nLet's go back to the original tabular data. We want to transform the input data, where each data sample is represented with a one-dimensional array, into an output data where each sample is represented with an image (i.e. a two-dimensional grid). Please ensure that data are normalize or scaled.\n\n```python\nfrom griddify import Flat2Grid\n\nfg = Flat2Grid(mappings, side)\nXi = fg.transform(data)\n```\n\nExplore one sample.\n\n```python\nfrom griddify.plots import grid_plot\n\ngrid_plot(Xi[0])\n```\n\n## Full pipeline\n\nYou can run the full pipeline described above in only a few lines of code.\n\n```python\nfrom griddify import datasets\nfrom griddify import Griddify\n\ndata = datasets.get_compound_descriptors()\n\ngf = Griddify(preprocess=True)\ngf.fit(data)\nXi = gf.transform(data)\n```\n\nYou can find more examples as Jupyter Notebooks in the [notebooks](notebooks) folder.\n\n## Learn more\n\nThe [Ersilia Open Source Initiative](https://ersilia.io) is on a mission to strenghten research capacity in low income countries. Please reach out to us if you want to contribute: [hello@ersilia.io]()\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Griddify high-dimensional tabular data for easy visualization and deep learning",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/ersilia-os/griddify",
        "Source Code": "https://github.com/ersilia-os/griddify"
    },
    "split_keywords": [
        "data-visualization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4f9624c314535a9376b5e91660ffc36e148d3416f32f038d8637ce0a147ce7d5",
                "md5": "4160772dda1ed27277c71a8272dfdd06",
                "sha256": "95dc6efa546ae7cedd9a54aa870e5c8b75e09e9d049e39e415abc5b2908b349c"
            },
            "downloads": -1,
            "filename": "griddify-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4160772dda1ed27277c71a8272dfdd06",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 20836,
            "upload_time": "2024-07-16T12:41:30",
            "upload_time_iso_8601": "2024-07-16T12:41:30.545562Z",
            "url": "https://files.pythonhosted.org/packages/4f/96/24c314535a9376b5e91660ffc36e148d3416f32f038d8637ce0a147ce7d5/griddify-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18b236c593e650b7de612c27c45d8dd2e36dd0faa92f5ed52f04d60ec1a1cf9c",
                "md5": "4b87422ecaecbbac56211394ec5f7246",
                "sha256": "c6f875b1994d9041da87c14ed3d92b294840f9f2c1a5436164ad8eb9e659b51b"
            },
            "downloads": -1,
            "filename": "griddify-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4b87422ecaecbbac56211394ec5f7246",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 20965,
            "upload_time": "2024-07-16T12:42:11",
            "upload_time_iso_8601": "2024-07-16T12:42:11.545768Z",
            "url": "https://files.pythonhosted.org/packages/18/b2/36c593e650b7de612c27c45d8dd2e36dd0faa92f5ed52f04d60ec1a1cf9c/griddify-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-16 12:42:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ersilia-os",
    "github_project": "griddify",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "griddify"
}
        
Elapsed time: 0.69056s