# Griddify
Redistribute tabular data into a grid for easy visualization and image-based deep learning. This library is greatly inspired by the excellent [MolMap](https://github.com/shenwanxiang/bidd-molmap) library.
## Installation
```bash
git clone https://github.com/ersilia-os/griddify.git
cd griddify
pip install -e .
```
Note that you may have to install a C++ compiler. You can just use conda for that:
```bash
conda install -c conda-forge cxx-compiler
```
## Step by step
### Get a multidimensional dataset and preprocess it
In this example, we will use a dataset of 200 physicochemical [descriptors](https://www.rdkit.org/docs/source/rdkit.Chem.Descriptors.html) calculated for about 10k compounds. You can get these data with the following command.
```python
from griddify import datasets
data = datasets.get_compound_descriptors()
```
It is important that you preprocess your data (impute missing values, normalize, etc.). We provide functionality to do so.
```python
from griddify import Preprocessing
pp = Preprocessing()
pp.fit(data)
data = pp.transform(data)
```
### Create a 2D cloud of data features
Start by calculating distances between features.
```python
from griddify import FeatureDistances
fd = FeatureDistances(metric="cosine").calculate(data)
```
You can now obtain a 2D cloud of your data features. By default, [UMAP](https://umap-learn.readthedocs.io/en/latest/) is used.
```python
from griddify import Tabular2Cloud
tc = Tabular2Cloud()
tc.fit(fd)
Xc = tc.transform(fd)
```
It is always good to inspect the resulting projection. The cloud contains as many points as features exist in your dataset.
```python
from griddify.plots import cloud_plot
cloud_plot(Xc)
```
### Rearrange the 2D cloud onto a grid
Distribute cloud points on a grid using a [linear assignment](https://github.com/gatagat/lap) algorithm.
```python
from griddify import Cloud2Grid
cg = Cloud2Grid()
cg.fit(Xc)
Xg = cg.transform(Xc)
```
You can check the rearrangement with an arrows plot.
```python
from griddify.plots import arrows_plot
arrows_plot(Xc, Xg)
```
To continue with the next steps, it is actually more convenient to get mappings as integers. The following method gives you the size of the grid as well.
```python
mappings, side = cg.get_mappings(Xc)
```
### Rearrange your flat data points into grids
Let's go back to the original tabular data. We want to transform the input data, where each data sample is represented with a one-dimensional array, into an output data where each sample is represented with an image (i.e. a two-dimensional grid). Please ensure that data are normalize or scaled.
```python
from griddify import Flat2Grid
fg = Flat2Grid(mappings, side)
Xi = fg.transform(data)
```
Explore one sample.
```python
from griddify.plots import grid_plot
grid_plot(Xi[0])
```
## Full pipeline
You can run the full pipeline described above in only a few lines of code.
```python
from griddify import datasets
from griddify import Griddify
data = datasets.get_compound_descriptors()
gf = Griddify(preprocess=True)
gf.fit(data)
Xi = gf.transform(data)
```
You can find more examples as Jupyter Notebooks in the [notebooks](notebooks) folder.
## Learn more
The [Ersilia Open Source Initiative](https://ersilia.io) is on a mission to strenghten research capacity in low income countries. Please reach out to us if you want to contribute: [hello@ersilia.io]()
Raw data
{
"_id": null,
"home_page": "https://github.com/ersilia-os/griddify",
"name": "griddify",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "data-visualization",
"author": "Miquel Duran-Frigola",
"author_email": "miquel@ersilia.io",
"download_url": "https://files.pythonhosted.org/packages/18/b2/36c593e650b7de612c27c45d8dd2e36dd0faa92f5ed52f04d60ec1a1cf9c/griddify-0.0.2.tar.gz",
"platform": null,
"description": "# Griddify\nRedistribute tabular data into a grid for easy visualization and image-based deep learning. This library is greatly inspired by the excellent [MolMap](https://github.com/shenwanxiang/bidd-molmap) library.\n\n## Installation\n\n```bash\ngit clone https://github.com/ersilia-os/griddify.git\ncd griddify\npip install -e .\n```\n\nNote that you may have to install a C++ compiler. You can just use conda for that:\n\n```bash\nconda install -c conda-forge cxx-compiler\n```\n\n## Step by step\n\n### Get a multidimensional dataset and preprocess it\n\nIn this example, we will use a dataset of 200 physicochemical [descriptors](https://www.rdkit.org/docs/source/rdkit.Chem.Descriptors.html) calculated for about 10k compounds. You can get these data with the following command.\n\n```python\nfrom griddify import datasets\n\ndata = datasets.get_compound_descriptors()\n```\n\nIt is important that you preprocess your data (impute missing values, normalize, etc.). We provide functionality to do so.\n\n```python\nfrom griddify import Preprocessing\n\npp = Preprocessing()\npp.fit(data)\ndata = pp.transform(data)\n```\n\n### Create a 2D cloud of data features\n\nStart by calculating distances between features.\n\n```python\nfrom griddify import FeatureDistances\n\nfd = FeatureDistances(metric=\"cosine\").calculate(data)\n```\n\nYou can now obtain a 2D cloud of your data features. By default, [UMAP](https://umap-learn.readthedocs.io/en/latest/) is used.\n\n```python\nfrom griddify import Tabular2Cloud\n\ntc = Tabular2Cloud()\ntc.fit(fd)\nXc = tc.transform(fd)\n```\n\nIt is always good to inspect the resulting projection. The cloud contains as many points as features exist in your dataset.\n\n```python\nfrom griddify.plots import cloud_plot\n\ncloud_plot(Xc)\n```\n\n### Rearrange the 2D cloud onto a grid\n\nDistribute cloud points on a grid using a [linear assignment](https://github.com/gatagat/lap) algorithm.\n\n```python\nfrom griddify import Cloud2Grid\n\ncg = Cloud2Grid()\ncg.fit(Xc)\nXg = cg.transform(Xc)\n```\n\nYou can check the rearrangement with an arrows plot.\n```python\nfrom griddify.plots import arrows_plot\n\narrows_plot(Xc, Xg)\n```\n\nTo continue with the next steps, it is actually more convenient to get mappings as integers. The following method gives you the size of the grid as well.\n\n```python\nmappings, side = cg.get_mappings(Xc)\n```\n\n### Rearrange your flat data points into grids\n\nLet's go back to the original tabular data. We want to transform the input data, where each data sample is represented with a one-dimensional array, into an output data where each sample is represented with an image (i.e. a two-dimensional grid). Please ensure that data are normalize or scaled.\n\n```python\nfrom griddify import Flat2Grid\n\nfg = Flat2Grid(mappings, side)\nXi = fg.transform(data)\n```\n\nExplore one sample.\n\n```python\nfrom griddify.plots import grid_plot\n\ngrid_plot(Xi[0])\n```\n\n## Full pipeline\n\nYou can run the full pipeline described above in only a few lines of code.\n\n```python\nfrom griddify import datasets\nfrom griddify import Griddify\n\ndata = datasets.get_compound_descriptors()\n\ngf = Griddify(preprocess=True)\ngf.fit(data)\nXi = gf.transform(data)\n```\n\nYou can find more examples as Jupyter Notebooks in the [notebooks](notebooks) folder.\n\n## Learn more\n\nThe [Ersilia Open Source Initiative](https://ersilia.io) is on a mission to strenghten research capacity in low income countries. Please reach out to us if you want to contribute: [hello@ersilia.io]()\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Griddify high-dimensional tabular data for easy visualization and deep learning",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/ersilia-os/griddify",
"Source Code": "https://github.com/ersilia-os/griddify"
},
"split_keywords": [
"data-visualization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4f9624c314535a9376b5e91660ffc36e148d3416f32f038d8637ce0a147ce7d5",
"md5": "4160772dda1ed27277c71a8272dfdd06",
"sha256": "95dc6efa546ae7cedd9a54aa870e5c8b75e09e9d049e39e415abc5b2908b349c"
},
"downloads": -1,
"filename": "griddify-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4160772dda1ed27277c71a8272dfdd06",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 20836,
"upload_time": "2024-07-16T12:41:30",
"upload_time_iso_8601": "2024-07-16T12:41:30.545562Z",
"url": "https://files.pythonhosted.org/packages/4f/96/24c314535a9376b5e91660ffc36e148d3416f32f038d8637ce0a147ce7d5/griddify-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "18b236c593e650b7de612c27c45d8dd2e36dd0faa92f5ed52f04d60ec1a1cf9c",
"md5": "4b87422ecaecbbac56211394ec5f7246",
"sha256": "c6f875b1994d9041da87c14ed3d92b294840f9f2c1a5436164ad8eb9e659b51b"
},
"downloads": -1,
"filename": "griddify-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "4b87422ecaecbbac56211394ec5f7246",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 20965,
"upload_time": "2024-07-16T12:42:11",
"upload_time_iso_8601": "2024-07-16T12:42:11.545768Z",
"url": "https://files.pythonhosted.org/packages/18/b2/36c593e650b7de612c27c45d8dd2e36dd0faa92f5ed52f04d60ec1a1cf9c/griddify-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-16 12:42:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ersilia-os",
"github_project": "griddify",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "griddify"
}