melinda


Namemelinda JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/HSE-LAMBDA/LINDA
SummarySynthetic data generation.
upload_time2024-08-25 18:54:47
maintainerNone
docs_urlNone
authorMikhail Hushchyn
requires_python<4.0,>=3.8
licenseMIT
keywords synthetic data augmentation generative models tabular data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Welcome to LINDA

``MELINDA`` is a python library for creating tabular synthetic data. 
It uses various generative models in artificial intelligence 
to learn statistical properties from your real data and 
use them to generate synthetic data.

## Installation
```python
git clone https://github.com/hse-cs/LINDA.git
cd LINDA
pip install -e .
```
or
```python
poetry install
```

## Basic usage
The following code snippet creates an example of real data, fits a generative model, and samples synthetic data.
```python
import numpy as np
import pandas as pd
from melinda.models import ProbaformsSynthesizer
from probaforms.models import CVAE

# generate an example of real data
n = 100
data_real = pd.DataFrame()
data_real['col_1'] = np.random.rand(n)
data_real['col_2'] = np.random.rand(n)
data_real['col_3'] = [str(i) for i in np.random.randint(0, 10, n)]
data_real['col_4'] = [str(i) for i in np.random.randint(0, 5, n)]

num_cols = ['col_1', 'col_2']
cat_cols = ['col_3', 'col_4']
lab_cols = None

# fit a generative model
model = CVAE(latent_dim=10, hidden=(10,), lr=0.001, n_epochs=10)
gen = ProbaformsSynthesizer(model, num_cols, cat_cols, lab_cols, cat_transform='OneHotEncoder')
gen.fit(data_real)

# sample synthetic data
data_synthetic = gen.sample(n_samples=10)
data_synthetic.head()
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/HSE-LAMBDA/LINDA",
    "name": "melinda",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "synthetic data, augmentation, generative models, tabular data",
    "author": "Mikhail Hushchyn",
    "author_email": "hushchyn.mikhail@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/57/1f/2e8644b7aa72b3041868c4129e7ac51ee479f4b33aac111d483eeff9120c/melinda-0.1.1.tar.gz",
    "platform": null,
    "description": "# Welcome to LINDA\n\n``MELINDA`` is a python library for creating tabular synthetic data. \nIt uses various generative models in artificial intelligence \nto learn statistical properties from your real data and \nuse them to generate synthetic data.\n\n## Installation\n```python\ngit clone https://github.com/hse-cs/LINDA.git\ncd LINDA\npip install -e .\n```\nor\n```python\npoetry install\n```\n\n## Basic usage\nThe following code snippet creates an example of real data, fits a generative model, and samples synthetic data.\n```python\nimport numpy as np\nimport pandas as pd\nfrom melinda.models import ProbaformsSynthesizer\nfrom probaforms.models import CVAE\n\n# generate an example of real data\nn = 100\ndata_real = pd.DataFrame()\ndata_real['col_1'] = np.random.rand(n)\ndata_real['col_2'] = np.random.rand(n)\ndata_real['col_3'] = [str(i) for i in np.random.randint(0, 10, n)]\ndata_real['col_4'] = [str(i) for i in np.random.randint(0, 5, n)]\n\nnum_cols = ['col_1', 'col_2']\ncat_cols = ['col_3', 'col_4']\nlab_cols = None\n\n# fit a generative model\nmodel = CVAE(latent_dim=10, hidden=(10,), lr=0.001, n_epochs=10)\ngen = ProbaformsSynthesizer(model, num_cols, cat_cols, lab_cols, cat_transform='OneHotEncoder')\ngen.fit(data_real)\n\n# sample synthetic data\ndata_synthetic = gen.sample(n_samples=10)\ndata_synthetic.head()\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Synthetic data generation.",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/HSE-LAMBDA/LINDA",
        "Repository": "https://github.com/HSE-LAMBDA/LINDA"
    },
    "split_keywords": [
        "synthetic data",
        " augmentation",
        " generative models",
        " tabular data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "adffb2c3049b925429f776efe0306eeae9bc801bd88c2b8bb6ede389575be764",
                "md5": "4c64205214f79bf07e1a3ee2465b4637",
                "sha256": "53b9214565e2a28804f0763e4226494d9bf2acdd8e7abb25e0c4607081cdb85f"
            },
            "downloads": -1,
            "filename": "melinda-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4c64205214f79bf07e1a3ee2465b4637",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 7924,
            "upload_time": "2024-08-25T18:54:45",
            "upload_time_iso_8601": "2024-08-25T18:54:45.870965Z",
            "url": "https://files.pythonhosted.org/packages/ad/ff/b2c3049b925429f776efe0306eeae9bc801bd88c2b8bb6ede389575be764/melinda-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "571f2e8644b7aa72b3041868c4129e7ac51ee479f4b33aac111d483eeff9120c",
                "md5": "34a75ead1682d122301291479367ad9e",
                "sha256": "833a8197415302f222918a1e5e20fb7cfad648644f06f61b085125f2a8cf8a6f"
            },
            "downloads": -1,
            "filename": "melinda-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "34a75ead1682d122301291479367ad9e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 6550,
            "upload_time": "2024-08-25T18:54:47",
            "upload_time_iso_8601": "2024-08-25T18:54:47.313841Z",
            "url": "https://files.pythonhosted.org/packages/57/1f/2e8644b7aa72b3041868c4129e7ac51ee479f4b33aac111d483eeff9120c/melinda-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-25 18:54:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "HSE-LAMBDA",
    "github_project": "LINDA",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "melinda"
}
        
Elapsed time: 0.52680s