# DataGen-kuma
DataGen-kuma is a library for generating test data.
It creates similar data with the same schema based on a Pandas DataFrame.
# How It Works
DataGen-kuma takes a DataFrame as input and generates random test data.
Internally, it generates statistical metrics for each data type to facilitate data generation.
Using these metrics, it produces similar data appropriate for each data type.
## Data Classification and Generation
- Numeric: Numeric data. Generates random values using Kernel Density Estimation (KDE) technique. The kernel density function uses gaussian_kde from scipy.stats.
- Category: Categorical data. Measures the frequency of each value and generates values according to these frequencies.
- Datetime: Date data following the ISO-8601 standard. Converts to Pandas Timestamps and generates random values within the given date range.
- Boolean: Boolean data. Measures the frequency of each value and generates values according to these frequencies.
- ETC: All other data types not mentioned above. Generates data by randomly sampling from the given values with replacement.
# Usage
Assuming you have a Pandas DataFrame named df.
This example generates 100,000 rows of data.
The generated object allows access to each row through iteration.
```python
from datagen_kuma.datagen import DataGen
datagen = DataGen(df=df, count=100_000)
for idx, row in datagen:
print(idx, row)
```
To retrieve the generated DataFrame, use the following:
```python
generated_df = datagen.dataframe
```
Raw data
{
"_id": null,
"home_page": "https://github.com/develinu/datagen.git",
"name": "datagen-kuma",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "data generator datagen_kuma pandas dataframe fake",
"author": "devinu",
"author_email": "iwlee.dev@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/1f/18/6d551ddea2e71777257344307f44dde27873326f2c8f5d7d4131ab62b22f/datagen_kuma-0.0.2.tar.gz",
"platform": null,
"description": "# DataGen-kuma\r\nDataGen-kuma is a library for generating test data. \r\nIt creates similar data with the same schema based on a Pandas DataFrame.\r\n\r\n# How It Works\r\nDataGen-kuma takes a DataFrame as input and generates random test data. \r\nInternally, it generates statistical metrics for each data type to facilitate data generation. \r\nUsing these metrics, it produces similar data appropriate for each data type.\r\n\r\n## Data Classification and Generation\r\n- Numeric: Numeric data. Generates random values using Kernel Density Estimation (KDE) technique. The kernel density function uses gaussian_kde from scipy.stats.\r\n- Category: Categorical data. Measures the frequency of each value and generates values according to these frequencies.\r\n- Datetime: Date data following the ISO-8601 standard. Converts to Pandas Timestamps and generates random values within the given date range.\r\n- Boolean: Boolean data. Measures the frequency of each value and generates values according to these frequencies.\r\n- ETC: All other data types not mentioned above. Generates data by randomly sampling from the given values with replacement.\r\n\r\n# Usage\r\nAssuming you have a Pandas DataFrame named df. \r\nThis example generates 100,000 rows of data. \r\nThe generated object allows access to each row through iteration.\r\n\r\n```python\r\nfrom datagen_kuma.datagen import DataGen\r\n\r\ndatagen = DataGen(df=df, count=100_000)\r\nfor idx, row in datagen:\r\n print(idx, row)\r\n```\r\n\r\nTo retrieve the generated DataFrame, use the following:\r\n```python\r\ngenerated_df = datagen.dataframe\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "DataGen is a library for generating test data.",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/develinu/datagen.git"
},
"split_keywords": [
"data",
"generator",
"datagen_kuma",
"pandas",
"dataframe",
"fake"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7b2b99c39acfe200e5d0a5b0b440e2f1adb39bcbaf101eaedc7f10b966fe7755",
"md5": "1c1bb41ed2bacf46b8d017aaeb7ffead",
"sha256": "eae1fa6d0126b09085d616ef2a54e49a9357a6fa889f55bf61d9b7cedd4abe04"
},
"downloads": -1,
"filename": "datagen_kuma-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1c1bb41ed2bacf46b8d017aaeb7ffead",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 5050,
"upload_time": "2024-05-29T01:39:54",
"upload_time_iso_8601": "2024-05-29T01:39:54.135966Z",
"url": "https://files.pythonhosted.org/packages/7b/2b/99c39acfe200e5d0a5b0b440e2f1adb39bcbaf101eaedc7f10b966fe7755/datagen_kuma-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1f186d551ddea2e71777257344307f44dde27873326f2c8f5d7d4131ab62b22f",
"md5": "70cf8e929147893b497847854be3cac9",
"sha256": "96cf0aaea116c61ce5a6f3329c3b5a291910136e349621203e71c4871bab3814"
},
"downloads": -1,
"filename": "datagen_kuma-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "70cf8e929147893b497847854be3cac9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 4692,
"upload_time": "2024-05-29T01:39:55",
"upload_time_iso_8601": "2024-05-29T01:39:55.692794Z",
"url": "https://files.pythonhosted.org/packages/1f/18/6d551ddea2e71777257344307f44dde27873326f2c8f5d7d4131ab62b22f/datagen_kuma-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-29 01:39:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "develinu",
"github_project": "datagen",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "asttokens",
"specs": [
[
"==",
"2.4.1"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "executing",
"specs": [
[
"==",
"2.0.1"
]
]
},
{
"name": "icecream",
"specs": [
[
"==",
"2.1.3"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.2"
]
]
},
{
"name": "Pygments",
"specs": [
[
"==",
"2.18.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "uv",
"specs": [
[
"==",
"0.2.4"
]
]
}
],
"lcname": "datagen-kuma"
}