# datazets
[](https://img.shields.io/pypi/pyversions/datazets)
[](https://pypi.org/project/datazets/)
[](https://erdogant.github.io/datazets/)
[](https://github.com/erdogant/datazets/)
[](https://pepy.tech/project/datazets)
[](https://pepy.tech/project/datazets)
[](https://github.com/erdogant/datazets/blob/master/LICENSE)
[](https://github.com/erdogant/datazets/network)
[](https://github.com/erdogant/datazets/issues)
[](http://www.repostatus.org/#active)


[](https://erdogant.github.io/datazets/pages/html/Documentation.html#)
* ``datazets`` is Python package
#
**Star this repo if you like it! ⭐️**
#
```bash
pip install datazets
```
#### Import datazets
```python
# Import library
import datazets as dz
# Import data set
df = dz.get('titanic')
```
#### Data sets:
| Dataset Name | Shape Size | Type | Description |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| meta | (1472, 20) | Continuous | time | Stock price of Meta |
| bitcoin | (2522, 2) | Continuous | time | Bitcoin price history data for time series and price prediction |
| iris | (150, 3) | Continuous | Classic flower classification dataset with iris species measurements with coordinates |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| gas_prices | (6556, 2) | Mixed | time | Historical gas prices by region for trend analysis |
| ads | (10000, 10) | Discrete | Data on online ads, covering click-through rates and targeting information |
| sprinkler | (1000, 4) | Discrete | Synthetic dataset with binary variables for rain and sprinkler probability illustration |
| random_discrete | (1000, 5) | Discrete | Synthetic dataset with random discrete variables, useful for probability modeling |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| malicious_urls | (387588, 2) | Text | URLs labeled as malicious or benign, useful in cybersecurity |
| malicious_phish | (651191, 4) | Text | URLs labeled as malicious or benign, defacement, phishing, malware (cybersecurity) |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| stormofswords | (352, 3) | Network | Character data from *A Storm of Swords*, with relationships, traits, and alliance info |
| bigbang | (9, 3) | Network | Data on *The Big Bang Theory* episodes and characters |
| energy | (68, 3) | Network | Data on building energy consumption |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| auto_mpg | (392, 8) | Mixed | Data on cars with features for predicting miles per gallon |
| breast_cancer | (569, 30) | Mixed | Dataset for breast cancer diagnosis prediction using tumor cell features |
| cancer | (4674, 9) | Mixed | Cancer patient data for classification and prediction of diagnosis outcome with Coordinates |
| census_income | (32561, 15) | Mixed | US Census data with various demographic and economic factors for income prediction |
| elections_rus | (94487, 23) | Mixed | Russian election data with demographic and political attributes |
| elections_usa | (24611, 8) | Mixed | US election data with demographic and political attributes |
| fifa | (128, 27) | Mixed | FIFA player stats including attributes like skill, position, country, and performance |
| marketing_retail | (999, 8) | Mixed | Retail customer data for behavior and segmentation analysis |
| predictive_maintenance | (10000, 14) | Mixed | Industrial equipment data for predictive maintenance |
| student | (649, 33) | Mixed | Data on student performance with socio-demographic and academic factors |
| surfspots | (9413, 4) | Mixed | latlon | Information on global surf spots, with details on location and wave characteristics |
| tips | (244, 7) | Mixed | Restaurant tipping data with variables on meal size, day, and tip amount |
| titanic | (891, 12) | Mixed | Titanic passenger data with demographic, class, and survival information |
| waterpump | (59400, 41) | Mixed | Water pump data with features for predicting functionality and maintenance needs |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| cat_and_dog | None | Image | Images of cats and dogs for classification and object recognition |
| digits | (1083, 65) | Image | Handwritten digit images (8x8 pixels) for recognition and classification |
| faces | (400, 4097) | Image | Images of faces used in facial recognition and feature analysis |
| flowers | None | Image | Various flower images for classification and image recognition |
| img_peaks1 | (930, 930, 3) | Image | Synthetic peak images for image processing and analysis |
| img_peaks2 | (125, 496, 3) | Image | Additional synthetic peak images for image processing |
| mnist | (1797, 65) | Image | MNIST handwritten digit images (28x28 pixels) for classification tasks |
| scenes | None | Image | Scene images for scene classification tasks |
| southern_nebula | None | Image | Images of the Southern Nebula, suitable for astronomical analysis |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| blobs | Custom | Continuous | Synthetic data of datapoints in blob shape |
| moons | Custom | Continuous | Synthetic data of datapoints in moon shape |
| circles | Custom | Continuous | Synthetic data of datapoints in circle shape |
| anisotropic | Custom | Continuous | Synthetic data of datapoints with anisotropic shape |
| globular | Custom | Continuous | Synthetic data of datapoints with globular shape |
| uniform | Custom | Continuous | Synthetic data with uniform shape |
| densities | Custom | Continuous | Synthetic data with different densities |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
#### Example:
```python
import datazets as dz
df = dz.get(data='titanic')
```
```python
import datazets as dz
# Import from url
url='https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
df = dz.get(url=url, sep=',')
```
### Maintainer
* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
### Contribute
* All kinds of contributions are welcome!
* If you wish to buy me a <a href="https://www.buymeacoffee.com/erdogant">Coffee</a> for this work, it is very appreciated :)
### Licence
See [LICENSE](LICENSE) for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/erdogant/datazets",
"name": "datazets",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": null,
"author": "Erdogan Taskesen",
"author_email": "erdogant@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/16/58/d629173d37b4b704656b82299568d182ea12fcb8ecf0eff1468d0703dbe0/datazets-1.1.0.tar.gz",
"platform": null,
"description": "# datazets\r\n\r\n[](https://img.shields.io/pypi/pyversions/datazets)\r\n[](https://pypi.org/project/datazets/)\r\n[](https://erdogant.github.io/datazets/)\r\n[](https://github.com/erdogant/datazets/)\r\n[](https://pepy.tech/project/datazets)\r\n[](https://pepy.tech/project/datazets)\r\n[](https://github.com/erdogant/datazets/blob/master/LICENSE)\r\n[](https://github.com/erdogant/datazets/network)\r\n[](https://github.com/erdogant/datazets/issues)\r\n[](http://www.repostatus.org/#active)\r\n\r\n\r\n[](https://erdogant.github.io/datazets/pages/html/Documentation.html#)\r\n\r\n* ``datazets`` is Python package\r\n\r\n# \r\n**Star this repo if you like it! \u2b50\ufe0f**\r\n#\r\n\r\n\r\n```bash\r\npip install datazets\r\n```\r\n\r\n#### Import datazets\r\n```python\r\n# Import library\r\nimport datazets as dz\r\n# Import data set\r\ndf = dz.get('titanic')\r\n\r\n```\r\n\r\n#### Data sets:\r\n\r\n\r\n| Dataset Name | Shape Size | Type | Description |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| meta | (1472, 20) | Continuous | time | Stock price of Meta |\r\n| bitcoin | (2522, 2) | Continuous | time | Bitcoin price history data for time series and price prediction |\r\n| iris | (150, 3) | Continuous | Classic flower classification dataset with iris species measurements with coordinates |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| gas_prices | (6556, 2) | Mixed | time | Historical gas prices by region for trend analysis |\r\n| ads | (10000, 10) | Discrete | Data on online ads, covering click-through rates and targeting information |\r\n| sprinkler | (1000, 4) | Discrete | Synthetic dataset with binary variables for rain and sprinkler probability illustration |\r\n| random_discrete | (1000, 5) | Discrete | Synthetic dataset with random discrete variables, useful for probability modeling |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| malicious_urls | (387588, 2) | Text | URLs labeled as malicious or benign, useful in cybersecurity |\r\n| malicious_phish | (651191, 4) | Text | URLs labeled as malicious or benign, defacement, phishing, malware (cybersecurity) |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| stormofswords | (352, 3) | Network | Character data from *A Storm of Swords*, with relationships, traits, and alliance info |\r\n| bigbang | (9, 3) | Network | Data on *The Big Bang Theory* episodes and characters |\r\n| energy | (68, 3) | Network | Data on building energy consumption |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| auto_mpg | (392, 8) | Mixed | Data on cars with features for predicting miles per gallon |\r\n| breast_cancer | (569, 30) | Mixed | Dataset for breast cancer diagnosis prediction using tumor cell features |\r\n| cancer | (4674, 9) | Mixed | Cancer patient data for classification and prediction of diagnosis outcome with Coordinates |\r\n| census_income | (32561, 15) | Mixed | US Census data with various demographic and economic factors for income prediction |\r\n| elections_rus | (94487, 23) | Mixed | Russian election data with demographic and political attributes |\r\n| elections_usa | (24611, 8) | Mixed | US election data with demographic and political attributes |\r\n| fifa | (128, 27) | Mixed | FIFA player stats including attributes like skill, position, country, and performance |\r\n| marketing_retail | (999, 8) | Mixed | Retail customer data for behavior and segmentation analysis |\r\n| predictive_maintenance | (10000, 14) | Mixed | Industrial equipment data for predictive maintenance |\r\n| student | (649, 33) | Mixed | Data on student performance with socio-demographic and academic factors |\r\n| surfspots | (9413, 4) | Mixed | latlon | Information on global surf spots, with details on location and wave characteristics |\r\n| tips | (244, 7) | Mixed | Restaurant tipping data with variables on meal size, day, and tip amount |\r\n| titanic | (891, 12) | Mixed | Titanic passenger data with demographic, class, and survival information |\r\n| waterpump | (59400, 41) | Mixed | Water pump data with features for predicting functionality and maintenance needs |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| cat_and_dog | None | Image | Images of cats and dogs for classification and object recognition |\r\n| digits | (1083, 65) | Image | Handwritten digit images (8x8 pixels) for recognition and classification |\r\n| faces | (400, 4097) | Image | Images of faces used in facial recognition and feature analysis |\r\n| flowers | None | Image | Various flower images for classification and image recognition |\r\n| img_peaks1 | (930, 930, 3) | Image | Synthetic peak images for image processing and analysis |\r\n| img_peaks2 | (125, 496, 3) | Image | Additional synthetic peak images for image processing |\r\n| mnist | (1797, 65) | Image | MNIST handwritten digit images (28x28 pixels) for classification tasks |\r\n| scenes | None | Image | Scene images for scene classification tasks |\r\n| southern_nebula | None | Image | Images of the Southern Nebula, suitable for astronomical analysis |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| blobs | Custom | Continuous | Synthetic data of datapoints in blob shape |\r\n| moons | Custom | Continuous | Synthetic data of datapoints in moon shape |\r\n| circles | Custom | Continuous | Synthetic data of datapoints in circle shape |\r\n| anisotropic | Custom | Continuous | Synthetic data of datapoints with anisotropic shape |\r\n| globular | Custom | Continuous | Synthetic data of datapoints with globular shape |\r\n| uniform | Custom | Continuous | Synthetic data with uniform shape |\r\n| densities | Custom | Continuous | Synthetic data with different densities |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n\r\n\r\n\r\n#### Example:\r\n\r\n```python\r\n\r\nimport datazets as dz\r\ndf = dz.get(data='titanic')\r\n\r\n```\r\n\r\n\r\n```python\r\n\r\nimport datazets as dz\r\n\r\n# Import from url\r\nurl='https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'\r\ndf = dz.get(url=url, sep=',')\r\n\r\n```\r\n\r\n### Maintainer\r\n* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)\r\n\r\n### Contribute\r\n* All kinds of contributions are welcome!\r\n* If you wish to buy me a <a href=\"https://www.buymeacoffee.com/erdogant\">Coffee</a> for this work, it is very appreciated :)\r\n\r\n### Licence\r\nSee [LICENSE](LICENSE) for details.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Datazets is a python package to import well known example data sets.",
"version": "1.1.0",
"project_urls": {
"Download": "https://github.com/erdogant/datazets/archive/1.1.0.tar.gz",
"Homepage": "https://github.com/erdogant/datazets"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4471b7012dee713198a598c836da8c446792ab39235714a389ac4db2417a6bca",
"md5": "1f7e487a7702a029283ea85bc51c9e73",
"sha256": "edf21e39c480edcd80c0b1fc4b36f9546c5097a1aa7f9276e97f7e990f1424de"
},
"downloads": -1,
"filename": "datazets-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1f7e487a7702a029283ea85bc51c9e73",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3",
"size": 14409,
"upload_time": "2025-01-17T13:41:39",
"upload_time_iso_8601": "2025-01-17T13:41:39.024412Z",
"url": "https://files.pythonhosted.org/packages/44/71/b7012dee713198a598c836da8c446792ab39235714a389ac4db2417a6bca/datazets-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1658d629173d37b4b704656b82299568d182ea12fcb8ecf0eff1468d0703dbe0",
"md5": "88b149fd27fd4da6e02563e5edd5a7aa",
"sha256": "27962c727f0c02f370153f183a81fc7e0b33277b95047324c60cae06bad15f99"
},
"downloads": -1,
"filename": "datazets-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "88b149fd27fd4da6e02563e5edd5a7aa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 14949,
"upload_time": "2025-01-17T13:41:40",
"upload_time_iso_8601": "2025-01-17T13:41:40.219539Z",
"url": "https://files.pythonhosted.org/packages/16/58/d629173d37b4b704656b82299568d182ea12fcb8ecf0eff1468d0703dbe0/datazets-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-17 13:41:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "erdogant",
"github_project": "datazets",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "requests",
"specs": []
}
],
"lcname": "datazets"
}