datazets


Namedatazets JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/erdogant/datazets
SummaryDatazets is a python package to import well known example data sets.
upload_time2025-01-17 13:41:40
maintainerNone
docs_urlNone
authorErdogan Taskesen
requires_python>=3
licenseNone
keywords
VCS
bugtrack_url
requirements pandas numpy requests
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # datazets

[![Python](https://img.shields.io/pypi/pyversions/datazets)](https://img.shields.io/pypi/pyversions/datazets)
[![Pypi](https://img.shields.io/pypi/v/datazets)](https://pypi.org/project/datazets/)
[![Docs](https://img.shields.io/badge/Sphinx-Docs-Green)](https://erdogant.github.io/datazets/)
[![LOC](https://sloc.xyz/github/erdogant/datazets/?category=code)](https://github.com/erdogant/datazets/)
[![Downloads](https://static.pepy.tech/personalized-badge/datazets?period=month&units=international_system&left_color=grey&right_color=brightgreen&left_text=PyPI%20downloads/month)](https://pepy.tech/project/datazets)
[![Downloads](https://static.pepy.tech/personalized-badge/datazets?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=Downloads)](https://pepy.tech/project/datazets)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/erdogant/datazets/blob/master/LICENSE)
[![Forks](https://img.shields.io/github/forks/erdogant/datazets.svg)](https://github.com/erdogant/datazets/network)
[![Issues](https://img.shields.io/github/issues/erdogant/datazets.svg)](https://github.com/erdogant/datazets/issues)
[![Project Status](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
![GitHub Repo stars](https://img.shields.io/github/stars/erdogant/datazets)
![GitHub repo size](https://img.shields.io/github/repo-size/erdogant/datazets)
[![Donate](https://img.shields.io/badge/Support%20this%20project-grey.svg?logo=github%20sponsors)](https://erdogant.github.io/datazets/pages/html/Documentation.html#)

* ``datazets`` is Python package

# 
**Star this repo if you like it! ⭐️**
#


```bash
pip install datazets
```

#### Import datazets
```python
# Import library
import datazets as dz
# Import data set
df = dz.get('titanic')

```

#### Data sets:


| Dataset Name           | Shape Size           | Type                | Description                                                                                   |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| meta                   | (1472, 20)           | Continuous | time   | Stock price of Meta                                                                           |
| bitcoin                | (2522, 2)            | Continuous | time   | Bitcoin price history data for time series and price prediction                               |
| iris                   | (150, 3)             | Continuous          | Classic flower classification dataset with iris species measurements with coordinates         |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| gas_prices             | (6556, 2)            | Mixed | time        | Historical gas prices by region for trend analysis                                            |
| ads                    | (10000, 10)          | Discrete            | Data on online ads, covering click-through rates and targeting information                    |
| sprinkler              | (1000, 4)            | Discrete            | Synthetic dataset with binary variables for rain and sprinkler probability illustration       |
| random_discrete        | (1000, 5)            | Discrete            | Synthetic dataset with random discrete variables, useful for probability modeling             |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| malicious_urls         | (387588, 2)          | Text                | URLs labeled as malicious or benign, useful in cybersecurity                                  |
| malicious_phish        | (651191, 4)          | Text                | URLs labeled as malicious or benign, defacement, phishing, malware (cybersecurity)            |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| stormofswords          | (352, 3)             | Network             | Character data from *A Storm of Swords*, with relationships, traits, and alliance info        |
| bigbang                | (9, 3)               | Network             | Data on *The Big Bang Theory* episodes and characters                                         |
| energy                 | (68, 3)              | Network             | Data on building energy consumption                                                           |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| auto_mpg               | (392, 8)             | Mixed               | Data on cars with features for predicting miles per gallon                                    |
| breast_cancer          | (569, 30)            | Mixed               | Dataset for breast cancer diagnosis prediction using tumor cell features                      |
| cancer                 | (4674, 9)            | Mixed               | Cancer patient data for classification and prediction of diagnosis outcome with Coordinates   |
| census_income          | (32561, 15)          | Mixed               | US Census data with various demographic and economic factors for income prediction            |
| elections_rus          | (94487, 23)          | Mixed               | Russian election data with demographic and political attributes                               |
| elections_usa          | (24611, 8)           | Mixed               | US election data with demographic and political attributes                                    |
| fifa                   | (128, 27)            | Mixed               | FIFA player stats including attributes like skill, position, country, and performance         |
| marketing_retail       | (999, 8)             | Mixed               | Retail customer data for behavior and segmentation analysis                                   |
| predictive_maintenance | (10000, 14)          | Mixed               | Industrial equipment data for predictive maintenance                                          |
| student                | (649, 33)            | Mixed               | Data on student performance with socio-demographic and academic factors                       |
| surfspots              | (9413, 4)            | Mixed | latlon      | Information on global surf spots, with details on location and wave characteristics           |
| tips                   | (244, 7)             | Mixed               | Restaurant tipping data with variables on meal size, day, and tip amount                      |
| titanic                | (891, 12)            | Mixed               | Titanic passenger data with demographic, class, and survival information                      |
| waterpump              | (59400, 41)          | Mixed               | Water pump data with features for predicting functionality and maintenance needs              |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| cat_and_dog            | None                 | Image               | Images of cats and dogs for classification and object recognition                             |
| digits                 | (1083, 65)           | Image               | Handwritten digit images (8x8 pixels) for recognition and classification                      |
| faces                  | (400, 4097)          | Image               | Images of faces used in facial recognition and feature analysis                               |
| flowers                | None                 | Image               | Various flower images for classification and image recognition                                |
| img_peaks1             | (930, 930, 3)        | Image               | Synthetic peak images for image processing and analysis                                       |
| img_peaks2             | (125, 496, 3)        | Image               | Additional synthetic peak images for image processing                                         |
| mnist                  | (1797, 65)           | Image               | MNIST handwritten digit images (28x28 pixels) for classification tasks                        |
| scenes                 | None                 | Image               | Scene images for scene classification tasks                                                   |
| southern_nebula        | None                 | Image               | Images of the Southern Nebula, suitable for astronomical analysis                             |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|
| blobs                  | Custom               | Continuous          | Synthetic data of datapoints in blob shape                                                    |
| moons                  | Custom               | Continuous          | Synthetic data of datapoints in moon shape                                                    |
| circles                | Custom               | Continuous          | Synthetic data of datapoints in circle shape                                                  |
| anisotropic            | Custom               | Continuous          | Synthetic data of datapoints with anisotropic shape                                           |
| globular               | Custom               | Continuous          | Synthetic data of datapoints with globular shape                                              |
| uniform                | Custom               | Continuous          | Synthetic data with uniform shape                                                             |
| densities              | Custom               | Continuous          | Synthetic data with different densities                                                       |
|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|



#### Example:

```python

import datazets as dz
df = dz.get(data='titanic')

```


```python

import datazets as dz

# Import from url
url='https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
df = dz.get(url=url, sep=',')

```

### Maintainer
* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)

### Contribute
* All kinds of contributions are welcome!
* If you wish to buy me a <a href="https://www.buymeacoffee.com/erdogant">Coffee</a> for this work, it is very appreciated :)

### Licence
See [LICENSE](LICENSE) for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/erdogant/datazets",
    "name": "datazets",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": null,
    "keywords": null,
    "author": "Erdogan Taskesen",
    "author_email": "erdogant@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/16/58/d629173d37b4b704656b82299568d182ea12fcb8ecf0eff1468d0703dbe0/datazets-1.1.0.tar.gz",
    "platform": null,
    "description": "# datazets\r\n\r\n[![Python](https://img.shields.io/pypi/pyversions/datazets)](https://img.shields.io/pypi/pyversions/datazets)\r\n[![Pypi](https://img.shields.io/pypi/v/datazets)](https://pypi.org/project/datazets/)\r\n[![Docs](https://img.shields.io/badge/Sphinx-Docs-Green)](https://erdogant.github.io/datazets/)\r\n[![LOC](https://sloc.xyz/github/erdogant/datazets/?category=code)](https://github.com/erdogant/datazets/)\r\n[![Downloads](https://static.pepy.tech/personalized-badge/datazets?period=month&units=international_system&left_color=grey&right_color=brightgreen&left_text=PyPI%20downloads/month)](https://pepy.tech/project/datazets)\r\n[![Downloads](https://static.pepy.tech/personalized-badge/datazets?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=Downloads)](https://pepy.tech/project/datazets)\r\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/erdogant/datazets/blob/master/LICENSE)\r\n[![Forks](https://img.shields.io/github/forks/erdogant/datazets.svg)](https://github.com/erdogant/datazets/network)\r\n[![Issues](https://img.shields.io/github/issues/erdogant/datazets.svg)](https://github.com/erdogant/datazets/issues)\r\n[![Project Status](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)\r\n![GitHub Repo stars](https://img.shields.io/github/stars/erdogant/datazets)\r\n![GitHub repo size](https://img.shields.io/github/repo-size/erdogant/datazets)\r\n[![Donate](https://img.shields.io/badge/Support%20this%20project-grey.svg?logo=github%20sponsors)](https://erdogant.github.io/datazets/pages/html/Documentation.html#)\r\n\r\n* ``datazets`` is Python package\r\n\r\n# \r\n**Star this repo if you like it! \u2b50\ufe0f**\r\n#\r\n\r\n\r\n```bash\r\npip install datazets\r\n```\r\n\r\n#### Import datazets\r\n```python\r\n# Import library\r\nimport datazets as dz\r\n# Import data set\r\ndf = dz.get('titanic')\r\n\r\n```\r\n\r\n#### Data sets:\r\n\r\n\r\n| Dataset Name           | Shape Size           | Type                | Description                                                                                   |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| meta                   | (1472, 20)           | Continuous | time   | Stock price of Meta                                                                           |\r\n| bitcoin                | (2522, 2)            | Continuous | time   | Bitcoin price history data for time series and price prediction                               |\r\n| iris                   | (150, 3)             | Continuous          | Classic flower classification dataset with iris species measurements with coordinates         |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| gas_prices             | (6556, 2)            | Mixed | time        | Historical gas prices by region for trend analysis                                            |\r\n| ads                    | (10000, 10)          | Discrete            | Data on online ads, covering click-through rates and targeting information                    |\r\n| sprinkler              | (1000, 4)            | Discrete            | Synthetic dataset with binary variables for rain and sprinkler probability illustration       |\r\n| random_discrete        | (1000, 5)            | Discrete            | Synthetic dataset with random discrete variables, useful for probability modeling             |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| malicious_urls         | (387588, 2)          | Text                | URLs labeled as malicious or benign, useful in cybersecurity                                  |\r\n| malicious_phish        | (651191, 4)          | Text                | URLs labeled as malicious or benign, defacement, phishing, malware (cybersecurity)            |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| stormofswords          | (352, 3)             | Network             | Character data from *A Storm of Swords*, with relationships, traits, and alliance info        |\r\n| bigbang                | (9, 3)               | Network             | Data on *The Big Bang Theory* episodes and characters                                         |\r\n| energy                 | (68, 3)              | Network             | Data on building energy consumption                                                           |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| auto_mpg               | (392, 8)             | Mixed               | Data on cars with features for predicting miles per gallon                                    |\r\n| breast_cancer          | (569, 30)            | Mixed               | Dataset for breast cancer diagnosis prediction using tumor cell features                      |\r\n| cancer                 | (4674, 9)            | Mixed               | Cancer patient data for classification and prediction of diagnosis outcome with Coordinates   |\r\n| census_income          | (32561, 15)          | Mixed               | US Census data with various demographic and economic factors for income prediction            |\r\n| elections_rus          | (94487, 23)          | Mixed               | Russian election data with demographic and political attributes                               |\r\n| elections_usa          | (24611, 8)           | Mixed               | US election data with demographic and political attributes                                    |\r\n| fifa                   | (128, 27)            | Mixed               | FIFA player stats including attributes like skill, position, country, and performance         |\r\n| marketing_retail       | (999, 8)             | Mixed               | Retail customer data for behavior and segmentation analysis                                   |\r\n| predictive_maintenance | (10000, 14)          | Mixed               | Industrial equipment data for predictive maintenance                                          |\r\n| student                | (649, 33)            | Mixed               | Data on student performance with socio-demographic and academic factors                       |\r\n| surfspots              | (9413, 4)            | Mixed | latlon      | Information on global surf spots, with details on location and wave characteristics           |\r\n| tips                   | (244, 7)             | Mixed               | Restaurant tipping data with variables on meal size, day, and tip amount                      |\r\n| titanic                | (891, 12)            | Mixed               | Titanic passenger data with demographic, class, and survival information                      |\r\n| waterpump              | (59400, 41)          | Mixed               | Water pump data with features for predicting functionality and maintenance needs              |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| cat_and_dog            | None                 | Image               | Images of cats and dogs for classification and object recognition                             |\r\n| digits                 | (1083, 65)           | Image               | Handwritten digit images (8x8 pixels) for recognition and classification                      |\r\n| faces                  | (400, 4097)          | Image               | Images of faces used in facial recognition and feature analysis                               |\r\n| flowers                | None                 | Image               | Various flower images for classification and image recognition                                |\r\n| img_peaks1             | (930, 930, 3)        | Image               | Synthetic peak images for image processing and analysis                                       |\r\n| img_peaks2             | (125, 496, 3)        | Image               | Additional synthetic peak images for image processing                                         |\r\n| mnist                  | (1797, 65)           | Image               | MNIST handwritten digit images (28x28 pixels) for classification tasks                        |\r\n| scenes                 | None                 | Image               | Scene images for scene classification tasks                                                   |\r\n| southern_nebula        | None                 | Image               | Images of the Southern Nebula, suitable for astronomical analysis                             |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n| blobs                  | Custom               | Continuous          | Synthetic data of datapoints in blob shape                                                    |\r\n| moons                  | Custom               | Continuous          | Synthetic data of datapoints in moon shape                                                    |\r\n| circles                | Custom               | Continuous          | Synthetic data of datapoints in circle shape                                                  |\r\n| anisotropic            | Custom               | Continuous          | Synthetic data of datapoints with anisotropic shape                                           |\r\n| globular               | Custom               | Continuous          | Synthetic data of datapoints with globular shape                                              |\r\n| uniform                | Custom               | Continuous          | Synthetic data with uniform shape                                                             |\r\n| densities              | Custom               | Continuous          | Synthetic data with different densities                                                       |\r\n|------------------------|----------------------|---------------------|-----------------------------------------------------------------------------------------------|\r\n\r\n\r\n\r\n#### Example:\r\n\r\n```python\r\n\r\nimport datazets as dz\r\ndf = dz.get(data='titanic')\r\n\r\n```\r\n\r\n\r\n```python\r\n\r\nimport datazets as dz\r\n\r\n# Import from url\r\nurl='https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'\r\ndf = dz.get(url=url, sep=',')\r\n\r\n```\r\n\r\n### Maintainer\r\n* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)\r\n\r\n### Contribute\r\n* All kinds of contributions are welcome!\r\n* If you wish to buy me a <a href=\"https://www.buymeacoffee.com/erdogant\">Coffee</a> for this work, it is very appreciated :)\r\n\r\n### Licence\r\nSee [LICENSE](LICENSE) for details.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Datazets is a python package to import well known example data sets.",
    "version": "1.1.0",
    "project_urls": {
        "Download": "https://github.com/erdogant/datazets/archive/1.1.0.tar.gz",
        "Homepage": "https://github.com/erdogant/datazets"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4471b7012dee713198a598c836da8c446792ab39235714a389ac4db2417a6bca",
                "md5": "1f7e487a7702a029283ea85bc51c9e73",
                "sha256": "edf21e39c480edcd80c0b1fc4b36f9546c5097a1aa7f9276e97f7e990f1424de"
            },
            "downloads": -1,
            "filename": "datazets-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1f7e487a7702a029283ea85bc51c9e73",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 14409,
            "upload_time": "2025-01-17T13:41:39",
            "upload_time_iso_8601": "2025-01-17T13:41:39.024412Z",
            "url": "https://files.pythonhosted.org/packages/44/71/b7012dee713198a598c836da8c446792ab39235714a389ac4db2417a6bca/datazets-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1658d629173d37b4b704656b82299568d182ea12fcb8ecf0eff1468d0703dbe0",
                "md5": "88b149fd27fd4da6e02563e5edd5a7aa",
                "sha256": "27962c727f0c02f370153f183a81fc7e0b33277b95047324c60cae06bad15f99"
            },
            "downloads": -1,
            "filename": "datazets-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "88b149fd27fd4da6e02563e5edd5a7aa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 14949,
            "upload_time": "2025-01-17T13:41:40",
            "upload_time_iso_8601": "2025-01-17T13:41:40.219539Z",
            "url": "https://files.pythonhosted.org/packages/16/58/d629173d37b4b704656b82299568d182ea12fcb8ecf0eff1468d0703dbe0/datazets-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-17 13:41:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "erdogant",
    "github_project": "datazets",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        }
    ],
    "lcname": "datazets"
}
        
Elapsed time: 1.56432s