Name | ucimlrepo JSON |
Version |
0.0.7
JSON |
| download |
home_page | https://github.com/uci-ml-repo/ucimlrepo |
Summary | Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks. |
upload_time | 2024-05-21 06:06:41 |
maintainer | None |
docs_url | None |
author | Philip Truong |
requires_python | >=3.7 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# `ucimlrepo` package
Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.
<br>
**Current Version: 0.0.7**
## Installation
In a Jupyter notebook, install with the command
!pip3 install -U ucimlrepo
Restart the kernel and import the module `ucimlrepo`.
## Example Usage
from ucimlrepo import fetch_ucirepo, list_available_datasets
# check which datasets can be imported
list_available_datasets()
# import dataset
heart_disease = fetch_ucirepo(id=45)
# alternatively: fetch_ucirepo(name='Heart Disease')
# access data
X = heart_disease.data.features
y = heart_disease.data.targets
# train model e.g. sklearn.linear_model.LinearRegression().fit(X, y)
# access metadata
print(heart_disease.metadata.uci_id)
print(heart_disease.metadata.num_instances)
print(heart_disease.metadata.additional_info.summary)
# access variable info in tabular format
print(heart_disease.variables)
## `fetch_ucirepo`
Loads a dataset from the UCI ML Repository, including the dataframes and metadata information.
### Parameters
Provide either a dataset ID or name as keyword (named) arguments. Cannot accept both.
- **`id`**: Dataset ID for UCI ML Repository
- **`name`**: Dataset name, or substring of name
### Returns
- **`dataset`**
- **`data`**: Contains dataset matrices as **pandas** dataframes
- `ids`: Dataframe of ID columns
- `features`: Dataframe of feature columns
- `targets`: Dataframe of target columns
- `original`: Dataframe consisting of all IDs, features, and targets
- `headers`: List of all variable names/headers
- **`metadata`**: Contains metadata information about the dataset
- See Metadata section below for details
- **`variables`**: Contains variable details presented in a tabular/dataframe format
- `name`: Variable name
- `role`: Whether the variable is an ID, feature, or target
- `type`: Data type e.g. categorical, integer, continuous
- `demographic`: Indicates whether the variable represents demographic data
- `description`: Short description of variable
- `units`: variable units for non-categorical data
- `missing_values`: Whether there are missing values in the variable's column
## `list_available_datasets`
Prints a list of datasets that can be imported via `fetch_ucirepo`
### Parameters
- **`filter`**: Optional keyword argument to filter available datasets based on a category
- Valid filters: `aim-ahead`
- **`search`**: Optional keyword argument to search datasets whose name contains the search query
### Returns
none
## Metadata
- `uci_id`: Unique dataset identifier for UCI repository
- `name`
- `abstract`: Short description of dataset
- `area`: Subject area e.g. life science, business
- `task`: Associated machine learning tasks e.g. classification, regression
- `characteristics`: Dataset types e.g. multivariate, sequential
- `num_instances`: Number of rows or samples
- `num_features`: Number of feature columns
- `feature_types`: Data types of features
- `target_col`: Name of target column(s)
- `index_col`: Name of index column(s)
- `has_missing_values`: Whether the dataset contains missing values
- `missing_values_symbol`: Indicates what symbol represents the missing entries (if the dataset has missing values)
- `year_of_dataset_creation`
- `dataset_doi`: DOI registered for dataset that links to UCI repo dataset page
- `creators`: List of dataset creator names
- `intro_paper`: Information about dataset's published introductory paper
- `repository_url`: Link to dataset webpage on the UCI repository
- `data_url`: Link to raw data file
- `additional_info`: Descriptive free text about dataset
- `summary`: General summary
- `purpose`: For what purpose was the dataset created?
- `funding`: Who funded the creation of the dataset?
- `instances_represent`: What do the instances in this dataset represent?
- `recommended_data_splits`: Are there recommended data splits?
- `sensitive_data`: Does the dataset contain data that might be considered sensitive in any way?
- `preprocessing_description`: Was there any data preprocessing performed?
- `variable_info`: Additional free text description for variables
- `citation`: Citation Requests/Acknowledgements
- `external_url`: URL to external dataset page. This field will only exist for linked datasets i.e. not hosted by UCI
## Links
- [UCI Machine Learning Repository home page](https://archive.ics.uci.edu/)
- [PyPi repository for this package](https://pypi.org/project/ucimlrepo)
- [Submit an issue](https://github.com/uci-ml-repo/ucimlrepo-feedback/issues)
Raw data
{
"_id": null,
"home_page": "https://github.com/uci-ml-repo/ucimlrepo",
"name": "ucimlrepo",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": "Philip Truong",
"author_email": "Philip Truong <ucirepository@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/87/7c/f5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e/ucimlrepo-0.0.7.tar.gz",
"platform": null,
"description": "# `ucimlrepo` package\r\nPackage to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks. \r\n<br>\r\n**Current Version: 0.0.7**\r\n\r\n## Installation\r\nIn a Jupyter notebook, install with the command \r\n\r\n !pip3 install -U ucimlrepo \r\n \r\nRestart the kernel and import the module `ucimlrepo`.\r\n\r\n## Example Usage\r\n\r\n from ucimlrepo import fetch_ucirepo, list_available_datasets\r\n\t\r\n\t# check which datasets can be imported\r\n\tlist_available_datasets()\r\n \r\n # import dataset\r\n heart_disease = fetch_ucirepo(id=45)\r\n # alternatively: fetch_ucirepo(name='Heart Disease')\r\n \r\n # access data\r\n X = heart_disease.data.features\r\n y = heart_disease.data.targets\r\n # train model e.g. sklearn.linear_model.LinearRegression().fit(X, y)\r\n \r\n # access metadata\r\n print(heart_disease.metadata.uci_id)\r\n print(heart_disease.metadata.num_instances)\r\n print(heart_disease.metadata.additional_info.summary)\r\n \r\n # access variable info in tabular format\r\n print(heart_disease.variables)\r\n\r\n\r\n\r\n## `fetch_ucirepo`\r\nLoads a dataset from the UCI ML Repository, including the dataframes and metadata information.\r\n\r\n### Parameters\r\nProvide either a dataset ID or name as keyword (named) arguments. Cannot accept both.\r\n- **`id`**: Dataset ID for UCI ML Repository\r\n- **`name`**: Dataset name, or substring of name\r\n\r\n### Returns\r\n- **`dataset`**\r\n\t- **`data`**: Contains dataset matrices as **pandas** dataframes\r\n\t\t- `ids`: Dataframe of ID columns\r\n\t\t- `features`: Dataframe of feature columns\r\n\t\t- `targets`: Dataframe of target columns\r\n\t\t- `original`: Dataframe consisting of all IDs, features, and targets\r\n\t\t- `headers`: List of all variable names/headers\r\n\t- **`metadata`**: Contains metadata information about the dataset\r\n\t\t- See Metadata section below for details\r\n\t- **`variables`**: Contains variable details presented in a tabular/dataframe format\r\n\t\t- `name`: Variable name\r\n\t\t- `role`: Whether the variable is an ID, feature, or target\r\n\t\t- `type`: Data type e.g. categorical, integer, continuous\r\n\t\t- `demographic`: Indicates whether the variable represents demographic data\r\n\t\t- `description`: Short description of variable\r\n\t\t- `units`: variable units for non-categorical data\r\n\t\t- `missing_values`: Whether there are missing values in the variable's column\r\n \r\n\r\n## `list_available_datasets`\r\nPrints a list of datasets that can be imported via `fetch_ucirepo`\r\n### Parameters\r\n- **`filter`**: Optional keyword argument to filter available datasets based on a category\r\n\t- Valid filters: `aim-ahead`\r\n- **`search`**: Optional keyword argument to search datasets whose name contains the search query\r\n### Returns\r\nnone\r\n\r\n\r\n## Metadata \r\n- `uci_id`: Unique dataset identifier for UCI repository \r\n- `name`\r\n- `abstract`: Short description of dataset\r\n- `area`: Subject area e.g. life science, business\r\n- `task`: Associated machine learning tasks e.g. classification, regression\r\n- `characteristics`: Dataset types e.g. multivariate, sequential\r\n- `num_instances`: Number of rows or samples\r\n- `num_features`: Number of feature columns\r\n- `feature_types`: Data types of features\r\n- `target_col`: Name of target column(s)\r\n- `index_col`: Name of index column(s)\r\n- `has_missing_values`: Whether the dataset contains missing values\r\n- `missing_values_symbol`: Indicates what symbol represents the missing entries (if the dataset has missing values)\r\n- `year_of_dataset_creation`\r\n- `dataset_doi`: DOI registered for dataset that links to UCI repo dataset page\r\n- `creators`: List of dataset creator names\r\n- `intro_paper`: Information about dataset's published introductory paper\r\n- `repository_url`: Link to dataset webpage on the UCI repository\r\n- `data_url`: Link to raw data file\r\n- `additional_info`: Descriptive free text about dataset\r\n\t- `summary`: General summary \r\n\t- `purpose`: For what purpose was the dataset created?\r\n\t- `funding`: Who funded the creation of the dataset?\r\n\t- `instances_represent`: What do the instances in this dataset represent?\r\n\t- `recommended_data_splits`: Are there recommended data splits?\r\n\t- `sensitive_data`: Does the dataset contain data that might be considered sensitive in any way?\r\n\t- `preprocessing_description`: Was there any data preprocessing performed?\r\n\t- `variable_info`: Additional free text description for variables\r\n\t- `citation`: Citation Requests/Acknowledgements\r\n - `external_url`: URL to external dataset page. This field will only exist for linked datasets i.e. not hosted by UCI\r\n\r\n\r\n## Links\r\n- [UCI Machine Learning Repository home page](https://archive.ics.uci.edu/)\r\n- [PyPi repository for this package](https://pypi.org/project/ucimlrepo)\r\n- [Submit an issue](https://github.com/uci-ml-repo/ucimlrepo-feedback/issues)\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.",
"version": "0.0.7",
"project_urls": {
"Bug Tracker": "https://github.com/uci-ml-repo/ucimlrepo/issues",
"Homepage": "https://github.com/uci-ml-repo/ucimlrepo/tree/main"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3b071252560194df2b4fad1cb3c46081b948331c63eb1bb0b97620d508d12a53",
"md5": "0d2573e037a2139365385e8588dbde52",
"sha256": "0a5ce7e21d7ec850a0da4427c47f9dd96fcc6532f1c7e95dcec63eeb40f08026"
},
"downloads": -1,
"filename": "ucimlrepo-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0d2573e037a2139365385e8588dbde52",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 8041,
"upload_time": "2024-05-21T06:06:39",
"upload_time_iso_8601": "2024-05-21T06:06:39.826389Z",
"url": "https://files.pythonhosted.org/packages/3b/07/1252560194df2b4fad1cb3c46081b948331c63eb1bb0b97620d508d12a53/ucimlrepo-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "877cf5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e",
"md5": "e4d228c4b01fcea87d2a3a13afa877ef",
"sha256": "4cff3f9e814367dd60956da999ace473197237b9fce4c07e9a689e77b4ffb59a"
},
"downloads": -1,
"filename": "ucimlrepo-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "e4d228c4b01fcea87d2a3a13afa877ef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 9369,
"upload_time": "2024-05-21T06:06:41",
"upload_time_iso_8601": "2024-05-21T06:06:41.465579Z",
"url": "https://files.pythonhosted.org/packages/87/7c/f5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e/ucimlrepo-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-21 06:06:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "uci-ml-repo",
"github_project": "ucimlrepo",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ucimlrepo"
}