ucimlrepo


Nameucimlrepo JSON
Version 0.0.7 PyPI version JSON
download
home_pagehttps://github.com/uci-ml-repo/ucimlrepo
SummaryPackage to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.
upload_time2024-05-21 06:06:41
maintainerNone
docs_urlNone
authorPhilip Truong
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # `ucimlrepo` package
Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks. 
<br>
**Current Version: 0.0.7**

## Installation
In a Jupyter notebook, install with the command 

    !pip3 install -U ucimlrepo 
    
Restart the kernel and import the module `ucimlrepo`.

## Example Usage

    from ucimlrepo import fetch_ucirepo, list_available_datasets
	
	# check which datasets can be imported
	list_available_datasets()
    
    # import dataset
    heart_disease = fetch_ucirepo(id=45)
    # alternatively: fetch_ucirepo(name='Heart Disease')
    
    # access data
    X = heart_disease.data.features
    y = heart_disease.data.targets
    # train model e.g. sklearn.linear_model.LinearRegression().fit(X, y)
    
    # access metadata
    print(heart_disease.metadata.uci_id)
    print(heart_disease.metadata.num_instances)
    print(heart_disease.metadata.additional_info.summary)
    
    # access variable info in tabular format
    print(heart_disease.variables)



## `fetch_ucirepo`
Loads a dataset from the UCI ML Repository, including the dataframes and metadata information.

### Parameters
Provide either a dataset ID or name as keyword (named) arguments. Cannot accept both.
- **`id`**: Dataset ID for UCI ML Repository
- **`name`**: Dataset name, or substring of name

### Returns
- **`dataset`**
	- **`data`**: Contains dataset matrices as **pandas** dataframes
		- `ids`: Dataframe of ID columns
		- `features`: Dataframe of feature columns
		- `targets`: Dataframe of target columns
		- `original`: Dataframe consisting of all IDs, features, and targets
		- `headers`: List of all variable names/headers
	- **`metadata`**: Contains metadata information about the dataset
		- See Metadata section below for details
	- **`variables`**: Contains variable details presented in a tabular/dataframe format
		- `name`: Variable name
		- `role`: Whether the variable is an ID, feature, or target
		- `type`: Data type e.g. categorical, integer, continuous
		- `demographic`: Indicates whether the variable represents demographic data
		- `description`: Short description of variable
		- `units`: variable units for non-categorical data
		- `missing_values`: Whether there are missing values in the variable's column
   

## `list_available_datasets`
Prints a list of datasets that can be imported via `fetch_ucirepo`
### Parameters
- **`filter`**: Optional keyword argument to filter available datasets based on a category
	- Valid filters: `aim-ahead`
- **`search`**: Optional keyword argument to search datasets whose name contains the search query
### Returns
none


## Metadata 
- `uci_id`: Unique dataset identifier for UCI repository 
- `name`
- `abstract`: Short description of dataset
- `area`: Subject area e.g. life science, business
- `task`: Associated machine learning tasks e.g. classification, regression
- `characteristics`: Dataset types e.g. multivariate, sequential
- `num_instances`: Number of rows or samples
- `num_features`: Number of feature columns
- `feature_types`: Data types of features
- `target_col`: Name of target column(s)
- `index_col`: Name of index column(s)
- `has_missing_values`: Whether the dataset contains missing values
- `missing_values_symbol`: Indicates what symbol represents the missing entries (if the dataset has missing values)
- `year_of_dataset_creation`
- `dataset_doi`: DOI registered for dataset that links to UCI repo dataset page
- `creators`: List of dataset creator names
- `intro_paper`: Information about dataset's published introductory paper
- `repository_url`: Link to dataset webpage on the UCI repository
- `data_url`: Link to raw data file
- `additional_info`: Descriptive free text about dataset
	- `summary`: General summary 
	- `purpose`: For what purpose was the dataset created?
	- `funding`: Who funded the creation of the dataset?
	- `instances_represent`: What do the instances in this dataset represent?
	- `recommended_data_splits`: Are there recommended data splits?
	- `sensitive_data`: Does the dataset contain data that might be considered sensitive in any way?
	- `preprocessing_description`: Was there any data preprocessing performed?
	- `variable_info`: Additional free text description for variables
	- `citation`: Citation Requests/Acknowledgements
 - `external_url`: URL to external dataset page. This field will only exist for linked datasets i.e. not hosted by UCI


## Links
- [UCI Machine Learning Repository home page](https://archive.ics.uci.edu/)
- [PyPi repository for this package](https://pypi.org/project/ucimlrepo)
- [Submit an issue](https://github.com/uci-ml-repo/ucimlrepo-feedback/issues)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/uci-ml-repo/ucimlrepo",
    "name": "ucimlrepo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Philip Truong",
    "author_email": "Philip Truong <ucirepository@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/87/7c/f5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e/ucimlrepo-0.0.7.tar.gz",
    "platform": null,
    "description": "# `ucimlrepo` package\r\nPackage to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks. \r\n<br>\r\n**Current Version: 0.0.7**\r\n\r\n## Installation\r\nIn a Jupyter notebook, install with the command \r\n\r\n    !pip3 install -U ucimlrepo \r\n    \r\nRestart the kernel and import the module `ucimlrepo`.\r\n\r\n## Example Usage\r\n\r\n    from ucimlrepo import fetch_ucirepo, list_available_datasets\r\n\t\r\n\t# check which datasets can be imported\r\n\tlist_available_datasets()\r\n    \r\n    # import dataset\r\n    heart_disease = fetch_ucirepo(id=45)\r\n    # alternatively: fetch_ucirepo(name='Heart Disease')\r\n    \r\n    # access data\r\n    X = heart_disease.data.features\r\n    y = heart_disease.data.targets\r\n    # train model e.g. sklearn.linear_model.LinearRegression().fit(X, y)\r\n    \r\n    # access metadata\r\n    print(heart_disease.metadata.uci_id)\r\n    print(heart_disease.metadata.num_instances)\r\n    print(heart_disease.metadata.additional_info.summary)\r\n    \r\n    # access variable info in tabular format\r\n    print(heart_disease.variables)\r\n\r\n\r\n\r\n## `fetch_ucirepo`\r\nLoads a dataset from the UCI ML Repository, including the dataframes and metadata information.\r\n\r\n### Parameters\r\nProvide either a dataset ID or name as keyword (named) arguments. Cannot accept both.\r\n- **`id`**: Dataset ID for UCI ML Repository\r\n- **`name`**: Dataset name, or substring of name\r\n\r\n### Returns\r\n- **`dataset`**\r\n\t- **`data`**: Contains dataset matrices as **pandas** dataframes\r\n\t\t- `ids`: Dataframe of ID columns\r\n\t\t- `features`: Dataframe of feature columns\r\n\t\t- `targets`: Dataframe of target columns\r\n\t\t- `original`: Dataframe consisting of all IDs, features, and targets\r\n\t\t- `headers`: List of all variable names/headers\r\n\t- **`metadata`**: Contains metadata information about the dataset\r\n\t\t- See Metadata section below for details\r\n\t- **`variables`**: Contains variable details presented in a tabular/dataframe format\r\n\t\t- `name`: Variable name\r\n\t\t- `role`: Whether the variable is an ID, feature, or target\r\n\t\t- `type`: Data type e.g. categorical, integer, continuous\r\n\t\t- `demographic`: Indicates whether the variable represents demographic data\r\n\t\t- `description`: Short description of variable\r\n\t\t- `units`: variable units for non-categorical data\r\n\t\t- `missing_values`: Whether there are missing values in the variable's column\r\n   \r\n\r\n## `list_available_datasets`\r\nPrints a list of datasets that can be imported via `fetch_ucirepo`\r\n### Parameters\r\n- **`filter`**: Optional keyword argument to filter available datasets based on a category\r\n\t- Valid filters: `aim-ahead`\r\n- **`search`**: Optional keyword argument to search datasets whose name contains the search query\r\n### Returns\r\nnone\r\n\r\n\r\n## Metadata \r\n- `uci_id`: Unique dataset identifier for UCI repository \r\n- `name`\r\n- `abstract`: Short description of dataset\r\n- `area`: Subject area e.g. life science, business\r\n- `task`: Associated machine learning tasks e.g. classification, regression\r\n- `characteristics`: Dataset types e.g. multivariate, sequential\r\n- `num_instances`: Number of rows or samples\r\n- `num_features`: Number of feature columns\r\n- `feature_types`: Data types of features\r\n- `target_col`: Name of target column(s)\r\n- `index_col`: Name of index column(s)\r\n- `has_missing_values`: Whether the dataset contains missing values\r\n- `missing_values_symbol`: Indicates what symbol represents the missing entries (if the dataset has missing values)\r\n- `year_of_dataset_creation`\r\n- `dataset_doi`: DOI registered for dataset that links to UCI repo dataset page\r\n- `creators`: List of dataset creator names\r\n- `intro_paper`: Information about dataset's published introductory paper\r\n- `repository_url`: Link to dataset webpage on the UCI repository\r\n- `data_url`: Link to raw data file\r\n- `additional_info`: Descriptive free text about dataset\r\n\t- `summary`: General summary \r\n\t- `purpose`: For what purpose was the dataset created?\r\n\t- `funding`: Who funded the creation of the dataset?\r\n\t- `instances_represent`: What do the instances in this dataset represent?\r\n\t- `recommended_data_splits`: Are there recommended data splits?\r\n\t- `sensitive_data`: Does the dataset contain data that might be considered sensitive in any way?\r\n\t- `preprocessing_description`: Was there any data preprocessing performed?\r\n\t- `variable_info`: Additional free text description for variables\r\n\t- `citation`: Citation Requests/Acknowledgements\r\n - `external_url`: URL to external dataset page. This field will only exist for linked datasets i.e. not hosted by UCI\r\n\r\n\r\n## Links\r\n- [UCI Machine Learning Repository home page](https://archive.ics.uci.edu/)\r\n- [PyPi repository for this package](https://pypi.org/project/ucimlrepo)\r\n- [Submit an issue](https://github.com/uci-ml-repo/ucimlrepo-feedback/issues)\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.",
    "version": "0.0.7",
    "project_urls": {
        "Bug Tracker": "https://github.com/uci-ml-repo/ucimlrepo/issues",
        "Homepage": "https://github.com/uci-ml-repo/ucimlrepo/tree/main"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3b071252560194df2b4fad1cb3c46081b948331c63eb1bb0b97620d508d12a53",
                "md5": "0d2573e037a2139365385e8588dbde52",
                "sha256": "0a5ce7e21d7ec850a0da4427c47f9dd96fcc6532f1c7e95dcec63eeb40f08026"
            },
            "downloads": -1,
            "filename": "ucimlrepo-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0d2573e037a2139365385e8588dbde52",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 8041,
            "upload_time": "2024-05-21T06:06:39",
            "upload_time_iso_8601": "2024-05-21T06:06:39.826389Z",
            "url": "https://files.pythonhosted.org/packages/3b/07/1252560194df2b4fad1cb3c46081b948331c63eb1bb0b97620d508d12a53/ucimlrepo-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "877cf5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e",
                "md5": "e4d228c4b01fcea87d2a3a13afa877ef",
                "sha256": "4cff3f9e814367dd60956da999ace473197237b9fce4c07e9a689e77b4ffb59a"
            },
            "downloads": -1,
            "filename": "ucimlrepo-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "e4d228c4b01fcea87d2a3a13afa877ef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 9369,
            "upload_time": "2024-05-21T06:06:41",
            "upload_time_iso_8601": "2024-05-21T06:06:41.465579Z",
            "url": "https://files.pythonhosted.org/packages/87/7c/f5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e/ucimlrepo-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-21 06:06:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "uci-ml-repo",
    "github_project": "ucimlrepo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ucimlrepo"
}
        
Elapsed time: 0.25220s