# Cross Validation Package
Python package for plug and play cross validation techniques.
If you like the idea or you find usefull this repo in your job, please leave a star to support this personal project.
* Cross Validation methods:
* [K-fold](#k-fold);
* [Leave One Out (LOO)](#leave-one-out-loo);
* [Leave One Subject Out (LOSO)](#leave-one-subject-out-loso).
At the moment the package is not available using `pip install <PACKAGE-NAME>`.
For the installation from the source code click **[here](#installation)**.
Each method returns the confusion matrix and some performance metrics for each itheration and for the overall result.
The performance metrics are:
* Balanced Accuracy;
* F1 Score;
* Matthews Correlation Coefficient.
## K-fold
K-fold consists of partitioning the dataset into k subsets; iteratively one of the k subsets is the test set and the others are the training set.
The value of k could be chosen according to the amount of available data. Increasing the value of k the result is enlarging the training set and decreasing the size of the test set.
Tipically, the default value of k is between 5 to 10, this is a good trade of between a robust validation and computational time.
After a k-fold cross validation all the data set has been tested and it is possible to generate a confusion matrix and compute some performance metrics to validate the generalization capabilities of your model.
![k-fold-cv-image](images/k-fold-cross-validation.png)
***K-fold cross-validation concept illustration** Each row represents an iteration of the cross-validation; in blue, there are the subsets labeled as training set and in orange, the subset defined as test set for the i-th iteration.
At the end, each subset has been tested getting the outcome, that could be compared to the real outputs of the instances*
### Example
```python
from cross_validation.cross_validation import kfold
clf = RandomForestClassifier()
[cm, perf] = kfold(clf, X, y, verbose=True)
```
## Leave One Out (LOO)
Leave-one-out (LOO) is a particular case of the k-fold when the value of k is equal to the number of data points in the dataset.
This method should be used when the data set has few samples; this guarantees to have enough data point for the model training; after the training phase only one point will be evaluated by the model.
### Example
```python
from cross_validation.cross_validation import leave_one_out
clf = RandomForestClassifier()
[cm, perf] = leave_one_out(clf, X, y, verbose=True)
```
## Leave One Subject Out (LOSO)
This method could be considered as a different version of the leave-one-out cross-validation. This method works leaving as a test set not a single example, but the entire examples that belong to a specific subject. The other subjects’ instances are used to train the learning algorithm.
The main advantage of the LOSO is the removal of the subject bias because all the instances of the are the test set.
This technique of cross-validation is widely used in the biomedical field where the the main task is to predict a disease or a condition of a patient using data of other patients.
### Example
```python
from cross_validation.cross_validation import leave_one_subject_out
clf = RandomForestClassifier()
[cm, perf] = leave_one_subject_out(clf, X, y, subject_ids, verbose=True):
```
## Installation
For the installation from the source code type this command into your terminal window:
```
pip install git+<repository-link>
```
or
```
python -m pip install git+<repository-link>
```
or
```
python3 -m pip install git+<repository-link>
```
Raw data
{
"_id": null,
"home_page": "https://github.com/matteo-serafino/cross-validation.git",
"name": "cross-validation-package",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6.2",
"maintainer_email": "",
"keywords": "cross-validation",
"author": "Matteo Serafino",
"author_email": "matteo.serafino1@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/6e/88/1032d5509016cbe953020beade33497301b5f2082adb983ea3af73317413/cross-validation-package-1.0.0.tar.gz",
"platform": null,
"description": "# Cross Validation Package\r\nPython package for plug and play cross validation techniques.\r\nIf you like the idea or you find usefull this repo in your job, please leave a star to support this personal project.\r\n\r\n* Cross Validation methods:\r\n * [K-fold](#k-fold);\r\n * [Leave One Out (LOO)](#leave-one-out-loo);\r\n * [Leave One Subject Out (LOSO)](#leave-one-subject-out-loso).\r\n\r\nAt the moment the package is not available using `pip install <PACKAGE-NAME>`.\r\n\r\nFor the installation from the source code click **[here](#installation)**.\r\n\r\nEach method returns the confusion matrix and some performance metrics for each itheration and for the overall result.\r\nThe performance metrics are:\r\n* Balanced Accuracy;\r\n* F1 Score;\r\n* Matthews Correlation Coefficient. \r\n\r\n## K-fold\r\nK-fold consists of partitioning the dataset into k subsets; iteratively one of the k subsets is the test set and the others are the training set.\r\nThe value of k could be chosen according to the amount of available data. Increasing the value of k the result is enlarging the training set and decreasing the size of the test set. \r\nTipically, the default value of k is between 5 to 10, this is a good trade of between a robust validation and computational time.\r\nAfter a k-fold cross validation all the data set has been tested and it is possible to generate a confusion matrix and compute some performance metrics to validate the generalization capabilities of your model.\r\n\r\n![k-fold-cv-image](images/k-fold-cross-validation.png)\r\n***K-fold cross-validation concept illustration** Each row represents an iteration of the cross-validation; in blue, there are the subsets labeled as training set and in orange, the subset defined as test set for the i-th iteration. \r\nAt the end, each subset has been tested getting the outcome, that could be compared to the real outputs of the instances*\r\n\r\n### Example\r\n```python\r\nfrom cross_validation.cross_validation import kfold\r\n\r\nclf = RandomForestClassifier()\r\n[cm, perf] = kfold(clf, X, y, verbose=True)\r\n```\r\n\r\n## Leave One Out (LOO)\r\nLeave-one-out (LOO) is a particular case of the k-fold when the value of k is equal to the number of data points in the dataset.\r\nThis method should be used when the data set has few samples; this guarantees to have enough data point for the model training; after the training phase only one point will be evaluated by the model.\r\n\r\n### Example\r\n```python\r\nfrom cross_validation.cross_validation import leave_one_out\r\n\r\nclf = RandomForestClassifier()\r\n[cm, perf] = leave_one_out(clf, X, y, verbose=True)\r\n```\r\n\r\n## Leave One Subject Out (LOSO)\r\nThis method could be considered as a different version of the leave-one-out cross-validation. This method works leaving as a test set not a single example, but the entire examples that belong to a specific subject. The other subjects\u00e2\u20ac\u2122 instances are used to train the learning algorithm.\r\nThe main advantage of the LOSO is the removal of the subject bias because all the instances of the are the test set.\r\nThis technique of cross-validation is widely used in the biomedical field where the the main task is to predict a disease or a condition of a patient using data of other patients.\r\n\r\n### Example\r\n```python\r\nfrom cross_validation.cross_validation import leave_one_subject_out\r\n\r\nclf = RandomForestClassifier()\r\n[cm, perf] = leave_one_subject_out(clf, X, y, subject_ids, verbose=True):\r\n```\r\n\r\n## Installation\r\nFor the installation from the source code type this command into your terminal window:\r\n```\r\npip install git+<repository-link>\r\n```\r\nor\r\n```\r\npython -m pip install git+<repository-link>\r\n```\r\nor\r\n```\r\npython3 -m pip install git+<repository-link>\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "",
"version": "1.0.0",
"split_keywords": [
"cross-validation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0b0eaa7c54d1650c182a64c820d26370b97b72ebcd96d8e87cfeb992b7894277",
"md5": "91896cecf9ab861333427c540aadc530",
"sha256": "0fabb0690a8f830ce4da0d0bce354fe705722376df4710b87da2f15dea4bf80d"
},
"downloads": -1,
"filename": "cross_validation_package-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "91896cecf9ab861333427c540aadc530",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6.2",
"size": 9191,
"upload_time": "2023-04-10T07:18:54",
"upload_time_iso_8601": "2023-04-10T07:18:54.931807Z",
"url": "https://files.pythonhosted.org/packages/0b/0e/aa7c54d1650c182a64c820d26370b97b72ebcd96d8e87cfeb992b7894277/cross_validation_package-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6e881032d5509016cbe953020beade33497301b5f2082adb983ea3af73317413",
"md5": "ced6ad1796199598e65042ad21feea24",
"sha256": "b81e684c9e5d5a0e39d44c8257bf0a814934d09953ce4a86bb0abd780a1830f0"
},
"downloads": -1,
"filename": "cross-validation-package-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "ced6ad1796199598e65042ad21feea24",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.2",
"size": 8372,
"upload_time": "2023-04-10T07:18:56",
"upload_time_iso_8601": "2023-04-10T07:18:56.796919Z",
"url": "https://files.pythonhosted.org/packages/6e/88/1032d5509016cbe953020beade33497301b5f2082adb983ea3af73317413/cross-validation-package-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-10 07:18:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "matteo-serafino",
"github_project": "cross-validation.git",
"lcname": "cross-validation-package"
}