# Ivers
This project offers tools for managing data splits, ensuring endpoint distributions are maintained, and presents two novel temporal split techniques: 'leaky' and 'all for free' splits. See the explanation below.
**Note**: This library was used in this paper [PlaceHolder](https://github.com/IversOhlsson/ivers) to generate the data splits.
## Features
- **Temporal Leaky**: Allows for forward-leakage in your data to simulate real-world scenarios where future data might influence the model subtly.
- **Temporal AllForFree**: Provides a stricter temporal separation, ensuring that the training data is entirely independent of the test set, suitable for rigorous testing of model predictions over time.
- **Temporal Fold Split**: Implements a novel approach to increasing the training set size successively across multiple folds based on the temporal time sequence
- **Stratified Endpoint Split**: Our library introduces a stratified endpoint split, crucial for maintaining a consistent distribution of data across different categories or endpoints in your datasets. Especially useful in scenarios where endpoint distributions are critical, such as in cheminformatics and bioinformatics.
- **Cross-Validation Support**: Integrates capabilities to ensure that each cross-validation split maintains endpoint distribution, ideal for developing models that are generalizable across varied data conditions.
## Integration with Chemprop
- By setting the `chemprop` variable to `true`, the library will generate splits compatible with the Chemprop library. This ensures that the features and train-test splits are generated in a way that can easily be used with Chemprop.
## Getting Started or Contributing
To get started with this library, clone the repository and install the required dependencies:
```bash
git clone https://github.com/IversOhlsson/ivers.git
cd ivers
pip install -r requirements.txt
```
## Installation via pip
You can also install the package via pip:
```bash
pip install ivers
```
We welcome contributions! Feel free to open issues or pull requests on our GitHub repository.
## Guide
## Reference
when using this library, please cite the following paper:
```
@article{Ivers_1,
title={PlaceHolder},
author={PlaceHolder},
journal={PlaceHolder},
volume={PlaceHolder},
number={PlaceHolder},
pages={PlaceHolder},
year={PlaceHolder},
publisher={PlaceHolder}
}
```
Raw data
{
"_id": null,
"home_page": "http://github.com/iversohlsson/ivers",
"name": "ivers",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "chemprop chemistry data science dataset splitting stratification temporal splits ivers",
"author": "Philip Ivers Ohlsson",
"author_email": "philip.iversohlsson@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/93/c4/faa97ea68fbdf92f3aef25ba7fd401cc92665ffb6d2ed2154e4ba09b98b3/ivers-0.1.13.tar.gz",
"platform": null,
"description": "# Ivers\r\n\r\n\r\nThis project offers tools for managing data splits, ensuring endpoint distributions are maintained, and presents two novel temporal split techniques: 'leaky' and 'all for free' splits. See the explanation below. \r\n\r\n**Note**: This library was used in this paper [PlaceHolder](https://github.com/IversOhlsson/ivers) to generate the data splits.\r\n\r\n## Features\r\n - **Temporal Leaky**: Allows for forward-leakage in your data to simulate real-world scenarios where future data might influence the model subtly.\r\n - **Temporal AllForFree**: Provides a stricter temporal separation, ensuring that the training data is entirely independent of the test set, suitable for rigorous testing of model predictions over time.\r\n - **Temporal Fold Split**: Implements a novel approach to increasing the training set size successively across multiple folds based on the temporal time sequence\r\n - **Stratified Endpoint Split**: Our library introduces a stratified endpoint split, crucial for maintaining a consistent distribution of data across different categories or endpoints in your datasets. Especially useful in scenarios where endpoint distributions are critical, such as in cheminformatics and bioinformatics.\r\n - **Cross-Validation Support**: Integrates capabilities to ensure that each cross-validation split maintains endpoint distribution, ideal for developing models that are generalizable across varied data conditions.\r\n\r\n## Integration with Chemprop\r\n\r\n- By setting the `chemprop` variable to `true`, the library will generate splits compatible with the Chemprop library. This ensures that the features and train-test splits are generated in a way that can easily be used with Chemprop.\r\n\r\n## Getting Started or Contributing\r\n\r\nTo get started with this library, clone the repository and install the required dependencies:\r\n\r\n```bash\r\ngit clone https://github.com/IversOhlsson/ivers.git\r\ncd ivers\r\npip install -r requirements.txt\r\n```\r\n\r\n## Installation via pip\r\nYou can also install the package via pip:\r\n```bash\r\npip install ivers\r\n```\r\nWe welcome contributions! Feel free to open issues or pull requests on our GitHub repository.\r\n\r\n## Guide\r\n\r\n## Reference\r\nwhen using this library, please cite the following paper:\r\n```\r\n@article{Ivers_1,\r\n title={PlaceHolder},\r\n author={PlaceHolder},\r\n journal={PlaceHolder},\r\n volume={PlaceHolder},\r\n number={PlaceHolder},\r\n pages={PlaceHolder},\r\n year={PlaceHolder},\r\n publisher={PlaceHolder}\r\n}\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Python package to stratify split datasets based on endpoint distributions, also 2 different temporal splits. Chemprop compatible.",
"version": "0.1.13",
"project_urls": {
"Documentation": "http://github.com/iversohlsson/ivers/docs/_build/html/index.html",
"Homepage": "http://github.com/iversohlsson/ivers",
"Source": "http://github.com/iversohlsson/ivers"
},
"split_keywords": [
"chemprop",
"chemistry",
"data",
"science",
"dataset",
"splitting",
"stratification",
"temporal",
"splits",
"ivers"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a83f47a30e97c4c53520b69e4512f0e7271502469a34dde77b1f8facdadbc2c7",
"md5": "332ac25a579907df5514b6d3171acf07",
"sha256": "0c7a8599ba1f8bd6e601d4b88b486e22defbfd32d73e4c879c972dfe446e4b99"
},
"downloads": -1,
"filename": "ivers-0.1.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "332ac25a579907df5514b6d3171acf07",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 18204,
"upload_time": "2024-06-23T17:35:49",
"upload_time_iso_8601": "2024-06-23T17:35:49.104710Z",
"url": "https://files.pythonhosted.org/packages/a8/3f/47a30e97c4c53520b69e4512f0e7271502469a34dde77b1f8facdadbc2c7/ivers-0.1.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "93c4faa97ea68fbdf92f3aef25ba7fd401cc92665ffb6d2ed2154e4ba09b98b3",
"md5": "d8b4ce961758d7036fe742b6bbe2a27e",
"sha256": "49f8dbda1ac37d3be665ea5044f2c3b9acb9507eb8054eedb91ecb494929c85c"
},
"downloads": -1,
"filename": "ivers-0.1.13.tar.gz",
"has_sig": false,
"md5_digest": "d8b4ce961758d7036fe742b6bbe2a27e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14922,
"upload_time": "2024-06-23T17:35:51",
"upload_time_iso_8601": "2024-06-23T17:35:51.113730Z",
"url": "https://files.pythonhosted.org/packages/93/c4/faa97ea68fbdf92f3aef25ba7fd401cc92665ffb6d2ed2154e4ba09b98b3/ivers-0.1.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-23 17:35:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "iversohlsson",
"github_project": "ivers",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": []
},
{
"name": "scikit-learn",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "logging",
"specs": []
},
{
"name": "typing",
"specs": []
},
{
"name": "datetime",
"specs": []
},
{
"name": "unittest",
"specs": []
}
],
"lcname": "ivers"
}