Here's the updated documentation encapsulated in a code block for clarity:
vbnet
Copy code
# Ivers
Ivers offers a suite of tools designed for managing data splits while maintaining endpoint distributions, and introduces two novel temporal split techniques: 'Leaky' and 'All for Free'. This library ensures that data splits are suitable for realistic scenarios and rigorous testing needs in various applications. It was utilized to generate data splits in the research outlined in the [linked paper](https://github.com/IversOhlsson/ivers).
## Features
- **Temporal Leaky**: Simulates real-world scenarios by allowing forward-leakage in data, which might subtly influence future models.
- **Temporal AllForFree**: Ensures strict temporal separation, keeping training data completely independent of the test set—ideal for accurate long-term model predictions.
- **Temporal Fold Split**: Progressively increases the training set size across multiple folds, adhering to the temporal sequence, enhancing model robustness over time.
- **Stratified Endpoint Split**: Introduces a stratified approach to splitting, crucial for consistent endpoint distribution across different categories in datasets—beneficial in fields like cheminformatics and bioinformatics.
## Code Functions
The library includes several functions tailored for different splitting strategies:
- `stratify_endpoint`, `stratify_split_and_cv`: These functions generate train/test and cross-validation splits that respect endpoint distribution.
- `leaky_endpoint_split`, `allforone_endpoint_split`: Used for generating a single train/test split with respective temporal dynamics.
- `allforone_folds_endpoint_split`, `leaky_folds_endpoint_split`: Enable multiple sectional splits, increasing training data size consistently.
- `balanced_scaffold_cv`: Supports balanced scaffold cross-validation, enhancing data representativeness in splits.
## Integration with Chemprop
- Activating the `chemprop` configuration allows the library to generate splits that are directly compatible with the Chemprop framework, facilitating seamless integration and usage.
## Getting Started or Contributing
To begin using Ivers, clone the repository and set up the necessary dependencies:
```bash
git clone https://github.com/IversOhlsson/ivers.git
cd ivers
pip install -r requirements.txt
```
## Installation via pip
You can also install the package via pip:
```bash
pip install ivers
```
We welcome contributions! Feel free to open issues or pull requests on our GitHub repository.
## Guide
## Reference
when using this library, please cite the following paper:
```
@article{Ivers_1,
title={PlaceHolder},
author={PlaceHolder},
journal={PlaceHolder},
volume={PlaceHolder},
number={PlaceHolder},
pages={PlaceHolder},
year={PlaceHolder},
publisher={PlaceHolder}
}
```
Raw data
{
"_id": null,
"home_page": "http://github.com/iversohlsson/ivers",
"name": "ivers",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Philip Ivers Ohlsson",
"author_email": "philip.iversohlsson@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/64/b8/d28771e887c32c5bab8b59362ce64b260bfc96ed345bf972c8379435666c/ivers-0.2.2.tar.gz",
"platform": null,
"description": "\r\nHere's the updated documentation encapsulated in a code block for clarity:\r\n\r\nvbnet\r\nCopy code\r\n# Ivers\r\n\r\nIvers offers a suite of tools designed for managing data splits while maintaining endpoint distributions, and introduces two novel temporal split techniques: 'Leaky' and 'All for Free'. This library ensures that data splits are suitable for realistic scenarios and rigorous testing needs in various applications. It was utilized to generate data splits in the research outlined in the [linked paper](https://github.com/IversOhlsson/ivers).\r\n\r\n## Features\r\n - **Temporal Leaky**: Simulates real-world scenarios by allowing forward-leakage in data, which might subtly influence future models.\r\n - **Temporal AllForFree**: Ensures strict temporal separation, keeping training data completely independent of the test set\u00e2\u20ac\u201dideal for accurate long-term model predictions.\r\n - **Temporal Fold Split**: Progressively increases the training set size across multiple folds, adhering to the temporal sequence, enhancing model robustness over time.\r\n - **Stratified Endpoint Split**: Introduces a stratified approach to splitting, crucial for consistent endpoint distribution across different categories in datasets\u00e2\u20ac\u201dbeneficial in fields like cheminformatics and bioinformatics.\r\n\r\n\r\n## Code Functions\r\nThe library includes several functions tailored for different splitting strategies:\r\n\r\n- `stratify_endpoint`, `stratify_split_and_cv`: These functions generate train/test and cross-validation splits that respect endpoint distribution.\r\n- `leaky_endpoint_split`, `allforone_endpoint_split`: Used for generating a single train/test split with respective temporal dynamics.\r\n- `allforone_folds_endpoint_split`, `leaky_folds_endpoint_split`: Enable multiple sectional splits, increasing training data size consistently.\r\n- `balanced_scaffold_cv`: Supports balanced scaffold cross-validation, enhancing data representativeness in splits.\r\n\r\n\r\n## Integration with Chemprop\r\n\r\n- Activating the `chemprop` configuration allows the library to generate splits that are directly compatible with the Chemprop framework, facilitating seamless integration and usage.\r\n\r\n## Getting Started or Contributing\r\n\r\nTo begin using Ivers, clone the repository and set up the necessary dependencies:\r\n\r\n```bash\r\ngit clone https://github.com/IversOhlsson/ivers.git\r\ncd ivers\r\npip install -r requirements.txt\r\n```\r\n\r\n## Installation via pip\r\nYou can also install the package via pip:\r\n```bash\r\npip install ivers\r\n```\r\nWe welcome contributions! Feel free to open issues or pull requests on our GitHub repository.\r\n\r\n## Guide\r\n\r\n## Reference\r\nwhen using this library, please cite the following paper:\r\n```\r\n@article{Ivers_1,\r\n title={PlaceHolder},\r\n author={PlaceHolder},\r\n journal={PlaceHolder},\r\n volume={PlaceHolder},\r\n number={PlaceHolder},\r\n pages={PlaceHolder},\r\n year={PlaceHolder},\r\n publisher={PlaceHolder}\r\n}\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Python package to stratify split datasets based on endpoint distributions",
"version": "0.2.2",
"project_urls": {
"Homepage": "http://github.com/iversohlsson/ivers"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "aa402057d587be10adbecc9288a9309a6fd0686ab531c5bc6c32f3a072aed6dd",
"md5": "15f416094bec3d435faac11d9f24dc03",
"sha256": "f917f6e918da74cd9c059d9e12e4a277fef0831d2d24b0be370edcc4f68ee425"
},
"downloads": -1,
"filename": "ivers-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "15f416094bec3d435faac11d9f24dc03",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 23269,
"upload_time": "2024-09-12T22:48:24",
"upload_time_iso_8601": "2024-09-12T22:48:24.870811Z",
"url": "https://files.pythonhosted.org/packages/aa/40/2057d587be10adbecc9288a9309a6fd0686ab531c5bc6c32f3a072aed6dd/ivers-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "64b8d28771e887c32c5bab8b59362ce64b260bfc96ed345bf972c8379435666c",
"md5": "9851890624aa19605cf7ad894cad5146",
"sha256": "79d7e9c0543c255402ba52380ef1ec8f0625e90eeada7bc14637c51d21812fa7"
},
"downloads": -1,
"filename": "ivers-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "9851890624aa19605cf7ad894cad5146",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 18042,
"upload_time": "2024-09-12T22:48:26",
"upload_time_iso_8601": "2024-09-12T22:48:26.355949Z",
"url": "https://files.pythonhosted.org/packages/64/b8/d28771e887c32c5bab8b59362ce64b260bfc96ed345bf972c8379435666c/ivers-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-12 22:48:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "iversohlsson",
"github_project": "ivers",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": []
},
{
"name": "scikit-learn",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "logging",
"specs": []
},
{
"name": "typing",
"specs": []
},
{
"name": "datetime",
"specs": []
},
{
"name": "unittest",
"specs": []
}
],
"lcname": "ivers"
}