ivers


Nameivers JSON
Version 0.1.13 PyPI version JSON
download
home_pagehttp://github.com/iversohlsson/ivers
SummaryPython package to stratify split datasets based on endpoint distributions, also 2 different temporal splits. Chemprop compatible.
upload_time2024-06-23 17:35:51
maintainerNone
docs_urlNone
authorPhilip Ivers Ohlsson
requires_pythonNone
licenseNone
keywords chemprop chemistry data science dataset splitting stratification temporal splits ivers
VCS
bugtrack_url
requirements pandas scikit-learn numpy logging typing datetime unittest
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Ivers


This project offers tools for managing data splits, ensuring endpoint distributions are maintained, and presents two novel temporal split techniques: 'leaky' and 'all for free' splits. See the explanation below. 

**Note**: This library was used in this paper [PlaceHolder](https://github.com/IversOhlsson/ivers) to generate the data splits.

## Features
  - **Temporal Leaky**: Allows for forward-leakage in your data to simulate real-world scenarios where future data might influence the model subtly.
  - **Temporal AllForFree**: Provides a stricter temporal separation, ensuring that the training data is entirely independent of the test set, suitable for rigorous testing of model predictions over time.
  - **Temporal Fold Split**: Implements a novel approach to increasing the training set size successively across multiple folds based on the temporal time sequence
  - **Stratified Endpoint Split**: Our library introduces a stratified endpoint split, crucial for maintaining a consistent distribution of data across different categories or endpoints in your datasets. Especially useful in scenarios where endpoint distributions are critical, such as in cheminformatics and bioinformatics.
  - **Cross-Validation Support**: Integrates capabilities to ensure that each cross-validation split maintains endpoint distribution, ideal for developing models that are generalizable across varied data conditions.

## Integration with Chemprop

- By setting the `chemprop` variable to `true`, the library will generate splits compatible with the Chemprop library. This ensures that the features and train-test splits are generated in a way that can easily be used with Chemprop.

## Getting Started or Contributing

To get started with this library, clone the repository and install the required dependencies:

```bash
git clone https://github.com/IversOhlsson/ivers.git
cd ivers
pip install -r requirements.txt
```

## Installation via pip
You can also install the package via pip:
```bash
pip install ivers
```
We welcome contributions! Feel free to open issues or pull requests on our GitHub repository.

## Guide

## Reference
when using this library, please cite the following paper:
```
@article{Ivers_1,
  title={PlaceHolder},
  author={PlaceHolder},
  journal={PlaceHolder},
  volume={PlaceHolder},
  number={PlaceHolder},
  pages={PlaceHolder},
  year={PlaceHolder},
  publisher={PlaceHolder}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/iversohlsson/ivers",
    "name": "ivers",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "chemprop chemistry data science dataset splitting stratification temporal splits ivers",
    "author": "Philip Ivers Ohlsson",
    "author_email": "philip.iversohlsson@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/93/c4/faa97ea68fbdf92f3aef25ba7fd401cc92665ffb6d2ed2154e4ba09b98b3/ivers-0.1.13.tar.gz",
    "platform": null,
    "description": "# Ivers\r\n\r\n\r\nThis project offers tools for managing data splits, ensuring endpoint distributions are maintained, and presents two novel temporal split techniques: 'leaky' and 'all for free' splits. See the explanation below. \r\n\r\n**Note**: This library was used in this paper [PlaceHolder](https://github.com/IversOhlsson/ivers) to generate the data splits.\r\n\r\n## Features\r\n  - **Temporal Leaky**: Allows for forward-leakage in your data to simulate real-world scenarios where future data might influence the model subtly.\r\n  - **Temporal AllForFree**: Provides a stricter temporal separation, ensuring that the training data is entirely independent of the test set, suitable for rigorous testing of model predictions over time.\r\n  - **Temporal Fold Split**: Implements a novel approach to increasing the training set size successively across multiple folds based on the temporal time sequence\r\n  - **Stratified Endpoint Split**: Our library introduces a stratified endpoint split, crucial for maintaining a consistent distribution of data across different categories or endpoints in your datasets. Especially useful in scenarios where endpoint distributions are critical, such as in cheminformatics and bioinformatics.\r\n  - **Cross-Validation Support**: Integrates capabilities to ensure that each cross-validation split maintains endpoint distribution, ideal for developing models that are generalizable across varied data conditions.\r\n\r\n## Integration with Chemprop\r\n\r\n- By setting the `chemprop` variable to `true`, the library will generate splits compatible with the Chemprop library. This ensures that the features and train-test splits are generated in a way that can easily be used with Chemprop.\r\n\r\n## Getting Started or Contributing\r\n\r\nTo get started with this library, clone the repository and install the required dependencies:\r\n\r\n```bash\r\ngit clone https://github.com/IversOhlsson/ivers.git\r\ncd ivers\r\npip install -r requirements.txt\r\n```\r\n\r\n## Installation via pip\r\nYou can also install the package via pip:\r\n```bash\r\npip install ivers\r\n```\r\nWe welcome contributions! Feel free to open issues or pull requests on our GitHub repository.\r\n\r\n## Guide\r\n\r\n## Reference\r\nwhen using this library, please cite the following paper:\r\n```\r\n@article{Ivers_1,\r\n  title={PlaceHolder},\r\n  author={PlaceHolder},\r\n  journal={PlaceHolder},\r\n  volume={PlaceHolder},\r\n  number={PlaceHolder},\r\n  pages={PlaceHolder},\r\n  year={PlaceHolder},\r\n  publisher={PlaceHolder}\r\n}\r\n```\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python package to stratify split datasets based on endpoint distributions, also 2 different temporal splits. Chemprop compatible.",
    "version": "0.1.13",
    "project_urls": {
        "Documentation": "http://github.com/iversohlsson/ivers/docs/_build/html/index.html",
        "Homepage": "http://github.com/iversohlsson/ivers",
        "Source": "http://github.com/iversohlsson/ivers"
    },
    "split_keywords": [
        "chemprop",
        "chemistry",
        "data",
        "science",
        "dataset",
        "splitting",
        "stratification",
        "temporal",
        "splits",
        "ivers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a83f47a30e97c4c53520b69e4512f0e7271502469a34dde77b1f8facdadbc2c7",
                "md5": "332ac25a579907df5514b6d3171acf07",
                "sha256": "0c7a8599ba1f8bd6e601d4b88b486e22defbfd32d73e4c879c972dfe446e4b99"
            },
            "downloads": -1,
            "filename": "ivers-0.1.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "332ac25a579907df5514b6d3171acf07",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18204,
            "upload_time": "2024-06-23T17:35:49",
            "upload_time_iso_8601": "2024-06-23T17:35:49.104710Z",
            "url": "https://files.pythonhosted.org/packages/a8/3f/47a30e97c4c53520b69e4512f0e7271502469a34dde77b1f8facdadbc2c7/ivers-0.1.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "93c4faa97ea68fbdf92f3aef25ba7fd401cc92665ffb6d2ed2154e4ba09b98b3",
                "md5": "d8b4ce961758d7036fe742b6bbe2a27e",
                "sha256": "49f8dbda1ac37d3be665ea5044f2c3b9acb9507eb8054eedb91ecb494929c85c"
            },
            "downloads": -1,
            "filename": "ivers-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "d8b4ce961758d7036fe742b6bbe2a27e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14922,
            "upload_time": "2024-06-23T17:35:51",
            "upload_time_iso_8601": "2024-06-23T17:35:51.113730Z",
            "url": "https://files.pythonhosted.org/packages/93/c4/faa97ea68fbdf92f3aef25ba7fd401cc92665ffb6d2ed2154e4ba09b98b3/ivers-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-23 17:35:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "iversohlsson",
    "github_project": "ivers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "logging",
            "specs": []
        },
        {
            "name": "typing",
            "specs": []
        },
        {
            "name": "datetime",
            "specs": []
        },
        {
            "name": "unittest",
            "specs": []
        }
    ],
    "lcname": "ivers"
}
        
Elapsed time: 0.26454s