dnattend

Name	dnattend JSON
Version	0.2.3 JSON
	download
home_page
Summary	AutoML classifier for predicting patient non-attendance (DNA)
upload_time	2022-12-05 17:03:58
maintainer
docs_url	None
author	Stephen Richer
requires_python	>=3.7
license	MIT License Copyright (c) 2022 NHSX Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	dnattend
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # DNAttend - ML framework for predicting patient non-attendance

## Train, test and validate a CatBoost Classifier for predicting patient non-attendance (DNA)

[![status: experimental](https://github.com/GIScience/badges/raw/master/status/experimental.svg)](https://github.com/GIScience/badges#experimental)
![build: status](https://github.com/nhsx/dna-risk-predict/actions/workflows/tests.yaml/badge.svg)

## Table of contents

  * [Installation](#installation)
  * [Workflow](#workflow)
  * [Usage](#usage)
    * [Generate Example Data](#generate-example-data)
    * [Train Model](#train-model)
    * [Evaluate Model](#evaluate-model)
    * [Refit Model with All Data](#refit-model-with-all-data)
    * [Generate Predictions](#generate-predictions)
  * [Example Data Verification](#example-data-verification)
  * [Configuration](#configuration)
  * [Further Documentation](#additional-documentation)
  * [Contributing](#contributing)
  * [License](#license)
  * [Contact](#contact)


## Installation
Installation is possible via `pip` as shown below.

Unix/macOS
```bash
python3 -m pip install dnattend
```

Windows
```bash
py -m pip install dnattend
```

#### Install within a Virtual Environment (optional)
<details>
<summary><strong>Unix/macOS</strong></summary>

```bash
python -m venv dnattend
source dnattend/bin/activate
python3 -m pip install dnattend
```
</details>

<details>
<summary><strong>Windows</strong></summary>

```bash
py -m venv dnattend
dnattend/Scripts/Activate.ps1
py -m pip install dnattend
```

If running scripts is disabled on your system then run the following command before activating your environment.

```bash
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
</details>


## Worklow

![workflow](https://github.com/nhsx/dna-risk-predict/blob/main/README_files/DNApredictSimpleFlowchart.png?raw=true)
 <br> *Overview of DNAttend workflow*

Refer to the [additional documentation](./README_files/docs.md) for further details of the underlying classifier framework.

## Usage
The following sections document the built-in example workflow provided.
It is recommended that users follow this workflow to verify proper installation.

### Generate Example Data
The ```simulate``` sub-command generates suitably formatted input data for testing functionality.
It also writes an example config file in YAML format.
Both of these output files can serve as templates for building real-world models.

```bash
dnattend simulate --config config.yaml DNAttend-example.csv
```

### Train Model
DNAttend trains two models independently; a baseline logistic regression model and a CatBoost model.
The baseline model is simple model that acts as reference to assess performance improvements of CatBoost.
Refer to the [additional documentation](./README_files/docs.md) for further details of the model workflow.

```bash
dnattend train config.yaml
```

### Evaluate Model
Following initial training, the `dnattend test` command can be used to assess performance of both the logistic regression and CatBoost models against the hold-out testing data set.
Refer to the [additional documentation](./README_files/docs.md) for example output visualisation and performance metrics.

```bash
dnattend test config.yaml
```

### Refit Model with All Data
The previous steps have trained two models: a baseline logistic regression model and a more advanced CatBoost.
Following parameterisation and assessment of model performance, a final model can be retrained using the entire data set.
The user may build a logistic regression or CatBoost model depending on the performance metrics.
This choice must be specified by the user in the `finalModel:` option of the configuration file.

```bash
dnattend retrain config.yaml
```

### Generate Predictions
The trained model is now ready to be used.
Predictions should be made with the `predict` module - this ensures the tuned decision threshold is correctly applied when assigning classes.
The output of `predict` includes the decision class (i.e.`Attend` and `DNA`) and the underlying probabilities of theses classes.
The output results of this example can be found [here](./README_files/example-data-predictions.csv)

```bash
dnattend predict --verify DNAttend-example.csv catboost-final.pkl > FinalPredictions.csv
```

**Note: the `--verify` flag is only required when running the example workflow ([see below](#example-data-verifcation)).**

## Example Workflow Verification
Following initial installation, it is recommended that users run the example workflow, as described, to verify that the pipeline is functioning as expected.
The `--verify` flag of `dnattend predict`, as shown above, will check the results against the expected output and notify the user if the output matches or not.

## Configuration
DNAttend utilises a single configuration file, in YAML, which documents all model parameter and ensure reproducibility of the analysis.
The `dnattend simulate` command writes an example documented configuration file that the user can use as a template.
A copy of this file is shown below and available to download [here](./README_files/config.yaml).

```YAML
input: DNAttend-example.csv    # Path to input data (Mandatory).
target: status                 # Column name of target (Mandatory).
DNAclass: 1                    # Value of target corresponding to DNA.
out: .                         # Output directory to save results.
finalModel: catboost           # Method to train final model (catboost or logistic).
catCols:                       # Column names of categorical features.
    - day
    - priority
    - speciality
    - consultationMedia
    - site
boolCols:                      # Column names of boolean features.
    - firstAppointment
numericCols:                   # Column names of numeric features.
    - age
train_size: 0.7                # Proportion of data for training.
test_size: 0.15                # Proportion of data for testing.
val_size: 0.15                 # Proportion of data for validation.
tuneThresholdBy: f1            # Metric to tune decision threshold (f1 or roc).
cvFolds: 5                     # Hyper-tuning cross-validations.
catboostIterations: 100        # Hyper-tuning CatBoost iterations.
hypertuneIterations: 5         # Hyper-tuning parameter samples.
evalIterations: 10_000         # Upper-limit over-fit iterations.
earlyStoppingRounds: 10        # Over-fit detection early stopping rounds.
seed: 42                       # Seed to ensure workflow reproducibility.
```


## Further Documentation
Refer to the [additional documentation](./README_files/docs.md) for further technical details of the modeling framework and visualisations from the example data set.


## Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request


## License
Distributed under the MIT License. _See [LICENSE](./LICENSE) for more information._


### Contact
If you have any other questions please contact the author, [Stephen Richer](mailto:stephen.richer@proton.me?subject=[GitHub]%20dnattend).

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "dnattend",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "dnattend",
    "author": "Stephen Richer",
    "author_email": "stephen.richer@proton.me",
    "download_url": "https://files.pythonhosted.org/packages/5a/f1/31460878ea99d9c1c5259238e54824675427eed15e1fe89ddf744218239d/dnattend-0.2.3.tar.gz",
    "platform": null,
    "description": "# DNAttend - ML framework for predicting patient non-attendance\n\n## Train, test and validate a CatBoost Classifier for predicting patient non-attendance (DNA)\n\n[![status: experimental](https://github.com/GIScience/badges/raw/master/status/experimental.svg)](https://github.com/GIScience/badges#experimental)\n![build: status](https://github.com/nhsx/dna-risk-predict/actions/workflows/tests.yaml/badge.svg)\n\n## Table of contents\n\n  * [Installation](#installation)\n  * [Workflow](#workflow)\n  * [Usage](#usage)\n    * [Generate Example Data](#generate-example-data)\n    * [Train Model](#train-model)\n    * [Evaluate Model](#evaluate-model)\n    * [Refit Model with All Data](#refit-model-with-all-data)\n    * [Generate Predictions](#generate-predictions)\n  * [Example Data Verification](#example-data-verification)\n  * [Configuration](#configuration)\n  * [Further Documentation](#additional-documentation)\n  * [Contributing](#contributing)\n  * [License](#license)\n  * [Contact](#contact)\n\n\n## Installation\nInstallation is possible via `pip` as shown below.\n\nUnix/macOS\n```bash\npython3 -m pip install dnattend\n```\n\nWindows\n```bash\npy -m pip install dnattend\n```\n\n#### Install within a Virtual Environment (optional)\n<details>\n<summary><strong>Unix/macOS</strong></summary>\n\n```bash\npython -m venv dnattend\nsource dnattend/bin/activate\npython3 -m pip install dnattend\n```\n</details>\n\n<details>\n<summary><strong>Windows</strong></summary>\n\n```bash\npy -m venv dnattend\ndnattend/Scripts/Activate.ps1\npy -m pip install dnattend\n```\n\nIf running scripts is disabled on your system then run the following command before activating your environment.\n\n```bash\nSet-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser\n```\n</details>\n\n\n## Worklow\n\n![workflow](https://github.com/nhsx/dna-risk-predict/blob/main/README_files/DNApredictSimpleFlowchart.png?raw=true)\n <br> *Overview of DNAttend workflow*\n\nRefer to the [additional documentation](./README_files/docs.md) for further details of the underlying classifier framework.\n\n## Usage\nThe following sections document the built-in example workflow provided.\nIt is recommended that users follow this workflow to verify proper installation.\n\n### Generate Example Data\nThe ```simulate``` sub-command generates suitably formatted input data for testing functionality.\nIt also writes an example config file in YAML format.\nBoth of these output files can serve as templates for building real-world models.\n\n```bash\ndnattend simulate --config config.yaml DNAttend-example.csv\n```\n\n### Train Model\nDNAttend trains two models independently; a baseline logistic regression model and a CatBoost model.\nThe baseline model is simple model that acts as reference to assess performance improvements of CatBoost.\nRefer to the [additional documentation](./README_files/docs.md) for further details of the model workflow.\n\n```bash\ndnattend train config.yaml\n```\n\n### Evaluate Model\nFollowing initial training, the `dnattend test` command can be used to assess performance of both the logistic regression and CatBoost models against the hold-out testing data set.\nRefer to the [additional documentation](./README_files/docs.md) for example output visualisation and performance metrics.\n\n```bash\ndnattend test config.yaml\n```\n\n### Refit Model with All Data\nThe previous steps have trained two models: a baseline logistic regression model and a more advanced CatBoost.\nFollowing parameterisation and assessment of model performance, a final model can be retrained using the entire data set.\nThe user may build a logistic regression or CatBoost model depending on the performance metrics.\nThis choice must be specified by the user in the `finalModel:` option of the configuration file.\n\n```bash\ndnattend retrain config.yaml\n```\n\n### Generate Predictions\nThe trained model is now ready to be used.\nPredictions should be made with the `predict` module - this ensures the tuned decision threshold is correctly applied when assigning classes.\nThe output of `predict` includes the decision class (i.e.`Attend` and `DNA`) and the underlying probabilities of theses classes.\nThe output results of this example can be found [here](./README_files/example-data-predictions.csv)\n\n```bash\ndnattend predict --verify DNAttend-example.csv catboost-final.pkl > FinalPredictions.csv\n```\n\n**Note: the `--verify` flag is only required when running the example workflow ([see below](#example-data-verifcation)).**\n\n## Example Workflow Verification\nFollowing initial installation, it is recommended that users run the example workflow, as described, to verify that the pipeline is functioning as expected.\nThe `--verify` flag of `dnattend predict`, as shown above, will check the results against the expected output and notify the user if the output matches or not.\n\n## Configuration\nDNAttend utilises a single configuration file, in YAML, which documents all model parameter and ensure reproducibility of the analysis.\nThe `dnattend simulate` command writes an example documented configuration file that the user can use as a template.\nA copy of this file is shown below and available to download [here](./README_files/config.yaml).\n\n```YAML\ninput: DNAttend-example.csv    # Path to input data (Mandatory).\ntarget: status                 # Column name of target (Mandatory).\nDNAclass: 1                    # Value of target corresponding to DNA.\nout: .                         # Output directory to save results.\nfinalModel: catboost           # Method to train final model (catboost or logistic).\ncatCols:                       # Column names of categorical features.\n    - day\n    - priority\n    - speciality\n    - consultationMedia\n    - site\nboolCols:                      # Column names of boolean features.\n    - firstAppointment\nnumericCols:                   # Column names of numeric features.\n    - age\ntrain_size: 0.7                # Proportion of data for training.\ntest_size: 0.15                # Proportion of data for testing.\nval_size: 0.15                 # Proportion of data for validation.\ntuneThresholdBy: f1            # Metric to tune decision threshold (f1 or roc).\ncvFolds: 5                     # Hyper-tuning cross-validations.\ncatboostIterations: 100        # Hyper-tuning CatBoost iterations.\nhypertuneIterations: 5         # Hyper-tuning parameter samples.\nevalIterations: 10_000         # Upper-limit over-fit iterations.\nearlyStoppingRounds: 10        # Over-fit detection early stopping rounds.\nseed: 42                       # Seed to ensure workflow reproducibility.\n```\n\n\n## Further Documentation\nRefer to the [additional documentation](./README_files/docs.md) for further technical details of the modeling framework and visualisations from the example data set.\n\n\n## Contributing\nContributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\n## License\nDistributed under the MIT License. _See [LICENSE](./LICENSE) for more information._\n\n\n### Contact\nIf you have any other questions please contact the author, [Stephen Richer](mailto:stephen.richer@proton.me?subject=[GitHub]%20dnattend).\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2022 NHSX  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "AutoML classifier for predicting patient non-attendance (DNA)",
    "version": "0.2.3",
    "split_keywords": [
        "dnattend"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "657768f461e2b4a34932a2acfbb8b62a",
                "sha256": "46c6fca7b5d0745500a2cd9ca7c7a6423e1272810ea922e3b20548ab4d96c276"
            },
            "downloads": -1,
            "filename": "dnattend-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "657768f461e2b4a34932a2acfbb8b62a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 17782,
            "upload_time": "2022-12-05T17:03:49",
            "upload_time_iso_8601": "2022-12-05T17:03:49.790694Z",
            "url": "https://files.pythonhosted.org/packages/72/6a/fa27b4acd0c62980d21cfb41fe1badfc424ce4b8dffbfc30c090e250c64f/dnattend-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "1d0011d0dfb2aae3332c20773e954140",
                "sha256": "5f5eb1cb9c2948f9a9977823a8954bf5f36fbe01c70c8084c512666ef2686c35"
            },
            "downloads": -1,
            "filename": "dnattend-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1d0011d0dfb2aae3332c20773e954140",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 18128,
            "upload_time": "2022-12-05T17:03:58",
            "upload_time_iso_8601": "2022-12-05T17:03:58.903344Z",
            "url": "https://files.pythonhosted.org/packages/5a/f1/31460878ea99d9c1c5259238e54824675427eed15e1fe89ddf744218239d/dnattend-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-05 17:03:58",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "dnattend"
}

Stephen Richer