nhssynth


Namenhssynth JSON
Version 0.9.0 PyPI version JSON
download
home_pagehttps://github.com/nhsengland/NHSSynth
SummarySynthetic data generation pipeline leveraging a Differentially Private Variational Auto Encoder assessed using a variety of metrics
upload_time2023-10-18 10:43:51
maintainerNHSE TDAU
docs_urlNone
authorHarrisonWilde
requires_python>=3.9, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*, !=3.8.*
licenseMIT
keywords synthetic data privacy fairness machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- PROJECT SHIELDS -->
<div align="center">

<!-- ![Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/coverage.json) -->
![Tests Passing](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/tests.json)
![Lines of Code](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/loc.json)
![Percentage Comments](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/comments.json)
[![Snyk Package Health](https://snyk.io/advisor/python/nhssynth/badge.svg)](https://snyk.io/advisor/python/nhssynth)

</div>
<div align="center">

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/nhssynth)](https://www.python.org/downloads/release/python-3113/)
[![PyPI - Package Status](https://img.shields.io/pypi/status/nhssynth)](https://pypi.org/project/nhssynth/)
[![PyPI - Latest Release](https://img.shields.io/pypi/v/nhssynth)](https://pypi.org/project/nhssynth/)
[![PyPI - Wheel](https://img.shields.io/pypi/wheel/nhssynth)](https://pypi.org/project/nhssynth/)
[![PyPI - License](https://img.shields.io/pypi/l/nhssynth)](https://github.com/nhsengland/nhssynth/blob/main/LICENSE)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1)](https://pycqa.github.io/isort/)

</div>

<!-- PROJECT LOGO -->
<div align="center">
  <a href="https://nhsengland.github.io/NHSSynth">
    <img src="docs/assets/NHS.svg" alt="Logo" width="200" height="100">
  </a>
  <p align="center">
    <a href="https://nhsengland.github.io/NHSSynth"><strong>Explore the docs ยป</strong></a>
    <br /><br />
  </p>
</div>

# NHS Synth

## About

This repository currently consists of a Python package alongside research and investigative materials covering the effectiveness of the package and synthetic data more generally when applied to NHS use cases. See the internal [project description](https://nhsengland.github.io/nhsx-internship-projects/synthetic-data-exploration-vae/) for more information.

## Getting Started

### Project Structure

- The main package and codebase is found in [`src/nhssynth`](src/nhssynth/) (see [Usage](#usage) below for more information)
- Accompanying materials are available in the [`docs`](docs/) folder:
  - The components used to create the GitHub Pages [documentation site](https://nhsengland.github.io/NHSSynth/)
  - A [report](docs/reports/report.pdf) summarising the previous iteration of this project
  - A [model card](docs/model_card.md) providing more information about the VAE with Differential Privacy
- Numerous exemplar configurations are found in [`config`](config/)
- Empty [`data`](data/) and [`experiments`](experiments/) folders are provided; these are the default locations for inputs and outputs when running the project using the provided [CLI](../src/nhssynth/cli/) module
- Pre-processing notebooks for specific datasets used to assess the approach and other non-core code can be found in [`auxiliary`](../auxiliary/)

### Installation

For general usage, we recommend installing the package via `pip install nhssynth` in a supported python version environment. You can then `import` the package's [modules](src/nhssynth/modules/) and use them in your projects, or interact with the package directly via the [CLI](src/nhssynth/cli/), which is accessed using the `nhssynth` command (see [Usage](#usage) for more information).

#### Secure Mode

Note that in order to train a generator in *secure mode* (see the [documentation](https://nhsengland.github.io/NHSSynth/secure_mode/) for details) you will need to install the PyTorch extension package [`csprng`](https://github.com/pytorch/csprng) separately. Currently this package's dependencies are not compatible with recent versions of PyTorch (the author's plan on rectifying this - watch this space), so you will need to install it manually; for this we recommend following the instructions below:

```bash
git clone git@github.com:pytorch/csprng.git
cd csprng
git branch release "v0.2.2-rc1"
git checkout release
python setup.py install
```

#### Advanced Installation

If you intend on contributing or working with the codebase directly, or if you want to reproduce the results of this project, follow the steps below:

1. Clone the repo
2. Ensure one of the required versions of Python is installed
3. Install [`poetry`](https://python-poetry.org/docs/#installation) and either:
    - Skip to step four (and have `poetry` control the installation's virtual environment in their [proprietary way](https://python-poetry.org/docs/managing-environments/))
    - Change `poetry`'s configuration to manage your own virtual environments:
      
      ```bash
      poetry config virtualenvs.create false
      poetry config virtualenvs.in-project false
      ```

      You can now instantiate a virtual environment in the usual way (e.g. via `python -m venv nhssynth`) and activate it via `source nhssynth/bin/activate` before moving to the next step

4. Install the project dependencies with `poetry install` (add optional flags: `--with dev` when developing and [testing](tests/) the package, `--with aux` to work with the [auxiliary notebooks](auxiliary/), `--with docs` to work with the [documentation](docs/))
5. You can then interact with the package in one of two ways:
    - Via the [CLI](src/nhssynth/cli/) module, which is accessed using the `nhssynth` command, e.g.
      
      ```bash
      poetry run nhssynth ...
      ```
      
      *Note that you can omit the `poetry run` part and just type `nhssynth` if you followed the optional steps above to manage and activate your own virtual environment, or if you have executed `poetry shell` beforehand.*
    
    - Through directly importing parts of the package to use in an existing project (`from nhssynth.modules... import ...`).

### Usage

#### CLI

This package comprises a set of modules that can be run using the `CLI` individually, as part of a pipeline, or via a configuration file. These options are available via the aforementioned `(poetry run) nhssynth` command:

```
nhssynth <module name> --<args>
nhssynth pipeline --<args>
nhssynth config -c <name> --<overrides>
```

To see the modules that are available and their corresponding arguments, run `nhssynth --help` and `nhssynth <module name> --help` respectively.

Configuration files can be used to run the pipeline or a chosen set of modules. They should be placed in the [`config`](config/) folder and their layout should match that of the examples provided. They can be run as in the latter case above by providing their filename (without the `.yaml` extension). You can also override any of the arguments provided in the configuration file by passing them as arguments in the command line.

To ensure reproducibility, you should always specify a `--seed` value and provide the `--save-config` flag to dump the exact configuration specified / inferred for the run (missing options will be populated in the outputted config, so it may be larger than one you would specify yourself). This configuration file can then be used in the future to reproduce the exact same run or shared with others to run the same configuration on their dataset, etc.

#### Python API

Alternatively, you may want to import parts of the package into your own project or notebook. There is a minimum working example of this [in the auxiliary folder](auxiliary/mwe.ipynb). You can learn more about the API and structure of the package and its modules in the docs to reuse components as you see fit.

### Package Structure

The figure below shows the structure and workflow of the package and its modules.

![](docs/modules.png)

View a visualisation of the codebase [here](https://mango-dune-07a8b7110.1.azurestaticapps.net/?repo=nhsengland%2Fnhssynth)!

### Roadmap

See the [open issues](https://github.com/nhsengland/NHSSynth/issues) for a list of proposed features (and known bugs). Our [milestones](https://github.com/nhsengland/NHSSynth/milestones) represent longer term goals for the project.

### Contributing

Contributions are welcome! We encourage you to first raise an issue with your proposed contribution to enable discussion with the maintainers. After that, please follow these steps:

1. Fork the project
2. Create your branch (`git checkout -b <yourusername>/<featurename>`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin <yourusername>/<featurename>`)
5. Open a PR and we will try to get it merged!

_See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed guidance._

Thanks to everyone that has contributed so far!

<div align="center">
<a href="https://github.com/nhsengland/nhssynth/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=nhsengland/nhssynth" />
</a>
</div>

This codebase builds on previous NHSX Analytics Unit PhD internships contextualising and investigating the potential use of Variational Auto Encoders (VAEs) for synthetic data generation. These were undertaken by Dominic Danks and David Brind.

### License

Distributed under the MIT License. _See [LICENSE](./LICENSE) for more information._

### Contact

This project is under active development by [@HarrisonWilde](https://github.com/HarrisonWilde). For feature requests and bugs, please [raise an issue](https://github.com/nhsengland/NHSSynth/issues/new/choose); for security concerns, please open a [draft security advisory](https://github.com/nhsengland/NHSSynth/security/advisories/new). Alternatively, contact [NHS England TDAU](mailto:england.tdau@nhs.net).

To find out more about the [Analytics Unit](https://www.nhsx.nhs.uk/key-tools-and-info/nhsx-analytics-unit/) visit our [project website](https://nhsengland.github.io/AnalyticsUnit/projects.html) or get in touch at [england.tdau@nhs.net](mailto:england.tdau@nhs.net).

<!-- ### Acknowledgements -->

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nhsengland/NHSSynth",
    "name": "nhssynth",
    "maintainer": "NHSE TDAU",
    "docs_url": null,
    "requires_python": ">=3.9, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*, !=3.8.*",
    "maintainer_email": "england.tdau@nhs.net",
    "keywords": "synthetic data,privacy,fairness,machine learning",
    "author": "HarrisonWilde",
    "author_email": "harrisondwilde@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/93/19/37f3e461ede06fd340b81072f9282464b14c0ecc2aa693201ef16f1e9e48/nhssynth-0.9.0.tar.gz",
    "platform": null,
    "description": "<!-- PROJECT SHIELDS -->\n<div align=\"center\">\n\n<!-- ![Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/coverage.json) -->\n![Tests Passing](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/tests.json)\n![Lines of Code](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/loc.json)\n![Percentage Comments](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HarrisonWilde/1ab4eefed81ec381e29f7d4feb9856bc/raw/comments.json)\n[![Snyk Package Health](https://snyk.io/advisor/python/nhssynth/badge.svg)](https://snyk.io/advisor/python/nhssynth)\n\n</div>\n<div align=\"center\">\n\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/nhssynth)](https://www.python.org/downloads/release/python-3113/)\n[![PyPI - Package Status](https://img.shields.io/pypi/status/nhssynth)](https://pypi.org/project/nhssynth/)\n[![PyPI - Latest Release](https://img.shields.io/pypi/v/nhssynth)](https://pypi.org/project/nhssynth/)\n[![PyPI - Wheel](https://img.shields.io/pypi/wheel/nhssynth)](https://pypi.org/project/nhssynth/)\n[![PyPI - License](https://img.shields.io/pypi/l/nhssynth)](https://github.com/nhsengland/nhssynth/blob/main/LICENSE)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1)](https://pycqa.github.io/isort/)\n\n</div>\n\n<!-- PROJECT LOGO -->\n<div align=\"center\">\n  <a href=\"https://nhsengland.github.io/NHSSynth\">\n    <img src=\"docs/assets/NHS.svg\" alt=\"Logo\" width=\"200\" height=\"100\">\n  </a>\n  <p align=\"center\">\n    <a href=\"https://nhsengland.github.io/NHSSynth\"><strong>Explore the docs \u00bb</strong></a>\n    <br /><br />\n  </p>\n</div>\n\n# NHS Synth\n\n## About\n\nThis repository currently consists of a Python package alongside research and investigative materials covering the effectiveness of the package and synthetic data more generally when applied to NHS use cases. See the internal [project description](https://nhsengland.github.io/nhsx-internship-projects/synthetic-data-exploration-vae/) for more information.\n\n## Getting Started\n\n### Project Structure\n\n- The main package and codebase is found in [`src/nhssynth`](src/nhssynth/) (see [Usage](#usage) below for more information)\n- Accompanying materials are available in the [`docs`](docs/) folder:\n  - The components used to create the GitHub Pages [documentation site](https://nhsengland.github.io/NHSSynth/)\n  - A [report](docs/reports/report.pdf) summarising the previous iteration of this project\n  - A [model card](docs/model_card.md) providing more information about the VAE with Differential Privacy\n- Numerous exemplar configurations are found in [`config`](config/)\n- Empty [`data`](data/) and [`experiments`](experiments/) folders are provided; these are the default locations for inputs and outputs when running the project using the provided [CLI](../src/nhssynth/cli/) module\n- Pre-processing notebooks for specific datasets used to assess the approach and other non-core code can be found in [`auxiliary`](../auxiliary/)\n\n### Installation\n\nFor general usage, we recommend installing the package via `pip install nhssynth` in a supported python version environment. You can then `import` the package's [modules](src/nhssynth/modules/) and use them in your projects, or interact with the package directly via the [CLI](src/nhssynth/cli/), which is accessed using the `nhssynth` command (see [Usage](#usage) for more information).\n\n#### Secure Mode\n\nNote that in order to train a generator in *secure mode* (see the [documentation](https://nhsengland.github.io/NHSSynth/secure_mode/) for details) you will need to install the PyTorch extension package [`csprng`](https://github.com/pytorch/csprng) separately. Currently this package's dependencies are not compatible with recent versions of PyTorch (the author's plan on rectifying this - watch this space), so you will need to install it manually; for this we recommend following the instructions below:\n\n```bash\ngit clone git@github.com:pytorch/csprng.git\ncd csprng\ngit branch release \"v0.2.2-rc1\"\ngit checkout release\npython setup.py install\n```\n\n#### Advanced Installation\n\nIf you intend on contributing or working with the codebase directly, or if you want to reproduce the results of this project, follow the steps below:\n\n1. Clone the repo\n2. Ensure one of the required versions of Python is installed\n3. Install [`poetry`](https://python-poetry.org/docs/#installation) and either:\n    - Skip to step four (and have `poetry` control the installation's virtual environment in their [proprietary way](https://python-poetry.org/docs/managing-environments/))\n    - Change `poetry`'s configuration to manage your own virtual environments:\n      \n      ```bash\n      poetry config virtualenvs.create false\n      poetry config virtualenvs.in-project false\n      ```\n\n      You can now instantiate a virtual environment in the usual way (e.g. via `python -m venv nhssynth`) and activate it via `source nhssynth/bin/activate` before moving to the next step\n\n4. Install the project dependencies with `poetry install` (add optional flags: `--with dev` when developing and [testing](tests/) the package, `--with aux` to work with the [auxiliary notebooks](auxiliary/), `--with docs` to work with the [documentation](docs/))\n5. You can then interact with the package in one of two ways:\n    - Via the [CLI](src/nhssynth/cli/) module, which is accessed using the `nhssynth` command, e.g.\n      \n      ```bash\n      poetry run nhssynth ...\n      ```\n      \n      *Note that you can omit the `poetry run` part and just type `nhssynth` if you followed the optional steps above to manage and activate your own virtual environment, or if you have executed `poetry shell` beforehand.*\n    \n    - Through directly importing parts of the package to use in an existing project (`from nhssynth.modules... import ...`).\n\n### Usage\n\n#### CLI\n\nThis package comprises a set of modules that can be run using the `CLI` individually, as part of a pipeline, or via a configuration file. These options are available via the aforementioned `(poetry run) nhssynth` command:\n\n```\nnhssynth <module name> --<args>\nnhssynth pipeline --<args>\nnhssynth config -c <name> --<overrides>\n```\n\nTo see the modules that are available and their corresponding arguments, run `nhssynth --help` and `nhssynth <module name> --help` respectively.\n\nConfiguration files can be used to run the pipeline or a chosen set of modules. They should be placed in the [`config`](config/) folder and their layout should match that of the examples provided. They can be run as in the latter case above by providing their filename (without the `.yaml` extension). You can also override any of the arguments provided in the configuration file by passing them as arguments in the command line.\n\nTo ensure reproducibility, you should always specify a `--seed` value and provide the `--save-config` flag to dump the exact configuration specified / inferred for the run (missing options will be populated in the outputted config, so it may be larger than one you would specify yourself). This configuration file can then be used in the future to reproduce the exact same run or shared with others to run the same configuration on their dataset, etc.\n\n#### Python API\n\nAlternatively, you may want to import parts of the package into your own project or notebook. There is a minimum working example of this [in the auxiliary folder](auxiliary/mwe.ipynb). You can learn more about the API and structure of the package and its modules in the docs to reuse components as you see fit.\n\n### Package Structure\n\nThe figure below shows the structure and workflow of the package and its modules.\n\n![](docs/modules.png)\n\nView a visualisation of the codebase [here](https://mango-dune-07a8b7110.1.azurestaticapps.net/?repo=nhsengland%2Fnhssynth)!\n\n### Roadmap\n\nSee the [open issues](https://github.com/nhsengland/NHSSynth/issues) for a list of proposed features (and known bugs). Our [milestones](https://github.com/nhsengland/NHSSynth/milestones) represent longer term goals for the project.\n\n### Contributing\n\nContributions are welcome! We encourage you to first raise an issue with your proposed contribution to enable discussion with the maintainers. After that, please follow these steps:\n\n1. Fork the project\n2. Create your branch (`git checkout -b <yourusername>/<featurename>`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin <yourusername>/<featurename>`)\n5. Open a PR and we will try to get it merged!\n\n_See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed guidance._\n\nThanks to everyone that has contributed so far!\n\n<div align=\"center\">\n<a href=\"https://github.com/nhsengland/nhssynth/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=nhsengland/nhssynth\" />\n</a>\n</div>\n\nThis codebase builds on previous NHSX Analytics Unit PhD internships contextualising and investigating the potential use of Variational Auto Encoders (VAEs) for synthetic data generation. These were undertaken by Dominic Danks and David Brind.\n\n### License\n\nDistributed under the MIT License. _See [LICENSE](./LICENSE) for more information._\n\n### Contact\n\nThis project is under active development by [@HarrisonWilde](https://github.com/HarrisonWilde). For feature requests and bugs, please [raise an issue](https://github.com/nhsengland/NHSSynth/issues/new/choose); for security concerns, please open a [draft security advisory](https://github.com/nhsengland/NHSSynth/security/advisories/new). Alternatively, contact [NHS England TDAU](mailto:england.tdau@nhs.net).\n\nTo find out more about the [Analytics Unit](https://www.nhsx.nhs.uk/key-tools-and-info/nhsx-analytics-unit/) visit our [project website](https://nhsengland.github.io/AnalyticsUnit/projects.html) or get in touch at [england.tdau@nhs.net](mailto:england.tdau@nhs.net).\n\n<!-- ### Acknowledgements -->\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Synthetic data generation pipeline leveraging a Differentially Private Variational Auto Encoder assessed using a variety of metrics",
    "version": "0.9.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/nhsengland/NHSSynth/issues",
        "Docs": "https://nhsengland.github.io/NHSSynth",
        "Homepage": "https://github.com/nhsengland/NHSSynth",
        "Repository": "https://github.com/nhsengland/NHSSynth"
    },
    "split_keywords": [
        "synthetic data",
        "privacy",
        "fairness",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed9760d49947882db1c6904d604c64a5cd7c8c49daf3d06ae952f841fe6fbee8",
                "md5": "dd2c131fc8a80d8cf4e3dc65a035fda2",
                "sha256": "68aabf04755f915af6807330bead0ea583a5a20c0a7a13c7b479850047a9c57f"
            },
            "downloads": -1,
            "filename": "nhssynth-0.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dd2c131fc8a80d8cf4e3dc65a035fda2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*, !=3.8.*",
            "size": 77325,
            "upload_time": "2023-10-18T10:43:49",
            "upload_time_iso_8601": "2023-10-18T10:43:49.692080Z",
            "url": "https://files.pythonhosted.org/packages/ed/97/60d49947882db1c6904d604c64a5cd7c8c49daf3d06ae952f841fe6fbee8/nhssynth-0.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "931937f3e461ede06fd340b81072f9282464b14c0ecc2aa693201ef16f1e9e48",
                "md5": "0e6f0907e16fd6f46b4d78f14a7a214d",
                "sha256": "2f63fd542b208cdaea5ab33fafead59dc2bc04c02c0bc4945a4b1b7491613c48"
            },
            "downloads": -1,
            "filename": "nhssynth-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0e6f0907e16fd6f46b4d78f14a7a214d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*, !=3.8.*",
            "size": 61709,
            "upload_time": "2023-10-18T10:43:51",
            "upload_time_iso_8601": "2023-10-18T10:43:51.367241Z",
            "url": "https://files.pythonhosted.org/packages/93/19/37f3e461ede06fd340b81072f9282464b14c0ecc2aa693201ef16f1e9e48/nhssynth-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-18 10:43:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nhsengland",
    "github_project": "NHSSynth",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "nhssynth"
}
        
Elapsed time: 0.12930s