PBLES


NamePBLES JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/martinkuhn94/PBLES.git
SummaryPrivate Bi-LSTM Event Log Synthesizer (PBLES)
upload_time2024-09-13 14:21:24
maintainerNone
docs_urlNone
authorMartin Kuhn
requires_python>=3.9
licenseNone
keywords event log synthetization differential privacy bi-lstm synthetic data generation
VCS
bugtrack_url
requirements pandas numpy tensorflow scipy keras pm4py scikit-learn tensorflow_privacy openpyxl
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PBLES (Private Bi-LSTM Event Log Synthesizer)

## Overview

PBLES (Private Bi-LSTM Event Log Synthesizer) is a tool designed to generate process-oriented synthetic healthcare data.
It addresses the privacy concerns in healthcare data sharing by integrating differential privacy techniques. 
By doing so, it can make it easier for researches to share synthetic data with stakeholders, 
facilitating AI and process mining research in healthcare.However, legal compliance, such as adherence to GDPR or 
other similar regulations, must be confirmed before sharing data, even if strong differential private guarantees are used.

## Features

- **Process-Oriented Data Generation:** Handles the complexity of healthcare data processes.
- **Multiple Perspectives:** Considers various perspectives of healthcare data, not just control-flow.
- **Differential Privacy:** Ensures privacy by incorporating differential privacy techniques.

## Installation

To install PBLES, first clone the repository:

```bash
git clone https://github.com/martinkuhn94/PBLES.git
```

Then, install the required dependencies:

```bash
pip install -r requirements.txt
```

## Usage

### Training the Model 
For the training of the model, the stacked layers are configured with 32, 16 and 8 LSTM units respectively, and an embedding dimension of 16. The model trains for 3 epochs with a batch size of 16. The number of clusters for numerical attributes is set to 10, and to speed up the training, only the top 50% quantile of traces by length are considered, in this example. The noise multiplier is set to 0.0, which means that the model is trained without differential privacy. To train the model with differential privacy, set the noise multiplier to a value greater than 0.0. The epsilon value can be retrieved after training the model.
```bash
import pm4py
from PBLES.event_log_dp_lstm import EventLogDpLstm

# Read Event Log
path = "Sepsis_Cases_Event_Log.xes"
event_log = pm4py.read_xes(path)

# Train Model
pbles_model = EventLogDpLstm(lstm_units=32, embedding_output_dims=16, epochs=3, batch_size=16,
                               max_clusters=10, trace_quantile=0.5, noise_multiplier=0.0)

pbles_model.fit(event_log)
pbles_model.save_model("models/DP_Bi_LSTM_e=inf_Sepsis_Cases_Event_Log_test")

# Print Epsilon to verify Privacy Guarantees
print(pbles_model.epsilon)
```

### Sampling Event Logs 
To sample synthetic event logs, use the following example with a trained model can be used. The sample size is set to 160, and the batch size is set to 16. The synthetic event log is saved as a XES file.
Pretrained models can be found in the "models" folder.
```bash
import pm4py
from PBLES.event_log_dp_lstm import EventLogDpLstm

# Load Model
pbles_model = EventLogDpLstm()
pbles_model.load("models/DP_Bi_LSTM_e=inf_Sepsis_Case")

# Sample
event_log = pbles_model.sample(sample_size=160, batch_size=16)
event_log_xes = pm4py.convert_to_event_log(event_log)

# Save as XES File
xes_filename = "Synthetic_Sepsis_Case_Event_Log.xes"
pm4py.write_xes(event_log_xes, xes_filename)

# Save as XSLX File for quick inspection
df = pm4py.convert_to_dataframe(event_log_xes)
df['time:timestamp'] = df['time:timestamp'].astype(str)
df.to_excel("Synthetic_Sepsis_Case_Event_Log.xlsx", index=False)
```

## Future Work
Future work will focus on enhancing the algorithm and making it available on PyPI.

## Contribution

We welcome contributions from the community. If you have any suggestions or issues, please create a GitHub issue or a pull request. 


## License
This project is licensed under the GPL-3.0 License - see the [LICENSE](LICENSE) file for details.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/martinkuhn94/PBLES.git",
    "name": "PBLES",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "Event Log Synthetization, Differential Privacy, Bi-LSTM, Synthetic Data Generation",
    "author": "Martin Kuhn",
    "author_email": "martin.kuhn@dfki.de",
    "download_url": "https://files.pythonhosted.org/packages/52/d3/32d43a35ce313278330f98253d13c1f114d91bb70850c11de4abf1bbca17/PBLES-0.0.2.tar.gz",
    "platform": null,
    "description": "# PBLES (Private Bi-LSTM Event Log Synthesizer)\r\n\r\n## Overview\r\n\r\nPBLES (Private Bi-LSTM Event Log Synthesizer) is a tool designed to generate process-oriented synthetic healthcare data.\r\nIt addresses the privacy concerns in healthcare data sharing by integrating differential privacy techniques. \r\nBy doing so, it can make it easier for researches to share synthetic data with stakeholders, \r\nfacilitating AI and process mining research in healthcare.However, legal compliance, such as adherence to GDPR or \r\nother similar regulations, must be confirmed before sharing data, even if strong differential private guarantees are used.\r\n\r\n## Features\r\n\r\n- **Process-Oriented Data Generation:** Handles the complexity of healthcare data processes.\r\n- **Multiple Perspectives:** Considers various perspectives of healthcare data, not just control-flow.\r\n- **Differential Privacy:** Ensures privacy by incorporating differential privacy techniques.\r\n\r\n## Installation\r\n\r\nTo install PBLES, first clone the repository:\r\n\r\n```bash\r\ngit clone https://github.com/martinkuhn94/PBLES.git\r\n```\r\n\r\nThen, install the required dependencies:\r\n\r\n```bash\r\npip install -r requirements.txt\r\n```\r\n\r\n## Usage\r\n\r\n### Training the Model \r\nFor the training of the model, the stacked layers are configured with 32, 16 and 8 LSTM units respectively, and an embedding dimension of 16. The model trains for 3 epochs with a batch size of 16. The number of clusters for numerical attributes is set to 10, and to speed up the training, only the top 50% quantile of traces by length are considered, in this example. The noise multiplier is set to 0.0, which means that the model is trained without differential privacy. To train the model with differential privacy, set the noise multiplier to a value greater than 0.0. The epsilon value can be retrieved after training the model.\r\n```bash\r\nimport pm4py\r\nfrom PBLES.event_log_dp_lstm import EventLogDpLstm\r\n\r\n# Read Event Log\r\npath = \"Sepsis_Cases_Event_Log.xes\"\r\nevent_log = pm4py.read_xes(path)\r\n\r\n# Train Model\r\npbles_model = EventLogDpLstm(lstm_units=32, embedding_output_dims=16, epochs=3, batch_size=16,\r\n                               max_clusters=10, trace_quantile=0.5, noise_multiplier=0.0)\r\n\r\npbles_model.fit(event_log)\r\npbles_model.save_model(\"models/DP_Bi_LSTM_e=inf_Sepsis_Cases_Event_Log_test\")\r\n\r\n# Print Epsilon to verify Privacy Guarantees\r\nprint(pbles_model.epsilon)\r\n```\r\n\r\n### Sampling Event Logs \r\nTo sample synthetic event logs, use the following example with a trained model can be used. The sample size is set to 160, and the batch size is set to 16. The synthetic event log is saved as a XES file.\r\nPretrained models can be found in the \"models\" folder.\r\n```bash\r\nimport pm4py\r\nfrom PBLES.event_log_dp_lstm import EventLogDpLstm\r\n\r\n# Load Model\r\npbles_model = EventLogDpLstm()\r\npbles_model.load(\"models/DP_Bi_LSTM_e=inf_Sepsis_Case\")\r\n\r\n# Sample\r\nevent_log = pbles_model.sample(sample_size=160, batch_size=16)\r\nevent_log_xes = pm4py.convert_to_event_log(event_log)\r\n\r\n# Save as XES File\r\nxes_filename = \"Synthetic_Sepsis_Case_Event_Log.xes\"\r\npm4py.write_xes(event_log_xes, xes_filename)\r\n\r\n# Save as XSLX File for quick inspection\r\ndf = pm4py.convert_to_dataframe(event_log_xes)\r\ndf['time:timestamp'] = df['time:timestamp'].astype(str)\r\ndf.to_excel(\"Synthetic_Sepsis_Case_Event_Log.xlsx\", index=False)\r\n```\r\n\r\n## Future Work\r\nFuture work will focus on enhancing the algorithm and making it available on PyPI.\r\n\r\n## Contribution\r\n\r\nWe welcome contributions from the community. If you have any suggestions or issues, please create a GitHub issue or a pull request. \r\n\r\n\r\n## License\r\nThis project is licensed under the GPL-3.0 License - see the [LICENSE](LICENSE) file for details.\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Private Bi-LSTM Event Log Synthesizer (PBLES)",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/martinkuhn94/PBLES.git"
    },
    "split_keywords": [
        "event log synthetization",
        " differential privacy",
        " bi-lstm",
        " synthetic data generation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "081a10e45a091c51a0ef0589ef569095cee63749a584afccddc71f00c4987dc7",
                "md5": "c008ec79ec6fec6f07cd82a6c63a065f",
                "sha256": "929f01100c2b54022dbe8c73000b0e2d3b61f99c76163f71adfef0760a19381a"
            },
            "downloads": -1,
            "filename": "PBLES-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c008ec79ec6fec6f07cd82a6c63a065f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 27138,
            "upload_time": "2024-09-13T14:21:22",
            "upload_time_iso_8601": "2024-09-13T14:21:22.217760Z",
            "url": "https://files.pythonhosted.org/packages/08/1a/10e45a091c51a0ef0589ef569095cee63749a584afccddc71f00c4987dc7/PBLES-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "52d332d43a35ce313278330f98253d13c1f114d91bb70850c11de4abf1bbca17",
                "md5": "c518853ef0a469dc2c9998032470fcb9",
                "sha256": "d4d187acb7e6ccb3bffc955bb501184dcd6e2b6dabd8d417be4e629b45611085"
            },
            "downloads": -1,
            "filename": "PBLES-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c518853ef0a469dc2c9998032470fcb9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 25008,
            "upload_time": "2024-09-13T14:21:24",
            "upload_time_iso_8601": "2024-09-13T14:21:24.119770Z",
            "url": "https://files.pythonhosted.org/packages/52/d3/32d43a35ce313278330f98253d13c1f114d91bb70850c11de4abf1bbca17/PBLES-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-13 14:21:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "martinkuhn94",
    "github_project": "PBLES",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "1.5.3"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.23.5"
                ]
            ]
        },
        {
            "name": "tensorflow",
            "specs": [
                [
                    "==",
                    "2.14.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.12.0"
                ]
            ]
        },
        {
            "name": "keras",
            "specs": [
                [
                    "==",
                    "2.14.0"
                ]
            ]
        },
        {
            "name": "pm4py",
            "specs": [
                [
                    "==",
                    "2.5.2"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.4.1.post1"
                ]
            ]
        },
        {
            "name": "tensorflow_privacy",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "==",
                    "3.1.2"
                ]
            ]
        }
    ],
    "lcname": "pbles"
}
        
Elapsed time: 0.29669s