ssdtm


Namessdtm JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryA package that can generate low-fidelity synthetic CDISC SDTM data based on intelligent sequence generators
upload_time2024-08-03 19:50:32
maintainerNone
docs_urlNone
authorAkshay Chougule
requires_python>=3.7
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Synthetic SDTM (ssdtm)

This library provides a collection functions to create synthetic CDISC SDTM data. It is largely done using intelligent sequence generators powered by domain knowledge. 

## Background

The dummy or low-fidelity synthetic SDTM data would be very valuable in multiple scenarios. A few use cases listed below:

### Testing and Validation of Systems:
    System Configuration: Using dummy data allows for thorough testing and configuration of data management systems before real data is collected, ensuring that systems are correctly set up and can handle the expected data formats and volumes .
    Software Validation: Dummy data is essential for validating the software tools used for data capture, processing, and analysis, ensuring they work correctly under various scenarios and edge cases.

### Training and Education:
    Staff Training: Dummy data provides a safe and realistic way to train clinical staff, data managers, and statisticians on data entry, management, and analysis processes without risking patient confidentiality or data integrity .
    Protocol Familiarization: It could help the study team familiarize themselves with the study protocols and data collection methods, improving overall preparedness and efficiency.

### Protocol Development and Refinement:
    CRF and Protocol Testing: Dummy data can be used to test and refine clinical trial protocols and case report forms (CRFs) before actual patient data is collected, identifying potential issues and making necessary adjustments early in the process .
    Scenario Simulation: Simulating various scenarios using fake data helps in identifying and mitigating risks, ensuring the protocol is robust and ready for real-world application.

### Quality Control:
    Error Detection: By using dummy data, potential data entry errors, inconsistencies, and system flaws can be identified and corrected before the actual trial begins, enhancing data quality and reliability .
    Process Optimization: It allows for the optimization of data collection and processing workflows, ensuring they are efficient and capable of handling real data smoothly.

### Regulatory Compliance:
    Compliance Testing: Ensures that all data handling and processing systems comply with regulatory standards and guidelines by testing with dummy data first, reducing the risk of non-compliance during the actual trial .

### Confidentiality and Security:
    Safe Testing Environment: Using fake data protects patient confidentiality and adheres to privacy regulations during system testing and staff training, minimizing the risk of data breaches and ethical issues .
    Security Assessment: Dummy data can be used to test the security measures of data management systems, ensuring they are robust enough to protect sensitive patient information when real data is collected.

### Shorter study startup time
    Test and validate the data pipelines: Having access to realistic dummy data allows to test and validate the data entry and data transfer pipelines before the First-Patient-In milestone of a study. This results in a shorter study startup time.


* Free software: MIT license


## Tutorial
--------


### How to install

```sh
$ pip install ssdtm
```

### Basic Usage

```sh
import ssdtm as sd

	
# Generate synthetic single-domain (adverse events) data for 5 patients
ae = sd.get_adverse_events(5)

# Generate synthetic single-domain (concomitant medication) data 5 patients
cm = sd.get_conmeds(5)

# Generate synthetic single-domain (adverse events) data 5 patients
dm = sd.get_demographics(5)

# Generate synthetic single-domain (adverse events) data 5 patients
ex = sd.get_exposure(5)

# Generate lab anbalytes dataset for 8 patients, where each patient has data for 4 visits.
lb = sd.get_lab_analytes(8,4)

# Generate vital signs dataset for 8 patients, where each patient has data for 4 visits.
vs = sd.get_vital_signs(8,4)

# Generates CDISC SDTM data for 6 domains (ae, cm, dm, ex, lb, and vs)
data = sd.get_sdtm_data(8,4)
# Then you can access individual domain-specific dataframes as follows
data['cm']
data['dm']
data['vs']

# This generates and saves the SDTM data for 6 common SDTM domains in the local directory
sd.save_sdtm_data(8,4)

# Generate vital signs dataset for 8 patients, assuming 5 visits per patient.
rs = sd.get_response(8)

# Generate vital signs dataset for 8 patients, where each patient can have 1 to 5 tumors.
tu = sd.get_tumor_identification(8)

# Generate tumor results dataset for 8 patients, where each patient can have 1 to 5 tumors.
tr = sd.get_tumor_results(8)

# Generates CDISC SDTM data for 6 generic domains (ae, cm, dm, ex, lb, and vs) and additional therapeutic area specific domains (e.g. for 'oncology' we would have rs, tu and tr)
data = sd.get_sdtm_data(8,4, 'oncology')
# Then you can access individual domain-specific dataframes as follows
data['cm']
data['dm']
data['vs']
# And TA-specific individual domain dataframes as follows
data['rs']
data['tu']
data['tr']

# This generates and saves the SDTM data for 6 common SDTM domains and 3 therapeutic area specific domains in the local directory
sd.save_sdtm_data(8,4, 'oncology')

```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ssdtm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Akshay Chougule",
    "author_email": "akshay6023@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a7/22/aadc6ebfe79c1235658dc45da4c4d69fd32863c7f7fb45b7d2c1f6858e87/ssdtm-0.1.3.tar.gz",
    "platform": null,
    "description": "# Synthetic SDTM (ssdtm)\n\nThis library provides a collection functions to create synthetic CDISC SDTM data. It is largely done using intelligent sequence generators powered by domain knowledge. \n\n## Background\n\nThe dummy or low-fidelity synthetic SDTM data would be very valuable in multiple scenarios. A few use cases listed below:\n\n### Testing and Validation of Systems:\n    System Configuration: Using dummy data allows for thorough testing and configuration of data management systems before real data is collected, ensuring that systems are correctly set up and can handle the expected data formats and volumes .\n    Software Validation: Dummy data is essential for validating the software tools used for data capture, processing, and analysis, ensuring they work correctly under various scenarios and edge cases.\n\n### Training and Education:\n    Staff Training: Dummy data provides a safe and realistic way to train clinical staff, data managers, and statisticians on data entry, management, and analysis processes without risking patient confidentiality or data integrity .\n    Protocol Familiarization: It could help the study team familiarize themselves with the study protocols and data collection methods, improving overall preparedness and efficiency.\n\n### Protocol Development and Refinement:\n    CRF and Protocol Testing: Dummy data can be used to test and refine clinical trial protocols and case report forms (CRFs) before actual patient data is collected, identifying potential issues and making necessary adjustments early in the process .\n    Scenario Simulation: Simulating various scenarios using fake data helps in identifying and mitigating risks, ensuring the protocol is robust and ready for real-world application.\n\n### Quality Control:\n    Error Detection: By using dummy data, potential data entry errors, inconsistencies, and system flaws can be identified and corrected before the actual trial begins, enhancing data quality and reliability .\n    Process Optimization: It allows for the optimization of data collection and processing workflows, ensuring they are efficient and capable of handling real data smoothly.\n\n### Regulatory Compliance:\n    Compliance Testing: Ensures that all data handling and processing systems comply with regulatory standards and guidelines by testing with dummy data first, reducing the risk of non-compliance during the actual trial .\n\n### Confidentiality and Security:\n    Safe Testing Environment: Using fake data protects patient confidentiality and adheres to privacy regulations during system testing and staff training, minimizing the risk of data breaches and ethical issues .\n    Security Assessment: Dummy data can be used to test the security measures of data management systems, ensuring they are robust enough to protect sensitive patient information when real data is collected.\n\n### Shorter study startup time\n    Test and validate the data pipelines: Having access to realistic dummy data allows to test and validate the data entry and data transfer pipelines before the First-Patient-In milestone of a study. This results in a shorter study startup time.\n\n\n* Free software: MIT license\n\n\n## Tutorial\n--------\n\n\n### How to install\n\n```sh\n$ pip install ssdtm\n```\n\n### Basic Usage\n\n```sh\nimport ssdtm as sd\n\n\t\n# Generate synthetic single-domain (adverse events) data for 5 patients\nae = sd.get_adverse_events(5)\n\n# Generate synthetic single-domain (concomitant medication) data 5 patients\ncm = sd.get_conmeds(5)\n\n# Generate synthetic single-domain (adverse events) data 5 patients\ndm = sd.get_demographics(5)\n\n# Generate synthetic single-domain (adverse events) data 5 patients\nex = sd.get_exposure(5)\n\n# Generate lab anbalytes dataset for 8 patients, where each patient has data for 4 visits.\nlb = sd.get_lab_analytes(8,4)\n\n# Generate vital signs dataset for 8 patients, where each patient has data for 4 visits.\nvs = sd.get_vital_signs(8,4)\n\n# Generates CDISC SDTM data for 6 domains (ae, cm, dm, ex, lb, and vs)\ndata = sd.get_sdtm_data(8,4)\n# Then you can access individual domain-specific dataframes as follows\ndata['cm']\ndata['dm']\ndata['vs']\n\n# This generates and saves the SDTM data for 6 common SDTM domains in the local directory\nsd.save_sdtm_data(8,4)\n\n# Generate vital signs dataset for 8 patients, assuming 5 visits per patient.\nrs = sd.get_response(8)\n\n# Generate vital signs dataset for 8 patients, where each patient can have 1 to 5 tumors.\ntu = sd.get_tumor_identification(8)\n\n# Generate tumor results dataset for 8 patients, where each patient can have 1 to 5 tumors.\ntr = sd.get_tumor_results(8)\n\n# Generates CDISC SDTM data for 6 generic domains (ae, cm, dm, ex, lb, and vs) and additional therapeutic area specific domains (e.g. for 'oncology' we would have rs, tu and tr)\ndata = sd.get_sdtm_data(8,4, 'oncology')\n# Then you can access individual domain-specific dataframes as follows\ndata['cm']\ndata['dm']\ndata['vs']\n# And TA-specific individual domain dataframes as follows\ndata['rs']\ndata['tu']\ndata['tr']\n\n# This generates and saves the SDTM data for 6 common SDTM domains and 3 therapeutic area specific domains in the local directory\nsd.save_sdtm_data(8,4, 'oncology')\n\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A package that can generate low-fidelity synthetic CDISC SDTM data based on intelligent sequence generators",
    "version": "0.1.3",
    "project_urls": {
        "source": "https://github.com/AksChougule/gen-sdtm"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "501ea3fcb23e4f662bf18e53e283e95fb12fc3cb7399d6d56a331980ca04bf5c",
                "md5": "178252227551a6f9153c1772e4fb6420",
                "sha256": "b1b570bd56c11388e00e71cf958ce4a47c53df32fa5462e822db77fff9a175ed"
            },
            "downloads": -1,
            "filename": "ssdtm-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "178252227551a6f9153c1772e4fb6420",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 6696,
            "upload_time": "2024-08-03T19:50:31",
            "upload_time_iso_8601": "2024-08-03T19:50:31.343865Z",
            "url": "https://files.pythonhosted.org/packages/50/1e/a3fcb23e4f662bf18e53e283e95fb12fc3cb7399d6d56a331980ca04bf5c/ssdtm-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a722aadc6ebfe79c1235658dc45da4c4d69fd32863c7f7fb45b7d2c1f6858e87",
                "md5": "9303a41b6abf848200ca57c4d883848e",
                "sha256": "2221fae5c0dc635a4792895795389c76244399ff2d26a4afed4a024b5dd2ba38"
            },
            "downloads": -1,
            "filename": "ssdtm-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "9303a41b6abf848200ca57c4d883848e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 8641,
            "upload_time": "2024-08-03T19:50:32",
            "upload_time_iso_8601": "2024-08-03T19:50:32.778816Z",
            "url": "https://files.pythonhosted.org/packages/a7/22/aadc6ebfe79c1235658dc45da4c4d69fd32863c7f7fb45b7d2c1f6858e87/ssdtm-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-03 19:50:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AksChougule",
    "github_project": "gen-sdtm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ssdtm"
}
        
Elapsed time: 0.28912s