wisconsinsc-cleaner


Namewisconsinsc-cleaner JSON
Version 0.0.2 PyPI version JSON
download
home_page
SummaryA tool to clean and uniform annotation files from Wisconsin Sleep Cohort (WSC), distributed by NSRR
upload_time2024-02-22 09:15:39
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords data storage data cleaning medical records
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # WisconsinSC_cleaner
A tool to clean and uniform annotation files from Wisconsin Sleep Cohort (WSC), distributed by NSRR [here](https://sleepdata.org/datasets/wsc).

## Motivation
The Wisconsin Sleep Cohort is a great resource for studing longitudinal history of sleep disorders in a large population.

However, this long time span comes at the price of some issues and inconsistencies in the annotation of events, making the life of researchers more difficult than necessary.
The WSC dataset uses also two different annotation formats (Twin and Gamma) that are slightly incompatible between each other.
For example, obstructive apneas may be formatted as: 'Obs Apnea', 'OBS Apnea', ' Obs Apnea ', 'Obst Apnea', 'OA', 'Apnea', 'Obst. Apnea'
Other issues relates to missing information in the annotations' columns, inconsistent time formats (most are 24h, some am/pm) and other dirty bits that are not necessary and complicate automatic parsing.

## How to use this script
Clone the repo or download the `wsc_clean.py` and `mappings.txt` file and run it from command line as:

`python wsc_clean.py <your_dataset_polysomnography_folder>`

The package can be also installed from PyPI:

`pip install wisconsinsc_cleaner`

and executed as:

`wsc_clean <your_dataset_polysomnography_folder>`

## Content of this repo
A single python script (no installation needed) parses all the annotation files and produce another set of annotation files with the suffix `.uniform.txt`.
The mapping of annotations is available in the `mappings.txt` file in the form `A|B|C` (see [https://zzz.bwh.harvard.edu/luna/ref/annotations/#remap] for details), meaning that every instance of `B` or `C` will be mapped as `A`. If a mapping does not exist, the original value is returned with a prefix `misc:`.

An extra text file includes all lines that were not mapped.

If a recording uses the Twin format (allscore.txt files) the output is kept as one file.
If it uses the Gamma format (log.txt files) sleep stages and event scoring are merged together with the log.

The code does not remove any existing annotation nor modify original files. However, some redundant information is ignored in Gamma logs (See [Known Issues](./KNOWN_ISSUES.md) file.)

The script is entirely built on Python standard library and tested on Python v3.8. 
The script is not optimized for efficiency and it parse recordings sequentially. Parsing 2570 recordings takes less than 10 minutes.

## Format of the output
The `.uniform.txt` file will have a columnar format (comma separated values) with a header:

| Timestamp   | EventKey | Duration | Param1    | Param2    | Param3    |
|-------------|----------|----------|-----------|-----------|-----------|
| hh:mm:ss.ms | String   | seconds  | see below | see below | see below |

The Duration and Param[1-3] depend on the type of events

### Sleep stages, position and miscellanea
They don't need extra information other than the event itself. Duration set to -1, Params to 0.
The duration is defined by the next event of the same type
### Sensor gain in Gamma files
gain:sensor_affected, -1, new gain value, channel affected, 0
### Respiratory events
event_key, Duration of the event in seconds, SpO2 minimum of the event [%], 0, 0
### Oxygen desaturations
event_key, Duration of the event in seconds, SpO2 minimum of the event [%], SpO2 drop [%], 0
### Leg movements, arousals, ekg events, snore and any other without additional parameters
event_key, Duration of the event in seconds, 0, 0, 0

## Known issues
See [Known Issues](./KNOWN_ISSUES.md) file.

## Contributing
Please let us know if you encountered issues or bugs through github issues tracker.

If you feel generous and this library helped your project:

[![Buy me a coffee][buymeacoffee-shield]][buymeacoffee]

[buymeacoffee]: https://www.buymeacoffee.com/u2Vb3kO
[buymeacoffee-shield]: https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "wisconsinsc-cleaner",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "data storage,data cleaning,medical records",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/2f/89/e276845ce5cc96fc1f3465a612ebf7076b9adc32b81b33e14c0a1deb8c68/wisconsinsc_cleaner-0.0.2.tar.gz",
    "platform": null,
    "description": "# WisconsinSC_cleaner\r\nA tool to clean and uniform annotation files from Wisconsin Sleep Cohort (WSC), distributed by NSRR [here](https://sleepdata.org/datasets/wsc).\r\n\r\n## Motivation\r\nThe Wisconsin Sleep Cohort is a great resource for studing longitudinal history of sleep disorders in a large population.\r\n\r\nHowever, this long time span comes at the price of some issues and inconsistencies in the annotation of events, making the life of researchers more difficult than necessary.\r\nThe WSC dataset uses also two different annotation formats (Twin and Gamma) that are slightly incompatible between each other.\r\nFor example, obstructive apneas may be formatted as: 'Obs Apnea', 'OBS Apnea', ' Obs Apnea ', 'Obst Apnea', 'OA', 'Apnea', 'Obst. Apnea'\r\nOther issues relates to missing information in the annotations' columns, inconsistent time formats (most are 24h, some am/pm) and other dirty bits that are not necessary and complicate automatic parsing.\r\n\r\n## How to use this script\r\nClone the repo or download the `wsc_clean.py` and `mappings.txt` file and run it from command line as:\r\n\r\n`python wsc_clean.py <your_dataset_polysomnography_folder>`\r\n\r\nThe package can be also installed from PyPI:\r\n\r\n`pip install wisconsinsc_cleaner`\r\n\r\nand executed as:\r\n\r\n`wsc_clean <your_dataset_polysomnography_folder>`\r\n\r\n## Content of this repo\r\nA single python script (no installation needed) parses all the annotation files and produce another set of annotation files with the suffix `.uniform.txt`.\r\nThe mapping of annotations is available in the `mappings.txt` file in the form `A|B|C` (see [https://zzz.bwh.harvard.edu/luna/ref/annotations/#remap] for details), meaning that every instance of `B` or `C` will be mapped as `A`. If a mapping does not exist, the original value is returned with a prefix `misc:`.\r\n\r\nAn extra text file includes all lines that were not mapped.\r\n\r\nIf a recording uses the Twin format (allscore.txt files) the output is kept as one file.\r\nIf it uses the Gamma format (log.txt files) sleep stages and event scoring are merged together with the log.\r\n\r\nThe code does not remove any existing annotation nor modify original files. However, some redundant information is ignored in Gamma logs (See [Known Issues](./KNOWN_ISSUES.md) file.)\r\n\r\nThe script is entirely built on Python standard library and tested on Python v3.8. \r\nThe script is not optimized for efficiency and it parse recordings sequentially. Parsing 2570 recordings takes less than 10 minutes.\r\n\r\n## Format of the output\r\nThe `.uniform.txt` file will have a columnar format (comma separated values) with a header:\r\n\r\n| Timestamp   | EventKey | Duration | Param1    | Param2    | Param3    |\r\n|-------------|----------|----------|-----------|-----------|-----------|\r\n| hh:mm:ss.ms | String   | seconds  | see below | see below | see below |\r\n\r\nThe Duration and Param[1-3] depend on the type of events\r\n\r\n### Sleep stages, position and miscellanea\r\nThey don't need extra information other than the event itself. Duration set to -1, Params to 0.\r\nThe duration is defined by the next event of the same type\r\n### Sensor gain in Gamma files\r\ngain:sensor_affected, -1, new gain value, channel affected, 0\r\n### Respiratory events\r\nevent_key, Duration of the event in seconds, SpO2 minimum of the event [%], 0, 0\r\n### Oxygen desaturations\r\nevent_key, Duration of the event in seconds, SpO2 minimum of the event [%], SpO2 drop [%], 0\r\n### Leg movements, arousals, ekg events, snore and any other without additional parameters\r\nevent_key, Duration of the event in seconds, 0, 0, 0\r\n\r\n## Known issues\r\nSee [Known Issues](./KNOWN_ISSUES.md) file.\r\n\r\n## Contributing\r\nPlease let us know if you encountered issues or bugs through github issues tracker.\r\n\r\nIf you feel generous and this library helped your project:\r\n\r\n[![Buy me a coffee][buymeacoffee-shield]][buymeacoffee]\r\n\r\n[buymeacoffee]: https://www.buymeacoffee.com/u2Vb3kO\r\n[buymeacoffee-shield]: https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A tool to clean and uniform annotation files from Wisconsin Sleep Cohort (WSC), distributed by NSRR",
    "version": "0.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/LucaCerina/WisconsinSC_cleaner/issues",
        "Homepage": "https://github.com/LucaCerina/WisconsinSC_cleaner"
    },
    "split_keywords": [
        "data storage",
        "data cleaning",
        "medical records"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dbe225e6b755b8ee136e46b2ea62bb4add5f5ab9e7b82e93eca0e0f2030b6fa4",
                "md5": "0d33ef165b4686cec020e6dc71e16a16",
                "sha256": "481572c2ac59080b8dcebeb1adf999315ec69a6d733daba5d0ab66b068e4e0a4"
            },
            "downloads": -1,
            "filename": "wisconsinsc_cleaner-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0d33ef165b4686cec020e6dc71e16a16",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 13090,
            "upload_time": "2024-02-22T09:15:38",
            "upload_time_iso_8601": "2024-02-22T09:15:38.331009Z",
            "url": "https://files.pythonhosted.org/packages/db/e2/25e6b755b8ee136e46b2ea62bb4add5f5ab9e7b82e93eca0e0f2030b6fa4/wisconsinsc_cleaner-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f89e276845ce5cc96fc1f3465a612ebf7076b9adc32b81b33e14c0a1deb8c68",
                "md5": "422f1d578f2446c218250c67d6f240df",
                "sha256": "2cc9a7e2f2490c82bbe4d5d44860a1471a2e720942c7c445dde3b4ee2c865116"
            },
            "downloads": -1,
            "filename": "wisconsinsc_cleaner-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "422f1d578f2446c218250c67d6f240df",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 14246,
            "upload_time": "2024-02-22T09:15:39",
            "upload_time_iso_8601": "2024-02-22T09:15:39.891760Z",
            "url": "https://files.pythonhosted.org/packages/2f/89/e276845ce5cc96fc1f3465a612ebf7076b9adc32b81b33e14c0a1deb8c68/wisconsinsc_cleaner-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-22 09:15:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LucaCerina",
    "github_project": "WisconsinSC_cleaner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "wisconsinsc-cleaner"
}
        
Elapsed time: 0.26788s