# bci-dataset
Python library for organizing multiple EEG datasets using HDF.
Support EEGLAB Data!
*For do deep learning, this library was created as a tool to combine datasets for the major BCI paradigms.
## Installation
`
pip install bci-dataset
`
## How to Use
### Add EEG Data
#### Supported Formats
+ EEGLAB(.set)
+ Epoching (epoch splitting) on EEGLAB is required.
+ numpy(ndarray)
#### Commonality
```python
import bci_dataset
fpath = "./dataset.hdf"
fs = 500 # sampling rate
updater = DatasetUpdater(fpath,fs=fs)
updater.remove_hdf() # delete hdf file that already exist
```
#### Add EEGLAB Data
```python
import numpy as np
labels = ["left","right"]
eeglab_list = ["./sample.set"] # path list of eeglab files
# add eeglab(.set) files
for fp in eeglab_list:
updater.add_eeglab(fp,labels)
```
#### Add NumPy Data
```python
#dummy
dummy_data = np.ones((12,6000)) # channel × signal
dummy_indexes = [0,1000,2000,3000,4000,5000] #Index of trial start
dummy_labels = ["left","right"]*3 #Label of trials
dummy_size = 500 #Size of 1 trial
updater.add_numpy(dummy_data,dummy_indexes,dummy_labels,dummy_size)
```
### Apply Preprocessing
If the "preprocess" method is executed again with the same group name, the already created group with the specified name is deleted once before preprocessing.
```python
"""
preprocessing example
bx : ch × samples
"""
def prepro_func(bx:np.ndarray):
x = bx[12:15,:]
return StandardScaler().fit_transform(x.T).T
updater.preprocess("custom",prepro_func)
```
### Contents of HDF
Note that "dataset" in the figure below refers to the HDF dataset (class).
```
hdf file
├ origin : group / raw data
│ ├ 1 : dataset
│ ├ 2 : dataset
│ ├ 3 : dataset
│ ├ 4 : dataset
│ ├ 5 : dataset
│ └ …
└ prepro : group / data after preprocessing
├ custom : group / "custom" is any group name
│ ├ 1 : dataset
│ ├ 2 : dataset
│ ├ 3 : dataset
│ ├ 4 : dataset
│ ├ 5 : dataset
│ └ …
└ custom2 : group
└ ...omit (1,2,3,4,…)
```
+ Check the contents with software such as HDFView.
+ Use "h5py" or similar to read the HDF file.
```python
import h5py
with h5py.File(fpath) as h5:
fs = h5["prepro/custom"].attrs["fs"]
dataset_size = h5["prepro/custom"].attrs["count"]
dataset79 = h5["prepro/custom/79"][()] #ch × samples
dataset79_label = h5["prepro/custom/79"].attrs["label"]
```
### Merge Dataset
In order to merge, "dataset_name" must be set.
If the order of channels is different for each dataset, the order can be aligned by specifying ch_indexes.
**Source's preprocessing group is not inherited. In other words, preprocess() must be executed after the merge.**
Example: Merge source1 and source2 datasets
```python
target = DatasetUpdater("new_dataset.h5",fs=fs)
target.remove_hdf() # reset hdf
s1 = DatasetUpdater("source1.h5",fs=fs,dataset_name="source1")
s2 = DatasetUpdater("source2.h5",fs=fs,dataset_name="source2")
s1_ch_indexes = [1,60,10,5]# channel indexes to use
target.merge_hdf(s1,ch_indexes=s1_ch_indexes)
target.merge_hdf(s2)
```
## Pull requests / Issues
If you need anything...
Raw data
{
"_id": null,
"home_page": "https://github.com/s-n-1-0/bci-dataset",
"name": "bci-dataset",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "eeg",
"author": "sn-10",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/87/46/09f647c7cfb4f2edaba7cc27f579367957b6a741003a740cb6bd7bb65504/bci-dataset-1.0.0.tar.gz",
"platform": null,
"description": "# bci-dataset\r\nPython library for organizing multiple EEG datasets using HDF. \r\nSupport EEGLAB Data!\r\n\r\n*For do deep learning, this library was created as a tool to combine datasets for the major BCI paradigms.\r\n\r\n## Installation\r\n`\r\npip install bci-dataset\r\n`\r\n\r\n## How to Use\r\n### Add EEG Data\r\n#### Supported Formats\r\n+ EEGLAB(.set)\r\n + Epoching (epoch splitting) on EEGLAB is required.\r\n+ numpy(ndarray)\r\n\r\n#### Commonality\r\n```python\r\nimport bci_dataset\r\n\r\nfpath = \"./dataset.hdf\"\r\nfs = 500 # sampling rate\r\nupdater = DatasetUpdater(fpath,fs=fs)\r\nupdater.remove_hdf() # delete hdf file that already exist\r\n```\r\n#### Add EEGLAB Data\r\n```python\r\nimport numpy as np\r\n\r\nlabels = [\"left\",\"right\"]\r\neeglab_list = [\"./sample.set\"] # path list of eeglab files\r\n\r\n# add eeglab(.set) files\r\nfor fp in eeglab_list:\r\n updater.add_eeglab(fp,labels)\r\n\r\n```\r\n\r\n#### Add NumPy Data\r\n```python\r\n#dummy\r\ndummy_data = np.ones((12,6000)) # channel \u00d7 signal\r\ndummy_indexes = [0,1000,2000,3000,4000,5000] #Index of trial start\r\ndummy_labels = [\"left\",\"right\"]*3 #Label of trials\r\ndummy_size = 500 #Size of 1 trial\r\n\r\nupdater.add_numpy(dummy_data,dummy_indexes,dummy_labels,dummy_size)\r\n\r\n```\r\n### Apply Preprocessing\r\nIf the \"preprocess\" method is executed again with the same group name, the already created group with the specified name is deleted once before preprocessing.\r\n\r\n```python\r\n\"\"\"\r\npreprocessing example\r\nbx : ch \u00d7 samples\r\n\"\"\"\r\ndef prepro_func(bx:np.ndarray): \r\n x = bx[12:15,:]\r\n return StandardScaler().fit_transform(x.T).T\r\nupdater.preprocess(\"custom\",prepro_func)\r\n```\r\n\r\n### Contents of HDF\r\nNote that \"dataset\" in the figure below refers to the HDF dataset (class).\r\n```\r\nhdf file\r\n\u251c origin : group / raw data\r\n\u2502 \u251c 1 : dataset\r\n\u2502 \u251c 2 : dataset\r\n\u2502 \u251c 3 : dataset\r\n\u2502 \u251c 4 : dataset\r\n\u2502 \u251c 5 : dataset\r\n\u2502 \u2514 \u2026\r\n\u2514 prepro : group / data after preprocessing\r\n\u3000 \u251c custom : group / \"custom\" is any group name\r\n\u3000 \u2502 \u251c 1 : dataset\r\n\u3000 \u2502 \u251c 2 : dataset\r\n\u3000 \u2502 \u251c 3 : dataset\r\n\u3000 \u2502 \u251c 4 : dataset\r\n\u3000 \u2502 \u251c 5 : dataset\r\n\u3000 \u2502 \u2514 \u2026\r\n\u3000 \u2514 custom2 : group\r\n\u3000 \u3000 \u2514 ...omit (1,2,3,4,\u2026)\r\n```\r\n\r\n+ Check the contents with software such as HDFView.\r\n+ Use \"h5py\" or similar to read the HDF file.\r\n ```python\r\n import h5py\r\n with h5py.File(fpath) as h5:\r\n fs = h5[\"prepro/custom\"].attrs[\"fs\"]\r\n dataset_size = h5[\"prepro/custom\"].attrs[\"count\"]\r\n dataset79 = h5[\"prepro/custom/79\"][()] #ch \u00d7 samples\r\n dataset79_label = h5[\"prepro/custom/79\"].attrs[\"label\"]\r\n ```\r\n\r\n### Merge Dataset\r\nIn order to merge, \"dataset_name\" must be set. \r\nIf the order of channels is different for each dataset, the order can be aligned by specifying ch_indexes.\r\n\r\n**Source's preprocessing group is not inherited. In other words, preprocess() must be executed after the merge.**\r\n\r\nExample: Merge source1 and source2 datasets\r\n```python\r\n target = DatasetUpdater(\"new_dataset.h5\",fs=fs)\r\n target.remove_hdf() # reset hdf\r\n s1 = DatasetUpdater(\"source1.h5\",fs=fs,dataset_name=\"source1\")\r\n s2 = DatasetUpdater(\"source2.h5\",fs=fs,dataset_name=\"source2\")\r\n s1_ch_indexes = [1,60,10,5]# channel indexes to use\r\n target.merge_hdf(s1,ch_indexes=s1_ch_indexes)\r\n target.merge_hdf(s2)\r\n```\r\n\r\n## Pull requests / Issues\r\nIf you need anything...\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Building HDF datasets for machine learning.",
"version": "1.0.0",
"project_urls": {
"Download": "https://github.com/s-n-1-0/bci-dataset",
"Homepage": "https://github.com/s-n-1-0/bci-dataset"
},
"split_keywords": [
"eeg"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "923bf090a01733627b99edd22119b66cb3b9a5894ee7107ddea51f206beaff70",
"md5": "f9505a4577c2ab39a8c569bcd125bc83",
"sha256": "3b6d91987022a9c17d0bf82c72ea4a296fff4bb6d7a991a1907ea12644b95464"
},
"downloads": -1,
"filename": "bci_dataset-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f9505a4577c2ab39a8c569bcd125bc83",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7653,
"upload_time": "2023-10-26T09:08:39",
"upload_time_iso_8601": "2023-10-26T09:08:39.471810Z",
"url": "https://files.pythonhosted.org/packages/92/3b/f090a01733627b99edd22119b66cb3b9a5894ee7107ddea51f206beaff70/bci_dataset-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "874609f647c7cfb4f2edaba7cc27f579367957b6a741003a740cb6bd7bb65504",
"md5": "cfa5cfc7513e9709fa8cb9561dae8b86",
"sha256": "ae2bb40ddad32bd50d086fe59dcbb79b0f5ae7236098a71a9b4df3f8e17d2bac"
},
"downloads": -1,
"filename": "bci-dataset-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "cfa5cfc7513e9709fa8cb9561dae8b86",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6112,
"upload_time": "2023-10-26T09:08:41",
"upload_time_iso_8601": "2023-10-26T09:08:41.349487Z",
"url": "https://files.pythonhosted.org/packages/87/46/09f647c7cfb4f2edaba7cc27f579367957b6a741003a740cb6bd7bb65504/bci-dataset-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-26 09:08:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "s-n-1-0",
"github_project": "bci-dataset",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bci-dataset"
}