chopin2

Name	chopin2 JSON
Version	1.0.9.post1 JSON
	download
home_page	http://github.com/cumbof/chopin2
Summary	Supervised Classification with Hyperdimensional Computing
upload_time	2024-06-14 20:50:29
maintainer	None
docs_url	None
author	Fabio Cumbo
requires_python	None
license	LICENSE
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # chopin2
Supervised **C**lassification with **H**yperdimensional C**o**m**p**ut**in**g.

![Conda](https://img.shields.io/conda/dn/conda-forge/chopin2?label=chopin2%20on%20Conda)

#### Originally forked from [https://github.com/moimani/HD-Permutaion](https://github.com/moimani/HD-Permutaion)

This repository includes some Python 3.8 utilities to build a Hyperdimensional Computing classification model according to the architecture
originally introduced in [https://doi.org/10.1109/DAC.2018.8465708](https://doi.org/10.1109/DAC.2018.8465708)

The `src/generators` folder contains two Python 3.8 scripts able to create training a test datasets with randomly selected samples from:
- BRCA, KIRP, and THCA DNA-Methylation data from the paper [Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers](https://doi.org/10.1016/j.bdr.2018.02.005) by Fabrizio Celli, Fabio Cumbo, and Emanuel Weitschek;
- Gene-expression quantification and Methylation Beta Value experiments provided by [OpenGDC](https://github.com/cumbof/OpenGDC/) for all the 33 different types of tumors of the TCGA program.

Due to the size of the datasets, they have not been reported on this repository but can be retrieved from: 
- [ftp://bioinformatics.iasi.cnr.it/public/bigbiocl_dna-meth_data/](ftp://bioinformatics.iasi.cnr.it/public/bigbiocl_dna-meth_data/)
- [http://geco.deib.polimi.it/opengdc/](http://geco.deib.polimi.it/opengdc/) and [https://github.com/cumbof/OpenGDC/](https://github.com/cumbof/OpenGDC/)

The `isolet` dataset is part of the original forked version of the repository and it has been maintained in order to provide a simple 
toy model for testing purposes only.

### Install

We deployed `chopin2` as a Python 3.8 package that can be installed through `pip` and `conda`, as well as a Docker image.

Please, use one of the following commands to start playing with `chopin2`:

```
# Install chopin2 with pip
pip install chopin2

# Install chopin2 with conda
conda install -c conda-forge chopin2

# Initialise the Docker image
docker build -t chopin2 .
docker run -it chopin2
```

Please note that `chopin2` is also available as a Galaxy tool. It's wrapper is available under the official Galaxy ToolShed at [https://toolshed.g2.bx.psu.edu/view/fabio/chopin2](https://toolshed.g2.bx.psu.edu/view/fabio/chopin2)

### Usage

Once installed, you are ready to start playing with `chopin2`.

Try running the following command to run `chopin2` on the `isolet` dataset:
```
chopin2 --dimensionality 10000 \
        --levels 100 \
        --retrain 10 \
        --pickle ../dataset/isolet/isolet.pkl \
        --psplit_training 80 \
        --dump \
        --nproc 4 \
        --verbose
```

In order to run it on Spark, other arguments must be specified:
```
chopin2 --dimensionality 10000 \
        --levels 100 \
        --retrain 10 \
        --pickle ../dataset/isolet/isolet.pkl \
        --psplit_training 80 \
        --dump \
        --spark \
        --slices 10 \
        --master local \
        --memory 2048m \
        --verbose
```

List of standard arguments:
```
--dimensionality    -- Dimensionality of the HD model (default 10000)
--levels            -- Number of level hypervectors (default 2)
--retrain           -- Number of retraining iterations (default 0)
--stop              -- Stop retraining if the error rate does not change (default False)
--dataset           -- Path to the dataset file
--fieldsep          -- Field separator (default ",")
--psplit_training   -- Percentage of observations that will be used to train the model. 
                       The remaining percentage will be used to test the classification model
--crossv_k          -- Number of folds for cross validation.
                       Cross validate HD models if --k_folds greater than 1
--seed              -- Seed for reproducing random sampling of the observations in the dataset 
                       and build both the training and test set (default 0)
--pickle            -- Path to the pickle file. If specified, "--dataset", "--fieldsep", and "--training" parameters are not used
--dump              -- Build a summary and log files (default False)
--cleanup           -- Delete the classification model as soon as it produces the prediction accuracy (default False)
--keep_levels       -- Do not delete the level hypervectors. It works in conjunction with --cleanup only (default True)
--nproc             -- Number of parallel jobs for the creation of the HD model.
                       This argument is ignored if --spark is enabled (default 1)
--verbose           -- Print results in real time (default False)
--cite              -- Print references and exit
-v, --version       -- Print the current chopin2.py version and exit
```

List of arguments to enable backward variable selection:
```
--features                     -- Path to a file with a single column containing the whole set or a subset of feature
--select_features              -- This triggers the backward variable selection method for the identification of the most significant features.
                                  Warning: computationally intense!
--group_min                    -- Minimum number of features among those specified with the --features argument (default 1)
--accuracy_threshold           -- Stop the execution if the best accuracy achieved during the previous group of runs is lower than this number (default 60.0)
--accuracy_uncertainty_perc    -- Take a run into account even if its accuracy is lower than the best accuracy achieved in the same group minus its "accuracy_uncertainty_perc" percent
```

List of argument for the execution of the classifier on a Spark distributed environment:
```
--spark     -- Build the classification model in a Apache Spark distributed environment
--slices    -- Number of slices in case --spark argument is enabled. 
               This argument is ignored if --gpu is enabled
--master    -- Master node address
--memory    -- Executor memory
```

List of arguments for the execution of the classifier on NVidia powered GPUs:
```
--gpu       -- Build the classification model on an NVidia powered GPU. 
               This argument is ignored if --spark is specified
--tblock    -- Number of threads per block in case --gpu argument is enabled. 
               This argument is ignored if --spark is enabled
```

### Credits

Please credit our work in your manuscript by citing:

> Fabio Cumbo, Eleonora Cappelli, and Emanuel Weitschek, "A brain-inspired hyperdimensional computing approach for classifying massive DNA methylation data of cancer", MDPI Algorithms, 2020 [https://doi.org/10.3390/a13090233](https://doi.org/10.3390/a13090233)

> Fabio Cumbo, Emanuel Weitschek, and Daniel Blankenberg, "hdlib: A Python library for designing Vector-Symbolic Architectures", Journal of Open Source Software, 2023 [https://doi.org/10.21105/joss.05704](https://doi.org/10.21105/joss.05704)

Do not forget to also cite the following paper from which this works takes inspiration:

> Mohsen Imani, Chenyu Huang , Dequian Kong, Tajana Rosing, "Hierarchical Hyperdimensional Computing for Energy Efficient Classification", IEEE/ACM Design Automation Conference (DAC), 2018 [https://doi.org/10.1109/DAC.2018.8465708](https://doi.org/10.1109/DAC.2018.8465708)

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/cumbof/chopin2",
    "name": "chopin2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Fabio Cumbo",
    "author_email": "fabio.cumbo@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d1/8e/92cf7e6a404aa4ca388cba8065c566200a83223126377b9ff9801b62f7b7/chopin2-1.0.9.post1.tar.gz",
    "platform": null,
    "description": "# chopin2\nSupervised **C**lassification with **H**yperdimensional C**o**m**p**ut**in**g.\n\n![Conda](https://img.shields.io/conda/dn/conda-forge/chopin2?label=chopin2%20on%20Conda)\n\n#### Originally forked from [https://github.com/moimani/HD-Permutaion](https://github.com/moimani/HD-Permutaion)\n\nThis repository includes some Python 3.8 utilities to build a Hyperdimensional Computing classification model according to the architecture\noriginally introduced in [https://doi.org/10.1109/DAC.2018.8465708](https://doi.org/10.1109/DAC.2018.8465708)\n\nThe `src/generators` folder contains two Python 3.8 scripts able to create training a test datasets with randomly selected samples from:\n- BRCA, KIRP, and THCA DNA-Methylation data from the paper [Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers](https://doi.org/10.1016/j.bdr.2018.02.005) by Fabrizio Celli, Fabio Cumbo, and Emanuel Weitschek;\n- Gene-expression quantification and Methylation Beta Value experiments provided by [OpenGDC](https://github.com/cumbof/OpenGDC/) for all the 33 different types of tumors of the TCGA program.\n\nDue to the size of the datasets, they have not been reported on this repository but can be retrieved from: \n- [ftp://bioinformatics.iasi.cnr.it/public/bigbiocl_dna-meth_data/](ftp://bioinformatics.iasi.cnr.it/public/bigbiocl_dna-meth_data/)\n- [http://geco.deib.polimi.it/opengdc/](http://geco.deib.polimi.it/opengdc/) and [https://github.com/cumbof/OpenGDC/](https://github.com/cumbof/OpenGDC/)\n\nThe `isolet` dataset is part of the original forked version of the repository and it has been maintained in order to provide a simple \ntoy model for testing purposes only.\n\n### Install\n\nWe deployed `chopin2` as a Python 3.8 package that can be installed through `pip` and `conda`, as well as a Docker image.\n\nPlease, use one of the following commands to start playing with `chopin2`:\n\n```\n# Install chopin2 with pip\npip install chopin2\n\n# Install chopin2 with conda\nconda install -c conda-forge chopin2\n\n# Initialise the Docker image\ndocker build -t chopin2 .\ndocker run -it chopin2\n```\n\nPlease note that `chopin2` is also available as a Galaxy tool. It's wrapper is available under the official Galaxy ToolShed at [https://toolshed.g2.bx.psu.edu/view/fabio/chopin2](https://toolshed.g2.bx.psu.edu/view/fabio/chopin2)\n\n### Usage\n\nOnce installed, you are ready to start playing with `chopin2`.\n\nTry running the following command to run `chopin2` on the `isolet` dataset:\n```\nchopin2 --dimensionality 10000 \\\n        --levels 100 \\\n        --retrain 10 \\\n        --pickle ../dataset/isolet/isolet.pkl \\\n        --psplit_training 80 \\\n        --dump \\\n        --nproc 4 \\\n        --verbose\n```\n\nIn order to run it on Spark, other arguments must be specified:\n```\nchopin2 --dimensionality 10000 \\\n        --levels 100 \\\n        --retrain 10 \\\n        --pickle ../dataset/isolet/isolet.pkl \\\n        --psplit_training 80 \\\n        --dump \\\n        --spark \\\n        --slices 10 \\\n        --master local \\\n        --memory 2048m \\\n        --verbose\n```\n\nList of standard arguments:\n```\n--dimensionality    -- Dimensionality of the HD model (default 10000)\n--levels            -- Number of level hypervectors (default 2)\n--retrain           -- Number of retraining iterations (default 0)\n--stop              -- Stop retraining if the error rate does not change (default False)\n--dataset           -- Path to the dataset file\n--fieldsep          -- Field separator (default \",\")\n--psplit_training   -- Percentage of observations that will be used to train the model. \n                       The remaining percentage will be used to test the classification model\n--crossv_k          -- Number of folds for cross validation.\n                       Cross validate HD models if --k_folds greater than 1\n--seed              -- Seed for reproducing random sampling of the observations in the dataset \n                       and build both the training and test set (default 0)\n--pickle            -- Path to the pickle file. If specified, \"--dataset\", \"--fieldsep\", and \"--training\" parameters are not used\n--dump              -- Build a summary and log files (default False)\n--cleanup           -- Delete the classification model as soon as it produces the prediction accuracy (default False)\n--keep_levels       -- Do not delete the level hypervectors. It works in conjunction with --cleanup only (default True)\n--nproc             -- Number of parallel jobs for the creation of the HD model.\n                       This argument is ignored if --spark is enabled (default 1)\n--verbose           -- Print results in real time (default False)\n--cite              -- Print references and exit\n-v, --version       -- Print the current chopin2.py version and exit\n```\n\nList of arguments to enable backward variable selection:\n```\n--features                     -- Path to a file with a single column containing the whole set or a subset of feature\n--select_features              -- This triggers the backward variable selection method for the identification of the most significant features.\n                                  Warning: computationally intense!\n--group_min                    -- Minimum number of features among those specified with the --features argument (default 1)\n--accuracy_threshold           -- Stop the execution if the best accuracy achieved during the previous group of runs is lower than this number (default 60.0)\n--accuracy_uncertainty_perc    -- Take a run into account even if its accuracy is lower than the best accuracy achieved in the same group minus its \"accuracy_uncertainty_perc\" percent\n```\n\nList of argument for the execution of the classifier on a Spark distributed environment:\n```\n--spark     -- Build the classification model in a Apache Spark distributed environment\n--slices    -- Number of slices in case --spark argument is enabled. \n               This argument is ignored if --gpu is enabled\n--master    -- Master node address\n--memory    -- Executor memory\n```\n\nList of arguments for the execution of the classifier on NVidia powered GPUs:\n```\n--gpu       -- Build the classification model on an NVidia powered GPU. \n               This argument is ignored if --spark is specified\n--tblock    -- Number of threads per block in case --gpu argument is enabled. \n               This argument is ignored if --spark is enabled\n```\n\n### Credits\n\nPlease credit our work in your manuscript by citing:\n\n> Fabio Cumbo, Eleonora Cappelli, and Emanuel Weitschek, \"A brain-inspired hyperdimensional computing approach for classifying massive DNA methylation data of cancer\", MDPI Algorithms, 2020 [https://doi.org/10.3390/a13090233](https://doi.org/10.3390/a13090233)\n\n> Fabio Cumbo, Emanuel Weitschek, and Daniel Blankenberg, \"hdlib: A Python library for designing Vector-Symbolic Architectures\", Journal of Open Source Software, 2023 [https://doi.org/10.21105/joss.05704](https://doi.org/10.21105/joss.05704)\n\nDo not forget to also cite the following paper from which this works takes inspiration:\n\n> Mohsen Imani, Chenyu Huang , Dequian Kong, Tajana Rosing, \"Hierarchical Hyperdimensional Computing for Energy Efficient Classification\", IEEE/ACM Design Automation Conference (DAC), 2018 [https://doi.org/10.1109/DAC.2018.8465708](https://doi.org/10.1109/DAC.2018.8465708)\n",
    "bugtrack_url": null,
    "license": "LICENSE",
    "summary": "Supervised Classification with Hyperdimensional Computing",
    "version": "1.0.9.post1",
    "project_urls": {
        "Homepage": "http://github.com/cumbof/chopin2"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d18e92cf7e6a404aa4ca388cba8065c566200a83223126377b9ff9801b62f7b7",
                "md5": "046fe45044fac2ce50cc76745a076ba7",
                "sha256": "7c9216facfb89c8c999b158a499170483dadc25f5638ce4b3724299b4e48258f"
            },
            "downloads": -1,
            "filename": "chopin2-1.0.9.post1.tar.gz",
            "has_sig": false,
            "md5_digest": "046fe45044fac2ce50cc76745a076ba7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 31459,
            "upload_time": "2024-06-14T20:50:29",
            "upload_time_iso_8601": "2024-06-14T20:50:29.259931Z",
            "url": "https://files.pythonhosted.org/packages/d1/8e/92cf7e6a404aa4ca388cba8065c566200a83223126377b9ff9801b62f7b7/chopin2-1.0.9.post1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-14 20:50:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cumbof",
    "github_project": "chopin2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "chopin2"
}

Fabio Cumbo