# TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation
TACaPe (Transformed-based Anti-Cancer Peptide Classification and Generation) is a commandline tool
to train transformer-based models for anticancer peptide classification and generation. I was built 
on top of Tensorflow and uses an auto-regressive algorithm for peptide design, which results can
be filtered using an optional classification model.
## Setup
### Installing from PyPI using `pip`
```
$ pip install tacape
```
### Installing from GitHub 
```
$ git clone https://github.com/omixlab/anticancer-peptide
$ cd anticancer-peptide
```
#### Using `pip`
```
$ pip install -r requirements.txt -e .
```
#### Using `conda`
```
$ conda env create
$ conda activate anticancer-peptide
```
## Usage
### `tacape-train-classifier`
Trains a classification model for anticancer peptide.
```
$ tacape-train-classifier -h
/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 
usage: TACaPe: Model Training [-h] --positive-train POSITIVE_TRAIN --negative-train NEGATIVE_TRAIN --positive-test POSITIVE_TEST
                              --negative-test NEGATIVE_TEST [--format {text,fasta}] --output OUTPUT [--epochs EPOCHS]
optional arguments:
  -h, --help            show this help message and exit
  --positive-train POSITIVE_TRAIN
                        Input file containing positive peptides for training
  --negative-train NEGATIVE_TRAIN
                        Input file containing negative peptides for training
  --positive-test POSITIVE_TEST
                        Input file containing positive peptides for testing
  --negative-test NEGATIVE_TEST
                        Input file containing negative peptides for testing
  --format {text,fasta}
                        [optional] Input file format (default: text)
  --output OUTPUT       Path prefix of the output files
  --epochs EPOCHS       [optional] Number of epochs to be used during training (default: 30)
```
### `tacape-predict`
Runs a classification model for anticancer peptide prediction from a input file.
```
$ tacape-predict -h
/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 
usage: TACaPe: Predict [-h] --input INPUT [--format {text,fasta}] --classifier-prefix CLASSIFIER_PREFIX --output OUTPUT
optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         Input file
  --format {text,fasta}
                        [optional] Input file format (default: text)
  --classifier-prefix CLASSIFIER_PREFIX
                        [optional] Path to the file prefix of the trained classification model
  --output OUTPUT       Path to the output CSV file
```
### `tacape-train-generator`
Trains a auto-regressive generative model for anticancer peptide.
```
$ tacape-train-generator -h
/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 
usage: TACaPe: Generative Model Training [-h] --positive-train POSITIVE_TRAIN --positive-test POSITIVE_TEST [--format {text,fasta}] --output
                              OUTPUT [--epochs EPOCHS]
optional arguments:
  -h, --help            show this help message and exit
  --positive-train POSITIVE_TRAIN
                        Input file containing positive peptides for training
  --positive-test POSITIVE_TEST
                        Input file containing positive peptides for testing
  --format {text,fasta}
                        [optional] Input file format (default: text)
  --output OUTPUT       Path prefix of the output files containing the trained model
  --epochs EPOCHS       [optional] Number of epochs to be used during training (default: 30)
```
### `tacape-generate`
Generates a set of peptides with potential anticancer activity from a trained generative model. If a classification
model é provided, it will be used to filter the generated sequences and compute a probability of activity.
```
$ tacape-generate -h
/\__  _\ /\  __ \   /\  ___\   /\  __ \   /\  == \ /\  ___\   
\/_/\ \/ \ \  __ \  \ \ \____  \ \  __ \  \ \  _-/ \ \  __\   
   \ \_\  \ \_\ \_\  \ \_____\  \ \_\ \_\  \ \_\    \ \_____\ 
    \/_/   \/_/\/_/   \/_____/   \/_/\/_/   \/_/     \/_____/ 
usage: TACaPe: Generate [-h] --generator-prefix GENERATOR_PREFIX [--classifier-prefix CLASSIFIER_PREFIX]
                        [--number-of-sequences NUMBER_OF_SEQUENCES] [--temperature TEMPERATURE] [--threshold THRESHOLD] --output
                        OUTPUT
optional arguments:
  -h, --help            show this help message and exit
  --generator-prefix GENERATOR_PREFIX
                        Path to the file prefix of the trained generative model
  --classifier-prefix CLASSIFIER_PREFIX
                        [optional] Path to the file prefix of the trained classification model
  --number-of-sequences NUMBER_OF_SEQUENCES
                        [optional] Number of sequences to be generated (default: 1000)
  --temperature TEMPERATURE
                        [optional] Temperature used for logit scaling when sampling aminoacids during auto-regressive generation
                        (default: 1.0)
  --threshold THRESHOLD
                        [optional] Classification probability threshold (default: 0.5)
  --output OUTPUT       Path to the output CSV file
```
## Example: generating sequences from the AntiCP2 dataset
### Creating a peptide classifier for 100 epochs
```
$ tacape-train-classifier \
    --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
    --negative-train data/raw/anti_cp/anticp2_main_internal_negative.txt \
    --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
    --negative-test data/raw/anti_cp/anticp2_main_validation_negative.txt \
    --output data/models/classifier \
    --epochs 100
```
### Run the predictive model on the validation dataset
```
$ tacape-predict \
    --input data/raw/anti_cp/anticp2_main_validation_positive.txt \
    --format text \
    --classifier-prefix data/models/internal \
    --output data/models/internal_results.csv
```
### Creating a peptide generator for 100 epochs
```
$ tacape-train-generator \
    --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
    --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
    --output data/models/generator \
    --epochs 100
```
### Run the generative model to generate 100 sequences
```
$ tacape-generate \
    --generator-prefix data/models/generator \
    --classifier-prefix data/models/classifier \
    --number-of-sequence 100 \
    --output data/models/generated.csv
```
### Convert generated peptides to FASTA
```
$ tacape-csv-to-fasta \
    --input data/models/generated.csv \
    --output data/models/generated.fasta
```
 
            
         
        Raw data
        
            {
    "_id": null,
    "home_page": "https://github.com/omixlab/anticancer-peptide",
    "name": "tacape",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "bioinformatics machine-learning data science drug discovery QSAR",
    "author": "Isadora Leitzke Guidotti, Frederico Schmitt Kremer",
    "author_email": "fred.s.kremer@gmail.com",
    "download_url": "",
    "platform": null,
    "description": "# TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation\n\nTACaPe (Transformed-based Anti-Cancer Peptide Classification and Generation) is a commandline tool\nto train transformer-based models for anticancer peptide classification and generation. I was built \non top of Tensorflow and uses an auto-regressive algorithm for peptide design, which results can\nbe filtered using an optional classification model.\n\n## Setup\n\n### Installing from PyPI using `pip`\n\n```\n$ pip install tacape\n```\n\n### Installing from GitHub \n\n```\n$ git clone https://github.com/omixlab/anticancer-peptide\n$ cd anticancer-peptide\n```\n\n#### Using `pip`\n\n```\n$ pip install -r requirements.txt -e .\n```\n\n#### Using `conda`\n\n```\n$ conda env create\n$ conda activate anticancer-peptide\n```\n\n## Usage\n\n### `tacape-train-classifier`\n\nTrains a classification model for anticancer peptide.\n\n\n```\n$ tacape-train-classifier -h\n\n/\\__  _\\ /\\  __ \\   /\\  ___\\   /\\  __ \\   /\\  == \\ /\\  ___\\   \n\\/_/\\ \\/ \\ \\  __ \\  \\ \\ \\____  \\ \\  __ \\  \\ \\  _-/ \\ \\  __\\   \n   \\ \\_\\  \\ \\_\\ \\_\\  \\ \\_____\\  \\ \\_\\ \\_\\  \\ \\_\\    \\ \\_____\\ \n    \\/_/   \\/_/\\/_/   \\/_____/   \\/_/\\/_/   \\/_/     \\/_____/ \n\nusage: TACaPe: Model Training [-h] --positive-train POSITIVE_TRAIN --negative-train NEGATIVE_TRAIN --positive-test POSITIVE_TEST\n                              --negative-test NEGATIVE_TEST [--format {text,fasta}] --output OUTPUT [--epochs EPOCHS]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --positive-train POSITIVE_TRAIN\n                        Input file containing positive peptides for training\n  --negative-train NEGATIVE_TRAIN\n                        Input file containing negative peptides for training\n  --positive-test POSITIVE_TEST\n                        Input file containing positive peptides for testing\n  --negative-test NEGATIVE_TEST\n                        Input file containing negative peptides for testing\n  --format {text,fasta}\n                        [optional] Input file format (default: text)\n  --output OUTPUT       Path prefix of the output files\n  --epochs EPOCHS       [optional] Number of epochs to be used during training (default: 30)\n```\n\n### `tacape-predict`\n\nRuns a classification model for anticancer peptide prediction from a input file.\n\n```\n$ tacape-predict -h\n\n/\\__  _\\ /\\  __ \\   /\\  ___\\   /\\  __ \\   /\\  == \\ /\\  ___\\   \n\\/_/\\ \\/ \\ \\  __ \\  \\ \\ \\____  \\ \\  __ \\  \\ \\  _-/ \\ \\  __\\   \n   \\ \\_\\  \\ \\_\\ \\_\\  \\ \\_____\\  \\ \\_\\ \\_\\  \\ \\_\\    \\ \\_____\\ \n    \\/_/   \\/_/\\/_/   \\/_____/   \\/_/\\/_/   \\/_/     \\/_____/ \n\nusage: TACaPe: Predict [-h] --input INPUT [--format {text,fasta}] --classifier-prefix CLASSIFIER_PREFIX --output OUTPUT\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --input INPUT         Input file\n  --format {text,fasta}\n                        [optional] Input file format (default: text)\n  --classifier-prefix CLASSIFIER_PREFIX\n                        [optional] Path to the file prefix of the trained classification model\n  --output OUTPUT       Path to the output CSV file\n```\n\n### `tacape-train-generator`\n\nTrains a auto-regressive generative model for anticancer peptide.\n\n```\n$ tacape-train-generator -h\n\n/\\__  _\\ /\\  __ \\   /\\  ___\\   /\\  __ \\   /\\  == \\ /\\  ___\\   \n\\/_/\\ \\/ \\ \\  __ \\  \\ \\ \\____  \\ \\  __ \\  \\ \\  _-/ \\ \\  __\\   \n   \\ \\_\\  \\ \\_\\ \\_\\  \\ \\_____\\  \\ \\_\\ \\_\\  \\ \\_\\    \\ \\_____\\ \n    \\/_/   \\/_/\\/_/   \\/_____/   \\/_/\\/_/   \\/_/     \\/_____/ \n\nusage: TACaPe: Generative Model Training [-h] --positive-train POSITIVE_TRAIN --positive-test POSITIVE_TEST [--format {text,fasta}] --output\n                              OUTPUT [--epochs EPOCHS]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --positive-train POSITIVE_TRAIN\n                        Input file containing positive peptides for training\n  --positive-test POSITIVE_TEST\n                        Input file containing positive peptides for testing\n  --format {text,fasta}\n                        [optional] Input file format (default: text)\n  --output OUTPUT       Path prefix of the output files containing the trained model\n  --epochs EPOCHS       [optional] Number of epochs to be used during training (default: 30)\n```\n\n### `tacape-generate`\n\nGenerates a set of peptides with potential anticancer activity from a trained generative model. If a classification\nmodel \u00e9 provided, it will be used to filter the generated sequences and compute a probability of activity.\n\n```\n$ tacape-generate -h\n\n/\\__  _\\ /\\  __ \\   /\\  ___\\   /\\  __ \\   /\\  == \\ /\\  ___\\   \n\\/_/\\ \\/ \\ \\  __ \\  \\ \\ \\____  \\ \\  __ \\  \\ \\  _-/ \\ \\  __\\   \n   \\ \\_\\  \\ \\_\\ \\_\\  \\ \\_____\\  \\ \\_\\ \\_\\  \\ \\_\\    \\ \\_____\\ \n    \\/_/   \\/_/\\/_/   \\/_____/   \\/_/\\/_/   \\/_/     \\/_____/ \n\nusage: TACaPe: Generate [-h] --generator-prefix GENERATOR_PREFIX [--classifier-prefix CLASSIFIER_PREFIX]\n                        [--number-of-sequences NUMBER_OF_SEQUENCES] [--temperature TEMPERATURE] [--threshold THRESHOLD] --output\n                        OUTPUT\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --generator-prefix GENERATOR_PREFIX\n                        Path to the file prefix of the trained generative model\n  --classifier-prefix CLASSIFIER_PREFIX\n                        [optional] Path to the file prefix of the trained classification model\n  --number-of-sequences NUMBER_OF_SEQUENCES\n                        [optional] Number of sequences to be generated (default: 1000)\n  --temperature TEMPERATURE\n                        [optional] Temperature used for logit scaling when sampling aminoacids during auto-regressive generation\n                        (default: 1.0)\n  --threshold THRESHOLD\n                        [optional] Classification probability threshold (default: 0.5)\n  --output OUTPUT       Path to the output CSV file\n```\n\n## Example: generating sequences from the AntiCP2 dataset\n\n### Creating a peptide classifier for 100 epochs\n\n```\n$ tacape-train-classifier \\\n    --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \\\n    --negative-train data/raw/anti_cp/anticp2_main_internal_negative.txt \\\n    --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \\\n    --negative-test data/raw/anti_cp/anticp2_main_validation_negative.txt \\\n    --output data/models/classifier \\\n    --epochs 100\n```\n\n### Run the predictive model on the validation dataset\n\n```\n$ tacape-predict \\\n    --input data/raw/anti_cp/anticp2_main_validation_positive.txt \\\n    --format text \\\n    --classifier-prefix data/models/internal \\\n    --output data/models/internal_results.csv\n```\n\n### Creating a peptide generator for 100 epochs\n\n```\n$ tacape-train-generator \\\n    --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \\\n    --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \\\n    --output data/models/generator \\\n    --epochs 100\n```\n\n### Run the generative model to generate 100 sequences\n\n```\n$ tacape-generate \\\n    --generator-prefix data/models/generator \\\n    --classifier-prefix data/models/classifier \\\n    --number-of-sequence 100 \\\n    --output data/models/generated.csv\n```\n\n### Convert generated peptides to FASTA\n\n```\n$ tacape-csv-to-fasta \\\n    --input data/models/generated.csv \\\n    --output data/models/generated.fasta\n```\n \n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation",
    "version": "0.0.6",
    "project_urls": {
        "Homepage": "https://github.com/omixlab/anticancer-peptide"
    },
    "split_keywords": [
        "bioinformatics",
        "machine-learning",
        "data",
        "science",
        "drug",
        "discovery",
        "qsar"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "714774798ad33c381990d0da57728237758b381f159dafac93e848a7b3ae94b8",
                "md5": "f8baf36ec77f3e869de718e79fb638ef",
                "sha256": "61c69b1dcec0cd371cf80b6598e7de63d9220e429ad54e7e48333bb073360eeb"
            },
            "downloads": -1,
            "filename": "tacape-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8baf36ec77f3e869de718e79fb638ef",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11592,
            "upload_time": "2023-07-28T18:19:09",
            "upload_time_iso_8601": "2023-07-28T18:19:09.694030Z",
            "url": "https://files.pythonhosted.org/packages/71/47/74798ad33c381990d0da57728237758b381f159dafac93e848a7b3ae94b8/tacape-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-28 18:19:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "omixlab",
    "github_project": "anticancer-peptide",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "tacape"
}