# TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation
TACaPe (Transformed-based Anti-Cancer Peptide Classification and Generation) is a commandline tool
to train transformer-based models for anticancer peptide classification and generation. I was built
on top of Tensorflow and uses an auto-regressive algorithm for peptide design, which results can
be filtered using an optional classification model.
## Setup
### Installing from PyPI using `pip`
```
$ pip install tacape
```
### Installing from GitHub
```
$ git clone https://github.com/omixlab/anticancer-peptide
$ cd anticancer-peptide
```
#### Using `pip`
```
$ pip install -r requirements.txt -e .
```
#### Using `conda`
```
$ conda env create
$ conda activate anticancer-peptide
```
## Usage
### `tacape-train-classifier`
Trains a classification model for anticancer peptide.
```
$ tacape-train-classifier -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Model Training [-h] --positive-train POSITIVE_TRAIN --negative-train NEGATIVE_TRAIN --positive-test POSITIVE_TEST
--negative-test NEGATIVE_TEST [--format {text,fasta}] --output OUTPUT [--epochs EPOCHS]
optional arguments:
-h, --help show this help message and exit
--positive-train POSITIVE_TRAIN
Input file containing positive peptides for training
--negative-train NEGATIVE_TRAIN
Input file containing negative peptides for training
--positive-test POSITIVE_TEST
Input file containing positive peptides for testing
--negative-test NEGATIVE_TEST
Input file containing negative peptides for testing
--format {text,fasta}
[optional] Input file format (default: text)
--output OUTPUT Path prefix of the output files
--epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)
```
### `tacape-predict`
Runs a classification model for anticancer peptide prediction from a input file.
```
$ tacape-predict -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Predict [-h] --input INPUT [--format {text,fasta}] --classifier-prefix CLASSIFIER_PREFIX --output OUTPUT
optional arguments:
-h, --help show this help message and exit
--input INPUT Input file
--format {text,fasta}
[optional] Input file format (default: text)
--classifier-prefix CLASSIFIER_PREFIX
[optional] Path to the file prefix of the trained classification model
--output OUTPUT Path to the output CSV file
```
### `tacape-train-generator`
Trains a auto-regressive generative model for anticancer peptide.
```
$ tacape-train-generator -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Generative Model Training [-h] --positive-train POSITIVE_TRAIN --positive-test POSITIVE_TEST [--format {text,fasta}] --output
OUTPUT [--epochs EPOCHS]
optional arguments:
-h, --help show this help message and exit
--positive-train POSITIVE_TRAIN
Input file containing positive peptides for training
--positive-test POSITIVE_TEST
Input file containing positive peptides for testing
--format {text,fasta}
[optional] Input file format (default: text)
--output OUTPUT Path prefix of the output files containing the trained model
--epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)
```
### `tacape-generate`
Generates a set of peptides with potential anticancer activity from a trained generative model. If a classification
model é provided, it will be used to filter the generated sequences and compute a probability of activity.
```
$ tacape-generate -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Generate [-h] --generator-prefix GENERATOR_PREFIX [--classifier-prefix CLASSIFIER_PREFIX]
[--number-of-sequences NUMBER_OF_SEQUENCES] [--temperature TEMPERATURE] [--threshold THRESHOLD] --output
OUTPUT
optional arguments:
-h, --help show this help message and exit
--generator-prefix GENERATOR_PREFIX
Path to the file prefix of the trained generative model
--classifier-prefix CLASSIFIER_PREFIX
[optional] Path to the file prefix of the trained classification model
--number-of-sequences NUMBER_OF_SEQUENCES
[optional] Number of sequences to be generated (default: 1000)
--temperature TEMPERATURE
[optional] Temperature used for logit scaling when sampling aminoacids during auto-regressive generation
(default: 1.0)
--threshold THRESHOLD
[optional] Classification probability threshold (default: 0.5)
--output OUTPUT Path to the output CSV file
```
## Example: generating sequences from the AntiCP2 dataset
### Creating a peptide classifier for 100 epochs
```
$ tacape-train-classifier \
--positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
--negative-train data/raw/anti_cp/anticp2_main_internal_negative.txt \
--positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
--negative-test data/raw/anti_cp/anticp2_main_validation_negative.txt \
--output data/models/classifier \
--epochs 100
```
### Run the predictive model on the validation dataset
```
$ tacape-predict \
--input data/raw/anti_cp/anticp2_main_validation_positive.txt \
--format text \
--classifier-prefix data/models/internal \
--output data/models/internal_results.csv
```
### Creating a peptide generator for 100 epochs
```
$ tacape-train-generator \
--positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
--positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
--output data/models/generator \
--epochs 100
```
### Run the generative model to generate 100 sequences
```
$ tacape-generate \
--generator-prefix data/models/generator \
--classifier-prefix data/models/classifier \
--number-of-sequence 100 \
--output data/models/generated.csv
```
### Convert generated peptides to FASTA
```
$ tacape-csv-to-fasta \
--input data/models/generated.csv \
--output data/models/generated.fasta
```
Raw data
{
"_id": null,
"home_page": "https://github.com/omixlab/anticancer-peptide",
"name": "tacape",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "bioinformatics machine-learning data science drug discovery QSAR",
"author": "Isadora Leitzke Guidotti, Frederico Schmitt Kremer",
"author_email": "fred.s.kremer@gmail.com",
"download_url": "",
"platform": null,
"description": "# TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation\n\nTACaPe (Transformed-based Anti-Cancer Peptide Classification and Generation) is a commandline tool\nto train transformer-based models for anticancer peptide classification and generation. I was built \non top of Tensorflow and uses an auto-regressive algorithm for peptide design, which results can\nbe filtered using an optional classification model.\n\n## Setup\n\n### Installing from PyPI using `pip`\n\n```\n$ pip install tacape\n```\n\n### Installing from GitHub \n\n```\n$ git clone https://github.com/omixlab/anticancer-peptide\n$ cd anticancer-peptide\n```\n\n#### Using `pip`\n\n```\n$ pip install -r requirements.txt -e .\n```\n\n#### Using `conda`\n\n```\n$ conda env create\n$ conda activate anticancer-peptide\n```\n\n## Usage\n\n### `tacape-train-classifier`\n\nTrains a classification model for anticancer peptide.\n\n\n```\n$ tacape-train-classifier -h\n\n/\\__ _\\ /\\ __ \\ /\\ ___\\ /\\ __ \\ /\\ == \\ /\\ ___\\ \n\\/_/\\ \\/ \\ \\ __ \\ \\ \\ \\____ \\ \\ __ \\ \\ \\ _-/ \\ \\ __\\ \n \\ \\_\\ \\ \\_\\ \\_\\ \\ \\_____\\ \\ \\_\\ \\_\\ \\ \\_\\ \\ \\_____\\ \n \\/_/ \\/_/\\/_/ \\/_____/ \\/_/\\/_/ \\/_/ \\/_____/ \n\nusage: TACaPe: Model Training [-h] --positive-train POSITIVE_TRAIN --negative-train NEGATIVE_TRAIN --positive-test POSITIVE_TEST\n --negative-test NEGATIVE_TEST [--format {text,fasta}] --output OUTPUT [--epochs EPOCHS]\n\noptional arguments:\n -h, --help show this help message and exit\n --positive-train POSITIVE_TRAIN\n Input file containing positive peptides for training\n --negative-train NEGATIVE_TRAIN\n Input file containing negative peptides for training\n --positive-test POSITIVE_TEST\n Input file containing positive peptides for testing\n --negative-test NEGATIVE_TEST\n Input file containing negative peptides for testing\n --format {text,fasta}\n [optional] Input file format (default: text)\n --output OUTPUT Path prefix of the output files\n --epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)\n```\n\n### `tacape-predict`\n\nRuns a classification model for anticancer peptide prediction from a input file.\n\n```\n$ tacape-predict -h\n\n/\\__ _\\ /\\ __ \\ /\\ ___\\ /\\ __ \\ /\\ == \\ /\\ ___\\ \n\\/_/\\ \\/ \\ \\ __ \\ \\ \\ \\____ \\ \\ __ \\ \\ \\ _-/ \\ \\ __\\ \n \\ \\_\\ \\ \\_\\ \\_\\ \\ \\_____\\ \\ \\_\\ \\_\\ \\ \\_\\ \\ \\_____\\ \n \\/_/ \\/_/\\/_/ \\/_____/ \\/_/\\/_/ \\/_/ \\/_____/ \n\nusage: TACaPe: Predict [-h] --input INPUT [--format {text,fasta}] --classifier-prefix CLASSIFIER_PREFIX --output OUTPUT\n\noptional arguments:\n -h, --help show this help message and exit\n --input INPUT Input file\n --format {text,fasta}\n [optional] Input file format (default: text)\n --classifier-prefix CLASSIFIER_PREFIX\n [optional] Path to the file prefix of the trained classification model\n --output OUTPUT Path to the output CSV file\n```\n\n### `tacape-train-generator`\n\nTrains a auto-regressive generative model for anticancer peptide.\n\n```\n$ tacape-train-generator -h\n\n/\\__ _\\ /\\ __ \\ /\\ ___\\ /\\ __ \\ /\\ == \\ /\\ ___\\ \n\\/_/\\ \\/ \\ \\ __ \\ \\ \\ \\____ \\ \\ __ \\ \\ \\ _-/ \\ \\ __\\ \n \\ \\_\\ \\ \\_\\ \\_\\ \\ \\_____\\ \\ \\_\\ \\_\\ \\ \\_\\ \\ \\_____\\ \n \\/_/ \\/_/\\/_/ \\/_____/ \\/_/\\/_/ \\/_/ \\/_____/ \n\nusage: TACaPe: Generative Model Training [-h] --positive-train POSITIVE_TRAIN --positive-test POSITIVE_TEST [--format {text,fasta}] --output\n OUTPUT [--epochs EPOCHS]\n\noptional arguments:\n -h, --help show this help message and exit\n --positive-train POSITIVE_TRAIN\n Input file containing positive peptides for training\n --positive-test POSITIVE_TEST\n Input file containing positive peptides for testing\n --format {text,fasta}\n [optional] Input file format (default: text)\n --output OUTPUT Path prefix of the output files containing the trained model\n --epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)\n```\n\n### `tacape-generate`\n\nGenerates a set of peptides with potential anticancer activity from a trained generative model. If a classification\nmodel \u00e9 provided, it will be used to filter the generated sequences and compute a probability of activity.\n\n```\n$ tacape-generate -h\n\n/\\__ _\\ /\\ __ \\ /\\ ___\\ /\\ __ \\ /\\ == \\ /\\ ___\\ \n\\/_/\\ \\/ \\ \\ __ \\ \\ \\ \\____ \\ \\ __ \\ \\ \\ _-/ \\ \\ __\\ \n \\ \\_\\ \\ \\_\\ \\_\\ \\ \\_____\\ \\ \\_\\ \\_\\ \\ \\_\\ \\ \\_____\\ \n \\/_/ \\/_/\\/_/ \\/_____/ \\/_/\\/_/ \\/_/ \\/_____/ \n\nusage: TACaPe: Generate [-h] --generator-prefix GENERATOR_PREFIX [--classifier-prefix CLASSIFIER_PREFIX]\n [--number-of-sequences NUMBER_OF_SEQUENCES] [--temperature TEMPERATURE] [--threshold THRESHOLD] --output\n OUTPUT\n\noptional arguments:\n -h, --help show this help message and exit\n --generator-prefix GENERATOR_PREFIX\n Path to the file prefix of the trained generative model\n --classifier-prefix CLASSIFIER_PREFIX\n [optional] Path to the file prefix of the trained classification model\n --number-of-sequences NUMBER_OF_SEQUENCES\n [optional] Number of sequences to be generated (default: 1000)\n --temperature TEMPERATURE\n [optional] Temperature used for logit scaling when sampling aminoacids during auto-regressive generation\n (default: 1.0)\n --threshold THRESHOLD\n [optional] Classification probability threshold (default: 0.5)\n --output OUTPUT Path to the output CSV file\n```\n\n## Example: generating sequences from the AntiCP2 dataset\n\n### Creating a peptide classifier for 100 epochs\n\n```\n$ tacape-train-classifier \\\n --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \\\n --negative-train data/raw/anti_cp/anticp2_main_internal_negative.txt \\\n --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \\\n --negative-test data/raw/anti_cp/anticp2_main_validation_negative.txt \\\n --output data/models/classifier \\\n --epochs 100\n```\n\n### Run the predictive model on the validation dataset\n\n```\n$ tacape-predict \\\n --input data/raw/anti_cp/anticp2_main_validation_positive.txt \\\n --format text \\\n --classifier-prefix data/models/internal \\\n --output data/models/internal_results.csv\n```\n\n### Creating a peptide generator for 100 epochs\n\n```\n$ tacape-train-generator \\\n --positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \\\n --positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \\\n --output data/models/generator \\\n --epochs 100\n```\n\n### Run the generative model to generate 100 sequences\n\n```\n$ tacape-generate \\\n --generator-prefix data/models/generator \\\n --classifier-prefix data/models/classifier \\\n --number-of-sequence 100 \\\n --output data/models/generated.csv\n```\n\n### Convert generated peptides to FASTA\n\n```\n$ tacape-csv-to-fasta \\\n --input data/models/generated.csv \\\n --output data/models/generated.fasta\n```\n \n\n",
"bugtrack_url": null,
"license": "",
"summary": "TACaPe: Transformed-based Anti-Cancer Peptide Classification and Generation",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/omixlab/anticancer-peptide"
},
"split_keywords": [
"bioinformatics",
"machine-learning",
"data",
"science",
"drug",
"discovery",
"qsar"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "714774798ad33c381990d0da57728237758b381f159dafac93e848a7b3ae94b8",
"md5": "f8baf36ec77f3e869de718e79fb638ef",
"sha256": "61c69b1dcec0cd371cf80b6598e7de63d9220e429ad54e7e48333bb073360eeb"
},
"downloads": -1,
"filename": "tacape-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f8baf36ec77f3e869de718e79fb638ef",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 11592,
"upload_time": "2023-07-28T18:19:09",
"upload_time_iso_8601": "2023-07-28T18:19:09.694030Z",
"url": "https://files.pythonhosted.org/packages/71/47/74798ad33c381990d0da57728237758b381f159dafac93e848a7b3ae94b8/tacape-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-28 18:19:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "omixlab",
"github_project": "anticancer-peptide",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "tacape"
}