epicore

Name	epicore JSON
Version	0.1.6 JSON
	download
home_page	None
Summary	Compute core epitopes from multiple overlapping peptides.
upload_time	2025-07-30 07:34:21
maintainer	None
docs_url	None
author	None
requires_python	>=3.12
license	MIT license
keywords	peptides epitopes core epitopes
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # epicore
This tool is an adaption from [plateau](https://plateau.bcp.fu-berlin.de/).

## General purpose
The tool can be used to identify and quantify shared consensus epitopes. 

## Installation
[![Install with pip](https://img.shields.io/badge/install%20with-pip-brightgreen?style=flat-square)](https://test.pypi.org/project/epicore/)
```bash
pip install epicore
```

![Install with bioconda](https://img.shields.io/badge/install%20with-bioconda-blue?style=flat-square)
```bash
conda install bioconda::epicore
```

## How to use
To compute the consensus epitopes enter the following command:
```bash
epicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> generate-epicore-csv --min_epi_length <MIN_EPI_LENGTH> --min_overlap <MIN_OVERLAP> --max_step_size <MAX_STEP_SIZE> --seq_column <SEQ_COLUMN> --protacc_column <PROTACC_COLUMN> --delimiter <DELIMITER> [--intensity_column <INTENSITY_COLUMN> --start_column <START_COLUMN> --end_column <END_COLUMN> --mod_pattern <MOD_PATTERN> --report --html] --evidence_file <EVIDENCE_FILE>
```
Replace ```EVIDENCE_FILE``` with the path to your evidence file and ```PROTEOME_FILE``` with the path to the proteome FASTA file, that was used to generate the evidence file. You can find more detailed information about the input data [here](#input).  

To visualize the landscape of a protein you can use the following command:
```bash
epicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> plot-landscape --epicore_csv <EPICORE_RESULT> --protacc <PROTACC>
```
Replace ```EPICORE_RESULT``` with the file epicore_result.csv, which can be generated by using the generate-epicore-csv command. 

### Input 
The description of each parameter can be found in the table below. Parameters enclosed in square brackets are optional. Parameters highlighted with 🟢 are necessary for the plot-landscape command. Parameters highlighted with 🔴 are necessary for the generate-epicore-csv command. The tool supports any output that contains a sequence and a protein accession column. 
| Parameter | Description |
| --- | --- |
| 🔴 max_step_size | Defines the maximal step size between two peptides to still be grouped to the same epitope. If the start positions of two peptides differ by that number, the peptides are only grouped together if they overlap by a minimum of min_overlap amino acids.|
| 🔴 min_overlap | Defines the minimal overlap between two epitopes to still be grouped to the same epitope, if the start positions of the epitopes differ more than max_step_size.|
| 🔴 min_epi_length | Defines the minimum epitope length. This is the minimal length a core epitope has to have. If for a epitope the whole sequence of the epitope is shorter than the minimum epitope length, the core will be defined as the whole sequence.| 
| 🔴 seq_column | Defines the column header in the input evidence file that contains the peptide sequences. |
| 🔴 protacc_column | Defines the column header in the input evidence file that contains the protein accessions of proteins that contain the peptide of the row. |
| [start_column] | This is an optional parameter. It defines the column header in the input evidence file that contains the start position of the peptide in the different proteins. Setting this parameter reduces the runtime. |
| [end_column] | This is an optional parameter. It defines the column header in the input evidence file that contains the end position of the peptide in the different proteins. Setting this parameter reduces the runtime.|
| [intensity_column] | This is an optional parameter. It defines the column header in the input evidence file of the column that contains the intensity of a peptide sequence. |
| 🔴 out_dir | Defines the directory in which the results will be saved. |
| [mod_pattern] | Defines how modifications of a peptide are separated from the sequence in the sequence column. Provide a comma-separated string here, where the element before the comma specifies the start of a modification and the element after the comma defines the end of a modification in the sequences of the sequence column. If the sequences in the sequence column include modifications they are separated by delimiters. In AAAPAIM/+15.99\SY for example the modification is separated by / and \ . The mod_pattern parameter should be  ```/,\``` in that case. All parts of a sequence inside () and [] are interpreted as modifications by default. If these delimiters are used in your input file, you do not need to provide a mod_delimiter parameter.|
| 🔴 delimiter | Defines the delimiter that separates multiple values in one cell in the input evidence file. |
| [report] | If set a [report](#reporthtml) gets generated.|
| [html] | If set to a html version of the generated plots gets computed.|
| 🟢 protacc | Defines the proteins for which the core epitopes and landscape should be visualized. Separate multiple parameters with commas. |

#### evidence file
The evidence file is the output file of a search engine. The following file types are supported: csv, tsv, xlsx.

#### proteome file
The proteome file should contain the proteome used for the identification of the peptide sequences. The file should follow the FASTA format. 


### Output files
The generate-epicore-csv command results in three csv files ([epitopes.csv](#epitopescsv), [epicore_result.csv](#epicore_resultcsv), [pep_cores_mapping.csv](#pep_cores_mappingcsv)), two plots ([epitope_intensity_his.svg](#epitope_intensity_histsvg), [length_distributions.svg](#length_distributionssvg)) and one optional html report. 

The plot-landscape command results in protein landscape visualizations. One example can be found [here](#landscape-visualization). The number of plots is defined by the number of accessions provided in the params.yaml file.

#### epitopes.csv
The csv contains one epitope per row. 
| column | description |
| --- | --- |
| whole_epitopes | The sequence of the entire epitope. |
| consensus_epitopes | The sequence of the core epitope. |
| landscape | The landscape of the epitope. |
| grouped_peptides_sequence | A list containing the peptide sequences that contribute to the epitope. |
| relative_core_intensity | The relative core intensity of the epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file.|
| core_epitopes_intensity | The total core intensity of the epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope.|
| accession | A list containing the accessions of proteins in which the epitope occurs. |


#### epicore_result.csv
The csv contains one protein per row. The different columns contain the following information: 
| column | description |
| --- | --- |
| accession | The protein accession. |
| sequence | A list of sequences of peptides mapped to the protein. |
| start | A list containing the start positions of the peptides in the protein. | 
| end | A list containing the end positions of the peptides in the protein. | 
| grouped peptides start | The start positions of all peptides grouped together to epitopes. |
| grouped peptides end | The end positions of all peptides grouped together to epitopes. | 
| grouped peptides sequence | The peptide sequences that contribute to the same epitope grouped together. |
| sequence group mapping | A list mapping each peptide onto it's epitope.| 
| core_epitopes_intensity | A list containing the intensity of each epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope.|
| relative_core_intensity | A list containing the relative intensity of each epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file. |
| landscape | A list containing the landscapes of each epitope. | 
| whole epitopes | A list containing the whole epitopes. | 
| core epitopes | A list containing the core epitopes. | 
| core epitopes start | A list containing the start positions of the cores in the protein. |
| core epitopes end |  A list containing the start positions of the cores in the protein. |

#### pep_cores_mapping.csv
The pep_cores_mapping.csv contains all the information from the initial evidence file. In addition there are the following columns:
| column | description |
| --- | --- |
| entire_epitope_sequence | A list of all sequences of epitopes to which the peptide of the row contributes.|
| core_epitope_sequence | A list of all core sequences of epitopes to which the peptide of the row contributes.|
| proteome_occurrence | A list containing protein accessions and sequence positions at which the core epitope occurs in the proteome. |
| consensus_epitope_intensity | A list containing the intensity of each consensus epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope. |
| relative_consensus_epitope_intensity | A list containing the relative intensity of each consensus epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file.|

#### epitope_intensity_hist.svg
The plot visualizes how many peptides contribute to a core epitope.
![An example epitope_intensity_hist plot](epitope_intensity_hist.svg)
#### length_distributions.svg
The plot visualizes the length distribution of the original peptides and the computed core epitopes. 
![An example length_distributions plot](length_distributions.svg)
#### report.html
The report file summarizes some of the results. Among other things it includes two histograms visualizing the peptide and epitope length distribution and shows the ten epitopes with the highest number of mapped peptides.

#### landscape visualization
An example landscape visualization of a protein generated with the plot-landscape command:
![An example landscape of the protein sp|P62736|ACTA_HUMAN](landscape_example.png)
The height indicates how many peptides are mapped to a position in the proteome. The different colors indicate different epitopes. Lighter areas of a color indicate how many peptides are associated with the epitope. The more intense region indicate the core epitope. 


## Workflow
1. Identification of the location of all peptides in the proteome.
2. Group peptides whose start position does not differ by more than max_step_size amino acids or whose overlap is larger than min_overlap. max_step_size and min_overlap are parameters that can be specified by the user.
3. Identify epitope sequences,V as the sequence of each peptide group.
4. For each peptide sequence, identify the core epitope sequence. The core epitope sequence is defined as the sequence region that has the highest peptide mapping count while having a minimum length of min_epi_length amino acids.

## Citation
Epicore is an adaption from the tool developed by Álvaro-Benito et al.[1].<br>
[1] Álvaro-Benito, Miguel, et al. "Quantification of HLA-DM-dependent major histocompatibility complex of class II immunopeptidomes by the peptide landscape antigenic epitope alignment utility." Frontiers in immunology 9 (2018): 872.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "epicore",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "peptides, epitopes, core epitopes",
    "author": null,
    "author_email": "Jana Hoffmann <epicore_jana@family-hoffmann.de>",
    "download_url": "https://files.pythonhosted.org/packages/3e/53/bac5a548c4258241a095c798259271bab8ab3eed63639c11455b7a4c3947/epicore-0.1.6.tar.gz",
    "platform": null,
    "description": "# epicore\nThis tool is an adaption from [plateau](https://plateau.bcp.fu-berlin.de/).\n\n## General purpose\nThe tool can be used to identify and quantify shared consensus epitopes. \n\n## Installation\n[![Install with pip](https://img.shields.io/badge/install%20with-pip-brightgreen?style=flat-square)](https://test.pypi.org/project/epicore/)\n```bash\npip install epicore\n```\n\n![Install with bioconda](https://img.shields.io/badge/install%20with-bioconda-blue?style=flat-square)\n```bash\nconda install bioconda::epicore\n```\n\n## How to use\nTo compute the consensus epitopes enter the following command:\n```bash\nepicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> generate-epicore-csv --min_epi_length <MIN_EPI_LENGTH> --min_overlap <MIN_OVERLAP> --max_step_size <MAX_STEP_SIZE> --seq_column <SEQ_COLUMN> --protacc_column <PROTACC_COLUMN> --delimiter <DELIMITER> [--intensity_column <INTENSITY_COLUMN> --start_column <START_COLUMN> --end_column <END_COLUMN> --mod_pattern <MOD_PATTERN> --report --html] --evidence_file <EVIDENCE_FILE>\n```\nReplace ```EVIDENCE_FILE``` with the path to your evidence file and ```PROTEOME_FILE``` with the path to the proteome FASTA file, that was used to generate the evidence file. You can find more detailed information about the input data [here](#input).  \n\nTo visualize the landscape of a protein you can use the following command:\n```bash\nepicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> plot-landscape --epicore_csv <EPICORE_RESULT> --protacc <PROTACC>\n```\nReplace ```EPICORE_RESULT``` with the file epicore_result.csv, which can be generated by using the generate-epicore-csv command. \n\n### Input \nThe description of each parameter can be found in the table below. Parameters enclosed in square brackets are optional. Parameters highlighted with \ud83d\udfe2 are necessary for the plot-landscape command. Parameters highlighted with \ud83d\udd34 are necessary for the generate-epicore-csv command. The tool supports any output that contains a sequence and a protein accession column. \n| Parameter | Description |\n| --- | --- |\n| \ud83d\udd34 max_step_size | Defines the maximal step size between two peptides to still be grouped to the same epitope. If the start positions of two peptides differ by that number, the peptides are only grouped together if they overlap by a minimum of min_overlap amino acids.|\n| \ud83d\udd34 min_overlap | Defines the minimal overlap between two epitopes to still be grouped to the same epitope, if the start positions of the epitopes differ more than max_step_size.|\n| \ud83d\udd34 min_epi_length | Defines the minimum epitope length. This is the minimal length a core epitope has to have. If for a epitope the whole sequence of the epitope is shorter than the minimum epitope length, the core will be defined as the whole sequence.| \n| \ud83d\udd34 seq_column | Defines the column header in the input evidence file that contains the peptide sequences. |\n| \ud83d\udd34 protacc_column | Defines the column header in the input evidence file that contains the protein accessions of proteins that contain the peptide of the row. |\n| [start_column] | This is an optional parameter. It defines the column header in the input evidence file that contains the start position of the peptide in the different proteins. Setting this parameter reduces the runtime. |\n| [end_column] | This is an optional parameter. It defines the column header in the input evidence file that contains the end position of the peptide in the different proteins. Setting this parameter reduces the runtime.|\n| [intensity_column] | This is an optional parameter. It defines the column header in the input evidence file of the column that contains the intensity of a peptide sequence. |\n| \ud83d\udd34 out_dir | Defines the directory in which the results will be saved. |\n| [mod_pattern] | Defines how modifications of a peptide are separated from the sequence in the sequence column. Provide a comma-separated string here, where the element before the comma specifies the start of a modification and the element after the comma defines the end of a modification in the sequences of the sequence column. If the sequences in the sequence column include modifications they are separated by delimiters. In AAAPAIM/+15.99\\SY for example the modification is separated by / and \\ . The mod_pattern parameter should be  ```/,\\``` in that case. All parts of a sequence inside () and [] are interpreted as modifications by default. If these delimiters are used in your input file, you do not need to provide a mod_delimiter parameter.|\n| \ud83d\udd34 delimiter | Defines the delimiter that separates multiple values in one cell in the input evidence file. |\n| [report] | If set a [report](#reporthtml) gets generated.|\n| [html] | If set to a html version of the generated plots gets computed.|\n| \ud83d\udfe2 protacc | Defines the proteins for which the core epitopes and landscape should be visualized. Separate multiple parameters with commas. |\n\n#### evidence file\nThe evidence file is the output file of a search engine. The following file types are supported: csv, tsv, xlsx.\n\n#### proteome file\nThe proteome file should contain the proteome used for the identification of the peptide sequences. The file should follow the FASTA format. \n\n\n### Output files\nThe generate-epicore-csv command results in three csv files ([epitopes.csv](#epitopescsv), [epicore_result.csv](#epicore_resultcsv), [pep_cores_mapping.csv](#pep_cores_mappingcsv)), two plots ([epitope_intensity_his.svg](#epitope_intensity_histsvg), [length_distributions.svg](#length_distributionssvg)) and one optional html report. \n\nThe plot-landscape command results in protein landscape visualizations. One example can be found [here](#landscape-visualization). The number of plots is defined by the number of accessions provided in the params.yaml file.\n\n#### epitopes.csv\nThe csv contains one epitope per row. \n| column | description |\n| --- | --- |\n| whole_epitopes | The sequence of the entire epitope. |\n| consensus_epitopes | The sequence of the core epitope. |\n| landscape | The landscape of the epitope. |\n| grouped_peptides_sequence | A list containing the peptide sequences that contribute to the epitope. |\n| relative_core_intensity | The relative core intensity of the epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file.|\n| core_epitopes_intensity | The total core intensity of the epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope.|\n| accession | A list containing the accessions of proteins in which the epitope occurs. |\n\n\n#### epicore_result.csv\nThe csv contains one protein per row. The different columns contain the following information: \n| column | description |\n| --- | --- |\n| accession | The protein accession. |\n| sequence | A list of sequences of peptides mapped to the protein. |\n| start | A list containing the start positions of the peptides in the protein. | \n| end | A list containing the end positions of the peptides in the protein. | \n| grouped peptides start | The start positions of all peptides grouped together to epitopes. |\n| grouped peptides end | The end positions of all peptides grouped together to epitopes. | \n| grouped peptides sequence | The peptide sequences that contribute to the same epitope grouped together. |\n| sequence group mapping | A list mapping each peptide onto it's epitope.| \n| core_epitopes_intensity | A list containing the intensity of each epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope.|\n| relative_core_intensity | A list containing the relative intensity of each epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file. |\n| landscape | A list containing the landscapes of each epitope. | \n| whole epitopes | A list containing the whole epitopes. | \n| core epitopes | A list containing the core epitopes. | \n| core epitopes start | A list containing the start positions of the cores in the protein. |\n| core epitopes end |  A list containing the start positions of the cores in the protein. |\n\n#### pep_cores_mapping.csv\nThe pep_cores_mapping.csv contains all the information from the initial evidence file. In addition there are the following columns:\n| column | description |\n| --- | --- |\n| entire_epitope_sequence | A list of all sequences of epitopes to which the peptide of the row contributes.|\n| core_epitope_sequence | A list of all core sequences of epitopes to which the peptide of the row contributes.|\n| proteome_occurrence | A list containing protein accessions and sequence positions at which the core epitope occurs in the proteome. |\n| consensus_epitope_intensity | A list containing the intensity of each consensus epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope. |\n| relative_consensus_epitope_intensity | A list containing the relative intensity of each consensus epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file.|\n\n#### epitope_intensity_hist.svg\nThe plot visualizes how many peptides contribute to a core epitope.\n![An example epitope_intensity_hist plot](epitope_intensity_hist.svg)\n#### length_distributions.svg\nThe plot visualizes the length distribution of the original peptides and the computed core epitopes. \n![An example length_distributions plot](length_distributions.svg)\n#### report.html\nThe report file summarizes some of the results. Among other things it includes two histograms visualizing the peptide and epitope length distribution and shows the ten epitopes with the highest number of mapped peptides.\n\n#### landscape visualization\nAn example landscape visualization of a protein generated with the plot-landscape command:\n![An example landscape of the protein sp|P62736|ACTA_HUMAN](landscape_example.png)\nThe height indicates how many peptides are mapped to a position in the proteome. The different colors indicate different epitopes. Lighter areas of a color indicate how many peptides are associated with the epitope. The more intense region indicate the core epitope. \n\n\n## Workflow\n1. Identification of the location of all peptides in the proteome.\n2. Group peptides whose start position does not differ by more than max_step_size amino acids or whose overlap is larger than min_overlap. max_step_size and min_overlap are parameters that can be specified by the user.\n3. Identify epitope sequences,V as the sequence of each peptide group.\n4. For each peptide sequence, identify the core epitope sequence. The core epitope sequence is defined as the sequence region that has the highest peptide mapping count while having a minimum length of min_epi_length amino acids.\n\n## Citation\nEpicore is an adaption from the tool developed by \u00c1lvaro-Benito et al.[1].<br>\n[1] \u00c1lvaro-Benito, Miguel, et al. \"Quantification of HLA-DM-dependent major histocompatibility complex of class II immunopeptidomes by the peptide landscape antigenic epitope alignment utility.\" Frontiers in immunology 9 (2018): 872.\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Compute core epitopes from multiple overlapping peptides.",
    "version": "0.1.6",
    "project_urls": {
        "Repository": "https://github.com/AG-Walz/epicore"
    },
    "split_keywords": [
        "peptides",
        " epitopes",
        " core epitopes"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "04804c843a25a210f3a45af5d590b9690cf0c0d4b6e28a615a6ee047664aeef0",
                "md5": "33021a68fcf07dea1c225ec875926490",
                "sha256": "7e6747d2142931cf6b8ddb848f123405c4ca4ff25b09ff876933d892ae2728c4"
            },
            "downloads": -1,
            "filename": "epicore-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "33021a68fcf07dea1c225ec875926490",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 24813,
            "upload_time": "2025-07-30T07:34:20",
            "upload_time_iso_8601": "2025-07-30T07:34:20.054394Z",
            "url": "https://files.pythonhosted.org/packages/04/80/4c843a25a210f3a45af5d590b9690cf0c0d4b6e28a615a6ee047664aeef0/epicore-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3e53bac5a548c4258241a095c798259271bab8ab3eed63639c11455b7a4c3947",
                "md5": "a754e282410fa0c7a393a6f797b84075",
                "sha256": "0835d7a18a3f948495aeabffd1d8539099337146f7979febfe4891fc58c4755d"
            },
            "downloads": -1,
            "filename": "epicore-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "a754e282410fa0c7a393a6f797b84075",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 21242,
            "upload_time": "2025-07-30T07:34:21",
            "upload_time_iso_8601": "2025-07-30T07:34:21.422699Z",
            "url": "https://files.pythonhosted.org/packages/3e/53/bac5a548c4258241a095c798259271bab8ab3eed63639c11455b7a4c3947/epicore-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-30 07:34:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AG-Walz",
    "github_project": "epicore",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "epicore"
}

None