# _De novo_ peptide sequencing with InstaNovo
[![PyPI version](https://badge.fury.io/py/instanovo.svg)](https://badge.fury.io/py/instanovo)
<!-- [![Tests Status](./reports/junit/tests-badge.svg?dummy=8484744)](./reports/junit/report.html) -->
<!-- [![Coverage Status](./reports/coverage/coverage-badge.svg?dummy=8484744)](./reports/coverage/index.html) -->
<a target="_blank" href="https://colab.research.google.com/github/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a>
<a target="_blank" href="https://kaggle.com/kernels/welcome?src=https://github.com/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb">
<img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open In Kaggle"/> </a>
The official code repository for InstaNovo. This repo contains the code for training and inference
of InstaNovo and InstaNovo+. InstaNovo is a transformer neural network with the ability to translate
fragment ion peaks into the sequence of amino acids that make up the studied peptide(s). InstaNovo+,
inspired by human intuition, is a multinomial diffusion model that further improves performance by
iterative refinement of predicted sequences.
![Graphical Abstract](https://raw.githubusercontent.com/instadeepai/InstaNovo/main/docs/assets/graphical_abstract.jpeg)
**Links:**
- bioRxiv:
[https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3](https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3)
- documentation:
[https://instadeepai.github.io/InstaNovo/](https://instadeepai.github.io/InstaNovo/)
**Developed by:**
- [InstaDeep](https://www.instadeep.com/)
- [The Department of Biotechnology and Biomedicine](https://orbit.dtu.dk/en/organisations/department-of-biotechnology-and-biomedicine) -
[Technical University of Denmark](https://www.dtu.dk/)
## Usage
### Installation
To use InstaNovo, we need to install the module via `pip`:
```bash
pip install instanovo
```
It is recommended to install InstaNovo in a fresh environment, such as Conda or PyEnv. For example,
if you have
[conda](https://docs.conda.io/en/latest/)/[miniconda](https://docs.conda.io/projects/miniconda/en/latest/)
installed:
```bash
conda env create -f environment.yml
conda activate instanovo
```
Note: InstaNovo is built for Python >= 3.8, <3.12 and tested on Linux and Windows.
### Training
To train auto-regressive InstaNovo using Hydra configs (see `--hydra-help` for more information):
```bash
usage: python -m instanovo.transformer.train [--config-name CONFIG_NAME]
Config options:
config-name Name of Hydra config in `/configs/`
Defaults to `instanovo_acpt`
```
Note: data is expected to be saved as Polars `.ipc` format. See section on data conversion.
To update the InstaNovo model config, modify the config file under
[configs/instanovo/base.yaml](https://github.com/instadeepai/InstaNovo/blob/main/configs/instanovo/base.yaml)
### Prediction
To get _de novo_ predictions from InstaNovo:
```bash
Usage: python -m instanovo.transformer.predict [--config-name CONFIG_NAME] data_path=path/to/data.mgf model_path=path/to/model.ckpt output_path=path/to/output.csv denovo=True
Predict with the model.
Options:
data_path Path to dataset to be evaluated. Must be specified
in config or cli. Allows `.mgf`, `.mzxml`, a directory,
or an `.ipc` file. Glob notation is supported: eg.:
`./experiment/*.mgf`
model_path Path to model to be used. Must be specified
in config or cli. Model must be a `.ckpt` output by the
training script.
output_path Path to output csv file.
config-name Name of Hydra config in `/configs/inference/`
Defaults to `default`
```
To evaluate InstaNovo performance on an annotated dataset:
```bash
Usage: python -m instanovo.transformer.predict [--config-name CONFIG_NAME] data_path=path/to/data.mgf model_path=path/to/model.ckpt denovo=False
Predict with the model.
Options:
data_path Path to dataset to be evaluated. Must be specified
in config or cli. Allows `.mgf`, `.mzxml`, a directory,
or an `.ipc` file. Glob notation is supported: eg.:
`./experiment/*.mgf`
model_path Path to model to be used. Must be specified
in config or cli. Model must be a `.ckpt` output by the
training script.
config-name Name of Hydra config in `/configs/inference/`
Defaults to `default`
```
The configuration file for inference may be found under
[/configs/inference/default.yaml](./configs/inference/default.yaml)
Note: the `denovo=True/False` flag controls whether metrics will be calculated.
### Spectrum Data Class
InstaNovo introduces a Spectrum Data Class: [SpectrumDataFrame](./instanovo/utils/data_handler.py).
This class acts as an interface between many common formats used for storing mass spectrometry,
including `.mgf`, `.mzml`, `.mzxml`, and `.csv`. This class also supports reading directly from
HuggingFace, Pandas, and Polars.
When using InstaNovo, these formats are natively supported and automatically converted to the
internal SpectrumDataFrame supported by InstaNovo for training and inference. Any data path may be
specified using [glob notation](<https://en.wikipedia.org/wiki/Glob_(programming)>). For example you
could use the following command to get _de novo_ predictions from all the files in the folder
`./experiment`:
```bash
python -m instanovo.transformer.predict data_path=./experiment/*.mgf
```
Alternatively, a list of files may be specified in the
[inference config](./configs/inference/default.yaml).
The SpectrumDataFrame also allows for loading of much larger datasets in a lazy way. To do this, the
data is loaded and stored as [`.parquet`](https://docs.pola.rs/user-guide/io/parquet/) files in a
temporary directory. Alternatively, the data may be saved permanently natively as `.parquet` for
optimal loading.
**Example usage:**
Converting mgf files to the native format:
```python
from instanovo.utils import SpectrumDataFrame
# Convert mgf files native parquet:
sdf = SpectrumDataFrame.load("/path/to/data.mgf", lazy=False, is_annotated=True)
sdf.save("path/to/parquet/folder", partition="train", chunk_size=1e6)
```
Loading the native format in shuffle mode:
```python
# Load a native parquet dataset:
sdf = SpectrumDataFrame.load("path/to/parquet/folder", partition="train", shuffle=True, lazy=True, is_annotated=True)
```
Using the loaded SpectrumDataFrame in a PyTorch DataLoader:
```python
from instanovo.transformer.dataset import SpectrumDataset
from torch.utils.data import DataLoader
ds = SpectrumDataset(sdf)
# Note: Shuffle and workers is handled by the SpectrumDataFrame
dl = DataLoader(
ds,
collate_fn=SpectrumDataset.collate_batch,
shuffle=False,
num_workers=0,
)
```
Some more examples using the SpectrumDataFrame:
```python
sdf = SpectrumDataFrame.load("/path/to/experiment/*.mzml", lazy=True)
# Remove rows with a charge value > 3:
sdf.filter_rows(lambda row: row["precursor_charge"]<=2)
# Sample a subset of the data:
sdf.sample_subset(fraction=0.5, seed=42)
# Convert to pandas
df = sdf.to_pandas() # Returns a pd.DataFrame
# Convert to polars LazyFrame
lazy_df = sdf.to_polars(return_lazy=True) # Returns a pl.LazyFrame
# Save as an `.mgf` file
sdf.write_mgf("path/to/output.mgf")
```
**Additional Features:**
- The SpectrumDataFrame supports lazy loading with asynchronous prefetching, mitigating wait times
between files.
- Filtering and sampling may be performed non-destructively through on file loading
- A two-fold shuffling strategy is introduced to optimise sampling during training (shuffling files
and shuffling within files).
### Using your own datasets
To use your own datasets, you simply need to tabulate your data in either
[Pandas](https://pandas.pydata.org/) or [Polars](https://www.pola.rs/) with the following schema:
The dataset is tabular, where each row corresponds to a labelled MS2 spectra.
- `sequence (string)` \
The target peptide sequence including post-translational modifications
- `modified_sequence (string) [legacy]` \
The target peptide sequence including post-translational modifications
- `precursor_mz (float64)` \
The mass-to-charge of the precursor (from MS1)
- `charge (int64)` \
The charge of the precursor (from MS1)
- `mz_array (list[float64])` \
The mass-to-charge values of the MS2 spectrum
- `intensity_array (list[float32])` \
The intensity values of the MS2 spectrum
For example, the DataFrame for the
[nine species benchmark](https://huggingface.co/datasets/InstaDeepAI/ms_ninespecies_benchmark)
dataset (introduced in [Tran _et al._ 2017](https://www.pnas.org/doi/full/10.1073/pnas.1705691114))
looks as follows:
| | sequence | modified_sequence | precursor_mz | precursor_charge | mz_array | intensity_array |
| --: | :------------------------- | :------------------------- | -----------: | ---------------: | :----------------------------------- | :---------------------------------- |
| 0 | GRVEGMEAR | GRVEGMEAR | 335.502 | 3 | [102.05527 104.052956 113.07079 ...] | [ 767.38837 2324.8787 598.8512 ...] |
| 1 | IGEYK | IGEYK | 305.165 | 2 | [107.07023 110.071236 111.11693 ...] | [ 1055.4957 2251.3171 35508.96 ...] |
| 2 | GVSREEIQR | GVSREEIQR | 358.528 | 3 | [103.039444 109.59844 112.08704 ...] | [801.19995 460.65268 808.3431 ...] |
| 3 | SSYHADEQVNEASK | SSYHADEQVNEASK | 522.234 | 3 | [101.07095 102.0552 110.07163 ...] | [ 989.45154 2332.653 1170.6191 ...] |
| 4 | DTFNTSSTSN(+.98)STSSSSSNSK | DTFNTSSTSN(+.98)STSSSSSNSK | 676.282 | 3 | [119.82458 120.08073 120.2038 ...] | [ 487.86942 4806.1377 516.8846 ...] |
For _de novo_ prediction, the `sequence` column is not required.
We also provide a conversion script for converting to native SpectrumDataFrame (sdf) format:
```bash
usage: python -m instanovo.utils.convert_to_sdf source target [-h] [--is_annotated IS_ANNOTATED] [--name NAME] [--partition {train,valid,test}] [--shard_size SHARD_SIZE] [--max_charge MAX_CHARGE]
positional arguments:
source source file(s)
target target folder to save data shards
options:
-h, --help show this help message and exit
--is_annotated IS_ANNOTATED
whether dataset is annotated
--name NAME name of saved dataset
--partition {train,valid,test}
partition of saved dataset
--shard_size SHARD_SIZE
length of saved data shards
--max_charge MAX_CHARGE
maximum charge to filter out
```
_Note: the target path should be a folder._
<!-- ## Roadmap
This code repo is currently under construction. -->
**ToDo:**
- Multi-GPU support
## License
Code is licensed under the Apache License, Version 2.0 (see [LICENSE](LICENSE.md))
The model checkpoints are licensed under Creative Commons Non-Commercial
([CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/))
## BibTeX entry and citation info
```bibtex
@article{eloff_kalogeropoulos_2024_instanovo,
title = {De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments},
author = {Kevin Eloff and Konstantinos Kalogeropoulos and Oliver Morell and Amandla Mabona and Jakob Berg Jespersen and Wesley Williams and Sam van Beljouw and Marcin Skwark and Andreas Hougaard Laustsen and Stan J. J. Brouns and Anne Ljungars and Erwin Marten Schoof and Jeroen Van Goey and Ulrich auf dem Keller and Karim Beguir and Nicolas Lopez Carranza and Timothy Patrick Jenkins},
year = {2024},
doi = {10.1101/2023.08.30.555055},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3},
journal = {bioRxiv}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "instanovo",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "mass spectrometry, bioinformatics, machine learning, deep learning, transformer, de novo peptide sequencing",
"author": null,
"author_email": "InstaDeep Ltd <hello@instadeep.com>",
"download_url": "https://files.pythonhosted.org/packages/12/a5/e97bec7c26bdd165f23108cb6c048ff50e75c3eced8b08b052c5e0d7ac25/instanovo-1.0.0.tar.gz",
"platform": null,
"description": "# _De novo_ peptide sequencing with InstaNovo\n\n[![PyPI version](https://badge.fury.io/py/instanovo.svg)](https://badge.fury.io/py/instanovo)\n\n<!-- [![Tests Status](./reports/junit/tests-badge.svg?dummy=8484744)](./reports/junit/report.html) -->\n<!-- [![Coverage Status](./reports/coverage/coverage-badge.svg?dummy=8484744)](./reports/coverage/index.html) -->\n<a target=\"_blank\" href=\"https://colab.research.google.com/github/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb\">\n<img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> </a>\n<a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/instadeepai/InstaNovo/blob/main/notebooks/getting_started_with_instanovo.ipynb\">\n<img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open In Kaggle\"/> </a>\n\nThe official code repository for InstaNovo. This repo contains the code for training and inference\nof InstaNovo and InstaNovo+. InstaNovo is a transformer neural network with the ability to translate\nfragment ion peaks into the sequence of amino acids that make up the studied peptide(s). InstaNovo+,\ninspired by human intuition, is a multinomial diffusion model that further improves performance by\niterative refinement of predicted sequences.\n\n![Graphical Abstract](https://raw.githubusercontent.com/instadeepai/InstaNovo/main/docs/assets/graphical_abstract.jpeg)\n\n**Links:**\n\n- bioRxiv:\n [https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3](https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3)\n- documentation:\n [https://instadeepai.github.io/InstaNovo/](https://instadeepai.github.io/InstaNovo/)\n\n**Developed by:**\n\n- [InstaDeep](https://www.instadeep.com/)\n- [The Department of Biotechnology and Biomedicine](https://orbit.dtu.dk/en/organisations/department-of-biotechnology-and-biomedicine) -\n [Technical University of Denmark](https://www.dtu.dk/)\n\n## Usage\n\n### Installation\n\nTo use InstaNovo, we need to install the module via `pip`:\n\n```bash\npip install instanovo\n```\n\nIt is recommended to install InstaNovo in a fresh environment, such as Conda or PyEnv. For example,\nif you have\n[conda](https://docs.conda.io/en/latest/)/[miniconda](https://docs.conda.io/projects/miniconda/en/latest/)\ninstalled:\n\n```bash\nconda env create -f environment.yml\nconda activate instanovo\n```\n\nNote: InstaNovo is built for Python >= 3.8, <3.12 and tested on Linux and Windows.\n\n### Training\n\nTo train auto-regressive InstaNovo using Hydra configs (see `--hydra-help` for more information):\n\n```bash\nusage: python -m instanovo.transformer.train [--config-name CONFIG_NAME]\n\nConfig options:\n config-name Name of Hydra config in `/configs/`\n Defaults to `instanovo_acpt`\n```\n\nNote: data is expected to be saved as Polars `.ipc` format. See section on data conversion.\n\nTo update the InstaNovo model config, modify the config file under\n[configs/instanovo/base.yaml](https://github.com/instadeepai/InstaNovo/blob/main/configs/instanovo/base.yaml)\n\n### Prediction\n\nTo get _de novo_ predictions from InstaNovo:\n\n```bash\nUsage: python -m instanovo.transformer.predict [--config-name CONFIG_NAME] data_path=path/to/data.mgf model_path=path/to/model.ckpt output_path=path/to/output.csv denovo=True\n\n Predict with the model.\n\nOptions:\n data_path Path to dataset to be evaluated. Must be specified\n in config or cli. Allows `.mgf`, `.mzxml`, a directory,\n or an `.ipc` file. Glob notation is supported: eg.:\n `./experiment/*.mgf`\n model_path Path to model to be used. Must be specified\n in config or cli. Model must be a `.ckpt` output by the\n training script.\n output_path Path to output csv file.\n config-name Name of Hydra config in `/configs/inference/`\n Defaults to `default`\n```\n\nTo evaluate InstaNovo performance on an annotated dataset:\n\n```bash\nUsage: python -m instanovo.transformer.predict [--config-name CONFIG_NAME] data_path=path/to/data.mgf model_path=path/to/model.ckpt denovo=False\n\n Predict with the model.\n\nOptions:\n data_path Path to dataset to be evaluated. Must be specified\n in config or cli. Allows `.mgf`, `.mzxml`, a directory,\n or an `.ipc` file. Glob notation is supported: eg.:\n `./experiment/*.mgf`\n model_path Path to model to be used. Must be specified\n in config or cli. Model must be a `.ckpt` output by the\n training script.\n config-name Name of Hydra config in `/configs/inference/`\n Defaults to `default`\n```\n\nThe configuration file for inference may be found under\n[/configs/inference/default.yaml](./configs/inference/default.yaml)\n\nNote: the `denovo=True/False` flag controls whether metrics will be calculated.\n\n### Spectrum Data Class\n\nInstaNovo introduces a Spectrum Data Class: [SpectrumDataFrame](./instanovo/utils/data_handler.py).\nThis class acts as an interface between many common formats used for storing mass spectrometry,\nincluding `.mgf`, `.mzml`, `.mzxml`, and `.csv`. This class also supports reading directly from\nHuggingFace, Pandas, and Polars.\n\nWhen using InstaNovo, these formats are natively supported and automatically converted to the\ninternal SpectrumDataFrame supported by InstaNovo for training and inference. Any data path may be\nspecified using [glob notation](<https://en.wikipedia.org/wiki/Glob_(programming)>). For example you\ncould use the following command to get _de novo_ predictions from all the files in the folder\n`./experiment`:\n\n```bash\npython -m instanovo.transformer.predict data_path=./experiment/*.mgf\n```\n\nAlternatively, a list of files may be specified in the\n[inference config](./configs/inference/default.yaml).\n\nThe SpectrumDataFrame also allows for loading of much larger datasets in a lazy way. To do this, the\ndata is loaded and stored as [`.parquet`](https://docs.pola.rs/user-guide/io/parquet/) files in a\ntemporary directory. Alternatively, the data may be saved permanently natively as `.parquet` for\noptimal loading.\n\n**Example usage:**\n\nConverting mgf files to the native format:\n\n```python\nfrom instanovo.utils import SpectrumDataFrame\n\n# Convert mgf files native parquet:\nsdf = SpectrumDataFrame.load(\"/path/to/data.mgf\", lazy=False, is_annotated=True)\nsdf.save(\"path/to/parquet/folder\", partition=\"train\", chunk_size=1e6)\n```\n\nLoading the native format in shuffle mode:\n\n```python\n# Load a native parquet dataset:\nsdf = SpectrumDataFrame.load(\"path/to/parquet/folder\", partition=\"train\", shuffle=True, lazy=True, is_annotated=True)\n```\n\nUsing the loaded SpectrumDataFrame in a PyTorch DataLoader:\n\n```python\nfrom instanovo.transformer.dataset import SpectrumDataset\nfrom torch.utils.data import DataLoader\n\nds = SpectrumDataset(sdf)\n# Note: Shuffle and workers is handled by the SpectrumDataFrame\ndl = DataLoader(\n ds,\n collate_fn=SpectrumDataset.collate_batch,\n shuffle=False,\n num_workers=0,\n)\n```\n\nSome more examples using the SpectrumDataFrame:\n\n```python\nsdf = SpectrumDataFrame.load(\"/path/to/experiment/*.mzml\", lazy=True)\n\n# Remove rows with a charge value > 3:\nsdf.filter_rows(lambda row: row[\"precursor_charge\"]<=2)\n\n# Sample a subset of the data:\nsdf.sample_subset(fraction=0.5, seed=42)\n\n# Convert to pandas\ndf = sdf.to_pandas() # Returns a pd.DataFrame\n\n# Convert to polars LazyFrame\nlazy_df = sdf.to_polars(return_lazy=True) # Returns a pl.LazyFrame\n\n# Save as an `.mgf` file\nsdf.write_mgf(\"path/to/output.mgf\")\n```\n\n**Additional Features:**\n\n- The SpectrumDataFrame supports lazy loading with asynchronous prefetching, mitigating wait times\n between files.\n- Filtering and sampling may be performed non-destructively through on file loading\n- A two-fold shuffling strategy is introduced to optimise sampling during training (shuffling files\n and shuffling within files).\n\n### Using your own datasets\n\nTo use your own datasets, you simply need to tabulate your data in either\n[Pandas](https://pandas.pydata.org/) or [Polars](https://www.pola.rs/) with the following schema:\n\nThe dataset is tabular, where each row corresponds to a labelled MS2 spectra.\n\n- `sequence (string)` \\\n The target peptide sequence including post-translational modifications\n- `modified_sequence (string) [legacy]` \\\n The target peptide sequence including post-translational modifications\n- `precursor_mz (float64)` \\\n The mass-to-charge of the precursor (from MS1)\n- `charge (int64)` \\\n The charge of the precursor (from MS1)\n- `mz_array (list[float64])` \\\n The mass-to-charge values of the MS2 spectrum\n- `intensity_array (list[float32])` \\\n The intensity values of the MS2 spectrum\n\nFor example, the DataFrame for the\n[nine species benchmark](https://huggingface.co/datasets/InstaDeepAI/ms_ninespecies_benchmark)\ndataset (introduced in [Tran _et al._ 2017](https://www.pnas.org/doi/full/10.1073/pnas.1705691114))\nlooks as follows:\n\n| | sequence | modified_sequence | precursor_mz | precursor_charge | mz_array | intensity_array |\n| --: | :------------------------- | :------------------------- | -----------: | ---------------: | :----------------------------------- | :---------------------------------- |\n| 0 | GRVEGMEAR | GRVEGMEAR | 335.502 | 3 | [102.05527 104.052956 113.07079 ...] | [ 767.38837 2324.8787 598.8512 ...] |\n| 1 | IGEYK | IGEYK | 305.165 | 2 | [107.07023 110.071236 111.11693 ...] | [ 1055.4957 2251.3171 35508.96 ...] |\n| 2 | GVSREEIQR | GVSREEIQR | 358.528 | 3 | [103.039444 109.59844 112.08704 ...] | [801.19995 460.65268 808.3431 ...] |\n| 3 | SSYHADEQVNEASK | SSYHADEQVNEASK | 522.234 | 3 | [101.07095 102.0552 110.07163 ...] | [ 989.45154 2332.653 1170.6191 ...] |\n| 4 | DTFNTSSTSN(+.98)STSSSSSNSK | DTFNTSSTSN(+.98)STSSSSSNSK | 676.282 | 3 | [119.82458 120.08073 120.2038 ...] | [ 487.86942 4806.1377 516.8846 ...] |\n\nFor _de novo_ prediction, the `sequence` column is not required.\n\nWe also provide a conversion script for converting to native SpectrumDataFrame (sdf) format:\n\n```bash\nusage: python -m instanovo.utils.convert_to_sdf source target [-h] [--is_annotated IS_ANNOTATED] [--name NAME] [--partition {train,valid,test}] [--shard_size SHARD_SIZE] [--max_charge MAX_CHARGE]\n\npositional arguments:\n source source file(s)\n target target folder to save data shards\n\noptions:\n -h, --help show this help message and exit\n --is_annotated IS_ANNOTATED\n whether dataset is annotated\n --name NAME name of saved dataset\n --partition {train,valid,test}\n partition of saved dataset\n --shard_size SHARD_SIZE\n length of saved data shards\n --max_charge MAX_CHARGE\n maximum charge to filter out\n```\n\n_Note: the target path should be a folder._\n\n<!-- ## Roadmap\n\nThis code repo is currently under construction. -->\n\n**ToDo:**\n\n- Multi-GPU support\n\n## License\n\nCode is licensed under the Apache License, Version 2.0 (see [LICENSE](LICENSE.md))\n\nThe model checkpoints are licensed under Creative Commons Non-Commercial\n([CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/))\n\n## BibTeX entry and citation info\n\n```bibtex\n@article{eloff_kalogeropoulos_2024_instanovo,\n\ttitle = {De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments},\n\tauthor = {Kevin Eloff and Konstantinos Kalogeropoulos and Oliver Morell and Amandla Mabona and Jakob Berg Jespersen and Wesley Williams and Sam van Beljouw and Marcin Skwark and Andreas Hougaard Laustsen and Stan J. J. Brouns and Anne Ljungars and Erwin Marten Schoof and Jeroen Van Goey and Ulrich auf dem Keller and Karim Beguir and Nicolas Lopez Carranza and Timothy Patrick Jenkins},\n\tyear = {2024},\n\tdoi = {10.1101/2023.08.30.555055},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tURL = {https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3},\n\tjournal = {bioRxiv}\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "De novo sequencing with InstaNovo",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/instadeepai/InstaNovo",
"Issues": "https://github.com/instadeepai/InstaNovo/issues"
},
"split_keywords": [
"mass spectrometry",
" bioinformatics",
" machine learning",
" deep learning",
" transformer",
" de novo peptide sequencing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "10f20f121a26c0b35fa9f962a68864afc75925295b909271e2ad18f2089831a6",
"md5": "bf51e812f6e9465096da82d10364c83f",
"sha256": "f0a1193c2f576cd4f89d3623a7f0bcb42e1031caad2ca2b3b51a8a7a66eb4b41"
},
"downloads": -1,
"filename": "instanovo-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bf51e812f6e9465096da82d10364c83f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 79386,
"upload_time": "2024-10-09T16:43:34",
"upload_time_iso_8601": "2024-10-09T16:43:34.708127Z",
"url": "https://files.pythonhosted.org/packages/10/f2/0f121a26c0b35fa9f962a68864afc75925295b909271e2ad18f2089831a6/instanovo-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "12a5e97bec7c26bdd165f23108cb6c048ff50e75c3eced8b08b052c5e0d7ac25",
"md5": "f1ada95af2453701d06c66280ed137ff",
"sha256": "fd9cfc377d9f8da5272f96b2eb4c14c08b579d7a65466aa402601ec6c4b42672"
},
"downloads": -1,
"filename": "instanovo-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "f1ada95af2453701d06c66280ed137ff",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 70720,
"upload_time": "2024-10-09T16:43:36",
"upload_time_iso_8601": "2024-10-09T16:43:36.442259Z",
"url": "https://files.pythonhosted.org/packages/12/a5/e97bec7c26bdd165f23108cb6c048ff50e75c3eced8b08b052c5e0d7ac25/instanovo-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-09 16:43:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "instadeepai",
"github_project": "InstaNovo",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "instanovo"
}