calisp


Namecalisp JSON
Version 3.1 PyPI version JSON
download
home_pagehttps://github.com/kinestetika/Calisp
SummaryIsotope analysis of proteomics data
upload_time2025-01-20 20:08:06
maintainerNone
docs_urlNone
authorMarc Strous
requires_python>=3.6
licenseMIT
keywords proteomics isotope mass-spectrometry 13c
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Calisp.py, version 3.1

Calisp.py (Calgary approach to isotopes in proteomics) is a program that estimates isotopic composition (e.g. 13C/12C,
delta13C, 15N/14N etc) of peptides from proteomics mass spectrometry data. Input data consist of mzML files and 
files with peptide spectrum matches.

Calisp was originally developed in Java. This newer, python version is much more concise, it consists of only six .py 
files, ~1,000 lines of code, compared to about fifty files and ~10,000 lines of code for the Java program. In addition,
the Java program depends on another program, mcl, whereas calisp.py is purely python, which is easy to install, see below.
The conciseness of the code makes the python version more transparent, easier to maintain and easier to further develop. 
I consider calisp.py the successor of the Java version. One of the reasons to shift to python is the possibility
to more effectively develop machine learning approaches to filter out noisy isotopic patterns. Future work will explore
that possibility.

Benchmarking of Calisp.py has been completed. It works well, benchmarking procedures and outcomes are shared in the 
"benchmarking" folder. Parsing of .mzid files still needs to be implemented but Thermo's spectrum protein match file format
 is supported.

Calisp.py depends on numpy, scipy, pandas, tqdm, [pymzml](https://pymzml.readthedocs.io/en/latest/intro.html), pyarrow. 
These will be installed automatically by the pip command below. 
Calisp outputs the data as a Pandas DataFrame saved in (binary) [feather](https://arrow.apache.org/docs/python/feather.html) format.
Each row contains a single isotopic pattern, with column definitions listed below.
Calisp version 3.1 provides two small utility scripts to further process those data:

calisp_filter_patterns      filters out noisy patterns and converts the .feather file to a .csv file
calisp_compute_medians      reads the .csv file and outputs summary statistics for each protein or bin

See below for more details on these scripts. It is possible that you need more data wrangling than provided by the scripts.
In that case, you can for example explore and visualize the results in a [Jupyter notebook](https://jupyter.org/). To get
started you can explore the code of the two scripts as well as the two benchmarking notebooks, which are in the benchmarking folder.

Compared to previous versions of calisp, the workflow has been simplified. Calisp.py itself does not filter out any isotopic
patterns, or adds up isotopic patterns to reduce noise - like the Java version does. It simply estimates the ratio for 
the target isotopes (e.g. 13C/12C) for every isotopic pattern it can subsample. It estimates this ratio based on neutron 
abundance and using fast fourier transforms. The former applies to stable isotope probing experiments. The latter applies 
to natural abundances, or to isotope probing experiments with very little added label (e.g. using substrates with <1% 
additional 13C). The motivation for omitting filtering from the main calisp program is that keeping all subsampled isotopic
patterns, including bad ones, will enable training of machine learning classifiers. Also, because it was shown that the
median provides better estimates for species in microbial communities than the mean, adding up isotopic patterns to improve
precision has lost its purpose. There is more power (and sensitivity) in numbers.

Because no data are filtered out and no isotopic patterns get added up, calisp.py, analyzes at least ten times as many 
isotopic patterns compared to the Java version. That means calisp.py is about ten times slower, it takes about 5-10 min 
per .mzML file on a Desktop computer.

## Installation

>python -m virtualenv calisp

>source calisp/bin/activate

>pip install --upgrade calisp

If you would like to explore calisp results in Jupyter notebooks, run the following command instead:

>pip install --upgrade calisp jupyter matplotlib jinja2

## Usage

>source calisp/bin/activate

>calisp.py --spectrum_file X --peptide_file Y --output_file Z

 List of all possible arguments:
 ```
    --spectrum_file             path to .mzML file or dir with .mzML files
    --peptide_file              path to .peptideSpectrumMatch file or dir with .PeptideSpectrumMatch files
    --output_file               dir where calisp.py will save results files
    --threads                   # of threads used, default 4
    --isotope                   13C, 14C, 15N, 17O, 18O, 2H, 3H, 33S, 34S or 36S, default 13C
    --bin_delimiter             character that separates the bin ID from the remainder of protein IDs, default '_'
    --mass_accuracy             accuracy of peak m/z identifications, default 10 ppm
    --isotope_abundance_matrix  path to file with isotope matrix, a default file is included with calisp
    --compute_clumps            use only if you want to compute clumpiness
 ```

## Column names of the Pandas DataFrame created by calisp.py:

In the saved dataframe, each row contains one isotopic pattern, defined by the following columns:
```
experiment             filename of the peptide spectrum match (psm) file
ms_run                 filename of the .mzml file
bins                   bin/mag ids, separated by commas. Calisp expects the protein ids in the psm 
                       file to consist of two parts, separated by a delimiter (_ by default). The 
                       first part is the bin/mag id, the second part the protein id
proteins               the ids of the proteins associated with the pattern (without the bin id)
peptide                the aminoacid sequence of the peptide
peptide_mass           the mass of the peptide
C                       # of carbon atoms in the peptide
N                       # of nitrogen atoms in the peptide
O                       # of oxygen atoms in the peptide
H                       # of hydrogen atoms in the peptide
S                       # of sulfur atoms in the peptide
psm_id                  psm id
psm_mz                  psm m over z
psm_charge              psm charge
psm_neutrons            number of neutrons inferred from custom 'neutron' modifications 
psm_rank                rank of the psm
psm_precursor_id        id of the ms1 spectrum that was the source of the psm 
psm_precursor_mz        mass over charge of the precursor of the psm
pattern_charge          charge of the pattern
pattern_precursor_id    id of the ms1 spectrum that was the source of the pattern
pattern_total_intensity total intensity of the pattern
pattern_peak_count      # of peaks in the pattern
pattern_median_peak_spacing medium mass difference between a pattern's peaks
spectrum_mass_irregularity  a measure for the standard deviation in the mass difference between a
                            pattern's peaks
ratio_na                the estimated isotope ratio inferred from neutron abundance (sip 
                        experiments) 
ratio_fft               the estimated isotope ratio inferred by the fft method (natural 
                        isotope abundances)
error_fft               the remaining error after fitting the pattern with fft
error_clumpy            the remaining error after fitting the pattern with the clumpy carbon 
                        method
flag_peptide_contains_sulfur true if peptide contains sulfur
flag_peptide_has_modifications true if peptide has no modifications
flag_peptide_assigned_to_multiple_bins true if peptide is associated with multiple proteins from 
                                       different bins/mags
flag_peptide_assigned_to_multiple_proteins true if peptide is associated with multiple proteins
flag_peptide_mass_and_elements_undefined true if peptide has unknown mass and elemental 
                                         composition
flag_psm_has_low_confidence true if psm was flagged as having low confidence (peptide identity 
                            uncertain)
flag_psm_is_ambiguous   true if psm could not be assigned with certainty
flag_pattern_is_contaminated true if multiple patterns have one or more shared peaks
flag_pattern_is_wobbly true if pattern_median_peak_spacing exceeds a treshold
flag_peak_at_minus_one_pos  true if a peak was detected immediately before the monoisotopic peak,
                            could indicate overlap with another pattern
i0 - i19                the intensities of the first 20 peaks of the pattern  
m0 - m19                the masses of the first 20 peaks of the pattern
c1 - c6                 contributions of clumps of 1-6 carbon to ratio_na. These are the 
                        outcomes of the clumpy carbon
                        model. These results are only meaningful if the biomass was labeled to 
                        saturation.
```
## Using a custom isotope abundance matrix

When estimating isotopic content for nitrogen, oxygen, hydrogen and sulfur, the estimates will be strongly
affected by isotope abundances of carbon. For example, often biomass is slightly depleted in 13C compared to
the inorganic reference (Vienna Pee Dee Belemnite). However, Calisp will assume the assumed 13C content to be
correct and will compensate the difference by reducing the content of the target isotope, which may result in
estimating a negative content for the target isotope (which is physically impossible). To overcome this issue,
you can provide a custom isotope matrix with the actual or estimated 13C content of your samples (or other changes
you may wish to make). The isotope matrix file should be formatted as follows, with elements on rows and isotopes
(+0, +1, +2, ... extra neutrons) on columns:
```
0.988943414833479 0.011056585166521 0.0         0.0 0.0    0.0 0.0 # C
0.996323567       0.003676433       0.0         0.0 0.0    0.0 0.0 # N
0.997574195       0.00038           0.002045805 0.0 0.0    0.0 0.0 # O
0.99988           0.00012           0.0         0.0 0.0    0.0 0.0 # H
0.9493            0.0076            0.0429      0.0 0.0002 0.0 0.0 # S
# VPDB standard 13C/12C = 0.0111802 in Isodat software
# see also https://www.webelements.com/sulfur/isotopes.html
# see also http://iupac.org/publications/pac/pdf/2003/pdf/7506x0683.pdf
```
## calisp_filter_patterns

Use this script after running calisp to filter the detected patterns and to convert your data from binary .feather
format to a .csv file that can be imported into a spreadsheet program.

Example usage: calisp_filter_patterns --SIF --result_file [calisp result file or dir]

You need to specify --SIP or --SIF.

Arguments:
```
    --result_file       The .feather file generated by calisp, or a dir containing .feather files.
    --SIF               Apply benchmarked filters for stable isotope fingerprinting data.
    --SIP               Apply benchmarked filters for stable isotope probing data.
    --CRAP              Remove proteins in the CRAP database.
    --PROTEIN           If you plan to do a per-protein analysis, you may want to filter out patterns
                        assigned to more than one protein
    --flags             (Expert use:) Specify a custom list of flags for filtering patterns. Options:
                            flag_psm_has_low_confidence*
                            flag_psm_is_ambiguous*
                            flag_pattern_is_contaminated*$
                            flag_pattern_is_wobbly*
                            flag_peptide_assigned_to_multiple_proteins*
                            flag_peptide_assigned_to_multiple_bins*$
                            flag_peptide_mass_and_elements_undefined*$
                            flag_peptide_has_modifications
                            flag_peptide_contains_sulfur
                        Flags with * are default when using --SIP. Flags with $ are default when
                        using --SIF
    --max_fft_error     (Expert use:) Specify a custom maximum value for the allowable fft error.
                        This only makes sense for SIF data. The benchmarked default when using
                        --SIF is 0.001.
```

## calisp_compute_medians

Use this script after running calisp_filter_patterns to compute medians and oter statistics. These will be written to
a stats.csv file that can be imported into a spreadsheet program.

Example usage: calisp_compute_medians --SIF --result_file [.filtered.csv file or dir with .filtered.csv files]

You need to specify --SIP or --SIF.

Arguments:
```
    --result_file               The .filtered.csv file generated by calisp_filter_patterns, or
                                a dir containing such files.
    --SIF                       Compute delta values for stable isotope fingerprinting data.
    --SIP                       Compute fractions (e.g. 13C/12C) for stable isotope probing data.
    --PROTEIN                   Only use this to calculate stats per protein instead of per bin
    --isotope                   13C, 14C, 15N, 17O, 18O, 2H, 3H, 33S, 34S or 36S, default 13C
                                (only needed for delta calculations for SIF)
    --isotope_abundance_matrix  Path to file with isotope matrix, a default file is included with
                                calisp (only needed for delta calculations, SIF)
    --vocabulary_file           Use this if you want to replace bin names, protein names or
                                reassign patterns to bins or proteins (see below)
```

## The Vocabulary File

The file should consist of rows, with each row defining a change to the data. Each row should have four values, separated by
a single tab. The first and third values should be "bins", "proteins", "experiment" or "ms_run". here are a few examples:

### Example 1

In this example, changes are purely cosmetic, we replace abstract bin identifiers with a more meaningful taxonomy for
visualization:
```
bins	bin_245 bins	Prochlorococcus
...
```

### Example 2

In this example, we have two bins that are both Prochlorococcus. We are interested to plot the assimilation of 13C by
Prochlorococcus but we have two Prochlorococcus bins. Using the following vocabulary line, these two bins will both be
renamed to Prochlorococcus and will thus be treated as one:

```
bins	bin_245 bins	Prochlorococcus
bins	bin_395 bins	Prochlorococcus
```

### Example 3

We have a dataset, but never added any bin identifiers to protein ids. But we do want to compute median values by bin. We add a
line for each protein, assigning it to a bin:

```
proteins    Q72K23  bins    Azoarcus
proteins    Q72H45  bins    Azoarcus
proteins    Q72KG1  bins    Azoarcus
...
proteins    XD0001  bins    Nitrosomonas
proteins    XD0002  bins    Nitrosomonas
proteins    XD0003  bins    Nitrosomonas
...

### Example 4

We have analysed multiple replicates for each experiment and want to aggregate the results across the replicates:

```
ms_run  run_a1  ms_run  time_point_1
ms_run  run_a2  ms_run  time_point_1
...
ms_run  run_b1  ms_run  time_point_2
ms_run  run_b2  ms_run  time_point_2

```

## Please cite

Calisp was developed using [PyCharm community edition](https://www.jetbrains.com/pycharm/).

Kleiner M, Dong X, Hinzke T, Wippler J, Thorson E, Mayer B, Strous M (2018) A metaproteomics method to determine 
carbon sources and assimilation pathways of species in microbial communities. Proceedings of the National Academy 
of Sciences 115 (24), E5576-E5584. 
doi: [https://doi.org/10.1073/pnas.1722325115 ](https://doi.org/10.1073/pnas.1722325115 )

Kleiner M, Kouris A, Jensen M, Liu Y, McCalder J, Strous M (2023) Ultra-sensitive Protein-SIP to quantify activity
and substrate uptake in microbiomes with stable isotopes. Microbiome 11, 24.
doi: [https://doi.org/10.1186/s40168-022-01454-1](https://doi.org/10.1186/s40168-022-01454-1)

M Kösters, J Leufken, S Schulze, K Sugimoto, J Klein, R P Zahedi, M Hippler, S A Leidel, C Fufezan; pymzML v2.0: 
introducing a highly compressed and seekable gzip format, Bioinformatics, 
doi: [https://doi.org/10.1093/bioinformatics/bty046](https://doi.org/10.1093/bioinformatics/bty046)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kinestetika/Calisp",
    "name": "calisp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "proteomics isotope mass-spectrometry 13C",
    "author": "Marc Strous",
    "author_email": "mstrous@ucalgary.ca",
    "download_url": "https://files.pythonhosted.org/packages/34/9a/fbbefcdac3fb2e5a0d63c1bc512da8c77217ce6d1d015a72febab655aeef/calisp-3.1.tar.gz",
    "platform": null,
    "description": "## Calisp.py, version 3.1\n\nCalisp.py (Calgary approach to isotopes in proteomics) is a program that estimates isotopic composition (e.g. 13C/12C,\ndelta13C, 15N/14N etc) of peptides from proteomics mass spectrometry data. Input data consist of mzML files and \nfiles with peptide spectrum matches.\n\nCalisp was originally developed in Java. This newer, python version is much more concise, it consists of only six .py \nfiles, ~1,000 lines of code, compared to about fifty files and ~10,000 lines of code for the Java program. In addition,\nthe Java program depends on another program, mcl, whereas calisp.py is purely python, which is easy to install, see below.\nThe conciseness of the code makes the python version more transparent, easier to maintain and easier to further develop. \nI consider calisp.py the successor of the Java version. One of the reasons to shift to python is the possibility\nto more effectively develop machine learning approaches to filter out noisy isotopic patterns. Future work will explore\nthat possibility.\n\nBenchmarking of Calisp.py has been completed. It works well, benchmarking procedures and outcomes are shared in the \n\"benchmarking\" folder. Parsing of .mzid files still needs to be implemented but Thermo's spectrum protein match file format\n is supported.\n\nCalisp.py depends on numpy, scipy, pandas, tqdm, [pymzml](https://pymzml.readthedocs.io/en/latest/intro.html), pyarrow. \nThese will be installed automatically by the pip command below. \nCalisp outputs the data as a Pandas DataFrame saved in (binary) [feather](https://arrow.apache.org/docs/python/feather.html) format.\nEach row contains a single isotopic pattern, with column definitions listed below.\nCalisp version 3.1 provides two small utility scripts to further process those data:\n\ncalisp_filter_patterns      filters out noisy patterns and converts the .feather file to a .csv file\ncalisp_compute_medians      reads the .csv file and outputs summary statistics for each protein or bin\n\nSee below for more details on these scripts. It is possible that you need more data wrangling than provided by the scripts.\nIn that case, you can for example explore and visualize the results in a [Jupyter notebook](https://jupyter.org/). To get\nstarted you can explore the code of the two scripts as well as the two benchmarking notebooks, which are in the benchmarking folder.\n\nCompared to previous versions of calisp, the workflow has been simplified. Calisp.py itself does not filter out any isotopic\npatterns, or adds up isotopic patterns to reduce noise - like the Java version does. It simply estimates the ratio for \nthe target isotopes (e.g. 13C/12C) for every isotopic pattern it can subsample. It estimates this ratio based on neutron \nabundance and using fast fourier transforms. The former applies to stable isotope probing experiments. The latter applies \nto natural abundances, or to isotope probing experiments with very little added label (e.g. using substrates with <1% \nadditional 13C). The motivation for omitting filtering from the main calisp program is that keeping all subsampled isotopic\npatterns, including bad ones, will enable training of machine learning classifiers. Also, because it was shown that the\nmedian provides better estimates for species in microbial communities than the mean, adding up isotopic patterns to improve\nprecision has lost its purpose. There is more power (and sensitivity) in numbers.\n\nBecause no data are filtered out and no isotopic patterns get added up, calisp.py, analyzes at least ten times as many \nisotopic patterns compared to the Java version. That means calisp.py is about ten times slower, it takes about 5-10 min \nper .mzML file on a Desktop computer.\n\n## Installation\n\n>python -m virtualenv calisp\n\n>source calisp/bin/activate\n\n>pip install --upgrade calisp\n\nIf you would like to explore calisp results in Jupyter notebooks, run the following command instead:\n\n>pip install --upgrade calisp jupyter matplotlib jinja2\n\n## Usage\n\n>source calisp/bin/activate\n\n>calisp.py --spectrum_file X --peptide_file Y --output_file Z\n\n List of all possible arguments:\n ```\n    --spectrum_file             path to .mzML file or dir with .mzML files\n    --peptide_file              path to .peptideSpectrumMatch file or dir with .PeptideSpectrumMatch files\n    --output_file               dir where calisp.py will save results files\n    --threads                   # of threads used, default 4\n    --isotope                   13C, 14C, 15N, 17O, 18O, 2H, 3H, 33S, 34S or 36S, default 13C\n    --bin_delimiter             character that separates the bin ID from the remainder of protein IDs, default '_'\n    --mass_accuracy             accuracy of peak m/z identifications, default 10 ppm\n    --isotope_abundance_matrix  path to file with isotope matrix, a default file is included with calisp\n    --compute_clumps            use only if you want to compute clumpiness\n ```\n\n## Column names of the Pandas DataFrame created by calisp.py:\n\nIn the saved dataframe, each row contains one isotopic pattern, defined by the following columns:\n```\nexperiment             filename of the peptide spectrum match (psm) file\nms_run                 filename of the .mzml file\nbins                   bin/mag ids, separated by commas. Calisp expects the protein ids in the psm \n                       file to consist of two parts, separated by a delimiter (_ by default). The \n                       first part is the bin/mag id, the second part the protein id\nproteins               the ids of the proteins associated with the pattern (without the bin id)\npeptide                the aminoacid sequence of the peptide\npeptide_mass           the mass of the peptide\nC                       # of carbon atoms in the peptide\nN                       # of nitrogen atoms in the peptide\nO                       # of oxygen atoms in the peptide\nH                       # of hydrogen atoms in the peptide\nS                       # of sulfur atoms in the peptide\npsm_id                  psm id\npsm_mz                  psm m over z\npsm_charge              psm charge\npsm_neutrons            number of neutrons inferred from custom 'neutron' modifications \npsm_rank                rank of the psm\npsm_precursor_id        id of the ms1 spectrum that was the source of the psm \npsm_precursor_mz        mass over charge of the precursor of the psm\npattern_charge          charge of the pattern\npattern_precursor_id    id of the ms1 spectrum that was the source of the pattern\npattern_total_intensity total intensity of the pattern\npattern_peak_count      # of peaks in the pattern\npattern_median_peak_spacing medium mass difference between a pattern's peaks\nspectrum_mass_irregularity  a measure for the standard deviation in the mass difference between a\n                            pattern's peaks\nratio_na                the estimated isotope ratio inferred from neutron abundance (sip \n                        experiments) \nratio_fft               the estimated isotope ratio inferred by the fft method (natural \n                        isotope abundances)\nerror_fft               the remaining error after fitting the pattern with fft\nerror_clumpy            the remaining error after fitting the pattern with the clumpy carbon \n                        method\nflag_peptide_contains_sulfur true if peptide contains sulfur\nflag_peptide_has_modifications true if peptide has no modifications\nflag_peptide_assigned_to_multiple_bins true if peptide is associated with multiple proteins from \n                                       different bins/mags\nflag_peptide_assigned_to_multiple_proteins true if peptide is associated with multiple proteins\nflag_peptide_mass_and_elements_undefined true if peptide has unknown mass and elemental \n                                         composition\nflag_psm_has_low_confidence true if psm was flagged as having low confidence (peptide identity \n                            uncertain)\nflag_psm_is_ambiguous   true if psm could not be assigned with certainty\nflag_pattern_is_contaminated true if multiple patterns have one or more shared peaks\nflag_pattern_is_wobbly true if pattern_median_peak_spacing exceeds a treshold\nflag_peak_at_minus_one_pos  true if a peak was detected immediately before the monoisotopic peak,\n                            could indicate overlap with another pattern\ni0 - i19                the intensities of the first 20 peaks of the pattern  \nm0 - m19                the masses of the first 20 peaks of the pattern\nc1 - c6                 contributions of clumps of 1-6 carbon to ratio_na. These are the \n                        outcomes of the clumpy carbon\n                        model. These results are only meaningful if the biomass was labeled to \n                        saturation.\n```\n## Using a custom isotope abundance matrix\n\nWhen estimating isotopic content for nitrogen, oxygen, hydrogen and sulfur, the estimates will be strongly\naffected by isotope abundances of carbon. For example, often biomass is slightly depleted in 13C compared to\nthe inorganic reference (Vienna Pee Dee Belemnite). However, Calisp will assume the assumed 13C content to be\ncorrect and will compensate the difference by reducing the content of the target isotope, which may result in\nestimating a negative content for the target isotope (which is physically impossible). To overcome this issue,\nyou can provide a custom isotope matrix with the actual or estimated 13C content of your samples (or other changes\nyou may wish to make). The isotope matrix file should be formatted as follows, with elements on rows and isotopes\n(+0, +1, +2, ... extra neutrons) on columns:\n```\n0.988943414833479 0.011056585166521 0.0         0.0 0.0    0.0 0.0 # C\n0.996323567       0.003676433       0.0         0.0 0.0    0.0 0.0 # N\n0.997574195       0.00038           0.002045805 0.0 0.0    0.0 0.0 # O\n0.99988           0.00012           0.0         0.0 0.0    0.0 0.0 # H\n0.9493            0.0076            0.0429      0.0 0.0002 0.0 0.0 # S\n# VPDB standard 13C/12C = 0.0111802 in Isodat software\n# see also https://www.webelements.com/sulfur/isotopes.html\n# see also http://iupac.org/publications/pac/pdf/2003/pdf/7506x0683.pdf\n```\n## calisp_filter_patterns\n\nUse this script after running calisp to filter the detected patterns and to convert your data from binary .feather\nformat to a .csv file that can be imported into a spreadsheet program.\n\nExample usage: calisp_filter_patterns --SIF --result_file [calisp result file or dir]\n\nYou need to specify --SIP or --SIF.\n\nArguments:\n```\n    --result_file       The .feather file generated by calisp, or a dir containing .feather files.\n    --SIF               Apply benchmarked filters for stable isotope fingerprinting data.\n    --SIP               Apply benchmarked filters for stable isotope probing data.\n    --CRAP              Remove proteins in the CRAP database.\n    --PROTEIN           If you plan to do a per-protein analysis, you may want to filter out patterns\n                        assigned to more than one protein\n    --flags             (Expert use:) Specify a custom list of flags for filtering patterns. Options:\n                            flag_psm_has_low_confidence*\n                            flag_psm_is_ambiguous*\n                            flag_pattern_is_contaminated*$\n                            flag_pattern_is_wobbly*\n                            flag_peptide_assigned_to_multiple_proteins*\n                            flag_peptide_assigned_to_multiple_bins*$\n                            flag_peptide_mass_and_elements_undefined*$\n                            flag_peptide_has_modifications\n                            flag_peptide_contains_sulfur\n                        Flags with * are default when using --SIP. Flags with $ are default when\n                        using --SIF\n    --max_fft_error     (Expert use:) Specify a custom maximum value for the allowable fft error.\n                        This only makes sense for SIF data. The benchmarked default when using\n                        --SIF is 0.001.\n```\n\n## calisp_compute_medians\n\nUse this script after running calisp_filter_patterns to compute medians and oter statistics. These will be written to\na stats.csv file that can be imported into a spreadsheet program.\n\nExample usage: calisp_compute_medians --SIF --result_file [.filtered.csv file or dir with .filtered.csv files]\n\nYou need to specify --SIP or --SIF.\n\nArguments:\n```\n    --result_file               The .filtered.csv file generated by calisp_filter_patterns, or\n                                a dir containing such files.\n    --SIF                       Compute delta values for stable isotope fingerprinting data.\n    --SIP                       Compute fractions (e.g. 13C/12C) for stable isotope probing data.\n    --PROTEIN                   Only use this to calculate stats per protein instead of per bin\n    --isotope                   13C, 14C, 15N, 17O, 18O, 2H, 3H, 33S, 34S or 36S, default 13C\n                                (only needed for delta calculations for SIF)\n    --isotope_abundance_matrix  Path to file with isotope matrix, a default file is included with\n                                calisp (only needed for delta calculations, SIF)\n    --vocabulary_file           Use this if you want to replace bin names, protein names or\n                                reassign patterns to bins or proteins (see below)\n```\n\n## The Vocabulary File\n\nThe file should consist of rows, with each row defining a change to the data. Each row should have four values, separated by\na single tab. The first and third values should be \"bins\", \"proteins\", \"experiment\" or \"ms_run\". here are a few examples:\n\n### Example 1\n\nIn this example, changes are purely cosmetic, we replace abstract bin identifiers with a more meaningful taxonomy for\nvisualization:\n```\nbins\tbin_245 bins\tProchlorococcus\n...\n```\n\n### Example 2\n\nIn this example, we have two bins that are both Prochlorococcus. We are interested to plot the assimilation of 13C by\nProchlorococcus but we have two Prochlorococcus bins. Using the following vocabulary line, these two bins will both be\nrenamed to Prochlorococcus and will thus be treated as one:\n\n```\nbins\tbin_245 bins\tProchlorococcus\nbins\tbin_395 bins\tProchlorococcus\n```\n\n### Example 3\n\nWe have a dataset, but never added any bin identifiers to protein ids. But we do want to compute median values by bin. We add a\nline for each protein, assigning it to a bin:\n\n```\nproteins    Q72K23  bins    Azoarcus\nproteins    Q72H45  bins    Azoarcus\nproteins    Q72KG1  bins    Azoarcus\n...\nproteins    XD0001  bins    Nitrosomonas\nproteins    XD0002  bins    Nitrosomonas\nproteins    XD0003  bins    Nitrosomonas\n...\n\n### Example 4\n\nWe have analysed multiple replicates for each experiment and want to aggregate the results across the replicates:\n\n```\nms_run  run_a1  ms_run  time_point_1\nms_run  run_a2  ms_run  time_point_1\n...\nms_run  run_b1  ms_run  time_point_2\nms_run  run_b2  ms_run  time_point_2\n\n```\n\n## Please cite\n\nCalisp was developed using [PyCharm community edition](https://www.jetbrains.com/pycharm/).\n\nKleiner M, Dong X, Hinzke T, Wippler J, Thorson E, Mayer B, Strous M (2018) A metaproteomics method to determine \ncarbon sources and assimilation pathways of species in microbial communities. Proceedings of the National Academy \nof Sciences 115 (24), E5576-E5584. \ndoi: [https://doi.org/10.1073/pnas.1722325115 ](https://doi.org/10.1073/pnas.1722325115 )\n\nKleiner M, Kouris A, Jensen M, Liu Y, McCalder J, Strous M (2023) Ultra-sensitive Protein-SIP to quantify activity\nand substrate uptake in microbiomes with stable isotopes. Microbiome 11, 24.\ndoi: [https://doi.org/10.1186/s40168-022-01454-1](https://doi.org/10.1186/s40168-022-01454-1)\n\nM K\u00f6sters, J Leufken, S Schulze, K Sugimoto, J Klein, R P Zahedi, M Hippler, S A Leidel, C Fufezan; pymzML v2.0: \nintroducing a highly compressed and seekable gzip format, Bioinformatics, \ndoi: [https://doi.org/10.1093/bioinformatics/bty046](https://doi.org/10.1093/bioinformatics/bty046)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Isotope analysis of proteomics data",
    "version": "3.1",
    "project_urls": {
        "Homepage": "https://github.com/kinestetika/Calisp",
        "Source": "https://github.com/kinestetika/Calisp"
    },
    "split_keywords": [
        "proteomics",
        "isotope",
        "mass-spectrometry",
        "13c"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e07a216573726e853d965584c4dce7557c43dbbf2a9a68a9e8da1a0307de601",
                "md5": "089b03cbb7f5300c895a4f635081f880",
                "sha256": "be9fe050229792d26a875a71f684e047c8fefc3bdf3da09dbb44749ca08cea7f"
            },
            "downloads": -1,
            "filename": "calisp-3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "089b03cbb7f5300c895a4f635081f880",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 38913,
            "upload_time": "2025-01-20T20:08:05",
            "upload_time_iso_8601": "2025-01-20T20:08:05.017396Z",
            "url": "https://files.pythonhosted.org/packages/6e/07/a216573726e853d965584c4dce7557c43dbbf2a9a68a9e8da1a0307de601/calisp-3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "349afbbefcdac3fb2e5a0d63c1bc512da8c77217ce6d1d015a72febab655aeef",
                "md5": "082974eabf9da29002a0b4d7ec0f2397",
                "sha256": "a2c1a7a6d01bd1326bf1046d2d0f604f7ce130c45be2b2a245b57a887f4016a0"
            },
            "downloads": -1,
            "filename": "calisp-3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "082974eabf9da29002a0b4d7ec0f2397",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 42187,
            "upload_time": "2025-01-20T20:08:06",
            "upload_time_iso_8601": "2025-01-20T20:08:06.868583Z",
            "url": "https://files.pythonhosted.org/packages/34/9a/fbbefcdac3fb2e5a0d63c1bc512da8c77217ce6d1d015a72febab655aeef/calisp-3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-20 20:08:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kinestetika",
    "github_project": "Calisp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "calisp"
}
        
Elapsed time: 0.40614s