# xi-mzidentml-converter
![python-app](https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter/actions/workflows/python-app.yml/badge.svg)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
xi-mzidentml-converter processes mzIdentML 1.2.0 and 1.3.0 files with the primary aim of extracting crosslink information.
It has three use cases:
1. to validate mzIdentML files against the criteria given here: https://www.ebi.ac.uk/pride/markdownpage/crosslinking
2. to extract information on crosslinked resiude pairs and output it in a form more easily used by modelling software
3. to populate the database that is accessed by [xiview-api](https://github.com/Rappsilber-Laboratory/xiview-api)
It uses the pyteomics library (https://pyteomics.readthedocs.io/en/latest/index.html) as the underlying parser for mzIdentML.
Results are written into a relational database (PostgreSQL or SQLite) using sqlalchemy.
## Requirements:
python3.10
pipenv
sqlite3 for validation and residue pair extraction. postgresql or sqlite3 for creation of xiview-api dtabase
(the instructions below use posrgresql)
## Installation
Clone git repository and set up python envorment or install via PYPI:
```
git clone https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter.git
cd x-mzidentml-converter
pipenv install --python 3.10
```
PYPI project: https://pypi.org/project/xi-mzidentml-converter/
PYPI instructions: https://packaging.python.org/en/latest/tutorials/installing-packages/
## Usage
proceess_dataset.py is the entry point and running it with the -h option will give a list of options.
```
python process_dataset.py -h
```
### 1. Validate a dataset
Run processdataset.py with the -v option to validate a dataset, the argument is the path to a specific mzIdentML file
or to a directory conatining multiple mzIdentML files, in which case all of them will be validated. To pass, all the peaklist files
referenced must be in the same directory as the mzIdentML file(s). The converter will create an sqlite database in the
temporary folder which is used in the validation process, the temporary folder can be specified with the -t option.
Examples:
```
python process_dataset.py -v ~/mydata
```
```
python process_dataset.py -v ~/mydata/mymzid.mzid -t ~/mytempdir
```
The result is written to the console. If the data fails validation but the error message is not informative,
please open an issue on the github repository: https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter/issues
### 2. Extract summary of crosslinked residue pairs
Run processdataset.py with the --seqsandresiduepairs option to extract a summary of search sequences and
crosslinked residue pairs. The output is json which is written to the console. The argument is the path to an mZIdentML
file or a directory containing multiple mzIdentML files, in which case all of them will be processed.
Examples:
```
python process_dataset.py --seqsandresiduepairs ~/mydata -t ~/mytempdir
```
```
python process_dataset.py --seqsandresiduepairs ~/mydata/mymzid.mzid
```
It can also be accessed programitically by using the
`json_sequences_and_residue_pairs(filepath, tmpdir)` function in process_dataset.py.
### 3. populate the xiview-api database
#### Create the database
```
sudo su postgres
psql
create database xiview;
create user xiadmin with login password 'your_password_here';
grant all privileges on database xiview to xiadmin;
```
find the hba.conf file in the postgresql installation directory and add a line to allow the xiadmin role to access the database:
e.g.
```
sudo nano /etc/postgresql/13/main/pg_hba.conf
```
then add the line:
`local xiview xiadmin md5`
then restart postgresql:
```
sudo service postgresql restart
```
#### Configure the python environment for the file parser
edit the file xi-mzidentml-converter/config/database.ini to point to your postgressql database.
e.g. so its content is:
```
[postgresql]
host=localhost
database=xitest
user=xiadmin
password=your_password_here
port=5432
```
#### Create the database schema
run create_db_schema.py to create the database tables:
```
python database/create_db_schema.py
```
#### Populate the database
To parse a test dataset:
```
python process_dataset.py -d ~/PXD038060
```
The command line options that populate the database are -d, -f and -p. Only one of these can be used.
The -d option is the directory to process files from,
the -f option is the path to an ftp directory conatining mzIdentML files,
the -p option is a ProteomeXchange identifier or a list of ProteomeXchange identifiers separated by spaces.
The -i option is the project identifier to use in the database. It will default to the PXD accession or the
name of the directory containing the mzIdentML file.
## To run tests
Make sure we have the right db user available
```
psql -p 5432 -c "create role ximzid_unittests with password 'ximzid_unittests';"
psql -p 5432 -c 'alter role ximzid_unittests with login;'
psql -p 5432 -c 'alter role ximzid_unittests with createdb;'
psql -p 5432 -c 'GRANT pg_signal_backend TO ximzid_unittests;'
```
run the tests
```pipenv run pytest```
Raw data
{
"_id": null,
"home_page": "https://github.com/PRIDE-Archive/xi-mzidentml-converter",
"name": "xi-mzidentml-converter",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "crosslinking python proteomics",
"author": "Colin Combe, Lars Kolbowski, Suresh Hewapathirana",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/f6/02/4d88fea0f08b88af55aef28ddf6f8a04261e80ccedf9568934640b6f6dbf/xi_mzidentml_converter-0.3.5.tar.gz",
"platform": "any",
"description": "# xi-mzidentml-converter\n![python-app](https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter/actions/workflows/python-app.yml/badge.svg)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\nxi-mzidentml-converter processes mzIdentML 1.2.0 and 1.3.0 files with the primary aim of extracting crosslink information. \nIt has three use cases:\n1. to validate mzIdentML files against the criteria given here: https://www.ebi.ac.uk/pride/markdownpage/crosslinking\n2. to extract information on crosslinked resiude pairs and output it in a form more easily used by modelling software\n3. to populate the database that is accessed by [xiview-api](https://github.com/Rappsilber-Laboratory/xiview-api)\n\nIt uses the pyteomics library (https://pyteomics.readthedocs.io/en/latest/index.html) as the underlying parser for mzIdentML.\nResults are written into a relational database (PostgreSQL or SQLite) using sqlalchemy.\n\n## Requirements:\npython3.10\n\npipenv\n\nsqlite3 for validation and residue pair extraction. postgresql or sqlite3 for creation of xiview-api dtabase \n(the instructions below use posrgresql)\n\n## Installation\n\nClone git repository and set up python envorment or install via PYPI:\n\n```\ngit clone https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter.git\ncd x-mzidentml-converter\npipenv install --python 3.10\n```\n\nPYPI project: https://pypi.org/project/xi-mzidentml-converter/\n\nPYPI instructions: https://packaging.python.org/en/latest/tutorials/installing-packages/\n\n## Usage\n\nproceess_dataset.py is the entry point and running it with the -h option will give a list of options.\n\n```\npython process_dataset.py -h\n```\n\n### 1. Validate a dataset\n\nRun processdataset.py with the -v option to validate a dataset, the argument is the path to a specific mzIdentML file \nor to a directory conatining multiple mzIdentML files, in which case all of them will be validated. To pass, all the peaklist files \nreferenced must be in the same directory as the mzIdentML file(s). The converter will create an sqlite database in the \ntemporary folder which is used in the validation process, the temporary folder can be specified with the -t option. \n\nExamples:\n```\npython process_dataset.py -v ~/mydata\n```\n```\npython process_dataset.py -v ~/mydata/mymzid.mzid -t ~/mytempdir\n```\n\nThe result is written to the console. If the data fails validation but the error message is not informative,\nplease open an issue on the github repository: https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter/issues\n\n### 2. Extract summary of crosslinked residue pairs \n\nRun processdataset.py with the --seqsandresiduepairs option to extract a summary of search sequences and\ncrosslinked residue pairs. The output is json which is written to the console. The argument is the path to an mZIdentML \nfile or a directory containing multiple mzIdentML files, in which case all of them will be processed. \n\nExamples:\n```\npython process_dataset.py --seqsandresiduepairs ~/mydata -t ~/mytempdir\n```\n\n```\npython process_dataset.py --seqsandresiduepairs ~/mydata/mymzid.mzid\n```\n\nIt can also be accessed programitically by using the \n`json_sequences_and_residue_pairs(filepath, tmpdir)` function in process_dataset.py. \n\n### 3. populate the xiview-api database\n\n#### Create the database\n\n```\nsudo su postgres\npsql\ncreate database xiview;\ncreate user xiadmin with login password 'your_password_here';\ngrant all privileges on database xiview to xiadmin;\n```\n\nfind the hba.conf file in the postgresql installation directory and add a line to allow the xiadmin role to access the database:\ne.g.\n```\nsudo nano /etc/postgresql/13/main/pg_hba.conf\n```\nthen add the line:\n`local xiview xiadmin md5`\n\nthen restart postgresql:\n```\nsudo service postgresql restart\n```\n\n\n#### Configure the python environment for the file parser\n\nedit the file xi-mzidentml-converter/config/database.ini to point to your postgressql database.\ne.g. so its content is:\n```\n[postgresql]\nhost=localhost\ndatabase=xitest\nuser=xiadmin\npassword=your_password_here\nport=5432\n```\n\n#### Create the database schema \n\nrun create_db_schema.py to create the database tables:\n```\npython database/create_db_schema.py\n```\n\n#### Populate the database\nTo parse a test dataset:\n```\npython process_dataset.py -d ~/PXD038060\n```\n\nThe command line options that populate the database are -d, -f and -p. Only one of these can be used.\nThe -d option is the directory to process files from, \nthe -f option is the path to an ftp directory conatining mzIdentML files, \nthe -p option is a ProteomeXchange identifier or a list of ProteomeXchange identifiers separated by spaces.\n\nThe -i option is the project identifier to use in the database. It will default to the PXD accession or the \nname of the directory containing the mzIdentML file.\n\n\n\n## To run tests\n\nMake sure we have the right db user available\n```\npsql -p 5432 -c \"create role ximzid_unittests with password 'ximzid_unittests';\"\npsql -p 5432 -c 'alter role ximzid_unittests with login;'\npsql -p 5432 -c 'alter role ximzid_unittests with createdb;'\npsql -p 5432 -c 'GRANT pg_signal_backend TO ximzid_unittests;'\n```\nrun the tests\n\n```pipenv run pytest```\n",
"bugtrack_url": null,
"license": "'Apache 2.0",
"summary": "xi-mzidentml-converter uses pyteomics (https://pyteomics.readthedocs.io/en/latest/index.html) to parse mzIdentML files (v1.2.0) and extract crosslink information. Results are written to a relational database (PostgreSQL or SQLite) using sqlalchemy.",
"version": "0.3.5",
"project_urls": {
"Homepage": "https://github.com/PRIDE-Archive/xi-mzidentml-converter"
},
"split_keywords": [
"crosslinking",
"python",
"proteomics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "25c65f7cb985ff9eeaf54dbc391f7d2e9585a5feb5a4be4386c71034c15affd9",
"md5": "d0ce878cf7ecb891ea45c91c72b40170",
"sha256": "f77287cfe4f7af57bd2af9617ee32a8ccd439ef819ef3791169bcfe8a553f5bb"
},
"downloads": -1,
"filename": "xi_mzidentml_converter-0.3.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d0ce878cf7ecb891ea45c91c72b40170",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 87823,
"upload_time": "2024-11-07T14:51:25",
"upload_time_iso_8601": "2024-11-07T14:51:25.847723Z",
"url": "https://files.pythonhosted.org/packages/25/c6/5f7cb985ff9eeaf54dbc391f7d2e9585a5feb5a4be4386c71034c15affd9/xi_mzidentml_converter-0.3.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f6024d88fea0f08b88af55aef28ddf6f8a04261e80ccedf9568934640b6f6dbf",
"md5": "f69d60d42e8954aceef63bff65ab8f49",
"sha256": "f9bcf6e450bcfa8e6b6d31eaf0bdbbeeb31bc0a00855b014f4a95e849de7d174"
},
"downloads": -1,
"filename": "xi_mzidentml_converter-0.3.5.tar.gz",
"has_sig": false,
"md5_digest": "f69d60d42e8954aceef63bff65ab8f49",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 70684,
"upload_time": "2024-11-07T14:51:27",
"upload_time_iso_8601": "2024-11-07T14:51:27.375002Z",
"url": "https://files.pythonhosted.org/packages/f6/02/4d88fea0f08b88af55aef28ddf6f8a04261e80ccedf9568934640b6f6dbf/xi_mzidentml_converter-0.3.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-07 14:51:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "PRIDE-Archive",
"github_project": "xi-mzidentml-converter",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "xi-mzidentml-converter"
}