Name | broad-babel JSON |
Version |
0.1.27
JSON |
| download |
home_page | None |
Summary | A translator of Broad and JUMP ids to more conventional names. |
upload_time | 2024-10-08 20:12:17 |
maintainer | None |
docs_url | None |
author | Alan Munoz |
requires_python | <3.12,>=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Broad_Babel
Minimal name translator of [JUMP](https://jump-cellpainting.broadinstitute.org/) consortium.
## Installation
```bash
pip install broad-babel
```
## Broad sample to standard
You can fetch a single value. Note that only ORF datasets have an associated broad_id by default.
```python
from broad_babel.query import broad_to_standard
broad_to_standard("ccsbBroad304_99994")
# 'LacZ'
```
If you provide multiple strings it will return dictionary.
```python
broad_to_standard(("ccsbBroad304_09930", "ccsbBroad304_16164"))
# {'ccsbBroad304_09930': 'SCIMP', 'ccsbBroad304_16164': 'NAP1L5'}
```
## Wildcard search
You can also use [sqlite](https://docs.python.org/3/library/sqlite3.html) bindings. For instance, to get all the samples that start as "poscon" you can use:
```python
from broad_babel.query import run_query
run_query(query="poscon%", input_column="pert_type", output_columns="JCP2022,standard_key,plate_type,pert_type", operator="LIKE")
# [(None, 'LRRMQNGSYOUANY-OMCISZLKSA-N', 'compound', 'poscon_cp'),
# (None, 'DHMTURDWPRKSOA-RUZDIDTESA-N', 'compound', 'poscon_diverse'),
# ...
# ('JCP2022_913605', 'CDK2', 'orf', 'poscon_orf'),
# ('JCP2022_913622', 'CLK1', 'orf', 'poscon_cp')]
```
## Make mappers for quick renaming
This is very useful when you need to map from a long list of perturbation names. The following example shows how to map all the perturbations in the compound plate from JCP id to perturbation type.
```python
from broad_babel.query import get_mapper
mapper = get_mapper(query="compound", input_column="plate_type", output_columns="JCP2022,pert_type")
```
## Export database as csv
```python
from broad_babel.query import export_csv
export_csv("./output.csv")
```
## Custom querying
The available fields are:
- standard_key: Gene Entrez id for gene-related perturbations, and InChIKey for compound perturbations
- JCP2022: Identifier from the JUMP dataset
- plate_type: Dataset of origin for a given entry
- NCBI_Gene_ID: NCBI identifier, only applicable to ORF and CRISPR
- broad_sample: Internal Broad ID
- pert_type: Type of perturbation, options are trt (treatment), control, negcon (Negative Control), poscon_cp (Positive Control, Compound Probe), poscon_diverse, poscon_orf, and poscon (Positive Control).
You can fetch any field using another (note that the output is a list of tuples)
```python
run_query(query="JCP2022_915119", input_column="JCP2022", output_columns="broad_sample")
# [('ccsbBroad304_16164',)]
```
It is also possible to use fuzzy querying by changing the operator argument and adding "%" to out key. For example, to get the genes in the "orf" dataset whose name start with "RBP"(some of which are retinol-binding proteins) we can do:
```python
[x[:2] for x in run_query(
"RBP%",
input_column="standard_key",
output_columns="standard_key,JCP2022,plate_type",
operator="LIKE",
) if x[2]=="orf"]
# [('RBP7', 'JCP2022_904406'), ('RBPJ', 'JCP2022_906023'), ('RBP4', 'JCP2022_906415'),
# ('RBPMS', 'JCP2022_902435'), ('RBP2', 'JCP2022_914559'), ('RBP2', 'JCP2022_906413'),
# ('RBP3', 'JCP2022_906414'), ('RBP1', 'JCP2022_910341')]
```
Note that we also got RBPMS here, which is actually RNA-binding protein with multiple splicing, so use this with caution.
## Additional documentation
Metadata sources and additional documentation is available [here](./docs).
Note that Babel only contains metadata of JUMP compounds and genes, and may not contain sample information from other projects (e.g., LINCS). A more comprehensive table to map "broad ids" to standard chemical ids (e.g., SMILES, InChiKey) can be found [here](https://repo-hub.broadinstitute.org/repurposing#download-data).
Raw data
{
"_id": null,
"home_page": null,
"name": "broad-babel",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Alan Munoz",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/42/37/228b22cfbce8add192015473ca9ad404a4d3b5017f68171edb69f93d2b0b/broad_babel-0.1.27.tar.gz",
"platform": null,
"description": "# Broad_Babel\n\nMinimal name translator of [JUMP](https://jump-cellpainting.broadinstitute.org/) consortium.\n\n## Installation\n\n```bash\npip install broad-babel\n```\n\n## Broad sample to standard \nYou can fetch a single value. Note that only ORF datasets have an associated broad_id by default.\n```python\nfrom broad_babel.query import broad_to_standard\n\nbroad_to_standard(\"ccsbBroad304_99994\") \n# 'LacZ'\n```\nIf you provide multiple strings it will return dictionary.\n\n```python\nbroad_to_standard((\"ccsbBroad304_09930\", \"ccsbBroad304_16164\")) \n\n# {'ccsbBroad304_09930': 'SCIMP', 'ccsbBroad304_16164': 'NAP1L5'}\n```\n\n## Wildcard search\nYou can also use [sqlite](https://docs.python.org/3/library/sqlite3.html) bindings. For instance, to get all the samples that start as \"poscon\" you can use:\n\n```python\nfrom broad_babel.query import run_query\nrun_query(query=\"poscon%\", input_column=\"pert_type\", output_columns=\"JCP2022,standard_key,plate_type,pert_type\", operator=\"LIKE\")\n\n# [(None, 'LRRMQNGSYOUANY-OMCISZLKSA-N', 'compound', 'poscon_cp'),\n# (None, 'DHMTURDWPRKSOA-RUZDIDTESA-N', 'compound', 'poscon_diverse'),\n# ...\n# ('JCP2022_913605', 'CDK2', 'orf', 'poscon_orf'),\n# ('JCP2022_913622', 'CLK1', 'orf', 'poscon_cp')]\n```\n\n## Make mappers for quick renaming\n\nThis is very useful when you need to map from a long list of perturbation names. The following example shows how to map all the perturbations in the compound plate from JCP id to perturbation type.\n```python\nfrom broad_babel.query import get_mapper\n\nmapper = get_mapper(query=\"compound\", input_column=\"plate_type\", output_columns=\"JCP2022,pert_type\")\n```\n\n\n## Export database as csv\n```python\nfrom broad_babel.query import export_csv\n\nexport_csv(\"./output.csv\")\n```\n\n## Custom querying\nThe available fields are:\n- standard_key: Gene Entrez id for gene-related perturbations, and InChIKey for compound perturbations\n- JCP2022: Identifier from the JUMP dataset\n- plate_type: Dataset of origin for a given entry\n- NCBI_Gene_ID: NCBI identifier, only applicable to ORF and CRISPR\n- broad_sample: Internal Broad ID\n- pert_type: Type of perturbation, options are trt (treatment), control, negcon (Negative Control), poscon_cp (Positive Control, Compound Probe), poscon_diverse, poscon_orf, and poscon (Positive Control).\n\nYou can fetch any field using another (note that the output is a list of tuples)\n\n```python\nrun_query(query=\"JCP2022_915119\", input_column=\"JCP2022\", output_columns=\"broad_sample\")\n# [('ccsbBroad304_16164',)]\n```\n\nIt is also possible to use fuzzy querying by changing the operator argument and adding \"%\" to out key. For example, to get the genes in the \"orf\" dataset whose name start with \"RBP\"(some of which are retinol-binding proteins) we can do:\n\n```python\n[x[:2] for x in run_query(\n \"RBP%\",\n input_column=\"standard_key\",\n output_columns=\"standard_key,JCP2022,plate_type\",\n operator=\"LIKE\",\n ) if x[2]==\"orf\"]\n\n# [('RBP7', 'JCP2022_904406'), ('RBPJ', 'JCP2022_906023'), ('RBP4', 'JCP2022_906415'),\n# ('RBPMS', 'JCP2022_902435'), ('RBP2', 'JCP2022_914559'), ('RBP2', 'JCP2022_906413'),\n# ('RBP3', 'JCP2022_906414'), ('RBP1', 'JCP2022_910341')]\n```\nNote that we also got RBPMS here, which is actually RNA-binding protein with multiple splicing, so use this with caution.\n\n## Additional documentation\nMetadata sources and additional documentation is available [here](./docs). \n\nNote that Babel only contains metadata of JUMP compounds and genes, and may not contain sample information from other projects (e.g., LINCS). A more comprehensive table to map \"broad ids\" to standard chemical ids (e.g., SMILES, InChiKey) can be found [here](https://repo-hub.broadinstitute.org/repurposing#download-data). \n",
"bugtrack_url": null,
"license": null,
"summary": "A translator of Broad and JUMP ids to more conventional names.",
"version": "0.1.27",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "aae63235993165b272d3f2f18a3e5ae34e49414f6b9b0af538b0c7d816846066",
"md5": "75ed365095399c68f6cc23e01bade750",
"sha256": "67f756c80894883d3d0559fc0a39ed7de0d2b7e1e855d2fd7a3d528bd526b479"
},
"downloads": -1,
"filename": "broad_babel-0.1.27-py3-none-any.whl",
"has_sig": false,
"md5_digest": "75ed365095399c68f6cc23e01bade750",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.10",
"size": 5698,
"upload_time": "2024-10-08T20:12:16",
"upload_time_iso_8601": "2024-10-08T20:12:16.483105Z",
"url": "https://files.pythonhosted.org/packages/aa/e6/3235993165b272d3f2f18a3e5ae34e49414f6b9b0af538b0c7d816846066/broad_babel-0.1.27-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4237228b22cfbce8add192015473ca9ad404a4d3b5017f68171edb69f93d2b0b",
"md5": "6f19e5f67f04536ca6fd888e0184ca37",
"sha256": "c14803ec1ad99fe8791b71501516f22b064ad51aad6135ed528d57abe25ad267"
},
"downloads": -1,
"filename": "broad_babel-0.1.27.tar.gz",
"has_sig": false,
"md5_digest": "6f19e5f67f04536ca6fd888e0184ca37",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.10",
"size": 5103,
"upload_time": "2024-10-08T20:12:17",
"upload_time_iso_8601": "2024-10-08T20:12:17.951079Z",
"url": "https://files.pythonhosted.org/packages/42/37/228b22cfbce8add192015473ca9ad404a4d3b5017f68171edb69f93d2b0b/broad_babel-0.1.27.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-08 20:12:17",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "broad-babel"
}