broad-babel

Name	broad-babel JSON
Version	0.1.27 JSON
	download
home_page	None
Summary	A translator of Broad and JUMP ids to more conventional names.
upload_time	2024-10-08 20:12:17
maintainer	None
docs_url	None
author	Alan Munoz
requires_python	<3.12,>=3.10
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Broad_Babel

Minimal name translator of [JUMP](https://jump-cellpainting.broadinstitute.org/) consortium.

## Installation

```bash
pip install broad-babel
```

## Broad sample to standard 
You can fetch a single value. Note that only ORF datasets have an associated broad_id by default.
```python
from broad_babel.query import broad_to_standard

broad_to_standard("ccsbBroad304_99994") 
# 'LacZ'
```
If you provide multiple strings it will return dictionary.

```python
broad_to_standard(("ccsbBroad304_09930", "ccsbBroad304_16164")) 

# {'ccsbBroad304_09930': 'SCIMP', 'ccsbBroad304_16164': 'NAP1L5'}
```

## Wildcard search
You can also use [sqlite](https://docs.python.org/3/library/sqlite3.html) bindings. For instance, to get all the samples that start as "poscon" you can use:

```python
from broad_babel.query import run_query
run_query(query="poscon%", input_column="pert_type", output_columns="JCP2022,standard_key,plate_type,pert_type", operator="LIKE")

# [(None, 'LRRMQNGSYOUANY-OMCISZLKSA-N', 'compound', 'poscon_cp'),
#  (None, 'DHMTURDWPRKSOA-RUZDIDTESA-N', 'compound', 'poscon_diverse'),
#  ...
#  ('JCP2022_913605', 'CDK2', 'orf', 'poscon_orf'),
#  ('JCP2022_913622', 'CLK1', 'orf', 'poscon_cp')]
```

## Make mappers for quick renaming

This is very useful when you need to map from a long list of perturbation names. The following example shows how to map all the perturbations in the compound plate from JCP id to perturbation type.
```python
from broad_babel.query import get_mapper

mapper = get_mapper(query="compound", input_column="plate_type", output_columns="JCP2022,pert_type")
```


## Export database as csv
```python
from broad_babel.query import export_csv

export_csv("./output.csv")
```

## Custom querying
The available fields are:
- standard_key: Gene Entrez id for gene-related perturbations, and InChIKey for compound perturbations
- JCP2022: Identifier from the JUMP dataset
- plate_type: Dataset of origin for a given entry
- NCBI_Gene_ID: NCBI identifier, only applicable to ORF and CRISPR
- broad_sample: Internal Broad ID
- pert_type: Type of perturbation, options are trt (treatment), control, negcon (Negative Control), poscon_cp (Positive Control, Compound Probe), poscon_diverse, poscon_orf, and poscon (Positive Control).

You can fetch any field using another (note that the output is a list of tuples)

```python
run_query(query="JCP2022_915119", input_column="JCP2022", output_columns="broad_sample")
# [('ccsbBroad304_16164',)]
```

It is also possible to use fuzzy querying by changing the operator argument and adding "%" to out key. For example, to get the genes in the "orf" dataset whose name start with "RBP"(some of which are retinol-binding proteins) we can do:

```python
[x[:2] for x in run_query(
    "RBP%",
    input_column="standard_key",
    output_columns="standard_key,JCP2022,plate_type",
    operator="LIKE",
    ) if x[2]=="orf"]

# [('RBP7', 'JCP2022_904406'), ('RBPJ', 'JCP2022_906023'), ('RBP4', 'JCP2022_906415'),
# ('RBPMS', 'JCP2022_902435'), ('RBP2', 'JCP2022_914559'), ('RBP2', 'JCP2022_906413'),
# ('RBP3', 'JCP2022_906414'), ('RBP1', 'JCP2022_910341')]
```
Note that we also got RBPMS here, which is actually RNA-binding protein with multiple splicing, so use this with caution.

## Additional documentation
Metadata sources and additional documentation is available [here](./docs). 

Note that Babel only contains metadata of JUMP compounds and genes, and may not contain sample information from other projects (e.g., LINCS). A more comprehensive table to map "broad ids" to standard chemical ids (e.g., SMILES, InChiKey) can be found [here](https://repo-hub.broadinstitute.org/repurposing#download-data).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "broad-babel",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Alan Munoz",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/42/37/228b22cfbce8add192015473ca9ad404a4d3b5017f68171edb69f93d2b0b/broad_babel-0.1.27.tar.gz",
    "platform": null,
    "description": "# Broad_Babel\n\nMinimal name translator of [JUMP](https://jump-cellpainting.broadinstitute.org/) consortium.\n\n## Installation\n\n```bash\npip install broad-babel\n```\n\n## Broad sample to standard \nYou can fetch a single value. Note that only ORF datasets have an associated broad_id by default.\n```python\nfrom broad_babel.query import broad_to_standard\n\nbroad_to_standard(\"ccsbBroad304_99994\") \n# 'LacZ'\n```\nIf you provide multiple strings it will return dictionary.\n\n```python\nbroad_to_standard((\"ccsbBroad304_09930\", \"ccsbBroad304_16164\")) \n\n# {'ccsbBroad304_09930': 'SCIMP', 'ccsbBroad304_16164': 'NAP1L5'}\n```\n\n## Wildcard search\nYou can also use [sqlite](https://docs.python.org/3/library/sqlite3.html) bindings. For instance, to get all the samples that start as \"poscon\" you can use:\n\n```python\nfrom broad_babel.query import run_query\nrun_query(query=\"poscon%\", input_column=\"pert_type\", output_columns=\"JCP2022,standard_key,plate_type,pert_type\", operator=\"LIKE\")\n\n# [(None, 'LRRMQNGSYOUANY-OMCISZLKSA-N', 'compound', 'poscon_cp'),\n#  (None, 'DHMTURDWPRKSOA-RUZDIDTESA-N', 'compound', 'poscon_diverse'),\n#  ...\n#  ('JCP2022_913605', 'CDK2', 'orf', 'poscon_orf'),\n#  ('JCP2022_913622', 'CLK1', 'orf', 'poscon_cp')]\n```\n\n## Make mappers for quick renaming\n\nThis is very useful when you need to map from a long list of perturbation names. The following example shows how to map all the perturbations in the compound plate from JCP id to perturbation type.\n```python\nfrom broad_babel.query import get_mapper\n\nmapper = get_mapper(query=\"compound\", input_column=\"plate_type\", output_columns=\"JCP2022,pert_type\")\n```\n\n\n## Export database as csv\n```python\nfrom broad_babel.query import export_csv\n\nexport_csv(\"./output.csv\")\n```\n\n## Custom querying\nThe available fields are:\n- standard_key: Gene Entrez id for gene-related perturbations, and InChIKey for compound perturbations\n- JCP2022: Identifier from the JUMP dataset\n- plate_type: Dataset of origin for a given entry\n- NCBI_Gene_ID: NCBI identifier, only applicable to ORF and CRISPR\n- broad_sample: Internal Broad ID\n- pert_type: Type of perturbation, options are trt (treatment), control, negcon (Negative Control), poscon_cp (Positive Control, Compound Probe), poscon_diverse, poscon_orf, and poscon (Positive Control).\n\nYou can fetch any field using another (note that the output is a list of tuples)\n\n```python\nrun_query(query=\"JCP2022_915119\", input_column=\"JCP2022\", output_columns=\"broad_sample\")\n# [('ccsbBroad304_16164',)]\n```\n\nIt is also possible to use fuzzy querying by changing the operator argument and adding \"%\" to out key. For example, to get the genes in the \"orf\" dataset whose name start with \"RBP\"(some of which are retinol-binding proteins) we can do:\n\n```python\n[x[:2] for x in run_query(\n    \"RBP%\",\n    input_column=\"standard_key\",\n    output_columns=\"standard_key,JCP2022,plate_type\",\n    operator=\"LIKE\",\n    ) if x[2]==\"orf\"]\n\n# [('RBP7', 'JCP2022_904406'), ('RBPJ', 'JCP2022_906023'), ('RBP4', 'JCP2022_906415'),\n# ('RBPMS', 'JCP2022_902435'), ('RBP2', 'JCP2022_914559'), ('RBP2', 'JCP2022_906413'),\n# ('RBP3', 'JCP2022_906414'), ('RBP1', 'JCP2022_910341')]\n```\nNote that we also got RBPMS here, which is actually RNA-binding protein with multiple splicing, so use this with caution.\n\n## Additional documentation\nMetadata sources and additional documentation is available [here](./docs). \n\nNote that Babel only contains metadata of JUMP compounds and genes, and may not contain sample information from other projects (e.g., LINCS). A more comprehensive table to map \"broad ids\" to standard chemical ids (e.g., SMILES, InChiKey) can be found [here](https://repo-hub.broadinstitute.org/repurposing#download-data). \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A translator of Broad and JUMP ids to more conventional names.",
    "version": "0.1.27",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aae63235993165b272d3f2f18a3e5ae34e49414f6b9b0af538b0c7d816846066",
                "md5": "75ed365095399c68f6cc23e01bade750",
                "sha256": "67f756c80894883d3d0559fc0a39ed7de0d2b7e1e855d2fd7a3d528bd526b479"
            },
            "downloads": -1,
            "filename": "broad_babel-0.1.27-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "75ed365095399c68f6cc23e01bade750",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.10",
            "size": 5698,
            "upload_time": "2024-10-08T20:12:16",
            "upload_time_iso_8601": "2024-10-08T20:12:16.483105Z",
            "url": "https://files.pythonhosted.org/packages/aa/e6/3235993165b272d3f2f18a3e5ae34e49414f6b9b0af538b0c7d816846066/broad_babel-0.1.27-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4237228b22cfbce8add192015473ca9ad404a4d3b5017f68171edb69f93d2b0b",
                "md5": "6f19e5f67f04536ca6fd888e0184ca37",
                "sha256": "c14803ec1ad99fe8791b71501516f22b064ad51aad6135ed528d57abe25ad267"
            },
            "downloads": -1,
            "filename": "broad_babel-0.1.27.tar.gz",
            "has_sig": false,
            "md5_digest": "6f19e5f67f04536ca6fd888e0184ca37",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.10",
            "size": 5103,
            "upload_time": "2024-10-08T20:12:17",
            "upload_time_iso_8601": "2024-10-08T20:12:17.951079Z",
            "url": "https://files.pythonhosted.org/packages/42/37/228b22cfbce8add192015473ca9ad404a4d3b5017f68171edb69f93d2b0b/broad_babel-0.1.27.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-08 20:12:17",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "broad-babel"
}

Alan Munoz