pycellbase


Namepycellbase JSON
Version 6.3.0 PyPI version JSON
download
home_pagehttps://github.com/opencb/cellbase/tree/develop/clients/python
SummaryPython client for CellBase
upload_time2024-10-15 17:20:29
maintainerNone
docs_urlNone
authorDaniel Perez-Gil
requires_pythonNone
licenseApache Software License
keywords opencb cellbase bioinformatics database
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. contents::

PyCellBase
==========

- PyCellBase is a Python package that provides programmatic access to the comprehensive RESTful web service API that has been implemented for the `CellBase`_ database, providing an easy, lightweight, fast and intuitive access to it.
- This package can be used to access to relevant biological information in a user-friendly way without the need of local databases installations.
- Data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance.
- PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications.
- More info about this package in the `Python client`_ section of the `CellBase Wiki`_

Installation
------------

Cloning
```````
PyCellBase can be cloned in your local machine by executing in your terminal::

   $ git clone https://github.com/opencb/cellbase.git

Once you have downloaded the project you can install the library::

   $ cd cellbase/clients/python
   $ python setup.py install

PyPI
````
PyCellBase is stored in PyPI and can be installed via pip::

   $ pip install pycellbase

REST client library
-------------------

PyCellBase consumes the RESTful web services provided by `CellBase`_, providing a simple and fast access to the database.
A series of clients and methods have been implemented to retrieve specific resources from the main features.

Getting started
```````````````
The first step is to import the module and initialize the **CellBaseClient**:

.. code-block:: python

    >>> from pycellbase.cbclient import CellBaseClient
    >>> cbc = CellBaseClient()

The second step is to create the **specific client** for the data we want to query (in this example we want to obtain information for a gene):

.. code-block:: python

   >>> gc = cbc.get_gene_client()

And now, you can start asking to the CellBase RESTful service by providing a **query ID**:

.. code-block:: python

    >>> tfbs_responses = gc.get_tfbs('BRCA1')  # Obtaining TFBS for this gene

Responses are retrieved as **JSON** formatted data. Therefore, fields can be queried by key:

.. code-block:: python

    >>> tfbs_responses = gc.get_tfbs('BRCA1')
    >>> tfbs_responses[0]['result'][0]['tfName']
    'E2F4'

    >>> transcript_responses = gc.get_transcript('BRCA1')
    >>> 'Number of transcripts: %d' % (len(transcript_responses[0]['result']))
    'Number of transcripts: 27'

    >>> for tfbs_response in gc.get_tfbs('BRCA1,BRCA2,LDLR'):
    ...     print('Number of TFBS for "%s": %d' % (tfbs_response['id'], len(tfbs_response['result'])))
    'Number of TFBS for "BRCA1": 175'
    'Number of TFBS for "BRCA2": 43'
    'Number of TFBS for "LDLR": 141'

Data can be accessed specifying **comma-separated IDs** or a **list of IDs**:

.. code-block:: python

    >>> tfbs_responses = gc.get_tfbs('BRCA1')
    >>> len(tfbs_responses)
    1

    >>> tfbs_responses = gc.get_tfbs('BRCA1,BRCA2')
    >>> len(tfbs_responses)
    2

    >>> tfbs_responses = gc.get_tfbs(['BRCA1', 'BRCA2'])
    >>> len(tfbs_responses)
    2

If there is an available resource in the CellBase Webservices, but there is not an available method in this python package, the CellBaseClient can be used to create the URL of interest and query the RESTful service:

.. code-block:: python

    >>> tfbs_responses = cbc.get(category='feature', subcategory='gene', query_id='BRCA1', resource='tfbs')
    >>> tfbs_responses[0]['result'][0]['tfName']
    'E2F4'

Optional **filters and extra options** can be added as key-value parameters (value can be a comma-separated string or a list):

.. code-block:: python

    >>> tfbs_responses = gc.get_tfbs('BRCA1')
    >>> len(res[0]['result'])
    175

    >>> tfbs_responses = gc.get_tfbs('BRCA1', include='name,id')  # Return only name and id
    >>> len(res[0]['result'])
    175

    >>> tfbs_responses = gc.get_tfbs('BRCA1', include=['name', 'id'])  # Return only name and id
    >>> len(res[0]['result'])
    175

    >>> tfbs_responses = gc.get_tfbs('BRCA1', **{'include': 'name,id'])  # Return only name and id
    >>> len(res[0]['result'])
    175

    >>> tfbs_responses = gc.get_tfbs('BRCA1', limit=100)  # Limit to 100 results
    >>> len(res[0]['result'])
    100

    >>> tfbs_responses = gc.get_tfbs('BRCA1', skip=100)  # Skip first 100 results
    >>> len(res[0]['result'])
    75

What can I ask for?
```````````````````
The best way to know which data can be retrieved for each client is either checking out the `RESTful web services`_ section of the CellBase Wiki or the `CellBase web services`_

Configuration
`````````````

Configuration stores the REST services host, API version and species.

Getting the **default configuration**:

.. code-block:: python

    >>> ConfigClient().get_default_configuration()
    {'version': 'v4',
     'species': 'hsapiens',
     'rest': {'hosts': ['http://bioinfo.hpc.cam.ac.uk:80/cellbase']}}


Showing the configuration parameters being used at the moment:

.. code-block:: python

    >>> cbc.show_configuration()
    {'host': 'bioinfo.hpc.cam.ac.uk:80/cellbase',
     'version': 'v4',
     'species': 'hsapiens'}

A **custom configuration** can be passed to CellBaseClient using a **ConfigClient object**. JSON and YAML files are supported:

.. code-block:: python

    >>> from pycellbase.cbconfig import ConfigClient
    >>> from pycellbase.cbclient import CellBaseClient

    >>> cc = ConfigClient('config.json')
    >>> cbc = CellBaseClient(cc)

A **custom configuration** can also be passed as a dictionary:

.. code-block:: python

    >>> from pycellbase.cbconfig import ConfigClient
    >>> from pycellbase.cbclient import CellBaseClient

    >>> custom_config = {'rest': {'hosts': ['bioinfo.hpc.cam.ac.uk:80/cellbase']}, 'version': 'v4', 'species': 'hsapiens'}
    >>> cc = ConfigClient(custom_config)
    >>> cbc = CellBaseClient(cc)

If you want to change the configuration **on the fly** you can directly modify the ConfigClient object:

.. code-block:: python

    >>> cc = ConfigClient()
    >>> cbc = CellBaseClient(cc)

    >>> cbc.show_configuration()['version']
    'v4'
    >>> cc.version = 'v3'
    >>> cbc.show_configuration()['version']
    'v3'

Use case
````````
A use case where PyCellBase is used to obtain multiple kinds of data from different sources can be found in this `Jupyter Notebook`_

Command-line tools
------------------

A command-line interface, called cbtools.py, has been implemented with several tools to ease and speed up frequently performed tasks in bioinformatics.
These tools make use of the REST client library and offer a further output processing to facilitate its analysis.

ID converter
````````````

This tool annotates genomic features with all their associated IDs, making use of 74 different sources for human, including most common databases such as Ensembl, NCBI, RefSeq, Reactome, OMIM, PDB, miRBase or UniProt among others.
In addition, it supports heterogeneous input files with IDs from different sources.

.. code-block:: bash

    $ cbtools.py xref file_with_ids.vcf > output.txt

HGVS calculator
```````````````

This tool annotates variants with their associated HGVS names.
Given a variant (in the format “chromosome:position:reference:alternate”), this tool returns all the associated HGVS names for many different types of reference sequence.

.. code-block:: bash

    $ cbtools.py hgvs 19:45411941:T:C

A file with multiple variants can also be used.

.. code-block:: bash

    $ cbtools.py hgvs file_with_variants.txt > output.txt

VCF annotator
`````````````

This tool takes a VCF file as input and returns it with its variants annotated with a broad range of information such as consequence types, population frequencies, overlapping sequence repeats, cytobands, gene expression, conservation scores, clinical significance (ClinVar, COSMIC, diseases and drugs), functional scores and more.

.. code-block:: bash

    $ cbtools.py annotation input.vcf > output.vcf



.. _CellBase: https://github.com/opencb/cellbase
.. _CellBase Wiki: https://github.com/opencb/cellbase/wiki
.. _Python client: https://github.com/opencb/cellbase/wiki/Python-client
.. _RESTful web services: https://github.com/opencb/cellbase/wiki/RESTful-web-services
.. _CellBase web services: http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
.. _Jupyter Notebook: http://nbviewer.jupyter.org/github/opencb/cellbase/blob/develop/clients/python/use_case.ipynb

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/opencb/cellbase/tree/develop/clients/python",
    "name": "pycellbase",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "opencb cellbase bioinformatics database",
    "author": "Daniel Perez-Gil",
    "author_email": "dperezgil89@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9e/ec/8f0739316f3e4adc859fc8bd240f52121cbbfe46bce309df471dfb887c20/pycellbase-6.3.0.tar.gz",
    "platform": null,
    "description": ".. contents::\n\nPyCellBase\n==========\n\n- PyCellBase is a Python package that provides programmatic access to the comprehensive RESTful web service API that has been implemented for the `CellBase`_ database, providing an easy, lightweight, fast and intuitive access to it.\n- This package can be used to access to relevant biological information in a user-friendly way without the need of local databases installations.\n- Data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance.\n- PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications.\n- More info about this package in the `Python client`_ section of the `CellBase Wiki`_\n\nInstallation\n------------\n\nCloning\n```````\nPyCellBase can be cloned in your local machine by executing in your terminal::\n\n   $ git clone https://github.com/opencb/cellbase.git\n\nOnce you have downloaded the project you can install the library::\n\n   $ cd cellbase/clients/python\n   $ python setup.py install\n\nPyPI\n````\nPyCellBase is stored in PyPI and can be installed via pip::\n\n   $ pip install pycellbase\n\nREST client library\n-------------------\n\nPyCellBase consumes the RESTful web services provided by `CellBase`_, providing a simple and fast access to the database.\nA series of clients and methods have been implemented to retrieve specific resources from the main features.\n\nGetting started\n```````````````\nThe first step is to import the module and initialize the **CellBaseClient**:\n\n.. code-block:: python\n\n    >>> from pycellbase.cbclient import CellBaseClient\n    >>> cbc = CellBaseClient()\n\nThe second step is to create the **specific client** for the data we want to query (in this example we want to obtain information for a gene):\n\n.. code-block:: python\n\n   >>> gc = cbc.get_gene_client()\n\nAnd now, you can start asking to the CellBase RESTful service by providing a **query ID**:\n\n.. code-block:: python\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1')  # Obtaining TFBS for this gene\n\nResponses are retrieved as **JSON** formatted data. Therefore, fields can be queried by key:\n\n.. code-block:: python\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1')\n    >>> tfbs_responses[0]['result'][0]['tfName']\n    'E2F4'\n\n    >>> transcript_responses = gc.get_transcript('BRCA1')\n    >>> 'Number of transcripts: %d' % (len(transcript_responses[0]['result']))\n    'Number of transcripts: 27'\n\n    >>> for tfbs_response in gc.get_tfbs('BRCA1,BRCA2,LDLR'):\n    ...     print('Number of TFBS for \"%s\": %d' % (tfbs_response['id'], len(tfbs_response['result'])))\n    'Number of TFBS for \"BRCA1\": 175'\n    'Number of TFBS for \"BRCA2\": 43'\n    'Number of TFBS for \"LDLR\": 141'\n\nData can be accessed specifying **comma-separated IDs** or a **list of IDs**:\n\n.. code-block:: python\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1')\n    >>> len(tfbs_responses)\n    1\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1,BRCA2')\n    >>> len(tfbs_responses)\n    2\n\n    >>> tfbs_responses = gc.get_tfbs(['BRCA1', 'BRCA2'])\n    >>> len(tfbs_responses)\n    2\n\nIf there is an available resource in the CellBase Webservices, but there is not an available method in this python package, the CellBaseClient can be used to create the URL of interest and query the RESTful service:\n\n.. code-block:: python\n\n    >>> tfbs_responses = cbc.get(category='feature', subcategory='gene', query_id='BRCA1', resource='tfbs')\n    >>> tfbs_responses[0]['result'][0]['tfName']\n    'E2F4'\n\nOptional **filters and extra options** can be added as key-value parameters (value can be a comma-separated string or a list):\n\n.. code-block:: python\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1')\n    >>> len(res[0]['result'])\n    175\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1', include='name,id')  # Return only name and id\n    >>> len(res[0]['result'])\n    175\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1', include=['name', 'id'])  # Return only name and id\n    >>> len(res[0]['result'])\n    175\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1', **{'include': 'name,id'])  # Return only name and id\n    >>> len(res[0]['result'])\n    175\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1', limit=100)  # Limit to 100 results\n    >>> len(res[0]['result'])\n    100\n\n    >>> tfbs_responses = gc.get_tfbs('BRCA1', skip=100)  # Skip first 100 results\n    >>> len(res[0]['result'])\n    75\n\nWhat can I ask for?\n```````````````````\nThe best way to know which data can be retrieved for each client is either checking out the `RESTful web services`_ section of the CellBase Wiki or the `CellBase web services`_\n\nConfiguration\n`````````````\n\nConfiguration stores the REST services host, API version and species.\n\nGetting the **default configuration**:\n\n.. code-block:: python\n\n    >>> ConfigClient().get_default_configuration()\n    {'version': 'v4',\n     'species': 'hsapiens',\n     'rest': {'hosts': ['http://bioinfo.hpc.cam.ac.uk:80/cellbase']}}\n\n\nShowing the configuration parameters being used at the moment:\n\n.. code-block:: python\n\n    >>> cbc.show_configuration()\n    {'host': 'bioinfo.hpc.cam.ac.uk:80/cellbase',\n     'version': 'v4',\n     'species': 'hsapiens'}\n\nA **custom configuration** can be passed to CellBaseClient using a **ConfigClient object**. JSON and YAML files are supported:\n\n.. code-block:: python\n\n    >>> from pycellbase.cbconfig import ConfigClient\n    >>> from pycellbase.cbclient import CellBaseClient\n\n    >>> cc = ConfigClient('config.json')\n    >>> cbc = CellBaseClient(cc)\n\nA **custom configuration** can also be passed as a dictionary:\n\n.. code-block:: python\n\n    >>> from pycellbase.cbconfig import ConfigClient\n    >>> from pycellbase.cbclient import CellBaseClient\n\n    >>> custom_config = {'rest': {'hosts': ['bioinfo.hpc.cam.ac.uk:80/cellbase']}, 'version': 'v4', 'species': 'hsapiens'}\n    >>> cc = ConfigClient(custom_config)\n    >>> cbc = CellBaseClient(cc)\n\nIf you want to change the configuration **on the fly** you can directly modify the ConfigClient object:\n\n.. code-block:: python\n\n    >>> cc = ConfigClient()\n    >>> cbc = CellBaseClient(cc)\n\n    >>> cbc.show_configuration()['version']\n    'v4'\n    >>> cc.version = 'v3'\n    >>> cbc.show_configuration()['version']\n    'v3'\n\nUse case\n````````\nA use case where PyCellBase is used to obtain multiple kinds of data from different sources can be found in this `Jupyter Notebook`_\n\nCommand-line tools\n------------------\n\nA command-line interface, called cbtools.py, has been implemented with several tools to ease and speed up frequently performed tasks in bioinformatics.\nThese tools make use of the REST client library and offer a further output processing to facilitate its analysis.\n\nID converter\n````````````\n\nThis tool annotates genomic features with all their associated IDs, making use of 74 different sources for human, including most common databases such as Ensembl, NCBI, RefSeq, Reactome, OMIM, PDB, miRBase or UniProt among others.\nIn addition, it supports heterogeneous input files with IDs from different sources.\n\n.. code-block:: bash\n\n    $ cbtools.py xref file_with_ids.vcf > output.txt\n\nHGVS calculator\n```````````````\n\nThis tool annotates variants with their associated HGVS names.\nGiven a variant (in the format \u201cchromosome:position:reference:alternate\u201d), this tool returns all the associated HGVS names for many different types of reference sequence.\n\n.. code-block:: bash\n\n    $ cbtools.py hgvs 19:45411941:T:C\n\nA file with multiple variants can also be used.\n\n.. code-block:: bash\n\n    $ cbtools.py hgvs file_with_variants.txt > output.txt\n\nVCF annotator\n`````````````\n\nThis tool takes a VCF file as input and returns it with its variants annotated with a broad range of information such as consequence types, population frequencies, overlapping sequence repeats, cytobands, gene expression, conservation scores, clinical significance (ClinVar, COSMIC, diseases and drugs), functional scores and more.\n\n.. code-block:: bash\n\n    $ cbtools.py annotation input.vcf > output.vcf\n\n\n\n.. _CellBase: https://github.com/opencb/cellbase\n.. _CellBase Wiki: https://github.com/opencb/cellbase/wiki\n.. _Python client: https://github.com/opencb/cellbase/wiki/Python-client\n.. _RESTful web services: https://github.com/opencb/cellbase/wiki/RESTful-web-services\n.. _CellBase web services: http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/\n.. _Jupyter Notebook: http://nbviewer.jupyter.org/github/opencb/cellbase/blob/develop/clients/python/use_case.ipynb\n",
    "bugtrack_url": null,
    "license": "Apache Software License",
    "summary": "Python client for CellBase",
    "version": "6.3.0",
    "project_urls": {
        "Bug Reports": "https://github.com/opencb/cellbase/issues",
        "CellBase": "https://github.com/opencb/cellbase",
        "CellBase Documentation": "http://docs.opencb.org/display/cellbase/CellBase+Home",
        "Documentation": "http://docs.opencb.org/display/cellbase/RESTful+Web+Services",
        "Homepage": "https://github.com/opencb/cellbase/tree/develop/clients/python",
        "Source": "https://github.com/opencb/cellbase/tree/develop/clients/python",
        "Tutorial": "http://docs.opencb.org/display/cellbase/Python+client+library"
    },
    "split_keywords": [
        "opencb",
        "cellbase",
        "bioinformatics",
        "database"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5f6c2c7be998495e7db7217dc2d1cba10c039ab34b3daf58f8052313bbe84872",
                "md5": "24968bd8fd0fd5413b518a443c53ee64",
                "sha256": "e7c8d2b76b0f1c2f074b9d0b5ad247b98d44a346e0a46980c4d6ebd21c0c7971"
            },
            "downloads": -1,
            "filename": "pycellbase-6.3.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "24968bd8fd0fd5413b518a443c53ee64",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 23774,
            "upload_time": "2024-10-15T17:20:28",
            "upload_time_iso_8601": "2024-10-15T17:20:28.168497Z",
            "url": "https://files.pythonhosted.org/packages/5f/6c/2c7be998495e7db7217dc2d1cba10c039ab34b3daf58f8052313bbe84872/pycellbase-6.3.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9eec8f0739316f3e4adc859fc8bd240f52121cbbfe46bce309df471dfb887c20",
                "md5": "f11bd2935c466e88e1ab10cf8f8eb3ac",
                "sha256": "f83edaa456b4a4c3a759ea73d035107074a028a571d80d1bb9ac8e68d78e74af"
            },
            "downloads": -1,
            "filename": "pycellbase-6.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f11bd2935c466e88e1ab10cf8f8eb3ac",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 27748,
            "upload_time": "2024-10-15T17:20:29",
            "upload_time_iso_8601": "2024-10-15T17:20:29.527696Z",
            "url": "https://files.pythonhosted.org/packages/9e/ec/8f0739316f3e4adc859fc8bd240f52121cbbfe46bce309df471dfb887c20/pycellbase-6.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-15 17:20:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "opencb",
    "github_project": "cellbase",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pycellbase"
}
        
Elapsed time: 0.30795s