rcsbsearchapi


Namercsbsearchapi JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/rcsb/py-rcsbsearchapi
SummaryPython package interface for the RCSB PDB search API service
upload_time2024-10-04 20:05:18
maintainerNone
docs_urlNone
authorDennis Piehl
requires_pythonNone
licenseBSD 3-Clause
keywords
VCS
bugtrack_url
requirements requests tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPi Release](https://img.shields.io/pypi/v/rcsbsearchapi.svg)](https://pypi.org/project/rcsbsearchapi/)
[![Build Status](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_apis/build/status/rcsb.py-rcsbsearchapi?branchName=master)](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=39&branchName=master)
[![Documentation Status](https://readthedocs.org/projects/rcsbsearchapi/badge/?version=latest)](https://rcsbsearchapi.readthedocs.io/en/latest/?badge=latest)
<a href="https://colab.research.google.com/github/rcsb/py-rcsbsearchapi/blob/master/notebooks/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# rcsbsearchapi

Python interface for the RCSB PDB Search API.

This package requires Python 3.7 or later.

# Quickstart

## Quickstart

## Installation

Get it from PyPI:

    pip install rcsbsearchapi

Or, download from [GitHub](https://github.com/rcsb/py-rcsbsearchapi)

## Getting Started
Full documentation available at [readthedocs](https://rcsbsearchapi.readthedocs.io/en/latest/index.html)

### Basic Query Construction

#### Full-text search
To perform a "full-text" search for structures associated with the term "Hemoglobin", you can create a `TextQuery`:

```python
from rcsbsearchapi import TextQuery

# Search for structures associated with the phrase "Hemoglobin"
query = TextQuery(value="Hemoglobin")

# Execute the query by running it as a function
results = query()

# Results are returned as an iterator of result identifiers.
for rid in results:
    print(rid)
```

#### Attribute search
To perform a search for specific structure or chemical attributes, you can create an `AttributeQuery`.

```python
from rcsbsearchapi import AttributeQuery

# Construct a query searching for structures from humans
query = AttributeQuery(
    attribute="rcsb_entity_source_organism.scientific_name",
    operator="exact_match",  # Other operators include "contains_phrase", "exists", and more
    value="Homo sapiens"
)

# Execute query and construct a list from results
results = list(query())
print(results)
```

Refer to the [Search Attributes](https://search.rcsb.org/structure-search-attributes.html) and [Chemical Attributes](https://search.rcsb.org/chemical-search-attributes.html) documentation for a full list of attributes and applicable operators.

Alternatively, you can also construct attribute queries with comparative operators using the `rcsb_attributes` object (which also allows for names to be tab-completed):

```python
from rcsbsearchapi import rcsb_attributes as attrs

# Search for structures from humans
query = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"

# Run query and construct a list from results
results = list(query())
print(results)
```

#### Grouping sub-queries

You can combine multiple queries using Python bitwise operators. 

```python
from rcsbsearchapi import rcsb_attributes as attrs

# Query for human epidermal growth factor receptor (EGFR) structures (UniProt ID P00533)
#  with investigational or experimental drugs bound
q1 = attrs.rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession == "P00533"
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"

# Structures matching UniProt ID P00533 AND from humans
#  AND (investigational OR experimental drug group)
query = q1 & q2 & (q3 | q4)

# Execute query and print first 10 ids
results = list(query())
print(results[:10])
```

These examples are in `operator` syntax. You can also make queries in `fluent` syntax. Learn more about both syntaxes and implementation details in [Constructing and Executing Queries](query_construction.md#constructing-and-executing-queries).

### Supported Search Services
The list of supported search service types are listed in the table below. For more details on their usage, see [Search Service Types](query_construction.md#search-service-types).

|Search service                    |QueryType                 |
|----------------------------------|--------------------------|
|Full-text                         |`TextQuery()`             |
|Attribute (structure or chemical) |`AttributeQuery()`        |
|Sequence similarity               |`SequenceQuery()`         |
|Sequence motif                    |`SequenceMotifQuery()`    |
|Structure similarity              |`StructSimilarityQuery()` |
|Structure motif                   |`StructMotifQuery()`      |
|Chemical similarity               |`ChemSimilarityQuery()`   |

Learn more about available search services on the [RCSB PDB Search API docs](https://search.rcsb.org/#search-services).

## Jupyter Notebooks
A runnable jupyter notebook is available in [notebooks/quickstart.ipynb](https://github.com/rcsb/py-rcsbsearchapi/blob/master/notebooks/quickstart.ipynb), or can be run online using Google Colab:
<a href="https://colab.research.google.com/github/rcsb/py-rcsbsearchapi/blob/master/notebooks/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

An additional Covid-19 related example is in [notebooks/covid.ipynb](https://github.com/rcsb/py-rcsbsearchapi/blob/master/notebooks/covid.ipynb):
<a href="https://colab.research.google.com/github//rcsb/py-rcsbsearchapi/blob/master/notebooks/covid.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Supported Features

The following table lists the status of current and planned features.

- [x] Structure and chemical attribute search
  - [x] Attribute Comparison operations
  - [x] Query set operations
  - [x] Attribute `contains`, `in_` (fluent only)
- [x] Option to include computed structure models (CSMs) in search
- [x] Sequence search
- [x] Sequence motif search
- [x] Structure similarity search
- [X] Structure motif search
- [X] Chemical similarity search
- [ ] Rich results using the Data API

Contributions are welcome for unchecked items!

## License

Code is licensed under the BSD 3-clause license. See [LICENSE](LICENSE) for details.

## Citing rcsbsearchapi

Please cite the rcsbsearchapi package by URL:

> https://rcsbsearchapi.readthedocs.io

You should also cite the RCSB PDB service this package utilizes:

> Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi
> Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley,
> John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards
> Integrated Searching and Efficient Access to Macromolecular Structure Data
> from the PDB Archive, Journal of Molecular Biology, 2020.
> DOI: [10.1016/j.jmb.2020.11.003](https://doi.org/10.1016/j.jmb.2020.11.003)

## Attributions

The source code for this project was originally written by [Spencer Bliven](https://github.com/sbliven) and forked from [sbliven/rcsbsearch](https://github.com/sbliven/rcsbsearch). We would like to express our tremendous gratitude for his generous efforts in designing such a comprehensive public utility Python package for interacting with the RCSB PDB search API.

## Developers

For information about building and developing `rcsbsearchapi`, see
[CONTRIBUTING.md](CONTRIBUTING.md)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rcsb/py-rcsbsearchapi",
    "name": "rcsbsearchapi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Dennis Piehl",
    "author_email": "dennis.piehl@rcsb.org",
    "download_url": "https://files.pythonhosted.org/packages/86/91/a18789016a05d76eacc3a3352376a0684dcf11440441d70827a4597339e6/rcsbsearchapi-2.0.0.tar.gz",
    "platform": null,
    "description": "[![PyPi Release](https://img.shields.io/pypi/v/rcsbsearchapi.svg)](https://pypi.org/project/rcsbsearchapi/)\n[![Build Status](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_apis/build/status/rcsb.py-rcsbsearchapi?branchName=master)](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=39&branchName=master)\n[![Documentation Status](https://readthedocs.org/projects/rcsbsearchapi/badge/?version=latest)](https://rcsbsearchapi.readthedocs.io/en/latest/?badge=latest)\n<a href=\"https://colab.research.google.com/github/rcsb/py-rcsbsearchapi/blob/master/notebooks/quickstart.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\n# rcsbsearchapi\n\nPython interface for the RCSB PDB Search API.\n\nThis package requires Python 3.7 or later.\n\n# Quickstart\n\n## Quickstart\n\n## Installation\n\nGet it from PyPI:\n\n    pip install rcsbsearchapi\n\nOr, download from [GitHub](https://github.com/rcsb/py-rcsbsearchapi)\n\n## Getting Started\nFull documentation available at [readthedocs](https://rcsbsearchapi.readthedocs.io/en/latest/index.html)\n\n### Basic Query Construction\n\n#### Full-text search\nTo perform a \"full-text\" search for structures associated with the term \"Hemoglobin\", you can create a `TextQuery`:\n\n```python\nfrom rcsbsearchapi import TextQuery\n\n# Search for structures associated with the phrase \"Hemoglobin\"\nquery = TextQuery(value=\"Hemoglobin\")\n\n# Execute the query by running it as a function\nresults = query()\n\n# Results are returned as an iterator of result identifiers.\nfor rid in results:\n    print(rid)\n```\n\n#### Attribute search\nTo perform a search for specific structure or chemical attributes, you can create an `AttributeQuery`.\n\n```python\nfrom rcsbsearchapi import AttributeQuery\n\n# Construct a query searching for structures from humans\nquery = AttributeQuery(\n    attribute=\"rcsb_entity_source_organism.scientific_name\",\n    operator=\"exact_match\",  # Other operators include \"contains_phrase\", \"exists\", and more\n    value=\"Homo sapiens\"\n)\n\n# Execute query and construct a list from results\nresults = list(query())\nprint(results)\n```\n\nRefer to the [Search Attributes](https://search.rcsb.org/structure-search-attributes.html) and [Chemical Attributes](https://search.rcsb.org/chemical-search-attributes.html) documentation for a full list of attributes and applicable operators.\n\nAlternatively, you can also construct attribute queries with comparative operators using the `rcsb_attributes` object (which also allows for names to be tab-completed):\n\n```python\nfrom rcsbsearchapi import rcsb_attributes as attrs\n\n# Search for structures from humans\nquery = attrs.rcsb_entity_source_organism.scientific_name == \"Homo sapiens\"\n\n# Run query and construct a list from results\nresults = list(query())\nprint(results)\n```\n\n#### Grouping sub-queries\n\nYou can combine multiple queries using Python bitwise operators. \n\n```python\nfrom rcsbsearchapi import rcsb_attributes as attrs\n\n# Query for human epidermal growth factor receptor (EGFR) structures (UniProt ID P00533)\n#  with investigational or experimental drugs bound\nq1 = attrs.rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession == \"P00533\"\nq2 = attrs.rcsb_entity_source_organism.scientific_name == \"Homo sapiens\"\nq3 = attrs.drugbank_info.drug_groups == \"investigational\"\nq4 = attrs.drugbank_info.drug_groups == \"experimental\"\n\n# Structures matching UniProt ID P00533 AND from humans\n#  AND (investigational OR experimental drug group)\nquery = q1 & q2 & (q3 | q4)\n\n# Execute query and print first 10 ids\nresults = list(query())\nprint(results[:10])\n```\n\nThese examples are in `operator` syntax. You can also make queries in `fluent` syntax. Learn more about both syntaxes and implementation details in [Constructing and Executing Queries](query_construction.md#constructing-and-executing-queries).\n\n### Supported Search Services\nThe list of supported search service types are listed in the table below. For more details on their usage, see [Search Service Types](query_construction.md#search-service-types).\n\n|Search service                    |QueryType                 |\n|----------------------------------|--------------------------|\n|Full-text                         |`TextQuery()`             |\n|Attribute (structure or chemical) |`AttributeQuery()`        |\n|Sequence similarity               |`SequenceQuery()`         |\n|Sequence motif                    |`SequenceMotifQuery()`    |\n|Structure similarity              |`StructSimilarityQuery()` |\n|Structure motif                   |`StructMotifQuery()`      |\n|Chemical similarity               |`ChemSimilarityQuery()`   |\n\nLearn more about available search services on the [RCSB PDB Search API docs](https://search.rcsb.org/#search-services).\n\n## Jupyter Notebooks\nA runnable jupyter notebook is available in [notebooks/quickstart.ipynb](https://github.com/rcsb/py-rcsbsearchapi/blob/master/notebooks/quickstart.ipynb), or can be run online using Google Colab:\n<a href=\"https://colab.research.google.com/github/rcsb/py-rcsbsearchapi/blob/master/notebooks/quickstart.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\nAn additional Covid-19 related example is in [notebooks/covid.ipynb](https://github.com/rcsb/py-rcsbsearchapi/blob/master/notebooks/covid.ipynb):\n<a href=\"https://colab.research.google.com/github//rcsb/py-rcsbsearchapi/blob/master/notebooks/covid.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\n\n## Supported Features\n\nThe following table lists the status of current and planned features.\n\n- [x] Structure and chemical attribute search\n  - [x] Attribute Comparison operations\n  - [x] Query set operations\n  - [x] Attribute `contains`, `in_` (fluent only)\n- [x] Option to include computed structure models (CSMs) in search\n- [x] Sequence search\n- [x] Sequence motif search\n- [x] Structure similarity search\n- [X] Structure motif search\n- [X] Chemical similarity search\n- [ ] Rich results using the Data API\n\nContributions are welcome for unchecked items!\n\n## License\n\nCode is licensed under the BSD 3-clause license. See [LICENSE](LICENSE) for details.\n\n## Citing rcsbsearchapi\n\nPlease cite the rcsbsearchapi package by URL:\n\n> https://rcsbsearchapi.readthedocs.io\n\nYou should also cite the RCSB PDB service this package utilizes:\n\n> Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi\n> Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley,\n> John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards\n> Integrated Searching and Efficient Access to Macromolecular Structure Data\n> from the PDB Archive, Journal of Molecular Biology, 2020.\n> DOI: [10.1016/j.jmb.2020.11.003](https://doi.org/10.1016/j.jmb.2020.11.003)\n\n## Attributions\n\nThe source code for this project was originally written by [Spencer Bliven](https://github.com/sbliven) and forked from [sbliven/rcsbsearch](https://github.com/sbliven/rcsbsearch). We would like to express our tremendous gratitude for his generous efforts in designing such a comprehensive public utility Python package for interacting with the RCSB PDB search API.\n\n## Developers\n\nFor information about building and developing `rcsbsearchapi`, see\n[CONTRIBUTING.md](CONTRIBUTING.md)\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause",
    "summary": "Python package interface for the RCSB PDB search API service",
    "version": "2.0.0",
    "project_urls": {
        "Homepage": "https://github.com/rcsb/py-rcsbsearchapi"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8691a18789016a05d76eacc3a3352376a0684dcf11440441d70827a4597339e6",
                "md5": "c099fa24a25e45673c3b03755db9d563",
                "sha256": "de5b46d2f5b75539860ac65bd9c47ad1b834feb743048acae5a8296a073edcfe"
            },
            "downloads": -1,
            "filename": "rcsbsearchapi-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c099fa24a25e45673c3b03755db9d563",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 181655,
            "upload_time": "2024-10-04T20:05:18",
            "upload_time_iso_8601": "2024-10-04T20:05:18.824125Z",
            "url": "https://files.pythonhosted.org/packages/86/91/a18789016a05d76eacc3a3352376a0684dcf11440441d70827a4597339e6/rcsbsearchapi-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-04 20:05:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rcsb",
    "github_project": "py-rcsbsearchapi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": []
        }
    ],
    "tox": true,
    "lcname": "rcsbsearchapi"
}
        
Elapsed time: 1.47547s