[](https://pypi.org/project/rcsb-api/)
[](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=40&branchName=master)
[](https://rcsbapi.readthedocs.io/en/latest/?badge=latest)
[](https://doi.org/10.5281/zenodo.14052470)
[](https://www.bestpractices.dev/projects/10424)
[](https://fairsoftwarechecklist.net/v0.2?f=31&a=30112&i=32111&r=133)
[](https://fair-software.eu)
# <img src="https://github.com/user-attachments/assets/248d3e32-7644-46b2-bf18-b5248c9e6305" height="160"/> *rcsb-api*: Python Toolkit for Accessing RCSB.org APIs
Python interface for RCSB Protein Data Bank API services at [RCSB.org](https://www.rcsb.org/).
## Installation
This package requires Python 3.8 or later.
Get it from PyPI:
pip install rcsb-api
Or, download from [GitHub](https://github.com/rcsb/py-rcsb-api/) and install locally:
git clone https://github.com/rcsb/py-rcsb-api.git
cd py-rcsb-api
pip install .
## Getting Started
Full documentation available at [readthedocs](https://rcsbapi.readthedocs.io/en/latest/).
The [RCSB PDB Search API](https://search.rcsb.org) supports RESTful requests according to a defined [schema](https://search.rcsb.org/redoc/index.html). This package provides an `rcsbapi.search` module that simplifies generating complex search queries.
The [RCSB PDB Data API](https://data.rcsb.org) supports requests using [GraphQL](https://graphql.org/), a language for API queries. This package provides an `rcsbapi.data` module that simplifies generating queries in GraphQL syntax.
### Search API
The `rcsbapi.search` module supports all available [Advanced Search](https://www.rcsb.org/search/advanced) services, as listed below. For more details on their usage, see [Search Service Types](https://rcsbapi.readthedocs.io/en/latest/search_api/query_construction.html#search-service-types).
|Search service |QueryType |
|----------------------------------|--------------------------|
|Full-text |`TextQuery()` |
|Attribute (structure or chemical) |`AttributeQuery()` |
|Sequence similarity |`SeqSimilarityQuery()` |
|Sequence motif |`SeqMotifQuery()` |
|Structure similarity |`StructSimilarityQuery()` |
|Structure motif |`StructMotifQuery()` |
|Chemical similarity |`ChemSimilarityQuery()` |
#### Search API Examples
To perform a search for all structures from humans associated with the term "Hemoglobin", you can combine a "full-text" query (`TextQuery`) with an "attribute" query (`AttributeQuery`):
```python
from rcsbapi.search import AttributeQuery, TextQuery
from rcsbapi.search import search_attributes as attrs
# Construct a "full-text" sub-query for structures associated with the term "Hemoglobin"
q1 = TextQuery(value="Hemoglobin")
# Construct an "attribute" sub-query to search for structures from humans
q2 = AttributeQuery(
attribute="rcsb_entity_source_organism.scientific_name",
operator="exact_match", # Other operators include "contains_phrase", "exists", and more
value="Homo sapiens"
)
# OR, do so by using Python bitwise operators:
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
# Combine the sub-queries (can sub-group using parentheses and standard operators, "&", "|", etc.)
query = q1 & q2
# Fetch the results by iterating over the query execution
for rId in query():
print(rId)
# OR, capture them into a variable
results = list(query())
```
These examples are in `operator` syntax. You can also make queries in `fluent` syntax. Learn more about both syntaxes and implementation details in [Query Syntax and Execution](https://rcsbapi.readthedocs.io/en/latest/search_api/query_construction.html#query-syntax-and-execution).
### Data API
The `rcsbapi.data` module allows you to easily construct GraphQL queries to the RCSB.org Data API.
This is done by specifying the following input:
- "input_type": the data hierarchy level you are starting from (e.g., "entry", "polymer_entity", etc.) (See full list [here](https://rcsbapi.readthedocs.io/en/latest/data_api/query_construction.html#input-type)).
- "input_ids": the list of IDs for which to fetch data (corresponding to the specified "input_type")
- "return_data_list": the list of data items ("fields") to retrieve. (Available fields can be explored [here](https://data.rcsb.org/data-attributes.html) or via the [GraphiQL editor's Documentation Explorer panel](https://data.rcsb.org/graphql/index.html).)
#### Data API Examples
This is a [simple query](https://data.rcsb.org/graphql/index.html?query=%7B%0A%20%20entry(entry_id%3A%20%224HHB%22)%20%7B%0A%20%20%20%20exptl%20%7B%0A%20%20%20%20%20%20method%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D) requesting the experimental method of a structure with PDB ID 4HHB (Hemoglobin).
The query must be executed using the `.exec()` method, which will return the JSON response as well as store the response as an attribute of the `DataQuery` object. From the object, you can access the Data API response, get an interactive editor link, or access the arguments used to create the query.
The package is able to automatically build queries based on the "input_type" and path segment passed into "return_data_list". If using this package in code intended for long-term use, it's recommended to use fully qualified paths. When autocompletion is being used, an WARNING message will be printed out as a reminder.
```python
from rcsbapi.data import DataQuery as Query
query = Query(
input_type="entries",
input_ids=["4HHB"],
return_data_list=["exptl.method"]
)
print(query.exec())
```
Data is returned in JSON format
```json
{
"data": {
"entries": [
{
"rcsb_id": "4HHB",
"exptl": [
{
"method": "X-RAY DIFFRACTION"
}
]
}
]
}
}
```
Here is a [more complex query](https://data.rcsb.org/graphql/index.html?query=%7B%0A%20%20polymer_entities(entity_ids%3A%5B%222CPK_1%22%2C%223WHM_1%22%2C%222D5Z_1%22%5D)%20%7B%0A%20%20%20%20rcsb_id%0A%20%20%20%20rcsb_entity_source_organism%20%7B%0A%20%20%20%20%20%20ncbi_taxonomy_id%0A%20%20%20%20%20%20ncbi_scientific_name%0A%20%20%20%20%7D%0A%20%20%20%20rcsb_cluster_membership%20%7B%0A%20%20%20%20%20%20cluster_id%0A%20%20%20%20%20%20identity%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D). Note that periods can be used to further specify requested data in return_data_list. Also note multiple return data items and ids can be requested in one query.
```python
from rcsbapi.data import DataQuery as Query
query = Query(
input_type="polymer_entities",
input_ids=["2CPK_1", "3WHM_1", "2D5Z_1"],
return_data_list=[
"polymer_entities.rcsb_id",
"rcsb_entity_source_organism.ncbi_taxonomy_id",
"rcsb_entity_source_organism.ncbi_scientific_name",
"cluster_id",
"identity"
]
)
print(query.exec())
```
## Jupyter Notebooks
Several Jupyter notebooks with example use cases and workflows for all package modules are provided under [notebooks](notebooks/).
For example, one notebook using both Search and Data API packages for a COVID-19 related example is available in [notebooks/search_data_workflow.ipynb](notebooks/search_data_workflow.ipynb) or online through Google Colab <a href="https://colab.research.google.com/github/rcsb/py-rcsb-api/blob/master/notebooks/search_data_workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>.
## Citing
Please cite the ``rcsb-api`` package with the following reference:
> Dennis W. Piehl, Brinda Vallat, Ivana Truong, Habiba Morsy, Rusham Bhatt,
> Santiago Blaumann, Pratyoy Biswas, Yana Rose, Sebastian Bittrich, Jose M. Duarte,
> Joan Segura, Chunxiao Bi, Douglas Myers-Turnbull, Brian P. Hudson, Christine Zardecki,
> Stephen K. Burley. rcsb-api: Python Toolkit for Streamlining Access to RCSB Protein
> Data Bank APIs, Journal of Molecular Biology, 2025.
> DOI: [10.1016/j.jmb.2025.168970](https://doi.org/10.1016/j.jmb.2025.168970)
You should also cite the RCSB.org API services this package utilizes:
> Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi
> Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley,
> John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards
> Integrated Searching and Efficient Access to Macromolecular Structure Data
> from the PDB Archive, Journal of Molecular Biology, 2020.
> DOI: [10.1016/j.jmb.2020.11.003](https://doi.org/10.1016/j.jmb.2020.11.003)
## Documentation and Support
Please refer to the [readthedocs page](https://rcsbapi.readthedocs.io/en/latest/index.html) to learn more about package usage and other available features as well as to see more examples.
If you experience any issues installing or using the package, please submit an issue on [GitHub](https://github.com/rcsb/py-rcsb-api/issues) and we will try to respond in a timely manner.
Raw data
{
"_id": null,
"home_page": "https://github.com/rcsb/py-rcsb-api",
"name": "rcsb-api",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Dennis Piehl",
"author_email": "dennis.piehl@rcsb.org",
"download_url": "https://files.pythonhosted.org/packages/1a/5a/f515a135ae2d3f626e0dc5f0afb3d4987091765304b2fe4b529d848a3ef5/rcsb_api-1.3.0.tar.gz",
"platform": null,
"description": "[](https://pypi.org/project/rcsb-api/)\n[](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=40&branchName=master)\n[](https://rcsbapi.readthedocs.io/en/latest/?badge=latest)\n[](https://doi.org/10.5281/zenodo.14052470)\n[](https://www.bestpractices.dev/projects/10424)\n[](https://fairsoftwarechecklist.net/v0.2?f=31&a=30112&i=32111&r=133)\n[](https://fair-software.eu)\n\n\n# <img src=\"https://github.com/user-attachments/assets/248d3e32-7644-46b2-bf18-b5248c9e6305\" height=\"160\"/> *rcsb-api*: Python Toolkit for Accessing RCSB.org APIs\nPython interface for RCSB Protein Data Bank API services at [RCSB.org](https://www.rcsb.org/).\n\n## Installation\nThis package requires Python 3.8 or later.\n\nGet it from PyPI:\n\n pip install rcsb-api\n\nOr, download from [GitHub](https://github.com/rcsb/py-rcsb-api/) and install locally:\n\n git clone https://github.com/rcsb/py-rcsb-api.git\n cd py-rcsb-api\n pip install .\n\n## Getting Started\nFull documentation available at [readthedocs](https://rcsbapi.readthedocs.io/en/latest/).\n\nThe [RCSB PDB Search API](https://search.rcsb.org) supports RESTful requests according to a defined [schema](https://search.rcsb.org/redoc/index.html). This package provides an `rcsbapi.search` module that simplifies generating complex search queries.\n\nThe [RCSB PDB Data API](https://data.rcsb.org) supports requests using [GraphQL](https://graphql.org/), a language for API queries. This package provides an `rcsbapi.data` module that simplifies generating queries in GraphQL syntax.\n\n### Search API\nThe `rcsbapi.search` module supports all available [Advanced Search](https://www.rcsb.org/search/advanced) services, as listed below. For more details on their usage, see [Search Service Types](https://rcsbapi.readthedocs.io/en/latest/search_api/query_construction.html#search-service-types).\n\n|Search service |QueryType |\n|----------------------------------|--------------------------|\n|Full-text |`TextQuery()` |\n|Attribute (structure or chemical) |`AttributeQuery()` |\n|Sequence similarity |`SeqSimilarityQuery()` |\n|Sequence motif |`SeqMotifQuery()` |\n|Structure similarity |`StructSimilarityQuery()` |\n|Structure motif |`StructMotifQuery()` |\n|Chemical similarity |`ChemSimilarityQuery()` |\n\n#### Search API Examples\nTo perform a search for all structures from humans associated with the term \"Hemoglobin\", you can combine a \"full-text\" query (`TextQuery`) with an \"attribute\" query (`AttributeQuery`):\n\n```python\nfrom rcsbapi.search import AttributeQuery, TextQuery\nfrom rcsbapi.search import search_attributes as attrs\n\n# Construct a \"full-text\" sub-query for structures associated with the term \"Hemoglobin\"\nq1 = TextQuery(value=\"Hemoglobin\")\n\n# Construct an \"attribute\" sub-query to search for structures from humans\nq2 = AttributeQuery(\n attribute=\"rcsb_entity_source_organism.scientific_name\",\n operator=\"exact_match\", # Other operators include \"contains_phrase\", \"exists\", and more\n value=\"Homo sapiens\"\n)\n# OR, do so by using Python bitwise operators:\nq2 = attrs.rcsb_entity_source_organism.scientific_name == \"Homo sapiens\"\n\n# Combine the sub-queries (can sub-group using parentheses and standard operators, \"&\", \"|\", etc.)\nquery = q1 & q2\n\n# Fetch the results by iterating over the query execution\nfor rId in query():\n print(rId)\n\n# OR, capture them into a variable\nresults = list(query())\n```\n\nThese examples are in `operator` syntax. You can also make queries in `fluent` syntax. Learn more about both syntaxes and implementation details in [Query Syntax and Execution](https://rcsbapi.readthedocs.io/en/latest/search_api/query_construction.html#query-syntax-and-execution).\n\n\n### Data API\nThe `rcsbapi.data` module allows you to easily construct GraphQL queries to the RCSB.org Data API.\n\nThis is done by specifying the following input:\n- \"input_type\": the data hierarchy level you are starting from (e.g., \"entry\", \"polymer_entity\", etc.) (See full list [here](https://rcsbapi.readthedocs.io/en/latest/data_api/query_construction.html#input-type)).\n- \"input_ids\": the list of IDs for which to fetch data (corresponding to the specified \"input_type\")\n- \"return_data_list\": the list of data items (\"fields\") to retrieve. (Available fields can be explored [here](https://data.rcsb.org/data-attributes.html) or via the [GraphiQL editor's Documentation Explorer panel](https://data.rcsb.org/graphql/index.html).)\n\n#### Data API Examples\nThis is a [simple query](https://data.rcsb.org/graphql/index.html?query=%7B%0A%20%20entry(entry_id%3A%20%224HHB%22)%20%7B%0A%20%20%20%20exptl%20%7B%0A%20%20%20%20%20%20method%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D) requesting the experimental method of a structure with PDB ID 4HHB (Hemoglobin).\n\nThe query must be executed using the `.exec()` method, which will return the JSON response as well as store the response as an attribute of the `DataQuery` object. From the object, you can access the Data API response, get an interactive editor link, or access the arguments used to create the query.\nThe package is able to automatically build queries based on the \"input_type\" and path segment passed into \"return_data_list\". If using this package in code intended for long-term use, it's recommended to use fully qualified paths. When autocompletion is being used, an WARNING message will be printed out as a reminder.\n\n```python\nfrom rcsbapi.data import DataQuery as Query\nquery = Query(\n input_type=\"entries\",\n input_ids=[\"4HHB\"],\n return_data_list=[\"exptl.method\"]\n)\nprint(query.exec())\n```\nData is returned in JSON format\n```json\n{\n \"data\": {\n \"entries\": [\n {\n \"rcsb_id\": \"4HHB\",\n \"exptl\": [\n {\n \"method\": \"X-RAY DIFFRACTION\"\n }\n ]\n }\n ]\n }\n}\n```\n\nHere is a [more complex query](https://data.rcsb.org/graphql/index.html?query=%7B%0A%20%20polymer_entities(entity_ids%3A%5B%222CPK_1%22%2C%223WHM_1%22%2C%222D5Z_1%22%5D)%20%7B%0A%20%20%20%20rcsb_id%0A%20%20%20%20rcsb_entity_source_organism%20%7B%0A%20%20%20%20%20%20ncbi_taxonomy_id%0A%20%20%20%20%20%20ncbi_scientific_name%0A%20%20%20%20%7D%0A%20%20%20%20rcsb_cluster_membership%20%7B%0A%20%20%20%20%20%20cluster_id%0A%20%20%20%20%20%20identity%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D). Note that periods can be used to further specify requested data in return_data_list. Also note multiple return data items and ids can be requested in one query.\n```python\nfrom rcsbapi.data import DataQuery as Query\nquery = Query(\n input_type=\"polymer_entities\",\n input_ids=[\"2CPK_1\", \"3WHM_1\", \"2D5Z_1\"],\n return_data_list=[\n \"polymer_entities.rcsb_id\",\n \"rcsb_entity_source_organism.ncbi_taxonomy_id\",\n \"rcsb_entity_source_organism.ncbi_scientific_name\",\n \"cluster_id\",\n \"identity\"\n ]\n)\nprint(query.exec())\n```\n\n## Jupyter Notebooks\nSeveral Jupyter notebooks with example use cases and workflows for all package modules are provided under [notebooks](notebooks/).\n\nFor example, one notebook using both Search and Data API packages for a COVID-19 related example is available in [notebooks/search_data_workflow.ipynb](notebooks/search_data_workflow.ipynb) or online through Google Colab <a href=\"https://colab.research.google.com/github/rcsb/py-rcsb-api/blob/master/notebooks/search_data_workflow.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>.\n\n\n## Citing\nPlease cite the ``rcsb-api`` package with the following reference:\n\n> Dennis W. Piehl, Brinda Vallat, Ivana Truong, Habiba Morsy, Rusham Bhatt, \n> Santiago Blaumann, Pratyoy Biswas, Yana Rose, Sebastian Bittrich, Jose M. Duarte,\n> Joan Segura, Chunxiao Bi, Douglas Myers-Turnbull, Brian P. Hudson, Christine Zardecki,\n> Stephen K. Burley. rcsb-api: Python Toolkit for Streamlining Access to RCSB Protein \n> Data Bank APIs, Journal of Molecular Biology, 2025.\n> DOI: [10.1016/j.jmb.2025.168970](https://doi.org/10.1016/j.jmb.2025.168970)\n\nYou should also cite the RCSB.org API services this package utilizes:\n\n> Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi\n> Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley,\n> John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards\n> Integrated Searching and Efficient Access to Macromolecular Structure Data\n> from the PDB Archive, Journal of Molecular Biology, 2020.\n> DOI: [10.1016/j.jmb.2020.11.003](https://doi.org/10.1016/j.jmb.2020.11.003)\n\n\n## Documentation and Support\nPlease refer to the [readthedocs page](https://rcsbapi.readthedocs.io/en/latest/index.html) to learn more about package usage and other available features as well as to see more examples.\n\nIf you experience any issues installing or using the package, please submit an issue on [GitHub](https://github.com/rcsb/py-rcsb-api/issues) and we will try to respond in a timely manner.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python package interface for RCSB.org API services",
"version": "1.3.0",
"project_urls": {
"Homepage": "https://github.com/rcsb/py-rcsb-api"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1a5af515a135ae2d3f626e0dc5f0afb3d4987091765304b2fe4b529d848a3ef5",
"md5": "f8d6b6ecfae32f014aface43a91eb237",
"sha256": "827f9faff30e1fa9565d5ae7fca1faa6ce5b7f32cff65f30b597711a03f2ec9d"
},
"downloads": -1,
"filename": "rcsb_api-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "f8d6b6ecfae32f014aface43a91eb237",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 83267,
"upload_time": "2025-07-23T20:25:53",
"upload_time_iso_8601": "2025-07-23T20:25:53.095940Z",
"url": "https://files.pythonhosted.org/packages/1a/5a/f515a135ae2d3f626e0dc5f0afb3d4987091765304b2fe4b529d848a3ef5/rcsb_api-1.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-23 20:25:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rcsb",
"github_project": "py-rcsb-api",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "requests",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "rustworkx",
"specs": []
},
{
"name": "graphql-core",
"specs": []
},
{
"name": "tqdm",
"specs": []
}
],
"tox": true,
"lcname": "rcsb-api"
}