rcsb.exdb


Namercsb.exdb JSON
Version 1.27 PyPI version JSON
download
home_pagehttps://github.com/rcsb/py-rcsb_exdb
SummaryRCSB Python ExDB data extraction and loading workflows
upload_time2025-01-23 20:10:39
maintainerNone
docs_urlNone
authorJohn Westbrook
requires_pythonNone
licenseApache 2.0
keywords
VCS
bugtrack_url
requirements OpenEye-toolkits numpy jsonschema rcsb.utils.io rcsb.db rcsb.utils.chem rcsb.utils.chemref rcsb.utils.citation rcsb.utils.config rcsb.utils.ec rcsb.utils.go rcsb.utils.seq rcsb.utils.seqalign rcsb.utils.targets rcsb.utils.struct rcsb.utils.taxonomy rcsb.utils.dictionary rcsb.workflow statistics
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # py-rcsb_exdb

[![Build Status](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_apis/build/status/rcsb.py-rcsb_exdb?branchName=master)](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=18&branchName=master)

RCSB exchange database extraction and loading workflow tools

## Introduction

This module contains a collection of utility classes for extracting data from
the RCSB exchange database and subsequently reloading processed or integrated data.

### Installation

Download the library source software from the project repository:

```bash

git clone --recurse-submodules https://github.com/rcsb/py-rcsb_exdb.git

```

Optionally, run test suite (Python versions 2.7 and 3.7) using
[setuptools](https://setuptools.readthedocs.io/en/latest/) or
[tox](http://tox.readthedocs.io/en/latest/example/platform.html):

```bash
python setup.py test

or simply run

tox
```

Installation is via the program [pip](https://pypi.python.org/pypi/pip).  To run tests
from the source tree, the package must be installed in editable mode (i.e. -e):

```bash
pip install -r requirements.txt   # OR:   pip install -i https://pypi.anaconda.org/OpenEye/simple OpenEye-toolkits

pip install -e .
```

#### Installing in Ubuntu Linux (tested in 18.04)

You will need a few packages, before `pip install .` can work:

```bash

sudo apt install flex bison

```

### Installing on macOS

To use and develop this package on macOS requires a number of packages that are not
distributed as part of the base macOS operating system.
The following steps provide one approach to creating the development environment for this
package.  First, install the Apple [XCode](https://developer.apple.com/xcode/) package and associate command-line tools.
This will provide essential compilers and supporting tools.  The [HomeBrew](https://brew.sh/) package
manager provides further access to a variety of common open source services and tools.
Follow the instructions provided by at the [HomeBrew](https://brew.sh/) site to
install this system.   Once HomeBrew is installed, you can further install the
[MongoDB](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/) packages which
are required to support the ExDB tools.  HomeBrew also provides a variety of options for
managing a [Python virtual environments](https://gist.github.com/Geoyi/f55ed54d24cc9ff1c14bd95fac21c042).

### Command Line Interfaces

A convenience CLI `exdb_exec_cli` is provided for performing update and loading operations.

```bash
exdb_exec_cli --help

usage: exdb_exec_cli [-h] [--data_set_id DATA_SET_ID] [--full] [--etl_chemref]
                     [--etl_tree_node_lists] [--config_path CONFIG_PATH]
                     [--config_name CONFIG_NAME] [--db_type DB_TYPE]
                     [--read_back_check] [--num_proc NUM_PROC]
                     [--chunk_size CHUNK_SIZE]
                     [--document_limit DOCUMENT_LIMIT] [--debug] [--mock]
                     [--cache_path CACHE_PATH] [--rebuild_cache]

optional arguments:
  -h, --help            show this help message and exit
  --data_set_id DATA_SET_ID
                        Data set identifier (default= 2019_14 for current
                        week)
  --full                Fresh full load in a new tables/collections (Default)
  --etl_chemref         ETL integrated chemical reference data
  --etl_tree_node_lists
                        ETL tree node lists
  --config_path CONFIG_PATH
                        Path to configuration options file
  --config_name CONFIG_NAME
                        Configuration section name
  --db_type DB_TYPE     Database server type (default=mongo)
  --read_back_check     Perform read back check on all documents
  --num_proc NUM_PROC   Number of processes to execute (default=2)
  --chunk_size CHUNK_SIZE
                        Number of files loaded per process
  --document_limit DOCUMENT_LIMIT
                        Load document limit for testing
  --debug               Turn on verbose logging
  --mock                Use MOCK repository configuration for testing
  --cache_path CACHE_PATH
                        Top cache path for external and local resource files
  --rebuild_cache       Rebuild cached files from remote resources
________________________________________________________________________________

```

For example, to construct and load tree nodes list data collections, the following
command may be used:

```bash
exdb_exec_cli --mock --full --etl_tree_node_lists --rebuild_cache \
              --cache_path ./CACHE  \
              --config_path ./rcsb/mock-data/config/dbload-setup-example.yml \
              --config_name site_info_configuration >& LOGTREE \
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rcsb/py-rcsb_exdb",
    "name": "rcsb.exdb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "John Westbrook",
    "author_email": "john.westbrook@rcsb.org",
    "download_url": "https://files.pythonhosted.org/packages/8f/27/a925e29ea732256676e7b0a3f488de552471a7feff8fd5827dc00ef5f5be/rcsb_exdb-1.27.tar.gz",
    "platform": null,
    "description": "# py-rcsb_exdb\n\n[![Build Status](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_apis/build/status/rcsb.py-rcsb_exdb?branchName=master)](https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=18&branchName=master)\n\nRCSB exchange database extraction and loading workflow tools\n\n## Introduction\n\nThis module contains a collection of utility classes for extracting data from\nthe RCSB exchange database and subsequently reloading processed or integrated data.\n\n### Installation\n\nDownload the library source software from the project repository:\n\n```bash\n\ngit clone --recurse-submodules https://github.com/rcsb/py-rcsb_exdb.git\n\n```\n\nOptionally, run test suite (Python versions 2.7 and 3.7) using\n[setuptools](https://setuptools.readthedocs.io/en/latest/) or\n[tox](http://tox.readthedocs.io/en/latest/example/platform.html):\n\n```bash\npython setup.py test\n\nor simply run\n\ntox\n```\n\nInstallation is via the program [pip](https://pypi.python.org/pypi/pip).  To run tests\nfrom the source tree, the package must be installed in editable mode (i.e. -e):\n\n```bash\npip install -r requirements.txt   # OR:   pip install -i https://pypi.anaconda.org/OpenEye/simple OpenEye-toolkits\n\npip install -e .\n```\n\n#### Installing in Ubuntu Linux (tested in 18.04)\n\nYou will need a few packages, before `pip install .` can work:\n\n```bash\n\nsudo apt install flex bison\n\n```\n\n### Installing on macOS\n\nTo use and develop this package on macOS requires a number of packages that are not\ndistributed as part of the base macOS operating system.\nThe following steps provide one approach to creating the development environment for this\npackage.  First, install the Apple [XCode](https://developer.apple.com/xcode/) package and associate command-line tools.\nThis will provide essential compilers and supporting tools.  The [HomeBrew](https://brew.sh/) package\nmanager provides further access to a variety of common open source services and tools.\nFollow the instructions provided by at the [HomeBrew](https://brew.sh/) site to\ninstall this system.   Once HomeBrew is installed, you can further install the\n[MongoDB](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/) packages which\nare required to support the ExDB tools.  HomeBrew also provides a variety of options for\nmanaging a [Python virtual environments](https://gist.github.com/Geoyi/f55ed54d24cc9ff1c14bd95fac21c042).\n\n### Command Line Interfaces\n\nA convenience CLI `exdb_exec_cli` is provided for performing update and loading operations.\n\n```bash\nexdb_exec_cli --help\n\nusage: exdb_exec_cli [-h] [--data_set_id DATA_SET_ID] [--full] [--etl_chemref]\n                     [--etl_tree_node_lists] [--config_path CONFIG_PATH]\n                     [--config_name CONFIG_NAME] [--db_type DB_TYPE]\n                     [--read_back_check] [--num_proc NUM_PROC]\n                     [--chunk_size CHUNK_SIZE]\n                     [--document_limit DOCUMENT_LIMIT] [--debug] [--mock]\n                     [--cache_path CACHE_PATH] [--rebuild_cache]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --data_set_id DATA_SET_ID\n                        Data set identifier (default= 2019_14 for current\n                        week)\n  --full                Fresh full load in a new tables/collections (Default)\n  --etl_chemref         ETL integrated chemical reference data\n  --etl_tree_node_lists\n                        ETL tree node lists\n  --config_path CONFIG_PATH\n                        Path to configuration options file\n  --config_name CONFIG_NAME\n                        Configuration section name\n  --db_type DB_TYPE     Database server type (default=mongo)\n  --read_back_check     Perform read back check on all documents\n  --num_proc NUM_PROC   Number of processes to execute (default=2)\n  --chunk_size CHUNK_SIZE\n                        Number of files loaded per process\n  --document_limit DOCUMENT_LIMIT\n                        Load document limit for testing\n  --debug               Turn on verbose logging\n  --mock                Use MOCK repository configuration for testing\n  --cache_path CACHE_PATH\n                        Top cache path for external and local resource files\n  --rebuild_cache       Rebuild cached files from remote resources\n________________________________________________________________________________\n\n```\n\nFor example, to construct and load tree nodes list data collections, the following\ncommand may be used:\n\n```bash\nexdb_exec_cli --mock --full --etl_tree_node_lists --rebuild_cache \\\n              --cache_path ./CACHE  \\\n              --config_path ./rcsb/mock-data/config/dbload-setup-example.yml \\\n              --config_name site_info_configuration >& LOGTREE \\\n```\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "RCSB Python ExDB data extraction and loading workflows",
    "version": "1.27",
    "project_urls": {
        "Homepage": "https://github.com/rcsb/py-rcsb_exdb"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8f27a925e29ea732256676e7b0a3f488de552471a7feff8fd5827dc00ef5f5be",
                "md5": "9871f39b3b523b676aa7195826aa504d",
                "sha256": "ed407b32e48e809aad0d099d637ef26b727c808a414ca978c6d6bbb9370027b5"
            },
            "downloads": -1,
            "filename": "rcsb_exdb-1.27.tar.gz",
            "has_sig": false,
            "md5_digest": "9871f39b3b523b676aa7195826aa504d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 92837,
            "upload_time": "2025-01-23T20:10:39",
            "upload_time_iso_8601": "2025-01-23T20:10:39.587245Z",
            "url": "https://files.pythonhosted.org/packages/8f/27/a925e29ea732256676e7b0a3f488de552471a7feff8fd5827dc00ef5f5be/rcsb_exdb-1.27.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-23 20:10:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rcsb",
    "github_project": "py-rcsb_exdb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "OpenEye-toolkits",
            "specs": [
                [
                    ">=",
                    "2024.1.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "jsonschema",
            "specs": [
                [
                    ">=",
                    "2.6.0"
                ]
            ]
        },
        {
            "name": "rcsb.utils.io",
            "specs": [
                [
                    ">=",
                    "1.48"
                ]
            ]
        },
        {
            "name": "rcsb.db",
            "specs": [
                [
                    ">=",
                    "1.725"
                ]
            ]
        },
        {
            "name": "rcsb.utils.chem",
            "specs": [
                [
                    ">=",
                    "0.79"
                ]
            ]
        },
        {
            "name": "rcsb.utils.chemref",
            "specs": [
                [
                    ">=",
                    "0.91"
                ]
            ]
        },
        {
            "name": "rcsb.utils.citation",
            "specs": [
                [
                    ">=",
                    "0.22"
                ]
            ]
        },
        {
            "name": "rcsb.utils.config",
            "specs": [
                [
                    ">=",
                    "0.40"
                ]
            ]
        },
        {
            "name": "rcsb.utils.ec",
            "specs": [
                [
                    ">=",
                    "0.25"
                ]
            ]
        },
        {
            "name": "rcsb.utils.go",
            "specs": [
                [
                    ">=",
                    "0.18"
                ]
            ]
        },
        {
            "name": "rcsb.utils.seq",
            "specs": [
                [
                    ">=",
                    "0.82"
                ]
            ]
        },
        {
            "name": "rcsb.utils.seqalign",
            "specs": [
                [
                    ">=",
                    "0.31"
                ]
            ]
        },
        {
            "name": "rcsb.utils.targets",
            "specs": [
                [
                    ">=",
                    "0.82"
                ]
            ]
        },
        {
            "name": "rcsb.utils.struct",
            "specs": [
                [
                    ">=",
                    "0.47"
                ]
            ]
        },
        {
            "name": "rcsb.utils.taxonomy",
            "specs": [
                [
                    ">=",
                    "0.43"
                ]
            ]
        },
        {
            "name": "rcsb.utils.dictionary",
            "specs": [
                [
                    ">=",
                    "1.27"
                ]
            ]
        },
        {
            "name": "rcsb.workflow",
            "specs": [
                [
                    ">=",
                    "0.46"
                ]
            ]
        },
        {
            "name": "statistics",
            "specs": []
        }
    ],
    "tox": true,
    "lcname": "rcsb.exdb"
}
        
Elapsed time: 1.29440s