scregistry


Namescregistry JSON
Version 0.3.1 PyPI version JSON
download
home_page
SummaryAPI for accessing the Shared Cloud Registry (scregistry) specification for sharing data in and across clouds
upload_time2023-06-28 17:27:19
maintainer
docs_urlNone
author
requires_python>=3.8
licenseThe MIT License (MIT) Copyright © 2023 The Johns Hopkins University Applied Physics Laboratory LLC Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords cloud registry aws
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Shared Cloud Registry (scregistry) Tool
This tool is designed for retrieving file registry files from a specific ID entry in a catalog within a bucket. It also includes search functionality for searching through all data registries found in the bucket registry list.

## Use Case
Suppose there is a mission on S3 that follows the HelioCloud 'Shared Cloud Registry' Specification, and you want to obtain specific files from this mission.

### Initial Setup and Global Catalog
First, install the tool if it has not been already installed. Then, import the tool into a script or shell. You will likely want to search the global catalog to find the specific bucket/catalog containing the file registry files.

```python
import scregistry

# Create CatalogRegistry object which will by default pull from the Heliocloud global catalog
# or if an environment variable has been set for another global catalog, it will pull from there
cr = scregistry.CatalogRegistry()

# Print out the entire global catalog
print(cr.get_catalog())

# Print out name + region of all global catalog entries
# If we know roughly what the name of the overarching bucket would be,
# this will help us find the exact name we need for the mission we want.
# Otherwise, other methods must be used to search for the bucket of interests.
print(cr.get_entries())
```

### Finding and Requesting the File Registry
At this point, you should have found the bucket containing the data registry of interest. Next, you will want to search the bucket-specific catalog (data registry) for the ID representing the mission you want to obtain data for.

```python
# With the bucket name we have obtained (possibly by using cr.get_endpoint(name, region_prefix=''))
bucket_name = 'a-bucket-name'
# If this is not a public bucket, you may need to pass access_key or other boto S3 client specific params to get the data
# cache_folder is only used if cache is True and defaults to `bucket_name + '_cache'`
fr = scregistry.FileRegistry(bucket_name, cache_folder=None, cache=True)  

# Print out the entire local catalog (data registry)
print(fr.get_catalog())

# To find the specific ID we can also get the ID + Title by
print(fr.get_entries())

# Now with the ID we can request the file registry files
# This if successful, will get us a Pandas dataframe of the file registry
# and if we previously had set cache to True in initialization, it will
# also save the downloaded file registry
fr_id = 'an_id_from_data_registry_catalog'
start_date = '2007-02-01T00:00:00Z'  # A ISO 8601 standard time and a valid time witin the mission/file-registry
end_date = None  # A ISO 8601 standard time or None if want all the file registry data after start_date
myfiles = fr.request_file_registry(fr_id, start_date=start_date, end_date=end_date, overwrite=False)
```

### Streaming Data from the File Registry
You now have a pandas DataFrame with startdate, key, and filesize for all the files of the mission within your specified start and end dates. From here, you can use the key to stream some of the data through EC2, a Lambda, or other processing methods.

This tool also offers a simple function for streaming the data once the file registry is obtained:

```python
scregistry.FileRegistry.stream(file_registry, lambda bfile, startdate, filesize: print(len(bo.read()), filesize))
```

### Searching the Entire Catalog
As an alternative to manually searching, you can use the EntireCatalogSearch class to find a catalog entry:

```python
search = scregistry.EntireCatalogSearch()
top_search_result = search.search_by_keywords(['vector', 'mission', 'useful'])[0]
# Prints out the top result with all the catalog info, including id, loc, startdate, etc.
print(top_search_result)
```

### Terse example for an SDO fetch of the filelist for all the 94A EUV images (1,624,900 files)
```
import scregistry
dataid = "aia_0094"
s3bucket="s3://gov-nasa-hdrl-data1/"
fr = scregistry.FileRegistry(s3bucket)
mySDOlist = fr.request_file_registry(dataid,
	    start_date=fr.get_entry(dataid)['start'],
	    stop_date=fr.get_entry(dataid)['stop'])
```

### Terse example for an MMS fetch of the filelist for all of a specific MMS item (64,383 files)
```
import scregistry
dataid = "mms1_feeps_brst_electron"
s3bucket="s3://helio-public/"
fr = scregistry.FileRegistry(s3bucket)
myMMSlist = fr.request_file_registry(dataid,
	    start_date=fr.get_entry(dataid)['start'],
	    stop_date=fr.get_entry(dataid)['stop'])
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "scregistry",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "cloud,registry,AWS",
    "author": "",
    "author_email": "Johns Hopkins University Applied Physics Laboratory LLC <sandy.antunes@jhuapl.edu>",
    "download_url": "https://files.pythonhosted.org/packages/46/56/a0b07eb9e66619e9714aa5cbbd7b116efe75a4420c311e5b558304e2ee86/scregistry-0.3.1.tar.gz",
    "platform": null,
    "description": "# Shared Cloud Registry (scregistry) Tool\nThis tool is designed for retrieving file registry files from a specific ID entry in a catalog within a bucket. It also includes search functionality for searching through all data registries found in the bucket registry list.\n\n## Use Case\nSuppose there is a mission on S3 that follows the HelioCloud 'Shared Cloud Registry' Specification, and you want to obtain specific files from this mission.\n\n### Initial Setup and Global Catalog\nFirst, install the tool if it has not been already installed. Then, import the tool into a script or shell. You will likely want to search the global catalog to find the specific bucket/catalog containing the file registry files.\n\n```python\nimport scregistry\n\n# Create CatalogRegistry object which will by default pull from the Heliocloud global catalog\n# or if an environment variable has been set for another global catalog, it will pull from there\ncr = scregistry.CatalogRegistry()\n\n# Print out the entire global catalog\nprint(cr.get_catalog())\n\n# Print out name + region of all global catalog entries\n# If we know roughly what the name of the overarching bucket would be,\n# this will help us find the exact name we need for the mission we want.\n# Otherwise, other methods must be used to search for the bucket of interests.\nprint(cr.get_entries())\n```\n\n### Finding and Requesting the File Registry\nAt this point, you should have found the bucket containing the data registry of interest. Next, you will want to search the bucket-specific catalog (data registry) for the ID representing the mission you want to obtain data for.\n\n```python\n# With the bucket name we have obtained (possibly by using cr.get_endpoint(name, region_prefix=''))\nbucket_name = 'a-bucket-name'\n# If this is not a public bucket, you may need to pass access_key or other boto S3 client specific params to get the data\n# cache_folder is only used if cache is True and defaults to `bucket_name + '_cache'`\nfr = scregistry.FileRegistry(bucket_name, cache_folder=None, cache=True)  \n\n# Print out the entire local catalog (data registry)\nprint(fr.get_catalog())\n\n# To find the specific ID we can also get the ID + Title by\nprint(fr.get_entries())\n\n# Now with the ID we can request the file registry files\n# This if successful, will get us a Pandas dataframe of the file registry\n# and if we previously had set cache to True in initialization, it will\n# also save the downloaded file registry\nfr_id = 'an_id_from_data_registry_catalog'\nstart_date = '2007-02-01T00:00:00Z'  # A ISO 8601 standard time and a valid time witin the mission/file-registry\nend_date = None  # A ISO 8601 standard time or None if want all the file registry data after start_date\nmyfiles = fr.request_file_registry(fr_id, start_date=start_date, end_date=end_date, overwrite=False)\n```\n\n### Streaming Data from the File Registry\nYou now have a pandas DataFrame with startdate, key, and filesize for all the files of the mission within your specified start and end dates. From here, you can use the key to stream some of the data through EC2, a Lambda, or other processing methods.\n\nThis tool also offers a simple function for streaming the data once the file registry is obtained:\n\n```python\nscregistry.FileRegistry.stream(file_registry, lambda bfile, startdate, filesize: print(len(bo.read()), filesize))\n```\n\n### Searching the Entire Catalog\nAs an alternative to manually searching, you can use the EntireCatalogSearch class to find a catalog entry:\n\n```python\nsearch = scregistry.EntireCatalogSearch()\ntop_search_result = search.search_by_keywords(['vector', 'mission', 'useful'])[0]\n# Prints out the top result with all the catalog info, including id, loc, startdate, etc.\nprint(top_search_result)\n```\n\n### Terse example for an SDO fetch of the filelist for all the 94A EUV images (1,624,900 files)\n```\nimport scregistry\ndataid = \"aia_0094\"\ns3bucket=\"s3://gov-nasa-hdrl-data1/\"\nfr = scregistry.FileRegistry(s3bucket)\nmySDOlist = fr.request_file_registry(dataid,\n\t    start_date=fr.get_entry(dataid)['start'],\n\t    stop_date=fr.get_entry(dataid)['stop'])\n```\n\n### Terse example for an MMS fetch of the filelist for all of a specific MMS item (64,383 files)\n```\nimport scregistry\ndataid = \"mms1_feeps_brst_electron\"\ns3bucket=\"s3://helio-public/\"\nfr = scregistry.FileRegistry(s3bucket)\nmyMMSlist = fr.request_file_registry(dataid,\n\t    start_date=fr.get_entry(dataid)['start'],\n\t    stop_date=fr.get_entry(dataid)['stop'])\n```\n",
    "bugtrack_url": null,
    "license": "The MIT License (MIT) Copyright \u00a9 2023 The Johns Hopkins University Applied Physics Laboratory LLC  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "API for accessing the Shared Cloud Registry (scregistry) specification for sharing data in and across clouds",
    "version": "0.3.1",
    "project_urls": {
        "Documentation": "https://heliocloud.org",
        "Homepage": "https://heliocloud.org",
        "Repository": "https://gitlab.smce.nasa.gov"
    },
    "split_keywords": [
        "cloud",
        "registry",
        "aws"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5eebb7a7cb642cd0ca65b2ef5274e293787123b43f00bd1938df56336f49726b",
                "md5": "aca456a7cc9ccbcfb316fb141ef084cc",
                "sha256": "dc69fecf1cf49c7a4514f4c355a4f47caf71b73c437e7b3be5043c47b6c45042"
            },
            "downloads": -1,
            "filename": "scregistry-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aca456a7cc9ccbcfb316fb141ef084cc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11365,
            "upload_time": "2023-06-28T17:27:17",
            "upload_time_iso_8601": "2023-06-28T17:27:17.641228Z",
            "url": "https://files.pythonhosted.org/packages/5e/eb/b7a7cb642cd0ca65b2ef5274e293787123b43f00bd1938df56336f49726b/scregistry-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4656a0b07eb9e66619e9714aa5cbbd7b116efe75a4420c311e5b558304e2ee86",
                "md5": "a6a185dbe08b9a4799421b72b03b7a76",
                "sha256": "fec0338b8d7028328e4a4bc96eb1dbee2521f632d2c1e7b92499d1a3762aaf05"
            },
            "downloads": -1,
            "filename": "scregistry-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a6a185dbe08b9a4799421b72b03b7a76",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 187950,
            "upload_time": "2023-06-28T17:27:19",
            "upload_time_iso_8601": "2023-06-28T17:27:19.489473Z",
            "url": "https://files.pythonhosted.org/packages/46/56/a0b07eb9e66619e9714aa5cbbd7b116efe75a4420c311e5b558304e2ee86/scregistry-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-28 17:27:19",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "scregistry"
}
        
Elapsed time: 0.45794s