jupyter-s3contents


Namejupyter-s3contents JSON
Version 0.10.2 PyPI version JSON
download
home_pageNone
SummaryS3 Contents Manager for Jupyter
upload_time2023-05-04 12:44:09
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords aws gcs google cloud storage jupyter jupyterlab minio notebooks s3
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
    <img src="https://raw.githubusercontent.com/danielfrg/s3contents/main/docs/logo.png" width="450px">
</p>

<p align="center">
    <a href="https://pypi.org/project/s3contents/">
        <img src="https://img.shields.io/pypi/v/mkdocs-jupyter.svg">
    </a>
    <a href="https://github.com/danielfrg/s3contents/actions/workflows/test.yml">
        <img src="https://github.com/danielfrg/s3contents/workflows/test/badge.svg">
    </a>
    <a href="https://codecov.io/gh/danielfrg/s3contents?branch=main">
        <img src="https://codecov.io/gh/danielfrg/s3contents/branch/main/graph/badge.svg">
    </a>
    <a href="http://github.com/danielfrg/s3contents/blob/main/LICENSE.txt">
        <img src="https://img.shields.io/:license-Apache%202-blue.svg">
    </a>
</p>

# S3Contents - Jupyter Notebooks in S3

A transparent, drop-in replacement for Jupyter standard filesystem-backed storage system.
With this implementation of a
[Jupyter Contents Manager](https://jupyter-server.readthedocs.io/en/latest/developers/contents.html)
you can save all your notebooks, files and directory structure directly to a
S3/GCS bucket on AWS/GCP or a self hosted S3 API compatible like [MinIO](http://minio.io).

## Installation

```shell
pip install s3contents
```

Install with GCS dependencies:

```shell
pip install s3contents[gcs]
```

## s3contents vs X

While there are some implementations of an S3 Jupyter Content Manager such as
[s3nb](https://github.com/monetate/s3nb) or [s3drive](https://github.com/stitchfix/s3drive)
s3contents is the only one tested against new versions of Jupyter.
It also supports more authentication methods and Google Cloud Storage.

This aims to be a fully tested implementation and it's based on [PGContents](https://github.com/quantopian/pgcontents).

## Configuration

Create a `jupyter_notebook_config.py` file in one of the
[Jupyter config directories](https://jupyter.readthedocs.io/en/latest/use/jupyter-directories.html#id1)
for example: `~/.jupyter/jupyter_notebook_config.py`.

**Jupyter Notebook Classic**: If you plan to use the Classic Jupyter Notebook
interface you need to change `ServerApp` to `NotebookApp` for all the examples on this page.

## AWS S3

```python
from s3contents import S3ContentsManager

c = get_config()

# Tell Jupyter to use S3ContentsManager
c.ServerApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.bucket = "<S3 bucket name>"

# Fix JupyterLab dialog issues
c.ServerApp.root_dir = ""
```

### Authentication

Additionally you can configure multiple authentication methods:

Access and secret keys:

```python
c.S3ContentsManager.access_key_id = "<AWS Access Key ID / IAM Access Key ID>"
c.S3ContentsManager.secret_access_key = "<AWS Secret Access Key / IAM Secret Access Key>"
```

Session token:

```python
c.S3ContentsManager.session_token = "<AWS Session Token / IAM Session Token>"
```

### AWS EC2 role auth setup

It also possible to use IAM Role-based access to the S3 bucket from an Amazon EC2 instance or AWS resource.

To do that just leave any authentication options (`access_key_id`, `secret_access_key`) to their default of `None`
and ensure that the EC2 instance has an IAM role which provides sufficient permissions (read and write) for the bucket.

### Optional settings

```python
# A prefix in the S3 buckets to use as the root of the Jupyter file system
c.S3ContentsManager.prefix = "this/is/a/prefix/on/the/s3/bucket"

# Server-Side Encryption
c.S3ContentsManager.sse = "AES256"

# Authentication signature version
c.S3ContentsManager.signature_version = "s3v4"

# See AWS key refresh
c.S3ContentsManager.init_s3_hook = init_function
```

### AWS key refresh

The optional `init_s3_hook` configuration can be used to enable AWS key rotation (described [here](https://dev.to/li_chastina/auto-refresh-aws-tokens-using-iam-role-and-boto3-2cjf) and [here](https://www.owenrumney.co.uk/2019/01/15/implementing-refreshingawscredentials-python/)) as follows:

```python
import boto3
import botocore
from botocore.credentials import RefreshableCredentials
from botocore.session import get_session
from configparser import ConfigParser

from s3contents import S3ContentsManager

def refresh_external_credentials():
    config = ConfigParser()
    config.read('/home/jovyan/.aws/credentials')
    return {
        "access_key": config['default']['aws_access_key_id'],
        "secret_key": config['default']['aws_secret_access_key'],
        "token": config['default']['aws_session_token'],
        "expiry_time": config['default']['aws_expiration']
    }

session_credentials = RefreshableCredentials.create_from_metadata(
        metadata = refresh_external_credentials(),
        refresh_using = refresh_external_credentials,
        method = 'custom-refreshing-key-file-reader'
)

def make_key_refresh_boto3(this_s3contents_instance):
    refresh_session =  get_session() # from botocore.session
    refresh_session._credentials = session_credentials
    my_s3_session =  boto3.Session(botocore_session=refresh_session)
    this_s3contents_instance.boto3_session = my_s3_session

# Tell Jupyter to use S3ContentsManager
c.ServerApp.contents_manager_class = S3ContentsManager

c.S3ContentsManager.init_s3_hook = make_key_refresh_boto3
```

### MinIO playground example

You can test this using the [`play.minio.io:9000`](https://play.minio.io:9000) playground:

Just be sure to create the bucket first.

```python
from s3contents import S3ContentsManager

c = get_config()

# Tell Jupyter to use S3ContentsManager
c.ServerApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.access_key_id = "Q3AM3UQ867SPQQA43P2F"
c.S3ContentsManager.secret_access_key = "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG"
c.S3ContentsManager.endpoint_url = "https://play.minio.io:9000"
c.S3ContentsManager.bucket = "s3contents-demo"
c.S3ContentsManager.prefix = "notebooks/test"
```

## Access local files

To access local file as well as remote files in S3 you can use [hybridcontents](https://github.com/viaduct-ai/hybridcontents).

Install it:

```shell
pip install hybridcontents
```

Use a configuration similar to this:

```python
from s3contents import S3ContentsManager
from hybridcontents import HybridContentsManager
from notebook.services.contents.largefilemanager import LargeFileManager

c = get_config()

c.ServerApp.contents_manager_class = HybridContentsManager

c.HybridContentsManager.manager_classes = {
    # Associate the root directory with an S3ContentsManager.
    # This manager will receive all requests that don"t fall under any of the
    # other managers.
    "": S3ContentsManager,
    # Associate /local_directory with a LargeFileManager.
    "local_directory": LargeFileManager,
}

c.HybridContentsManager.manager_kwargs = {
    # Args for root S3ContentsManager.
    "": {
        "access_key_id": "<AWS Access Key ID / IAM Access Key ID>",
        "secret_access_key": "<AWS Secret Access Key / IAM Secret Access Key>",
        "bucket": "<S3 bucket name>",
    },
    # Args for the LargeFileManager mapped to /local_directory
    "local_directory": {
        "root_dir": "/Users/danielfrg/Downloads",
    },
}
```

## GCP - Google Cloud Storage

Install the extra dependencies with:

```shell
pip install s3contents[gcs]
```

```python
from s3contents.gcs import GCSContentsManager

c = get_config(

c.ServerApp.contents_manager_class = GCSContentsManager
c.GCSContentsManager.project = "<your-project>"
c.GCSContentsManager.token = "~/.config/gcloud/application_default_credentials.json"
c.GCSContentsManager.bucket = "<GCP bucket name>"
```

Note that the file `~/.config/gcloud/application_default_credentials.json` assumes
a POSIX system when you did `gcloud init`.

## Other configuration

### File Save Hooks

If you want to use pre/post file save hooks here are some examples.

A `pre_save_hook` is written in the exact same way as normal, operating on the
file in local storage before committing it to the object store.

```python
def scrub_output_pre_save(model, **kwargs):
    """
    Scrub output before saving notebooks
    """

    # only run on notebooks
    if model["type"] != "notebook":
        return

    # only run on nbformat v4
    if model["content"]["nbformat"] != 4:
        return

    for cell in model["content"]["cells"]:
        if cell["cell_type"] != "code":
            continue
        cell["outputs"] = []
        cell["execution_count"] = None

c.S3ContentsManager.pre_save_hook = scrub_output_pre_save
```

A `post_save_hook` instead operates on the file in object storage,
because of this it is useful to use the file methods on the `contents_manager`
for data manipulation.
In addition, one must use the following function signature (unique to `s3contents`):

```python
def make_html_post_save(model, s3_path, contents_manager, **kwargs):
    """
    Convert notebooks to HTML after saving via nbconvert
    """
    from nbconvert import HTMLExporter

    if model["type"] != "notebook":
        return

    content, _format = contents_manager.fs.read(s3_path, format="text")
    my_notebook = nbformat.reads(content, as_version=4)

    html_exporter = HTMLExporter()
    html_exporter.template_name = "classic"

    (body, resources) = html_exporter.from_notebook_node(my_notebook)

    base, ext = os.path.splitext(s3_path)
    contents_manager.fs.write(path=(base + ".html"), content=body, format=_format)

c.S3ContentsManager.post_save_hook = make_html_post_save
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "jupyter-s3contents",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "aws,gcs,google cloud storage,jupyter,jupyterlab,minio,notebooks,s3",
    "author": null,
    "author_email": "Xin Fu <xin@imfing.com>",
    "download_url": "https://files.pythonhosted.org/packages/fc/68/150756790068adde0d38154223c3603fd2a2efae4fa69aa607e63bfeea2c/jupyter_s3contents-0.10.2.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <img src=\"https://raw.githubusercontent.com/danielfrg/s3contents/main/docs/logo.png\" width=\"450px\">\n</p>\n\n<p align=\"center\">\n    <a href=\"https://pypi.org/project/s3contents/\">\n        <img src=\"https://img.shields.io/pypi/v/mkdocs-jupyter.svg\">\n    </a>\n    <a href=\"https://github.com/danielfrg/s3contents/actions/workflows/test.yml\">\n        <img src=\"https://github.com/danielfrg/s3contents/workflows/test/badge.svg\">\n    </a>\n    <a href=\"https://codecov.io/gh/danielfrg/s3contents?branch=main\">\n        <img src=\"https://codecov.io/gh/danielfrg/s3contents/branch/main/graph/badge.svg\">\n    </a>\n    <a href=\"http://github.com/danielfrg/s3contents/blob/main/LICENSE.txt\">\n        <img src=\"https://img.shields.io/:license-Apache%202-blue.svg\">\n    </a>\n</p>\n\n# S3Contents - Jupyter Notebooks in S3\n\nA transparent, drop-in replacement for Jupyter standard filesystem-backed storage system.\nWith this implementation of a\n[Jupyter Contents Manager](https://jupyter-server.readthedocs.io/en/latest/developers/contents.html)\nyou can save all your notebooks, files and directory structure directly to a\nS3/GCS bucket on AWS/GCP or a self hosted S3 API compatible like [MinIO](http://minio.io).\n\n## Installation\n\n```shell\npip install s3contents\n```\n\nInstall with GCS dependencies:\n\n```shell\npip install s3contents[gcs]\n```\n\n## s3contents vs X\n\nWhile there are some implementations of an S3 Jupyter Content Manager such as\n[s3nb](https://github.com/monetate/s3nb) or [s3drive](https://github.com/stitchfix/s3drive)\ns3contents is the only one tested against new versions of Jupyter.\nIt also supports more authentication methods and Google Cloud Storage.\n\nThis aims to be a fully tested implementation and it's based on [PGContents](https://github.com/quantopian/pgcontents).\n\n## Configuration\n\nCreate a `jupyter_notebook_config.py` file in one of the\n[Jupyter config directories](https://jupyter.readthedocs.io/en/latest/use/jupyter-directories.html#id1)\nfor example: `~/.jupyter/jupyter_notebook_config.py`.\n\n**Jupyter Notebook Classic**: If you plan to use the Classic Jupyter Notebook\ninterface you need to change `ServerApp` to `NotebookApp` for all the examples on this page.\n\n## AWS S3\n\n```python\nfrom s3contents import S3ContentsManager\n\nc = get_config()\n\n# Tell Jupyter to use S3ContentsManager\nc.ServerApp.contents_manager_class = S3ContentsManager\nc.S3ContentsManager.bucket = \"<S3 bucket name>\"\n\n# Fix JupyterLab dialog issues\nc.ServerApp.root_dir = \"\"\n```\n\n### Authentication\n\nAdditionally you can configure multiple authentication methods:\n\nAccess and secret keys:\n\n```python\nc.S3ContentsManager.access_key_id = \"<AWS Access Key ID / IAM Access Key ID>\"\nc.S3ContentsManager.secret_access_key = \"<AWS Secret Access Key / IAM Secret Access Key>\"\n```\n\nSession token:\n\n```python\nc.S3ContentsManager.session_token = \"<AWS Session Token / IAM Session Token>\"\n```\n\n### AWS EC2 role auth setup\n\nIt also possible to use IAM Role-based access to the S3 bucket from an Amazon EC2 instance or AWS resource.\n\nTo do that just leave any authentication options (`access_key_id`, `secret_access_key`) to their default of `None`\nand ensure that the EC2 instance has an IAM role which provides sufficient permissions (read and write) for the bucket.\n\n### Optional settings\n\n```python\n# A prefix in the S3 buckets to use as the root of the Jupyter file system\nc.S3ContentsManager.prefix = \"this/is/a/prefix/on/the/s3/bucket\"\n\n# Server-Side Encryption\nc.S3ContentsManager.sse = \"AES256\"\n\n# Authentication signature version\nc.S3ContentsManager.signature_version = \"s3v4\"\n\n# See AWS key refresh\nc.S3ContentsManager.init_s3_hook = init_function\n```\n\n### AWS key refresh\n\nThe optional `init_s3_hook` configuration can be used to enable AWS key rotation (described [here](https://dev.to/li_chastina/auto-refresh-aws-tokens-using-iam-role-and-boto3-2cjf) and [here](https://www.owenrumney.co.uk/2019/01/15/implementing-refreshingawscredentials-python/)) as follows:\n\n```python\nimport boto3\nimport botocore\nfrom botocore.credentials import RefreshableCredentials\nfrom botocore.session import get_session\nfrom configparser import ConfigParser\n\nfrom s3contents import S3ContentsManager\n\ndef refresh_external_credentials():\n    config = ConfigParser()\n    config.read('/home/jovyan/.aws/credentials')\n    return {\n        \"access_key\": config['default']['aws_access_key_id'],\n        \"secret_key\": config['default']['aws_secret_access_key'],\n        \"token\": config['default']['aws_session_token'],\n        \"expiry_time\": config['default']['aws_expiration']\n    }\n\nsession_credentials = RefreshableCredentials.create_from_metadata(\n        metadata = refresh_external_credentials(),\n        refresh_using = refresh_external_credentials,\n        method = 'custom-refreshing-key-file-reader'\n)\n\ndef make_key_refresh_boto3(this_s3contents_instance):\n    refresh_session =  get_session() # from botocore.session\n    refresh_session._credentials = session_credentials\n    my_s3_session =  boto3.Session(botocore_session=refresh_session)\n    this_s3contents_instance.boto3_session = my_s3_session\n\n# Tell Jupyter to use S3ContentsManager\nc.ServerApp.contents_manager_class = S3ContentsManager\n\nc.S3ContentsManager.init_s3_hook = make_key_refresh_boto3\n```\n\n### MinIO playground example\n\nYou can test this using the [`play.minio.io:9000`](https://play.minio.io:9000) playground:\n\nJust be sure to create the bucket first.\n\n```python\nfrom s3contents import S3ContentsManager\n\nc = get_config()\n\n# Tell Jupyter to use S3ContentsManager\nc.ServerApp.contents_manager_class = S3ContentsManager\nc.S3ContentsManager.access_key_id = \"Q3AM3UQ867SPQQA43P2F\"\nc.S3ContentsManager.secret_access_key = \"zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG\"\nc.S3ContentsManager.endpoint_url = \"https://play.minio.io:9000\"\nc.S3ContentsManager.bucket = \"s3contents-demo\"\nc.S3ContentsManager.prefix = \"notebooks/test\"\n```\n\n## Access local files\n\nTo access local file as well as remote files in S3 you can use [hybridcontents](https://github.com/viaduct-ai/hybridcontents).\n\nInstall it:\n\n```shell\npip install hybridcontents\n```\n\nUse a configuration similar to this:\n\n```python\nfrom s3contents import S3ContentsManager\nfrom hybridcontents import HybridContentsManager\nfrom notebook.services.contents.largefilemanager import LargeFileManager\n\nc = get_config()\n\nc.ServerApp.contents_manager_class = HybridContentsManager\n\nc.HybridContentsManager.manager_classes = {\n    # Associate the root directory with an S3ContentsManager.\n    # This manager will receive all requests that don\"t fall under any of the\n    # other managers.\n    \"\": S3ContentsManager,\n    # Associate /local_directory with a LargeFileManager.\n    \"local_directory\": LargeFileManager,\n}\n\nc.HybridContentsManager.manager_kwargs = {\n    # Args for root S3ContentsManager.\n    \"\": {\n        \"access_key_id\": \"<AWS Access Key ID / IAM Access Key ID>\",\n        \"secret_access_key\": \"<AWS Secret Access Key / IAM Secret Access Key>\",\n        \"bucket\": \"<S3 bucket name>\",\n    },\n    # Args for the LargeFileManager mapped to /local_directory\n    \"local_directory\": {\n        \"root_dir\": \"/Users/danielfrg/Downloads\",\n    },\n}\n```\n\n## GCP - Google Cloud Storage\n\nInstall the extra dependencies with:\n\n```shell\npip install s3contents[gcs]\n```\n\n```python\nfrom s3contents.gcs import GCSContentsManager\n\nc = get_config(\n\nc.ServerApp.contents_manager_class = GCSContentsManager\nc.GCSContentsManager.project = \"<your-project>\"\nc.GCSContentsManager.token = \"~/.config/gcloud/application_default_credentials.json\"\nc.GCSContentsManager.bucket = \"<GCP bucket name>\"\n```\n\nNote that the file `~/.config/gcloud/application_default_credentials.json` assumes\na POSIX system when you did `gcloud init`.\n\n## Other configuration\n\n### File Save Hooks\n\nIf you want to use pre/post file save hooks here are some examples.\n\nA `pre_save_hook` is written in the exact same way as normal, operating on the\nfile in local storage before committing it to the object store.\n\n```python\ndef scrub_output_pre_save(model, **kwargs):\n    \"\"\"\n    Scrub output before saving notebooks\n    \"\"\"\n\n    # only run on notebooks\n    if model[\"type\"] != \"notebook\":\n        return\n\n    # only run on nbformat v4\n    if model[\"content\"][\"nbformat\"] != 4:\n        return\n\n    for cell in model[\"content\"][\"cells\"]:\n        if cell[\"cell_type\"] != \"code\":\n            continue\n        cell[\"outputs\"] = []\n        cell[\"execution_count\"] = None\n\nc.S3ContentsManager.pre_save_hook = scrub_output_pre_save\n```\n\nA `post_save_hook` instead operates on the file in object storage,\nbecause of this it is useful to use the file methods on the `contents_manager`\nfor data manipulation.\nIn addition, one must use the following function signature (unique to `s3contents`):\n\n```python\ndef make_html_post_save(model, s3_path, contents_manager, **kwargs):\n    \"\"\"\n    Convert notebooks to HTML after saving via nbconvert\n    \"\"\"\n    from nbconvert import HTMLExporter\n\n    if model[\"type\"] != \"notebook\":\n        return\n\n    content, _format = contents_manager.fs.read(s3_path, format=\"text\")\n    my_notebook = nbformat.reads(content, as_version=4)\n\n    html_exporter = HTMLExporter()\n    html_exporter.template_name = \"classic\"\n\n    (body, resources) = html_exporter.from_notebook_node(my_notebook)\n\n    base, ext = os.path.splitext(s3_path)\n    contents_manager.fs.write(path=(base + \".html\"), content=body, format=_format)\n\nc.S3ContentsManager.post_save_hook = make_html_post_save\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "S3 Contents Manager for Jupyter",
    "version": "0.10.2",
    "project_urls": {
        "Documentation": "https://github.com/imfing/jupyter-s3contents#readme",
        "Issues": "https://github.com/imfing/jupyter-s3contents/issues",
        "Source": "https://github.com/imfing/jupyter-s3contents"
    },
    "split_keywords": [
        "aws",
        "gcs",
        "google cloud storage",
        "jupyter",
        "jupyterlab",
        "minio",
        "notebooks",
        "s3"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6cfde4f1388067d56d5cbe45869414985b8b6706d024d626b891bde1b494ffc7",
                "md5": "625fca1925a2af4adfe403f8d26ffbc6",
                "sha256": "89838e6c8bf83ee063b849679bf27fb25cbdb0d7398b1611c68ed79670d0660e"
            },
            "downloads": -1,
            "filename": "jupyter_s3contents-0.10.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "625fca1925a2af4adfe403f8d26ffbc6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 30548,
            "upload_time": "2023-05-04T12:44:11",
            "upload_time_iso_8601": "2023-05-04T12:44:11.517896Z",
            "url": "https://files.pythonhosted.org/packages/6c/fd/e4f1388067d56d5cbe45869414985b8b6706d024d626b891bde1b494ffc7/jupyter_s3contents-0.10.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fc68150756790068adde0d38154223c3603fd2a2efae4fa69aa607e63bfeea2c",
                "md5": "8da013472587bbc415ef7121a9b48f89",
                "sha256": "c3f1d2a4010a698d61e420c0aa43fcad7b1cf02f996460543a972bbc27ea89ad"
            },
            "downloads": -1,
            "filename": "jupyter_s3contents-0.10.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8da013472587bbc415ef7121a9b48f89",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 25176,
            "upload_time": "2023-05-04T12:44:09",
            "upload_time_iso_8601": "2023-05-04T12:44:09.084362Z",
            "url": "https://files.pythonhosted.org/packages/fc/68/150756790068adde0d38154223c3603fd2a2efae4fa69aa607e63bfeea2c/jupyter_s3contents-0.10.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-04 12:44:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "imfing",
    "github_project": "jupyter-s3contents#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "jupyter-s3contents"
}
        
Elapsed time: 0.14046s