scrapbook

Name	scrapbook JSON
Version	0.5.0 JSON
	download
home_page	https://github.com/nteract/scrapbook
Summary	A library for recording and reading data in Jupyter and nteract Notebooks
upload_time	2021-01-06 19:16:26
maintainer
docs_url	None
author	nteract contributors
requires_python	>=3.5
license	BSD
keywords	jupyter mapreduce nteract pipeline notebook
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <img width="616" alt="scrapbook logo" src="https://user-images.githubusercontent.com/836375/52512549-31260f00-2bba-11e9-9556-515ba5ff0b4b.png">

# scrapbook

<!---(binder links generated at https://mybinder.readthedocs.io/en/latest/howto/badges.html and compressed at https://tinyurl.com) -->

[![Travis Build Status](https://travis-ci.org/nteract/scrapbook.svg?branch=main)](https://travis-ci.org/nteract/scrapbook)
[![image](https://codecov.io/github/nteract/scrapbook/coverage.svg?branch=main)](https://codecov.io/github/nteract/scrapbook=main)
[![Documentation Status](https://readthedocs.org/projects/nteract-scrapbook/badge/?version=latest)](https://nteract-scrapbook.readthedocs.io/en/latest/?badge=latest)
[![badge](https://tinyurl.com/y3moqkmc)](https://mybinder.org/v2/gh/nteract/scrapbook/main?filepath=binder%2Freglue_highlight_dates.ipynb)
[![badge](https://tinyurl.com/ybk8qa3j)](https://mybinder.org/v2/gh/nteract/scrapbook/main?filepath=binder%2FResultsDemo.ipynb)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)

The **scrapbook** library records a notebook’s data values and generated visual
content as "scraps". Recorded scraps can be read at a future time.

[See the scrapbook documentation](https://nteract-scrapbook.readthedocs.io/) for
more information on how to use scrapbook.

## Use Cases

Notebook users may wish to record data produced during a notebook's execution.
This recorded data, **scraps**, can be used at a later time or passed in a
workflow to another notebook as input.

Namely, scrapbook lets you:

- **persist** data and visual content displays in a notebook as scraps
- **recall** any persisted scrap of data
- **summarize collections** of notebooks

## Python Version Support

This library's long term support target is Python 3.5+. It currently also
supports Python 2.7 until Python 2 reaches end-of-life in 2020. After this
date, Python 2 support will halt, and only 3.x versions will be maintained.

## Installation

Install using `pip`:

```{.sourceCode .bash}
pip install scrapbook
```

For installing optional IO dependencies, you can specify individual store bundles,
like `s3` or `azure`:

```{.sourceCode .bash}
pip install scrapbook[s3]
```

or use `all`:

```{.sourceCode .bash}
pip install scrapbook[all]
```

---

## Models and Terminology

Scrapbook defines the following items:

- **scraps**: serializable data values and visualizations such as strings, lists of
  objects, pandas dataframes, charts, images, or data references.
- **notebook**: a wrapped nbformat notebook object with extra methods for interacting
  with scraps.
- **scrapbook**: a collection of notebooks with an interface for asking questions of
  the collection.
- **encoders**: a registered translator of data to/from notebook
  storage formats.

### `scrap` model

The `scrap` model houses a few key attributes in a tuple, including:

- **name**: The name of the scrap
- **data**: Any data captured by the scrapbook api call
- **encoder**: The name of the encoder used to encode/decode data to/from the notebook
- **display**: Any display data used by IPython to display visual content

---

## API

Scrapbook adds a few basic api commands which enable saving and retrieving data
including:

- `glue` to persist scraps with or without _display output_
- `read_notebook` reads one notebook
- `scraps` provides a searchable dictionary of all scraps by name
- `reglue` which copies a scrap from another notebook to the current notebook
- `read_notebooks` reads many notebooks from a given path
- `scraps_report` displays a report about collected scraps
- `papermill_dataframe` and `papermill_metrics` for backward compatibility
  for two deprecated papermill features

The following sections provide more detail on these api commands.

### `glue` to persist scraps

Records a `scrap` (data or display value) in the given notebook cell.

The `scrap` (recorded value) can be retrieved during later inspection of the
output notebook.

```python
"""glue example for recording data values"""
import scrapbook as sb

sb.glue("hello", "world")
sb.glue("number", 123)
sb.glue("some_list", [1, 3, 5])
sb.glue("some_dict", {"a": 1, "b": 2})
sb.glue("non_json", df, 'arrow')
```

The scrapbook library can be used later to recover `scraps` from the output
notebook:

```python
# read a notebook and get previously recorded scraps
nb = sb.read_notebook('notebook.ipynb')
nb.scraps
```

**scrapbook** will imply the storage format by the value type of any registered
data encoders. Alternatively, the implied encoding format can be overwritten by
setting the `encoder` argument to the registered name (e.g. `"json"`) of a
particular encoder.

This data is persisted by generating a display output with a special media type
identifying the content encoding format and data. These outputs are not always
visible in notebook rendering but still exist in the document. Scrapbook can
then rehydrate the data associated with the notebook in the future by reading
these cell outputs.

#### With _display output_

To display a named scrap with visible display outputs, you need to indicate that
the scrap is directly renderable.

This can be done by toggling the `display` argument.

```python
# record a UI message along with the input string
sb.glue("hello", "Hello World", display=True)
```

The call will save the data and the display attributes of the Scrap object,
making it visible as well as encoding the original data. This leans on the
`IPython.core.formatters.format_display_data` function to translate the data
object into a display and metadata dict for the notebook kernel to parse.

Another pattern that can be used is to specify that **only the display data**
should be saved, and not the original object. This is achieved by setting
the encoder to be `display`.

```python
# record an image without the original input object
sb.glue("sharable_png",
  IPython.display.Image(filename="sharable.png"),
  encoder='display'
)
```

Finally the media types that are generated can be controlled by passing
a list, tuple, or dict object as the display argument.

```python
sb.glue("media_as_text_only",
  media_obj,
  encoder='display',
  display=('text/plain',) # This passes [text/plain] to format_display_data's include argument
)

sb.glue("media_without_text",
  media_obj,
  encoder='display',
  display={'exclude': 'text/plain'} # forward to format_display_data's kwargs
)
```

Like data scraps, these can be retrieved at a later time be accessing the scrap's
`display` attribute. Though usually one will just use Notebook's `reglue` method
(described below).

### `read_notebook` reads one notebook

Reads a Notebook object loaded from the location specified at `path`.
You've already seen how this function is used in the above api call examples,
but essentially this provides a thin wrapper over an `nbformat`'s NotebookNode
with the ability to extract scrapbook scraps.

```python
nb = sb.read_notebook('notebook.ipynb')
```

This Notebook object adheres to the [nbformat's json schema](https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json),
allowing for access to its required fields.

```python
nb.cells # The cells from the notebook
nb.metadata
nb.nbformat
nb.nbformat_minor
```

There's a few additional methods provided, most of which are outlined in more detail
below:

```python
nb.scraps
nb.reglue
```

The abstraction also makes saved content available as a dataframe referencing each
key and source. More of these methods will be made available in later versions.

```python
# Produces a data frame with ["name", "data", "encoder", "display", "filename"] as columns
nb.scrap_dataframe # Warning: This might be a large object if data or display is large
```

The Notebook object also has a few legacy functions for backwards compatibility
with papermill's Notebook object model. As a result, it can be used to read
papermill execution statistics as well as scrapbook abstractions:

```python
nb.cell_timing # List of cell execution timings in cell order
nb.execution_counts # List of cell execution counts in cell order
nb.papermill_metrics # Dataframe of cell execution counts and times
nb.papermill_record_dataframe # Dataframe of notebook records (scraps with only data)
nb.parameter_dataframe # Dataframe of notebook parameters
nb.papermill_dataframe # Dataframe of notebook parameters and cell scraps
```

The notebook reader relies on [papermill's registered iorw](https://papermill.readthedocs.io/en/latest/reference/papermill-io.html)
to enable access to a variety of sources such as -- but not limited to -- S3,
Azure, and Google Cloud.

### `scraps` provides a name -> scrap lookup

The `scraps` method allows for access to all of the scraps in a particular notebook.

```python
nb = sb.read_notebook('notebook.ipynb')
nb.scraps # Prints a dict of all scraps by name
```

This object has a few additional methods as well for convenient conversion and
execution.

```python
nb.scraps.data_scraps # Filters to only scraps with `data` associated
nb.scraps.data_dict # Maps `data_scraps` to a `name` -> `data` dict
nb.scraps.display_scraps # Filters to only scraps with `display` associated
nb.scraps.display_dict # Maps `display_scraps` to a `name` -> `display` dict
nb.scraps.dataframe # Generates a dataframe with ["name", "data", "encoder", "display"] as columns
```

These methods allow for simple use-cases to not require digging through model
abstractions.

### `reglue` copys a scrap into the current notebook

Using `reglue` one can take any scrap glue'd into one notebook and glue into the
current one.

```python
nb = sb.read_notebook('notebook.ipynb')
nb.reglue("table_scrap") # This copies both data and displays
```

Any data or display information will be copied verbatim into the currently
executing notebook as though the user called `glue` again on the original source.

It's also possible to rename the scrap in the process.

```python
nb.reglue("table_scrap", "old_table_scrap")
```

And finally if one wishes to try to reglue without checking for existence the
`raise_on_missing` can be set to just display a message on failure.

```python
nb.reglue("maybe_missing", raise_on_missing=False)
# => "No scrap found with name 'maybe_missing' in this notebook"
```

### `read_notebooks` reads many notebooks

Reads all notebooks located in a given `path` into a Scrapbook object.

```python
# create a scrapbook named `book`
book = sb.read_notebooks('path/to/notebook/collection/')
# get the underlying notebooks as a list
book.notebooks # Or `book.values`
```

The path reuses [papermill's registered `iorw`](https://papermill.readthedocs.io/en/latest/reference/papermill-io.html)
to list and read files form various sources, such that non-local urls can load data.

```python
# create a scrapbook named `book`
book = sb.read_notebooks('s3://bucket/key/prefix/to/notebook/collection/')
```

The Scrapbook (`book` in this example) can be used to recall all scraps across
the collection of notebooks:

```python
book.notebook_scraps # Dict of shape `notebook` -> (`name` -> `scrap`)
book.scraps # merged dict of shape `name` -> `scrap`
```

### `scraps_report` displays a report about collected scraps

The Scrapbook collection can be used to generate a `scraps_report` on all the
scraps from the collection as a markdown structured output.

```python
book.scraps_report()
```

This display can filter on scrap and notebook names, as well as enable or disable
an overall header for the display.

```python
book.scraps_report(
  scrap_names=["scrap1", "scrap2"],
  notebook_names=["result1"], # matches `/notebook/collections/result1.ipynb` pathed notebooks
  header=False
)
```

By default the report will only populate with visual elements. To also
report on data elements set include_data.

```python
book.scraps_report(include_data=True)
```

### papermill support

Finally the scrapbook provides two backwards compatible features for deprecated
`papermill` capabilities:

```python
book.papermill_dataframe
book.papermill_metrics
```

## Encoders

Encoders are accessible by key names to Encoder objects registered
against the `encoders.registry` object. To register new data encoders
simply call:

```python
from encoder import registry as encoder_registry
# add encoder to the registry
encoder_registry.register("custom_encoder_name", MyCustomEncoder())
```

The encode class must implement two methods, `encode` and `decode`:

```python
class MyCustomEncoder(object):
    def encode(self, scrap):
        # scrap.data is any type, usually specific to the encoder name
        pass  # Return a `Scrap` with `data` type one of [None, list, dict, *six.integer_types, *six.string_types]

    def decode(self, scrap):
        # scrap.data is one of [None, list, dict, *six.integer_types, *six.string_types]
        pass  # Return a `Scrap` with `data` type as any type, usually specific to the encoder name
```

This can read transform scraps into a json object representing their contents or
location and load those strings back into the original data objects.

### `text`

A basic string storage format that saves data as python strings.

```python
sb.glue("hello", "world", "text")
```

### `json`

```python
sb.glue("foo_json", {"foo": "bar", "baz": 1}, "json")
```

### `pandas`

```python
sb.glue("pandas_df",pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}), "pandas")
```

## papermill's deprecated `record` feature

**scrapbook** provides a robust and flexible recording schema. This library replaces [papermill](https://papermill.readthedocs.io)'s existing
`record` functionality.

[Documentation for papermill `record`](https://papermill.readthedocs.io/en/latest/usage-recording.html?#recording-values-to-the-notebook) exists on ReadTheDocs.
In brief, the deprecated `record` function:

`pm.record(name, value)`: enables values to be saved
with the notebook [[API documentation]](https://papermill.readthedocs.io/en/latest/reference/papermill.html#papermill.api.record)

```python
pm.record("hello", "world")
pm.record("number", 123)
pm.record("some_list", [1, 3, 5])
pm.record("some_dict", {"a": 1, "b": 2})
```

`pm.read_notebook(notebook)`: pandas could be used later to recover recorded
values by reading the output notebook into a dataframe. For example:

```python
nb = pm.read_notebook('notebook.ipynb')
nb.dataframe
```

### Rationale for Papermill `record` deprecation

Papermill's `record` function was deprecated due to these limitations and challenges:

- The `record` function didn't follow papermill's pattern of linear execution
  of a notebook. It was awkward to describe `record` as an additional
  feature of papermill, and really felt like describing a second less
  developed library.
- Recording / Reading required data translation to JSON for everything. This is
  a tedious, painful process for dataframes.
- Reading recorded values into a dataframe would result in unintuitive dataframe
  shapes.
- Less modularity and flexiblity than other papermill components where custom
  operators can be registered.

To overcome these limitations in Papermill, a decision was made to create
**Scrapbook**.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nteract/scrapbook",
    "name": "scrapbook",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "jupyter,mapreduce,nteract,pipeline,notebook",
    "author": "nteract contributors",
    "author_email": "nteract@googlegroups.com",
    "download_url": "https://files.pythonhosted.org/packages/44/5f/cb73e0d1c92d73d992c6e77e050f7288e5a1503644822c2792a0e4bad996/scrapbook-0.5.0.tar.gz",
    "platform": "",
    "description": "<img width=\"616\" alt=\"scrapbook logo\" src=\"https://user-images.githubusercontent.com/836375/52512549-31260f00-2bba-11e9-9556-515ba5ff0b4b.png\">\n\n# scrapbook\n\n<!---(binder links generated at https://mybinder.readthedocs.io/en/latest/howto/badges.html and compressed at https://tinyurl.com) -->\n\n[![Travis Build Status](https://travis-ci.org/nteract/scrapbook.svg?branch=main)](https://travis-ci.org/nteract/scrapbook)\n[![image](https://codecov.io/github/nteract/scrapbook/coverage.svg?branch=main)](https://codecov.io/github/nteract/scrapbook=main)\n[![Documentation Status](https://readthedocs.org/projects/nteract-scrapbook/badge/?version=latest)](https://nteract-scrapbook.readthedocs.io/en/latest/?badge=latest)\n[![badge](https://tinyurl.com/y3moqkmc)](https://mybinder.org/v2/gh/nteract/scrapbook/main?filepath=binder%2Freglue_highlight_dates.ipynb)\n[![badge](https://tinyurl.com/ybk8qa3j)](https://mybinder.org/v2/gh/nteract/scrapbook/main?filepath=binder%2FResultsDemo.ipynb)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\nThe **scrapbook** library records a notebook\u2019s data values and generated visual\ncontent as \"scraps\". Recorded scraps can be read at a future time.\n\n[See the scrapbook documentation](https://nteract-scrapbook.readthedocs.io/) for\nmore information on how to use scrapbook.\n\n## Use Cases\n\nNotebook users may wish to record data produced during a notebook's execution.\nThis recorded data, **scraps**, can be used at a later time or passed in a\nworkflow to another notebook as input.\n\nNamely, scrapbook lets you:\n\n- **persist** data and visual content displays in a notebook as scraps\n- **recall** any persisted scrap of data\n- **summarize collections** of notebooks\n\n## Python Version Support\n\nThis library's long term support target is Python 3.5+. It currently also\nsupports Python 2.7 until Python 2 reaches end-of-life in 2020. After this\ndate, Python 2 support will halt, and only 3.x versions will be maintained.\n\n## Installation\n\nInstall using `pip`:\n\n```{.sourceCode .bash}\npip install scrapbook\n```\n\nFor installing optional IO dependencies, you can specify individual store bundles,\nlike `s3` or `azure`:\n\n```{.sourceCode .bash}\npip install scrapbook[s3]\n```\n\nor use `all`:\n\n```{.sourceCode .bash}\npip install scrapbook[all]\n```\n\n---\n\n## Models and Terminology\n\nScrapbook defines the following items:\n\n- **scraps**: serializable data values and visualizations such as strings, lists of\n  objects, pandas dataframes, charts, images, or data references.\n- **notebook**: a wrapped nbformat notebook object with extra methods for interacting\n  with scraps.\n- **scrapbook**: a collection of notebooks with an interface for asking questions of\n  the collection.\n- **encoders**: a registered translator of data to/from notebook\n  storage formats.\n\n### `scrap` model\n\nThe `scrap` model houses a few key attributes in a tuple, including:\n\n- **name**: The name of the scrap\n- **data**: Any data captured by the scrapbook api call\n- **encoder**: The name of the encoder used to encode/decode data to/from the notebook\n- **display**: Any display data used by IPython to display visual content\n\n---\n\n## API\n\nScrapbook adds a few basic api commands which enable saving and retrieving data\nincluding:\n\n- `glue` to persist scraps with or without _display output_\n- `read_notebook` reads one notebook\n- `scraps` provides a searchable dictionary of all scraps by name\n- `reglue` which copies a scrap from another notebook to the current notebook\n- `read_notebooks` reads many notebooks from a given path\n- `scraps_report` displays a report about collected scraps\n- `papermill_dataframe` and `papermill_metrics` for backward compatibility\n  for two deprecated papermill features\n\nThe following sections provide more detail on these api commands.\n\n### `glue` to persist scraps\n\nRecords a `scrap` (data or display value) in the given notebook cell.\n\nThe `scrap` (recorded value) can be retrieved during later inspection of the\noutput notebook.\n\n```python\n\"\"\"glue example for recording data values\"\"\"\nimport scrapbook as sb\n\nsb.glue(\"hello\", \"world\")\nsb.glue(\"number\", 123)\nsb.glue(\"some_list\", [1, 3, 5])\nsb.glue(\"some_dict\", {\"a\": 1, \"b\": 2})\nsb.glue(\"non_json\", df, 'arrow')\n```\n\nThe scrapbook library can be used later to recover `scraps` from the output\nnotebook:\n\n```python\n# read a notebook and get previously recorded scraps\nnb = sb.read_notebook('notebook.ipynb')\nnb.scraps\n```\n\n**scrapbook** will imply the storage format by the value type of any registered\ndata encoders. Alternatively, the implied encoding format can be overwritten by\nsetting the `encoder` argument to the registered name (e.g. `\"json\"`) of a\nparticular encoder.\n\nThis data is persisted by generating a display output with a special media type\nidentifying the content encoding format and data. These outputs are not always\nvisible in notebook rendering but still exist in the document. Scrapbook can\nthen rehydrate the data associated with the notebook in the future by reading\nthese cell outputs.\n\n#### With _display output_\n\nTo display a named scrap with visible display outputs, you need to indicate that\nthe scrap is directly renderable.\n\nThis can be done by toggling the `display` argument.\n\n```python\n# record a UI message along with the input string\nsb.glue(\"hello\", \"Hello World\", display=True)\n```\n\nThe call will save the data and the display attributes of the Scrap object,\nmaking it visible as well as encoding the original data. This leans on the\n`IPython.core.formatters.format_display_data` function to translate the data\nobject into a display and metadata dict for the notebook kernel to parse.\n\nAnother pattern that can be used is to specify that **only the display data**\nshould be saved, and not the original object. This is achieved by setting\nthe encoder to be `display`.\n\n```python\n# record an image without the original input object\nsb.glue(\"sharable_png\",\n  IPython.display.Image(filename=\"sharable.png\"),\n  encoder='display'\n)\n```\n\nFinally the media types that are generated can be controlled by passing\na list, tuple, or dict object as the display argument.\n\n```python\nsb.glue(\"media_as_text_only\",\n  media_obj,\n  encoder='display',\n  display=('text/plain',) # This passes [text/plain] to format_display_data's include argument\n)\n\nsb.glue(\"media_without_text\",\n  media_obj,\n  encoder='display',\n  display={'exclude': 'text/plain'} # forward to format_display_data's kwargs\n)\n```\n\nLike data scraps, these can be retrieved at a later time be accessing the scrap's\n`display` attribute. Though usually one will just use Notebook's `reglue` method\n(described below).\n\n### `read_notebook` reads one notebook\n\nReads a Notebook object loaded from the location specified at `path`.\nYou've already seen how this function is used in the above api call examples,\nbut essentially this provides a thin wrapper over an `nbformat`'s NotebookNode\nwith the ability to extract scrapbook scraps.\n\n```python\nnb = sb.read_notebook('notebook.ipynb')\n```\n\nThis Notebook object adheres to the [nbformat's json schema](https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json),\nallowing for access to its required fields.\n\n```python\nnb.cells # The cells from the notebook\nnb.metadata\nnb.nbformat\nnb.nbformat_minor\n```\n\nThere's a few additional methods provided, most of which are outlined in more detail\nbelow:\n\n```python\nnb.scraps\nnb.reglue\n```\n\nThe abstraction also makes saved content available as a dataframe referencing each\nkey and source. More of these methods will be made available in later versions.\n\n```python\n# Produces a data frame with [\"name\", \"data\", \"encoder\", \"display\", \"filename\"] as columns\nnb.scrap_dataframe # Warning: This might be a large object if data or display is large\n```\n\nThe Notebook object also has a few legacy functions for backwards compatibility\nwith papermill's Notebook object model. As a result, it can be used to read\npapermill execution statistics as well as scrapbook abstractions:\n\n```python\nnb.cell_timing # List of cell execution timings in cell order\nnb.execution_counts # List of cell execution counts in cell order\nnb.papermill_metrics # Dataframe of cell execution counts and times\nnb.papermill_record_dataframe # Dataframe of notebook records (scraps with only data)\nnb.parameter_dataframe # Dataframe of notebook parameters\nnb.papermill_dataframe # Dataframe of notebook parameters and cell scraps\n```\n\nThe notebook reader relies on [papermill's registered iorw](https://papermill.readthedocs.io/en/latest/reference/papermill-io.html)\nto enable access to a variety of sources such as -- but not limited to -- S3,\nAzure, and Google Cloud.\n\n### `scraps` provides a name -> scrap lookup\n\nThe `scraps` method allows for access to all of the scraps in a particular notebook.\n\n```python\nnb = sb.read_notebook('notebook.ipynb')\nnb.scraps # Prints a dict of all scraps by name\n```\n\nThis object has a few additional methods as well for convenient conversion and\nexecution.\n\n```python\nnb.scraps.data_scraps # Filters to only scraps with `data` associated\nnb.scraps.data_dict # Maps `data_scraps` to a `name` -> `data` dict\nnb.scraps.display_scraps # Filters to only scraps with `display` associated\nnb.scraps.display_dict # Maps `display_scraps` to a `name` -> `display` dict\nnb.scraps.dataframe # Generates a dataframe with [\"name\", \"data\", \"encoder\", \"display\"] as columns\n```\n\nThese methods allow for simple use-cases to not require digging through model\nabstractions.\n\n### `reglue` copys a scrap into the current notebook\n\nUsing `reglue` one can take any scrap glue'd into one notebook and glue into the\ncurrent one.\n\n```python\nnb = sb.read_notebook('notebook.ipynb')\nnb.reglue(\"table_scrap\") # This copies both data and displays\n```\n\nAny data or display information will be copied verbatim into the currently\nexecuting notebook as though the user called `glue` again on the original source.\n\nIt's also possible to rename the scrap in the process.\n\n```python\nnb.reglue(\"table_scrap\", \"old_table_scrap\")\n```\n\nAnd finally if one wishes to try to reglue without checking for existence the\n`raise_on_missing` can be set to just display a message on failure.\n\n```python\nnb.reglue(\"maybe_missing\", raise_on_missing=False)\n# => \"No scrap found with name 'maybe_missing' in this notebook\"\n```\n\n### `read_notebooks` reads many notebooks\n\nReads all notebooks located in a given `path` into a Scrapbook object.\n\n```python\n# create a scrapbook named `book`\nbook = sb.read_notebooks('path/to/notebook/collection/')\n# get the underlying notebooks as a list\nbook.notebooks # Or `book.values`\n```\n\nThe path reuses [papermill's registered `iorw`](https://papermill.readthedocs.io/en/latest/reference/papermill-io.html)\nto list and read files form various sources, such that non-local urls can load data.\n\n```python\n# create a scrapbook named `book`\nbook = sb.read_notebooks('s3://bucket/key/prefix/to/notebook/collection/')\n```\n\nThe Scrapbook (`book` in this example) can be used to recall all scraps across\nthe collection of notebooks:\n\n```python\nbook.notebook_scraps # Dict of shape `notebook` -> (`name` -> `scrap`)\nbook.scraps # merged dict of shape `name` -> `scrap`\n```\n\n### `scraps_report` displays a report about collected scraps\n\nThe Scrapbook collection can be used to generate a `scraps_report` on all the\nscraps from the collection as a markdown structured output.\n\n```python\nbook.scraps_report()\n```\n\nThis display can filter on scrap and notebook names, as well as enable or disable\nan overall header for the display.\n\n```python\nbook.scraps_report(\n  scrap_names=[\"scrap1\", \"scrap2\"],\n  notebook_names=[\"result1\"], # matches `/notebook/collections/result1.ipynb` pathed notebooks\n  header=False\n)\n```\n\nBy default the report will only populate with visual elements. To also\nreport on data elements set include_data.\n\n```python\nbook.scraps_report(include_data=True)\n```\n\n### papermill support\n\nFinally the scrapbook provides two backwards compatible features for deprecated\n`papermill` capabilities:\n\n```python\nbook.papermill_dataframe\nbook.papermill_metrics\n```\n\n## Encoders\n\nEncoders are accessible by key names to Encoder objects registered\nagainst the `encoders.registry` object. To register new data encoders\nsimply call:\n\n```python\nfrom encoder import registry as encoder_registry\n# add encoder to the registry\nencoder_registry.register(\"custom_encoder_name\", MyCustomEncoder())\n```\n\nThe encode class must implement two methods, `encode` and `decode`:\n\n```python\nclass MyCustomEncoder(object):\n    def encode(self, scrap):\n        # scrap.data is any type, usually specific to the encoder name\n        pass  # Return a `Scrap` with `data` type one of [None, list, dict, *six.integer_types, *six.string_types]\n\n    def decode(self, scrap):\n        # scrap.data is one of [None, list, dict, *six.integer_types, *six.string_types]\n        pass  # Return a `Scrap` with `data` type as any type, usually specific to the encoder name\n```\n\nThis can read transform scraps into a json object representing their contents or\nlocation and load those strings back into the original data objects.\n\n### `text`\n\nA basic string storage format that saves data as python strings.\n\n```python\nsb.glue(\"hello\", \"world\", \"text\")\n```\n\n### `json`\n\n```python\nsb.glue(\"foo_json\", {\"foo\": \"bar\", \"baz\": 1}, \"json\")\n```\n\n### `pandas`\n\n```python\nsb.glue(\"pandas_df\",pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}), \"pandas\")\n```\n\n## papermill's deprecated `record` feature\n\n**scrapbook** provides a robust and flexible recording schema. This library replaces [papermill](https://papermill.readthedocs.io)'s existing\n`record` functionality.\n\n[Documentation for papermill `record`](https://papermill.readthedocs.io/en/latest/usage-recording.html?#recording-values-to-the-notebook) exists on ReadTheDocs.\nIn brief, the deprecated `record` function:\n\n`pm.record(name, value)`: enables values to be saved\nwith the notebook [[API documentation]](https://papermill.readthedocs.io/en/latest/reference/papermill.html#papermill.api.record)\n\n```python\npm.record(\"hello\", \"world\")\npm.record(\"number\", 123)\npm.record(\"some_list\", [1, 3, 5])\npm.record(\"some_dict\", {\"a\": 1, \"b\": 2})\n```\n\n`pm.read_notebook(notebook)`: pandas could be used later to recover recorded\nvalues by reading the output notebook into a dataframe. For example:\n\n```python\nnb = pm.read_notebook('notebook.ipynb')\nnb.dataframe\n```\n\n### Rationale for Papermill `record` deprecation\n\nPapermill's `record` function was deprecated due to these limitations and challenges:\n\n- The `record` function didn't follow papermill's pattern of linear execution\n  of a notebook. It was awkward to describe `record` as an additional\n  feature of papermill, and really felt like describing a second less\n  developed library.\n- Recording / Reading required data translation to JSON for everything. This is\n  a tedious, painful process for dataframes.\n- Reading recorded values into a dataframe would result in unintuitive dataframe\n  shapes.\n- Less modularity and flexiblity than other papermill components where custom\n  operators can be registered.\n\nTo overcome these limitations in Papermill, a decision was made to create\n**Scrapbook**.\n\n\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "A library for recording and reading data in Jupyter and nteract Notebooks",
    "version": "0.5.0",
    "project_urls": {
        "Documentation": "https://nteract-scrapbook.readthedocs.io",
        "Funding": "https://nteract.io",
        "Homepage": "https://github.com/nteract/scrapbook",
        "Source": "https://github.com/nteract/scrapbook/",
        "Tracker": "https://github.com/nteract/scrapbook/issues"
    },
    "split_keywords": [
        "jupyter",
        "mapreduce",
        "nteract",
        "pipeline",
        "notebook"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "27dc68f9c96997dffbf3632bebe0d88077a519aa2a74585834e84d6690243825",
                "md5": "8fe5c118bfa4909ff12407a8a8894a26",
                "sha256": "3ae3172a160fbdc6c9bd7941e6cb9f105d53b813f4a059211960c88413f91c8e"
            },
            "downloads": -1,
            "filename": "scrapbook-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8fe5c118bfa4909ff12407a8a8894a26",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 34681,
            "upload_time": "2021-01-06T19:16:24",
            "upload_time_iso_8601": "2021-01-06T19:16:24.254428Z",
            "url": "https://files.pythonhosted.org/packages/27/dc/68f9c96997dffbf3632bebe0d88077a519aa2a74585834e84d6690243825/scrapbook-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "445fcb73e0d1c92d73d992c6e77e050f7288e5a1503644822c2792a0e4bad996",
                "md5": "1c6452b64bc313236986842dd5672974",
                "sha256": "6875bc804c3278a00544dbc232cb96e7cf8a563c36e02b17dcae329d05470cd0"
            },
            "downloads": -1,
            "filename": "scrapbook-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1c6452b64bc313236986842dd5672974",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 134804,
            "upload_time": "2021-01-06T19:16:26",
            "upload_time_iso_8601": "2021-01-06T19:16:26.400235Z",
            "url": "https://files.pythonhosted.org/packages/44/5f/cb73e0d1c92d73d992c6e77e050f7288e5a1503644822c2792a0e4bad996/scrapbook-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-01-06 19:16:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nteract",
    "github_project": "scrapbook",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "scrapbook"
}

nteract contributors