emblaze


Nameemblaze JSON
Version 0.10.6 PyPI version JSON
download
home_pagehttps://github.com/cmudig/emblaze
SummaryInteractive Jupyter notebook widget for visually comparing embeddings
upload_time2024-03-18 17:22:04
maintainer
docs_urlNone
authorvenkatesh-sivaraman
requires_python
licenseBSD
keywords jupyter widgets ipython
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Emblaze - Interactive Embedding Comparison

Emblaze is a Jupyter notebook widget for **visually comparing embeddings** using animated scatter plots. It bundles an easy-to-use Python API for performing dimensionality reduction on multiple sets of embedding data (including aligning the results for easier comparison), and a full-featured interactive platform for probing and comparing embeddings that runs within a Jupyter notebook cell. [Read the documentation >](https://dig.cmu.edu/emblaze/emblaze)

![](https://raw.githubusercontent.com/cmudig/emblaze/main/examples/screenshots/cover_art.png)

## Installation

**Compatibility Note:** Note that this widget has been tested using Python >= 3.7. If you are using JupyterLab, please make sure you are running version 3.0 or higher. The widget currently does not support displaying in the VS Code interactive notebook environment.

Install Emblaze using `pip`:

```bash
pip install emblaze
```

The widget should work out of the box when you run `jupyter lab` (see example code below).

_Jupyter Notebook note:_ If you are using Jupyter Notebook 5.2 or earlier, you may also need to enable
the nbextension:

```bash
jupyter nbextension enable --py --sys-prefix emblaze
```

### Running as a Standalone App

It's most convenient to use Emblaze as a Jupyter widget, but it can also be run as a standalone web application for small-scale hosting purposes. To do so, run the Flask server using:

```bash
python -m emblaze.server
```

See "Standalone App Development" below to learn how to import data into the standalone tool.

## Examples

Please see `examples/example.ipynb` to try using the Emblaze widget on the Boston housing prices or MNIST (TensorFlow import required) datasets.

**Example 1: Multiple projections of the same embedding dataset.** This can reveal areas of variation in the dimensionality reduction process, since tSNE and UMAP are randomized algorithms.

```python
import emblaze
from emblaze.utils import Field, ProjectionTechnique

# X is an n x k array, Y is a length-n array
X, Y = ...

# Represent the high-dimensional embedding
emb = emblaze.Embedding({Field.POSITION: X, Field.COLOR: Y})
# Compute nearest neighbors in the high-D space (for display)
emb.compute_neighbors(metric='cosine')

# Generate UMAP 2D representations - you can pass UMAP parameters to project()
variants = emblaze.EmbeddingSet([
    emb.project(method=ProjectionTechnique.UMAP) for _ in range(10)
])
# Compute neighbors again (to indicate that we want to compare projections)
variants.compute_neighbors(metric='euclidean')

w = emblaze.Viewer(embeddings=variants)
w
```

**Example 2: Multiple embeddings of the same data from different models.** This is useful to see how different models embed data differently.

```python
# Xs is a list of n x k arrays corresponding to different embedding spaces
Xs = ...
# Y is a length-n array of labels for color-coding
Y = ...
# List of strings representing the name of each embedding space (e.g.
# "Google News", "Wikipedia", "Twitter"). Omit to use generic names
embedding_names = [...]

# Make high-dimensional embedding objects
embeddings = emblaze.EmbeddingSet([
    emblaze.Embedding({Field.POSITION: X, Field.COLOR: Y}, label=emb_name)
    for X, emb_name in zip(Xs, embedding_names)
])
embeddings.compute_neighbors(metric='cosine')

# Make aligned UMAP
reduced = embeddings.project(method=ProjectionTechnique.ALIGNED_UMAP)

w = emblaze.Viewer(embeddings=reduced)
w
```

**Example 3: Visualizing image data with image thumbnails.** The viewer will display image previews for each point as well as its nearest neighbors. (For text data, you can use `TextThumbnails` to show small pieces of text next to the points.)

```python
# images is an n x 100 x 100 x 3 numpy array of 100x100 RGB images (values from 0-255)
images = ...
thumbnails = emblaze.ImageThumbnails(images)
w = emblaze.Viewer(embeddings=embeddings, thumbnails=thumbnails)
w
```

You can also visualize embeddings with multimodal labels (i.e. where some points have text labels and others have image labels) by initializing an `emblaze.CombinedThumbnails` instance with a list of other `Thumbnails` objects to combine.

See the [documentation](https://dig.cmu.edu/emblaze/emblaze) for more details on defining and configuring comparisons with Emblaze.

---

## Development Installation

Clone repository, then install dependencies:

```bash
pip install -r requirements.txt
```

Install the python package. This will also build the JS packages.

```bash
pip install -e .
```

Run the following commands if you use **Jupyter Lab**:

```
jupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build
jupyter labextension install .
```

Run the following commands if you use **Jupyter Notebook**:

```
jupyter nbextension install --sys-prefix --symlink --overwrite --py emblaze
jupyter nbextension enable --sys-prefix --py emblaze
```

Note that the `--symlink` flag doesn't work on Windows, so you will here have to run
the `install` command every time that you rebuild your extension. For certain installations
you might also need another flag instead of `--sys-prefix`, but we won't cover the meaning
of those flags here.

### How to see your changes

Open JupyterLab in watch mode with `jupyter lab --watch`. Then, in a separate terminal, watch the source directory for changes with `npm run watch`. After a change to the JavaScript code, you will wait for the build to finish, then refresh your browser. After changing in Python code, you will need to restart the notebook kernel to see your changes take effect.

### Standalone App Development

To develop using the standalone app, run `npm run watch:standalone` in a separate terminal from the Flask server to continuously build the frontend. You will need to reload the page to see your changes.

The standalone application serves datasets stored at the data path that is printed when the Flask server starts (should be something like `.../lib/python3.9/site-packages/emblaze/data` for the pip-installed version, or `.../emblaze/emblaze/data` for a local repository). You can add your own datasets by building an `EmbeddingSet` and (optionally) a `Thumbnails` object, then saving the results to files in the data directory:

```python
import os, json

dataset_name = "my-dataset"
data_dir = ... # data directory printed by flask server

embeddings = ... # EmbeddingSet object
thumbnails = ... # (Text|Image)Thumbnails object

os.mkdir(os.path.join(data_dir, dataset_name))
with open(os.path.join(data_dir, dataset_name, "data.json"), "w") as file:
    json.dump(embeddings.to_json(), file)
with open(os.path.join(data_dir, dataset_name, "thumbnails.json"), "w") as file:
    json.dump(thumbnails.to_json(), file)
```

### Building Documentation

Install pdoc3: `pip install pdoc3`

Build documentation:

```bash
pdoc --html --force --output-dir docs --template-dir docs/templates emblaze
```

### Deployment

First clean all npm build intermediates:

```
npm run clean
```

Bump the widget version in `emblaze/_version.py`, `emblaze/_frontend.py`, and `package.json` if applicable. Then build the notebook widgets and standalone app:

```
npm run build:all
```

Run the packaging script to generate the wheel for distribution:

```
pip install --upgrade build
python -m build
```

Upload to PyPI (replace `<VERSION>` with the version number):

```
twine upload dist/emblaze-<VERSION>*
```

### Development Notes

- Svelte transitions don't seem to work well as they force an expensive re-layout operation. Avoid using them during interactions.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cmudig/emblaze",
    "name": "emblaze",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Jupyter,Widgets,IPython",
    "author": "venkatesh-sivaraman",
    "author_email": "venkatesh.sivaraman.98@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/40/12/2039bde980fd016fb8efd3d2537c71cf59191837960ef396d1a24889bfb4/emblaze-0.10.6.tar.gz",
    "platform": "Linux",
    "description": "# Emblaze - Interactive Embedding Comparison\n\nEmblaze is a Jupyter notebook widget for **visually comparing embeddings** using animated scatter plots. It bundles an easy-to-use Python API for performing dimensionality reduction on multiple sets of embedding data (including aligning the results for easier comparison), and a full-featured interactive platform for probing and comparing embeddings that runs within a Jupyter notebook cell. [Read the documentation >](https://dig.cmu.edu/emblaze/emblaze)\n\n![](https://raw.githubusercontent.com/cmudig/emblaze/main/examples/screenshots/cover_art.png)\n\n## Installation\n\n**Compatibility Note:** Note that this widget has been tested using Python >= 3.7. If you are using JupyterLab, please make sure you are running version 3.0 or higher. The widget currently does not support displaying in the VS Code interactive notebook environment.\n\nInstall Emblaze using `pip`:\n\n```bash\npip install emblaze\n```\n\nThe widget should work out of the box when you run `jupyter lab` (see example code below).\n\n_Jupyter Notebook note:_ If you are using Jupyter Notebook 5.2 or earlier, you may also need to enable\nthe nbextension:\n\n```bash\njupyter nbextension enable --py --sys-prefix emblaze\n```\n\n### Running as a Standalone App\n\nIt's most convenient to use Emblaze as a Jupyter widget, but it can also be run as a standalone web application for small-scale hosting purposes. To do so, run the Flask server using:\n\n```bash\npython -m emblaze.server\n```\n\nSee \"Standalone App Development\" below to learn how to import data into the standalone tool.\n\n## Examples\n\nPlease see `examples/example.ipynb` to try using the Emblaze widget on the Boston housing prices or MNIST (TensorFlow import required) datasets.\n\n**Example 1: Multiple projections of the same embedding dataset.** This can reveal areas of variation in the dimensionality reduction process, since tSNE and UMAP are randomized algorithms.\n\n```python\nimport emblaze\nfrom emblaze.utils import Field, ProjectionTechnique\n\n# X is an n x k array, Y is a length-n array\nX, Y = ...\n\n# Represent the high-dimensional embedding\nemb = emblaze.Embedding({Field.POSITION: X, Field.COLOR: Y})\n# Compute nearest neighbors in the high-D space (for display)\nemb.compute_neighbors(metric='cosine')\n\n# Generate UMAP 2D representations - you can pass UMAP parameters to project()\nvariants = emblaze.EmbeddingSet([\n    emb.project(method=ProjectionTechnique.UMAP) for _ in range(10)\n])\n# Compute neighbors again (to indicate that we want to compare projections)\nvariants.compute_neighbors(metric='euclidean')\n\nw = emblaze.Viewer(embeddings=variants)\nw\n```\n\n**Example 2: Multiple embeddings of the same data from different models.** This is useful to see how different models embed data differently.\n\n```python\n# Xs is a list of n x k arrays corresponding to different embedding spaces\nXs = ...\n# Y is a length-n array of labels for color-coding\nY = ...\n# List of strings representing the name of each embedding space (e.g.\n# \"Google News\", \"Wikipedia\", \"Twitter\"). Omit to use generic names\nembedding_names = [...]\n\n# Make high-dimensional embedding objects\nembeddings = emblaze.EmbeddingSet([\n    emblaze.Embedding({Field.POSITION: X, Field.COLOR: Y}, label=emb_name)\n    for X, emb_name in zip(Xs, embedding_names)\n])\nembeddings.compute_neighbors(metric='cosine')\n\n# Make aligned UMAP\nreduced = embeddings.project(method=ProjectionTechnique.ALIGNED_UMAP)\n\nw = emblaze.Viewer(embeddings=reduced)\nw\n```\n\n**Example 3: Visualizing image data with image thumbnails.** The viewer will display image previews for each point as well as its nearest neighbors. (For text data, you can use `TextThumbnails` to show small pieces of text next to the points.)\n\n```python\n# images is an n x 100 x 100 x 3 numpy array of 100x100 RGB images (values from 0-255)\nimages = ...\nthumbnails = emblaze.ImageThumbnails(images)\nw = emblaze.Viewer(embeddings=embeddings, thumbnails=thumbnails)\nw\n```\n\nYou can also visualize embeddings with multimodal labels (i.e. where some points have text labels and others have image labels) by initializing an `emblaze.CombinedThumbnails` instance with a list of other `Thumbnails` objects to combine.\n\nSee the [documentation](https://dig.cmu.edu/emblaze/emblaze) for more details on defining and configuring comparisons with Emblaze.\n\n---\n\n## Development Installation\n\nClone repository, then install dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\nInstall the python package. This will also build the JS packages.\n\n```bash\npip install -e .\n```\n\nRun the following commands if you use **Jupyter Lab**:\n\n```\njupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build\njupyter labextension install .\n```\n\nRun the following commands if you use **Jupyter Notebook**:\n\n```\njupyter nbextension install --sys-prefix --symlink --overwrite --py emblaze\njupyter nbextension enable --sys-prefix --py emblaze\n```\n\nNote that the `--symlink` flag doesn't work on Windows, so you will here have to run\nthe `install` command every time that you rebuild your extension. For certain installations\nyou might also need another flag instead of `--sys-prefix`, but we won't cover the meaning\nof those flags here.\n\n### How to see your changes\n\nOpen JupyterLab in watch mode with `jupyter lab --watch`. Then, in a separate terminal, watch the source directory for changes with `npm run watch`. After a change to the JavaScript code, you will wait for the build to finish, then refresh your browser. After changing in Python code, you will need to restart the notebook kernel to see your changes take effect.\n\n### Standalone App Development\n\nTo develop using the standalone app, run `npm run watch:standalone` in a separate terminal from the Flask server to continuously build the frontend. You will need to reload the page to see your changes.\n\nThe standalone application serves datasets stored at the data path that is printed when the Flask server starts (should be something like `.../lib/python3.9/site-packages/emblaze/data` for the pip-installed version, or `.../emblaze/emblaze/data` for a local repository). You can add your own datasets by building an `EmbeddingSet` and (optionally) a `Thumbnails` object, then saving the results to files in the data directory:\n\n```python\nimport os, json\n\ndataset_name = \"my-dataset\"\ndata_dir = ... # data directory printed by flask server\n\nembeddings = ... # EmbeddingSet object\nthumbnails = ... # (Text|Image)Thumbnails object\n\nos.mkdir(os.path.join(data_dir, dataset_name))\nwith open(os.path.join(data_dir, dataset_name, \"data.json\"), \"w\") as file:\n    json.dump(embeddings.to_json(), file)\nwith open(os.path.join(data_dir, dataset_name, \"thumbnails.json\"), \"w\") as file:\n    json.dump(thumbnails.to_json(), file)\n```\n\n### Building Documentation\n\nInstall pdoc3: `pip install pdoc3`\n\nBuild documentation:\n\n```bash\npdoc --html --force --output-dir docs --template-dir docs/templates emblaze\n```\n\n### Deployment\n\nFirst clean all npm build intermediates:\n\n```\nnpm run clean\n```\n\nBump the widget version in `emblaze/_version.py`, `emblaze/_frontend.py`, and `package.json` if applicable. Then build the notebook widgets and standalone app:\n\n```\nnpm run build:all\n```\n\nRun the packaging script to generate the wheel for distribution:\n\n```\npip install --upgrade build\npython -m build\n```\n\nUpload to PyPI (replace `<VERSION>` with the version number):\n\n```\ntwine upload dist/emblaze-<VERSION>*\n```\n\n### Development Notes\n\n- Svelte transitions don't seem to work well as they force an expensive re-layout operation. Avoid using them during interactions.\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Interactive Jupyter notebook widget for visually comparing embeddings",
    "version": "0.10.6",
    "project_urls": {
        "Homepage": "https://github.com/cmudig/emblaze"
    },
    "split_keywords": [
        "jupyter",
        "widgets",
        "ipython"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1a464ea75f1c1cddccbf1b76740b7d6fbd9235d5c9ff0b01ef7050ce4b4fec26",
                "md5": "3c13858660ffe92578b24d58e328ba09",
                "sha256": "789e77a8dffb569bfd791090b4828e3560bb50b90d5081c1eec3b69830c13ad9"
            },
            "downloads": -1,
            "filename": "emblaze-0.10.6-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3c13858660ffe92578b24d58e328ba09",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 13900885,
            "upload_time": "2024-03-18T17:21:46",
            "upload_time_iso_8601": "2024-03-18T17:21:46.672599Z",
            "url": "https://files.pythonhosted.org/packages/1a/46/4ea75f1c1cddccbf1b76740b7d6fbd9235d5c9ff0b01ef7050ce4b4fec26/emblaze-0.10.6-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "40122039bde980fd016fb8efd3d2537c71cf59191837960ef396d1a24889bfb4",
                "md5": "0ca0778849d9f245e290d6408ca330c7",
                "sha256": "9a05b071567c7c4fbe99a773f4135e69ed2453eca1f689300d21ad6b80cc6bde"
            },
            "downloads": -1,
            "filename": "emblaze-0.10.6.tar.gz",
            "has_sig": false,
            "md5_digest": "0ca0778849d9f245e290d6408ca330c7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15212957,
            "upload_time": "2024-03-18T17:22:04",
            "upload_time_iso_8601": "2024-03-18T17:22:04.109312Z",
            "url": "https://files.pythonhosted.org/packages/40/12/2039bde980fd016fb8efd3d2537c71cf59191837960ef396d1a24889bfb4/emblaze-0.10.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-18 17:22:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cmudig",
    "github_project": "emblaze",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "emblaze"
}
        
Elapsed time: 3.45655s