maze-dataset

Name	maze-dataset JSON
Version	1.1.0 JSON
	download
home_page	https://github.com/understanding-search/maze-dataset
Summary	generating and working with datasets of mazes
upload_time	2024-09-10 19:33:49
maintainer	None
docs_url	None
author	Michael Ivanitskiy
requires_python	<4.0.0,>=3.10.6
license	None
keywords	maze mazes labyrinth dataset procedural pathfinding tokenization
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![PyPI](https://img.shields.io/pypi/v/maze-dataset)](https://pypi.org/project/maze-dataset/)
![PyPI - Downloads](https://img.shields.io/pypi/dm/maze-dataset)
[![Checks](https://github.com/understanding-search/maze-dataset/actions/workflows/checks.yml/badge.svg)](https://github.com/understanding-search/maze-dataset/actions/workflows/checks.yml)
[![Coverage](docs/coverage/coverage.svg)](docs/coverage/coverage.txt)
![code size, bytes](https://img.shields.io/github/languages/code-size/understanding-search/maze-dataset)
![GitHub commit activity](https://img.shields.io/github/commit-activity/t/understanding-search/maze-dataset)
![GitHub closed pull requests](https://img.shields.io/github/issues-pr-closed/understanding-search/maze-dataset)


# `maze-dataset`

This package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training ML systems. Primarily built for the [maze-transformer interpretability](https://github.com/understanding-search/maze-transformer) project. You can find our paper on it here: http://arxiv.org/abs/2309.10498

This package includes a variety of [maze generation algorithms](maze_dataset/generation/generators.py), including randomized depth first search, Wilson's algorithm for uniform spanning trees, and percolation. Datasets can be filtered to select mazes of a certain length or complexity, remove duplicates, and satisfy custom properties. A variety of output formats for visualization and training ML models are provided.

|   |   |   |   |
|---|---|---|---|
| ![Maze generated via percolation](docs/assets/maze_perc.png) |  ![Maze generated via constrained randomized depth first search](docs/assets/maze_dfs_constrained.png)  |  ![Maze with random heatmap](docs/assets/mazeplot_heatmap.png)  |  ![MazePlot with solution](docs/assets/mazeplot_path.png)  |

# Installation
This package is [available on PyPI](https://pypi.org/project/maze-dataset/), and can be installed via
```
pip install maze-dataset
```

# Docs

The full hosted documentation is available at [https://understanding-search.github.io/maze-dataset/](https://understanding-search.github.io/maze-dataset/).

Additionally:

- our notebooks serve as a good starting point for understanding the package:
    - the [notebooks](https://understanding-search.github.io/maze-dataset/notebooks) page in the docs has links to the rendered notebooks
    - the [`notebooks`](https://github.com/understanding-search/maze-dataset/tree/main/notebooks) folder has the source notebooks
- combined, single page docs are available as:
    - [plain text](https://understanding-search.github.io/maze-dataset/combined/maze_dataset.txt)
    - [html](https://understanding-search.github.io/maze-dataset/combined/maze_dataset.html)
    - [github markdown](https://github.com/understanding-search/maze-dataset/tree/main/docs/combined/maze_dataset.md)
    - [pandoc markdown](https://github.com/understanding-search/maze-dataset/tree/main/docs/combined/maze_dataset.md)
- test coverage reports are available on the [coverage](https://understanding-search.github.io/maze-dataset/coverage) page or the [`coverage/`](https://github.com/understanding-search/maze-dataset/tree/main/docs/coverage) folder
- generation benchmark results are available on the [benchmarks](https://understanding-search.github.io/maze-dataset/benchmarks) page or the [`benchmarks/`](https://github.com/understanding-search/maze-dataset/tree/main/docs/benchmarks) folder

# Usage

## Creating a dataset

To create a `MazeDataset`, which inherits from `torch.utils.data.Dataset`, you first create a `MazeDatasetConfig`:

```python
from maze_dataset import MazeDataset, MazeDatasetConfig
from maze_dataset.generation import LatticeMazeGenerators
cfg: MazeDatasetConfig = MazeDatasetConfig(
	name="test", # name is only for you to keep track of things
	grid_n=5, # number of rows/columns in the lattice
	n_mazes=4, # number of mazes to generate
	maze_ctor=LatticeMazeGenerators.gen_dfs, # algorithm to generate the maze
    maze_ctor_kwargs=dict(do_forks=False), # additional parameters to pass to the maze generation algorithm
)
```

and then pass this config to the `MazeDataset.from_config` method:

```python
dataset: MazeDataset = MazeDataset.from_config(cfg)
```

This method can search for whether a dataset with matching config hash already exists on your filesystem in the expected location, and load it if so. It can also generate a dataset on the fly if needed.

## Conversions to useful formats

The elements of the dataset are [`SolvedMaze`](maze_dataset/maze/lattice_maze.py) objects:
```python
>>> m = dataset[0]
>>> type(m)
maze_dataset.maze.lattice_maze.SolvedMaze
```

Which can be converted to a variety of formats:
```python
# visual representation as ascii art
m.as_ascii() 
# RGB image, optionally without solution or endpoints, suitable for CNNs
m.as_pixels() 
# text format for autoreregressive transformers
from maze_dataset.tokenization import MazeTokenizerModular, TokenizationMode
m.as_tokens(maze_tokenizer=MazeTokenizerModular(
    tokenization_mode=TokenizationMode.AOTP_UT_rasterized, max_grid_size=100,
))
# advanced visualization with many features
from maze_dataset.plotting import MazePlot
MazePlot(maze).plot()
```

![textual and visual output formats](docs/output_formats.png)


# Development

This project uses [Poetry](https://python-poetry.org/docs/#installation) for development. To install with dev requirements, run
```
poetry install --with dev
```

A makefile is included to simplify common development tasks:

- `make help` will print all available commands
- all tests via `make test`
    - unit tests via `make unit`
    - notebook tests via `make test_notebooks`
- formatter (black, pycln, and isort) via `make format`
    - formatter in check-only mode via `make check-format`


# Citing

If you use this code in your research, please cite [our paper](http://arxiv.org/abs/2309.10498):

```
@misc{maze-dataset,
    title={A Configurable Library for Generating and Manipulating Maze Datasets}, 
    author={Michael Igorevich Ivanitskiy and Rusheb Shah and Alex F. Spies and Tilman Räuker and Dan Valentine and Can Rager and Lucia Quirke and Chris Mathwin and Guillaume Corlouer and Cecilia Diniz Behn and Samy Wu Fung},
    year={2023},
    eprint={2309.10498},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={http://arxiv.org/abs/2309.10498}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/understanding-search/maze-dataset",
    "name": "maze-dataset",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.10.6",
    "maintainer_email": null,
    "keywords": "maze, mazes, labyrinth, dataset, procedural, pathfinding, tokenization",
    "author": "Michael Ivanitskiy",
    "author_email": "mivanits@umich.edu",
    "download_url": "https://files.pythonhosted.org/packages/30/3a/a9da3aac3613c0e86881c622cd74f717a986f1e3ec20ca5882d4a699284c/maze_dataset-1.1.0.tar.gz",
    "platform": null,
    "description": "[![PyPI](https://img.shields.io/pypi/v/maze-dataset)](https://pypi.org/project/maze-dataset/)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/maze-dataset)\n[![Checks](https://github.com/understanding-search/maze-dataset/actions/workflows/checks.yml/badge.svg)](https://github.com/understanding-search/maze-dataset/actions/workflows/checks.yml)\n[![Coverage](docs/coverage/coverage.svg)](docs/coverage/coverage.txt)\n![code size, bytes](https://img.shields.io/github/languages/code-size/understanding-search/maze-dataset)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/t/understanding-search/maze-dataset)\n![GitHub closed pull requests](https://img.shields.io/github/issues-pr-closed/understanding-search/maze-dataset)\n\n\n# `maze-dataset`\n\nThis package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training ML systems. Primarily built for the [maze-transformer interpretability](https://github.com/understanding-search/maze-transformer) project. You can find our paper on it here: http://arxiv.org/abs/2309.10498\n\nThis package includes a variety of [maze generation algorithms](maze_dataset/generation/generators.py), including randomized depth first search, Wilson's algorithm for uniform spanning trees, and percolation. Datasets can be filtered to select mazes of a certain length or complexity, remove duplicates, and satisfy custom properties. A variety of output formats for visualization and training ML models are provided.\n\n|   |   |   |   |\n|---|---|---|---|\n| ![Maze generated via percolation](docs/assets/maze_perc.png) |  ![Maze generated via constrained randomized depth first search](docs/assets/maze_dfs_constrained.png)  |  ![Maze with random heatmap](docs/assets/mazeplot_heatmap.png)  |  ![MazePlot with solution](docs/assets/mazeplot_path.png)  |\n\n# Installation\nThis package is [available on PyPI](https://pypi.org/project/maze-dataset/), and can be installed via\n```\npip install maze-dataset\n```\n\n# Docs\n\nThe full hosted documentation is available at [https://understanding-search.github.io/maze-dataset/](https://understanding-search.github.io/maze-dataset/).\n\nAdditionally:\n\n- our notebooks serve as a good starting point for understanding the package:\n    - the [notebooks](https://understanding-search.github.io/maze-dataset/notebooks) page in the docs has links to the rendered notebooks\n    - the [`notebooks`](https://github.com/understanding-search/maze-dataset/tree/main/notebooks) folder has the source notebooks\n- combined, single page docs are available as:\n    - [plain text](https://understanding-search.github.io/maze-dataset/combined/maze_dataset.txt)\n    - [html](https://understanding-search.github.io/maze-dataset/combined/maze_dataset.html)\n    - [github markdown](https://github.com/understanding-search/maze-dataset/tree/main/docs/combined/maze_dataset.md)\n    - [pandoc markdown](https://github.com/understanding-search/maze-dataset/tree/main/docs/combined/maze_dataset.md)\n- test coverage reports are available on the [coverage](https://understanding-search.github.io/maze-dataset/coverage) page or the [`coverage/`](https://github.com/understanding-search/maze-dataset/tree/main/docs/coverage) folder\n- generation benchmark results are available on the [benchmarks](https://understanding-search.github.io/maze-dataset/benchmarks) page or the [`benchmarks/`](https://github.com/understanding-search/maze-dataset/tree/main/docs/benchmarks) folder\n\n# Usage\n\n## Creating a dataset\n\nTo create a `MazeDataset`, which inherits from `torch.utils.data.Dataset`, you first create a `MazeDatasetConfig`:\n\n```python\nfrom maze_dataset import MazeDataset, MazeDatasetConfig\nfrom maze_dataset.generation import LatticeMazeGenerators\ncfg: MazeDatasetConfig = MazeDatasetConfig(\n\tname=\"test\", # name is only for you to keep track of things\n\tgrid_n=5, # number of rows/columns in the lattice\n\tn_mazes=4, # number of mazes to generate\n\tmaze_ctor=LatticeMazeGenerators.gen_dfs, # algorithm to generate the maze\n    maze_ctor_kwargs=dict(do_forks=False), # additional parameters to pass to the maze generation algorithm\n)\n```\n\nand then pass this config to the `MazeDataset.from_config` method:\n\n```python\ndataset: MazeDataset = MazeDataset.from_config(cfg)\n```\n\nThis method can search for whether a dataset with matching config hash already exists on your filesystem in the expected location, and load it if so. It can also generate a dataset on the fly if needed.\n\n## Conversions to useful formats\n\nThe elements of the dataset are [`SolvedMaze`](maze_dataset/maze/lattice_maze.py) objects:\n```python\n>>> m = dataset[0]\n>>> type(m)\nmaze_dataset.maze.lattice_maze.SolvedMaze\n```\n\nWhich can be converted to a variety of formats:\n```python\n# visual representation as ascii art\nm.as_ascii() \n# RGB image, optionally without solution or endpoints, suitable for CNNs\nm.as_pixels() \n# text format for autoreregressive transformers\nfrom maze_dataset.tokenization import MazeTokenizerModular, TokenizationMode\nm.as_tokens(maze_tokenizer=MazeTokenizerModular(\n    tokenization_mode=TokenizationMode.AOTP_UT_rasterized, max_grid_size=100,\n))\n# advanced visualization with many features\nfrom maze_dataset.plotting import MazePlot\nMazePlot(maze).plot()\n```\n\n![textual and visual output formats](docs/output_formats.png)\n\n\n# Development\n\nThis project uses [Poetry](https://python-poetry.org/docs/#installation) for development. To install with dev requirements, run\n```\npoetry install --with dev\n```\n\nA makefile is included to simplify common development tasks:\n\n- `make help` will print all available commands\n- all tests via `make test`\n    - unit tests via `make unit`\n    - notebook tests via `make test_notebooks`\n- formatter (black, pycln, and isort) via `make format`\n    - formatter in check-only mode via `make check-format`\n\n\n# Citing\n\nIf you use this code in your research, please cite [our paper](http://arxiv.org/abs/2309.10498):\n\n```\n@misc{maze-dataset,\n    title={A Configurable Library for Generating and Manipulating Maze Datasets}, \n    author={Michael Igorevich Ivanitskiy and Rusheb Shah and Alex F. Spies and Tilman R\u00e4uker and Dan Valentine and Can Rager and Lucia Quirke and Chris Mathwin and Guillaume Corlouer and Cecilia Diniz Behn and Samy Wu Fung},\n    year={2023},\n    eprint={2309.10498},\n    archivePrefix={arXiv},\n    primaryClass={cs.LG},\n    url={http://arxiv.org/abs/2309.10498}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "generating and working with datasets of mazes",
    "version": "1.1.0",
    "project_urls": {
        "Documentation": "https://understanding-search.github.io/maze-dataset/",
        "Homepage": "https://github.com/understanding-search/maze-dataset",
        "Repository": "https://github.com/understanding-search/maze-dataset"
    },
    "split_keywords": [
        "maze",
        " mazes",
        " labyrinth",
        " dataset",
        " procedural",
        " pathfinding",
        " tokenization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f090131a844db406c2bc59b8d33eae6b2c579a0cf19c1a780223dfac11317257",
                "md5": "6cb4bb16799c3f8d5e74443c16482833",
                "sha256": "55f75bbc9fb64ec9b45de50f0caa7617673017d0a8e5f50e32433f3fc301f992"
            },
            "downloads": -1,
            "filename": "maze_dataset-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6cb4bb16799c3f8d5e74443c16482833",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.10.6",
            "size": 90236,
            "upload_time": "2024-09-10T19:33:46",
            "upload_time_iso_8601": "2024-09-10T19:33:46.974365Z",
            "url": "https://files.pythonhosted.org/packages/f0/90/131a844db406c2bc59b8d33eae6b2c579a0cf19c1a780223dfac11317257/maze_dataset-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "303aa9da3aac3613c0e86881c622cd74f717a986f1e3ec20ca5882d4a699284c",
                "md5": "57e4dfbf64e000c040826ea671f27236",
                "sha256": "b4843115a57878d0ca7ceaadc31629507fb2215ff0a2fc4e918300265939810e"
            },
            "downloads": -1,
            "filename": "maze_dataset-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "57e4dfbf64e000c040826ea671f27236",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.10.6",
            "size": 79632,
            "upload_time": "2024-09-10T19:33:49",
            "upload_time_iso_8601": "2024-09-10T19:33:49.042955Z",
            "url": "https://files.pythonhosted.org/packages/30/3a/a9da3aac3613c0e86881c622cd74f717a986f1e3ec20ca5882d4a699284c/maze_dataset-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-10 19:33:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "understanding-search",
    "github_project": "maze-dataset",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "maze-dataset"
}

Michael Ivanitskiy