bocas


Namebocas JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/lukewood/bocas
Summary
upload_time2023-02-01 23:49:49
maintainer
docs_urlNone
authorLuke Wood
requires_python
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bocas

![Downloads](https://img.shields.io/pypi/dm/bocas.svg)
![Python](https://img.shields.io/badge/python-v3.7.0+-success.svg)
[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/lukewood/bocas/issues)


`bocas` is an opinionated open source framework for organizing,
orchestrating, and ultimately publishing research experiments.

Some design highlights of `bocas` include:

-  the ability to cache artifacts between experiment runs
- the de-coupling of plot generation and training jobs
-  `bocas` augments the `ml-collections` library to allow you to describes an
array of experiments in a single config
- run all of the experiments with a single command
- gather artifacts from the experiments
- aggregate the results into plots, tables, and figures for use in your final report
- easily combine results from multiple experiments
- and more!

## Quick-Start

- [Oxford 102 flowers classification example](examples/oxford_102/)
- [Overview](#Overview)
- *(TODO) API documentation*

## Overview

Using `bocas` is easy!  
To get started, you need to be familiar with a few concepts.
This overview covers everything you need to know.

[To quickly jump right into things, check out the Oxford 102 flowers classification example.](examples/oxford_102/)

### Tasks & Tactics

In the mental model of `bocas` there exists *Tasks* and *Tactics*.  A *Task* is
something like: "classify images from MNIST", or "cluster samples into N classes", or
"perform generative learning in X style".  

A *Tactic* refers to the combination of all the details used to produce a
solution to a Task.  For example, one such Tactic for solving MNIST classification might
be to train a ResNet50V2 on data augmented with AugMix.  
Typically, to get a publishable result your paper will require you to have numerous
tactics to benchmark your novel tactic against.

Typically, a research work will have many Tasks: where
the overall goal of the paper is to benchmark a new Tactic's ability at solving a variety
of tasks.

`bocas` is structured around this idea: you will have at least one Task, and
each Task may be solved by numerous tactics.
As such, I recommend breaking your codebase down at the `Task` level, structuring your
paper's artifact with splits made on the `Task` level.  For example, a classification
paper might have the structure:

```
- tasks/
      - mnist/
            - ...
      - imagenet/
            - ...
```

### Code Structure

`bocas` provides an opinionated framework for generating

Keeping these concepts in mind, `bocas` recommends that you structure your code
into three levels:

- `library/` holds anything unique to your report/paper/publication.  This might include
  a new augmentation, a new `keras.Layer`, a new loss function, or a new metric.
- `tasks/` holds all of the tasks to benchmark your new technique on.
- `paper/` holds the `Latex` or `Markdown` code required to render your paper
- `paper/artifacts` subdirectory of `paper` that holds all of the artifacts produced by
  the `tasks`.  Typically when running a Task sweep you'll want to provide this directory
  to your scripts.

Your tasks should be structured as follows:

All code for a task should reside in `tasks/{task}/`, i.e. `tasks/oxford_102`.
You should create a `run.py` script.  This script must have a `run()` method that
accepts an `ml_collections.ConfigDict` as its first positional argument.  If you follow
the example in the [Oxford Flowers 102 example](examples/oxford_102/run.py), your
`run.py` file will support both independent run and mass-scale sweeps:

```python
def run(config):
    name = f'{config.optimizer}'
    train_ds, test_ds = tfds.load(
        "oxford102", as_supervised=True, split=["train", "test"]
    )
    model = keras_cv.models.ResNet50V2(
      include_rescaling=True,
      include_top=True,
      classes=102
    )
    model.compile(loss="mse", optimizer=config.optimizer)
    history = model.fit(train_ds, epochs=10)

    return bocas.Result(
        name=name,
        artifacts=[
            bocas.artifacts.KerasHistory(history, name="fit_history"),
        ],
    )
```

Once you are happy with the results from a single `run.py` run, create a `sweep.py`
config file.  In `sweep.py`, specify a `ml_collections.ConfigDict` containing
`bocas.Sweep` objects for any value you'd like to sweep oer.

```python
config = ml_collections.ConfigDict()

config.static_value = 'any-string-or-int-or-float-or-python-object'
config.optimizer = bocas.Sweep(['sgd', 'adam'])
```

Anytime a value of type `bocas.Sweep()` is encountered, the product of all
other defined `bocas.Sweep()` parameters is run with the addition of the new
values in that sweep.  

Be careful with this!  It is easy to create a lot of experiments:

```python
config = ml_collections.ConfigDict()
config.learning_rate = bocas.Sweep([x/100 for x in range(5, 21)])
config.optimizer = bocas.Sweep(['sgd', 'adam'])
config.model = bocas.Sweep(
  ['resnet50', 'resnet50v2', 'densenet101', 'efficientnet']
)
```

This configuration already contains `15 * 2 * 4` or `120` runs!  That is probably
way more than you'd like.  Try to define a few experiments that are all encompassing.
To accomplish this, run hyper parameter sweeps separately, and hardcode the values into
the final runs that are used to produce the charts.

After all of your runs are complete, create some charts and plots.  Save them to your
designated directory in your `paper/` directory so that they are rendered
into your updated paper.

I recommend writing a script to produce desired plots based on the artifacts that can
be run entirely separately from your experiments themselves.  Any example of this can
be found in the `oxford_102` example:

```python
# scripts/create_plots.py
results = bocas.Result.load_collection("artifacts/")

metrics_to_plot = {}

for experiment in results:
    metrics = experiment.get_artifact("fit_history").metrics

    metrics_to_plot[f"{experiment.name} Train"] = metrics["accuracy"]
    metrics_to_plot[f"{experiment.name} Validation"] = metrics["val_accuracy"]

luketils.visualization.line_plot(
    metrics_to_plot,
    path=f"{paper_dir}/results/combined-accuracy.png",
    title="Model Accuracy",
)
```

[Check out the full code in oxford_102.](examples/oxford_102/)

### Conclusions & Further Reading

Thats all it takes to get running with `bocas`.  Please check out the
[`examples/`](examples/) directory for more reading.  It contains a few more patterns
that might be useful in structuring your experiments.

## Limitations

:warning: right now `bocas` is under active development :warning:

While the API is relatively straightforward and simple, `bocas`
lacks support for multi-worker experiment runs.  This means that you will need to run
all of your experiments concurrently on a single machine.  If you are running 10-20
`fit()` loops to convergence, this will likely be an extremely expensive process.

Personally, I'd rather just wait for my experiments to run then fiddle with a ton of
infrastructure.  That being said, I mainly run small scale research.

If someone wants to contribute distributed runs, feel free!

## License

[Apache v2 License](LICENSE)

## Contributing

Contributions are more than welcome to `bocas`.  
Please see the GitHub issue tracker, and feel free to pick up any issue annotated
with [Contribution Welcome](https://github.com/lukewood/bocas/issues).

Additionally, bug reports are not only welcome but encouraged.  
Help me improve `bocas`!  
I made this project because I needed the tool.
I'm sure many others do as well.

## Thanks!

If you find this tool helpful, please toss a GitHub star on the repo and follow me on Twitter.

Thank you to all of our GitHub contributors:

<a href="https://github.com/lukewood/bocas/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=lukewood/bocas" />
</a>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lukewood/bocas",
    "name": "bocas",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Luke Wood",
    "author_email": "lukewoodcs@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/69/bb/37012699d30e12805ebff93e66acb05389510501a7b275251ce55a343052/bocas-0.0.2.tar.gz",
    "platform": null,
    "description": "# Bocas\n\n![Downloads](https://img.shields.io/pypi/dm/bocas.svg)\n![Python](https://img.shields.io/badge/python-v3.7.0+-success.svg)\n[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/lukewood/bocas/issues)\n\n\n`bocas` is an opinionated open source framework for organizing,\norchestrating, and ultimately publishing research experiments.\n\nSome design highlights of `bocas` include:\n\n-  the ability to cache artifacts between experiment runs\n- the de-coupling of plot generation and training jobs\n-  `bocas` augments the `ml-collections` library to allow you to describes an\narray of experiments in a single config\n- run all of the experiments with a single command\n- gather artifacts from the experiments\n- aggregate the results into plots, tables, and figures for use in your final report\n- easily combine results from multiple experiments\n- and more!\n\n## Quick-Start\n\n- [Oxford 102 flowers classification example](examples/oxford_102/)\n- [Overview](#Overview)\n- *(TODO) API documentation*\n\n## Overview\n\nUsing `bocas` is easy!  \nTo get started, you need to be familiar with a few concepts.\nThis overview covers everything you need to know.\n\n[To quickly jump right into things, check out the Oxford 102 flowers classification example.](examples/oxford_102/)\n\n### Tasks & Tactics\n\nIn the mental model of `bocas` there exists *Tasks* and *Tactics*.  A *Task* is\nsomething like: \"classify images from MNIST\", or \"cluster samples into N classes\", or\n\"perform generative learning in X style\".  \n\nA *Tactic* refers to the combination of all the details used to produce a\nsolution to a Task.  For example, one such Tactic for solving MNIST classification might\nbe to train a ResNet50V2 on data augmented with AugMix.  \nTypically, to get a publishable result your paper will require you to have numerous\ntactics to benchmark your novel tactic against.\n\nTypically, a research work will have many Tasks: where\nthe overall goal of the paper is to benchmark a new Tactic's ability at solving a variety\nof tasks.\n\n`bocas` is structured around this idea: you will have at least one Task, and\neach Task may be solved by numerous tactics.\nAs such, I recommend breaking your codebase down at the `Task` level, structuring your\npaper's artifact with splits made on the `Task` level.  For example, a classification\npaper might have the structure:\n\n```\n- tasks/\n      - mnist/\n            - ...\n      - imagenet/\n            - ...\n```\n\n### Code Structure\n\n`bocas` provides an opinionated framework for generating\n\nKeeping these concepts in mind, `bocas` recommends that you structure your code\ninto three levels:\n\n- `library/` holds anything unique to your report/paper/publication.  This might include\n  a new augmentation, a new `keras.Layer`, a new loss function, or a new metric.\n- `tasks/` holds all of the tasks to benchmark your new technique on.\n- `paper/` holds the `Latex` or `Markdown` code required to render your paper\n- `paper/artifacts` subdirectory of `paper` that holds all of the artifacts produced by\n  the `tasks`.  Typically when running a Task sweep you'll want to provide this directory\n  to your scripts.\n\nYour tasks should be structured as follows:\n\nAll code for a task should reside in `tasks/{task}/`, i.e. `tasks/oxford_102`.\nYou should create a `run.py` script.  This script must have a `run()` method that\naccepts an `ml_collections.ConfigDict` as its first positional argument.  If you follow\nthe example in the [Oxford Flowers 102 example](examples/oxford_102/run.py), your\n`run.py` file will support both independent run and mass-scale sweeps:\n\n```python\ndef run(config):\n    name = f'{config.optimizer}'\n    train_ds, test_ds = tfds.load(\n        \"oxford102\", as_supervised=True, split=[\"train\", \"test\"]\n    )\n    model = keras_cv.models.ResNet50V2(\n      include_rescaling=True,\n      include_top=True,\n      classes=102\n    )\n    model.compile(loss=\"mse\", optimizer=config.optimizer)\n    history = model.fit(train_ds, epochs=10)\n\n    return bocas.Result(\n        name=name,\n        artifacts=[\n            bocas.artifacts.KerasHistory(history, name=\"fit_history\"),\n        ],\n    )\n```\n\nOnce you are happy with the results from a single `run.py` run, create a `sweep.py`\nconfig file.  In `sweep.py`, specify a `ml_collections.ConfigDict` containing\n`bocas.Sweep` objects for any value you'd like to sweep oer.\n\n```python\nconfig = ml_collections.ConfigDict()\n\nconfig.static_value = 'any-string-or-int-or-float-or-python-object'\nconfig.optimizer = bocas.Sweep(['sgd', 'adam'])\n```\n\nAnytime a value of type `bocas.Sweep()` is encountered, the product of all\nother defined `bocas.Sweep()` parameters is run with the addition of the new\nvalues in that sweep.  \n\nBe careful with this!  It is easy to create a lot of experiments:\n\n```python\nconfig = ml_collections.ConfigDict()\nconfig.learning_rate = bocas.Sweep([x/100 for x in range(5, 21)])\nconfig.optimizer = bocas.Sweep(['sgd', 'adam'])\nconfig.model = bocas.Sweep(\n  ['resnet50', 'resnet50v2', 'densenet101', 'efficientnet']\n)\n```\n\nThis configuration already contains `15 * 2 * 4` or `120` runs!  That is probably\nway more than you'd like.  Try to define a few experiments that are all encompassing.\nTo accomplish this, run hyper parameter sweeps separately, and hardcode the values into\nthe final runs that are used to produce the charts.\n\nAfter all of your runs are complete, create some charts and plots.  Save them to your\ndesignated directory in your `paper/` directory so that they are rendered\ninto your updated paper.\n\nI recommend writing a script to produce desired plots based on the artifacts that can\nbe run entirely separately from your experiments themselves.  Any example of this can\nbe found in the `oxford_102` example:\n\n```python\n# scripts/create_plots.py\nresults = bocas.Result.load_collection(\"artifacts/\")\n\nmetrics_to_plot = {}\n\nfor experiment in results:\n    metrics = experiment.get_artifact(\"fit_history\").metrics\n\n    metrics_to_plot[f\"{experiment.name} Train\"] = metrics[\"accuracy\"]\n    metrics_to_plot[f\"{experiment.name} Validation\"] = metrics[\"val_accuracy\"]\n\nluketils.visualization.line_plot(\n    metrics_to_plot,\n    path=f\"{paper_dir}/results/combined-accuracy.png\",\n    title=\"Model Accuracy\",\n)\n```\n\n[Check out the full code in oxford_102.](examples/oxford_102/)\n\n### Conclusions & Further Reading\n\nThats all it takes to get running with `bocas`.  Please check out the\n[`examples/`](examples/) directory for more reading.  It contains a few more patterns\nthat might be useful in structuring your experiments.\n\n## Limitations\n\n:warning: right now `bocas` is under active development :warning:\n\nWhile the API is relatively straightforward and simple, `bocas`\nlacks support for multi-worker experiment runs.  This means that you will need to run\nall of your experiments concurrently on a single machine.  If you are running 10-20\n`fit()` loops to convergence, this will likely be an extremely expensive process.\n\nPersonally, I'd rather just wait for my experiments to run then fiddle with a ton of\ninfrastructure.  That being said, I mainly run small scale research.\n\nIf someone wants to contribute distributed runs, feel free!\n\n## License\n\n[Apache v2 License](LICENSE)\n\n## Contributing\n\nContributions are more than welcome to `bocas`.  \nPlease see the GitHub issue tracker, and feel free to pick up any issue annotated\nwith [Contribution Welcome](https://github.com/lukewood/bocas/issues).\n\nAdditionally, bug reports are not only welcome but encouraged.  \nHelp me improve `bocas`!  \nI made this project because I needed the tool.\nI'm sure many others do as well.\n\n## Thanks!\n\nIf you find this tool helpful, please toss a GitHub star on the repo and follow me on Twitter.\n\nThank you to all of our GitHub contributors:\n\n<a href=\"https://github.com/lukewood/bocas/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=lukewood/bocas\" />\n</a>\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "",
    "version": "0.0.2",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ef27a2cb6938f94b5eb1be7f2c9240bd3d355c3282a9634c29d4fe4784c2781",
                "md5": "fde67149d9598e6cea31d8d16baad1f1",
                "sha256": "34c7e13230a43ea166de58f6c4230615532bf5b2bd79c8367803d3300c88f457"
            },
            "downloads": -1,
            "filename": "bocas-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fde67149d9598e6cea31d8d16baad1f1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8813,
            "upload_time": "2023-02-01T23:49:48",
            "upload_time_iso_8601": "2023-02-01T23:49:48.297456Z",
            "url": "https://files.pythonhosted.org/packages/1e/f2/7a2cb6938f94b5eb1be7f2c9240bd3d355c3282a9634c29d4fe4784c2781/bocas-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "69bb37012699d30e12805ebff93e66acb05389510501a7b275251ce55a343052",
                "md5": "5ece1078270004305373a66332bc876d",
                "sha256": "7ba905e343c276fb1bba4f32a153e18e67658201c647275d24e61f82bbd9ce94"
            },
            "downloads": -1,
            "filename": "bocas-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "5ece1078270004305373a66332bc876d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11046,
            "upload_time": "2023-02-01T23:49:49",
            "upload_time_iso_8601": "2023-02-01T23:49:49.924376Z",
            "url": "https://files.pythonhosted.org/packages/69/bb/37012699d30e12805ebff93e66acb05389510501a7b275251ce55a343052/bocas-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-01 23:49:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "lukewood",
    "github_project": "bocas",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bocas"
}
        
Elapsed time: 0.08267s